Add the new model for recording basic information about Realms on remote
server, to go with the other analytics data. Also adds necessary changes
to the bouncer endpoint and the send_analytics_to_push_bouncer()
function to submit such Realm information.
This calculates the largest amount of messages sent within a month for
the last 3 months. The query is targeted for the specific use-case in
this function - for finding the count for a specific server. For
calculating this in bulk for a large number of remote server an
adapted, bulk query will be needed - rather than running this one in a
loop, which would likely be very inefficient.
This parameter appeared here on the function definition,
but because it lacked a `REQ` call it didn't actually connect
to any parameter passed in the HTTP request.
It doesn't make any sense on this endpoint anyway -- presumably
it was copy-pasted from its "register" counterpart -- so just cut it.
We'll need this information in order to properly direct APNs
notifications. Happily, the Zulip server always sends it when
registering an APNs token; and it appears it always has done so
since the commit:
cddee49e7 Add support infrastructure for push notification bouncer service.
back in 2016. So there's no compatibility issue from requiring it.
This missing `REQ` call has meant we just drop this parameter:
even though the remote Zulip server passes it (for all APNs tokens),
we never notice and never store it. Fix that.
So that all child classes of BillingSession generate the same data
structure for customers that are created in Stripe, revise
`get_data_for_stripe_customer` to return a specific dataclass:
StripeCustomerData.
So that `update_or_create_stripe_customer` can work for Customer
objects with either a realm or remote_server, we create an abstract
base class, BillingSession, and implement a child class for the
current implementation of Customer objects with a realm.
Refactoring `update_or_create_stripe_customer` also moves
`create_stripe_customer` and `replace_payment_method` to the
BillingSession class.
Earlier, the 'automatically_follow_topics_policy' and
'automatically_unmute_topics_in_muted_streams_policy' were set
to 'NEVER' for both the test and dev databases.
This commit fixes the incorrect behavior.
Now, we set it to 'NEVER' only for the test-database.
We need the actual default value in our dev database.
We explicitly set it to NEVER for test-database as it skews other
tests by generating extra events and db queries. We have separate
tests with different values to test the intended behavior related
to these settings.
The bug was introduced in 58568a6.
Add an optional `automatic_new_visibility_policy` enum field
in the success response to indicate the new visibility policy
value due to the `automatically_follow_topics_policy` and
`automatically_unmute_topics_in_muted_streams_policy` user settings
during the send message action.
Only present if there is a change in the visibility policy.
This commit adds two user settings, named
* `automatically_follow_topics_policy`
* `automatically_unmute_topics_in_muted_streams_policy`
The settings control the user's preference on which topics they
will automatically 'follow' or 'unmute in muted streams'.
The policies offer four options:
1. Topics I participate in
2. Topics I send a message to
3. Topics I start
4. Never (default)
There is no support for configuring the settings through the UI yet.
There is no reason to have an index on just `realm_id` or `remote_id`,
as those values mean nothing outside of the scope of a specific
`server_id`. Remove those never-used single-column indexes from the
two tables that have them.
By contrast, the pair of `server_id` and `remote_id` is quite useful
and specific -- it is a unique pair, and every POST of statistics from
a remote host requires looking up the highest `remote_id` for a given
`server_id`, which (without this index) is otherwise a quite large
scan.
Add a unique constraint, which (in PostgreSQL) is implemented as a
unique index.
_default_manager is the same as objects on most of our models. But
when a model class is stored in a variable, the type system doesn’t
know which model the variable is referring to, so it can’t know that
objects even exists (Django doesn’t add it if the user added a custom
manager of a different name). django-stubs used to incorrectly assume
it exists unconditionally, but it no longer does.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This migration applies under the assumption that extra_data_json has
been populated for all existing and coming audit log entries.
- This removes the manual conversions back and forth for extra_data
throughout the codebase including the orjson.loads(), orjson.dumps(),
and str() calls.
- The custom handler used for converting Decimal is removed since
DjangoJSONEncoder handles that for extra_data.
- We remove None-checks for extra_data because it is now no longer
nullable.
- Meanwhile, we want the bouncer to support processing RealmAuditLog entries for
remote servers before and after the JSONField migration on extra_data.
- Since now extra_data should always be a dict for the newer remote
server, which is now migrated, the test cases are updated to create
RealmAuditLog objects by passing a dict for extra_data before
sending over the analytics data. Note that while JSONField allows for
non-dict values, a proper remote server always passes a dict for
extra_data.
- We still test out the legacy extra_data format because not all
remote servers have migrated to use JSONField extra_data.
This verifies that support for extra_data being a string or None has not
been dropped.
Co-authored-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
The curl examples of reordering linkifiers require there to be some
linkifiers in the database to be reordered. This adjusts some test cases
so they do not assume that there is no linkifier in the test db.
This commit updates the select_related calls in queries to
get UserProfile objects in populate_db code to pass "realm"
as argument to select_related call.
Also, note that "realm" is the only non-null foreign key field
in UserProfile object, so select_related() was only fetching
realm object previously as well. But we should still pass "realm"
as argument in select_related call so that we can make sure that
only required fields are selected in case we add more foreign
keys to UserProfile in future.
Translators benefit from the extra information in the field names, and
need the reordering freedom that isn’t available with multiple
positional fields.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The previous function was poorly named, asked for a
Realm object when realm_id sufficed, and returned a
tuple of strings that had different semantics.
I also avoid calling it duplicate times in a couple
places, although it was probably rarely the case that
both invocations actually happened if upstream
validations were working.
Note that there is a TypedDict called EmojiInfo, so I
chose EmojiData here. Perhaps a better name would be
TinyEmojiData or something.
I also simplify the reaction tests with a verify
helper.
This migration is reasonably complex because of various anomalies in existing
data.
Note that there are cases when extra_data does not contain data that is
proper json with possibly single quotes. Thus we need to use
"ast.literal_eval" to cover that.
There is also a special case for "event_type == USER_FULL_NAME_CHANGED",
where extra_data is a plain str. This event_type is only used for
RealmAuditLog, so the zilencer migration script does not need to handle
it.
The migration does not handle "event_type == REALM_DISCOUNT_CHANGED"
because ast.literal_eval only allow Python literals. We expect the admin
to populate the jsonified extra_data for extra_data_json manually
beforehand.
This chunks the backfilling migration to reduce potential block time.
The migration for zilencer is mostly similar to the one for zerver; except that
the backfill helper is added in a wrapper and unrelated events are
removed.
**Logging and error recovery**
We print out a warning when the extra_data_json field of an entry
would have been overwritten by a value inconsistent with what we derived
from extra_data. Usually this only happens when the extra_data was
corrupted before this migration. This prevents data loss by backing up
possibly corrupted data in extra_data_json with the keys
"inconsistent_old_extra_data" and "inconsistent_old_extra_data_json".
More roundtrips to the database are needed for inconsistent data, which are
expected to be infrequent.
This also outputs messages when there are audit log entries with decimals,
indicating that such entries are not backfilled. Do note that audit log
entries with decimals are not populated with "inconsistent_old_extra_data_*"
in the JSONField, because they are not overwritten.
For such audit log entries with "extra_data_json" marked as inconsistent,
we skip them in the migration. Because when we have discovered anomalies in a
previous run, there is no need to overwrite them again nesting the extra keys
we added to it.
**Testing**
We create a migration test case utilizing the property of bulk_create
that it doesn't call our modified save method.
We extend ZulipTestCase to support verifying console output at the test
case level. The implementation is crude but the use case should be rare
enough that we don't need it to be too elaborate.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Note that we use the DjangoJSONEncoder so that we have builtin support
for parsing Decimal and datetime.
During this intermediate state, the migration that creates
extra_data_json field has been run. We prepare for running the backfilling
migration that populates extra_data_json from extra_data.
This change implements double-write, which is important to keep the
state of extra data consistent. For most extra_data usage, this is
handled by the overriden `save` method on `AbstractRealmAuditLog`, where
we either generates extra_data_json using orjson.loads or
ast.literal_eval.
While backfilling ensures that old realm audit log entries have
extra_data_json populated, double-write ensures that any new entries
generated will also have extra_data_json set. So that we can then safely
rename extra_data_json to extra_data while ensuring the non-nullable
invariant.
For completeness, we additionally set RealmAuditLog.NEW_VALUE for
the USER_FULL_NAME_CHANGED event. This cannot be handled with the
overridden `save`.
This addresses: https://github.com/zulip/zulip/pull/23116#discussion_r1040277795
Note that extra_data_json at this point is not used yet. So the test
cases do not need to switch to testing extra_data_json. This is later
done after we rename extra_data_json to extra_data.
Double-write for the remote server audit logs is special, because we only
get the dumped bytes from an external source. Luckily, none of the
payload carries extra_data that is not generated using orjson.dumps for
audit logs of event types in SYNC_BILLING_EVENTS. This can be verified
by looking at:
`git grep -A 6 -E "event_type=.*(USER_CREATED|USER_ACTIVATED|USER_DEACTIVATED|USER_REACTIVATED|USER_ROLE_CHANGED|REALM_DEACTIVATED|REALM_REACTIVATED)"`
Therefore, we just need to populate extra_data_json doing an
orjson.loads call after a None-check.
Co-authored-by: Zixuan James Li <p359101898@gmail.com>
This adds support to accepting extra_data being dict from remote
servers' RealmAuditLog entries. So that it is forward-compatible with
servers that have migrated to use JSONField for RealmAuditLog just in
case. This prepares us for migrating zilencer's audit log models to use
JSONField for extra_data.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Part of splitting creating and editing scheduled messages.
Should be merged with final commit in series. Breaks tests.
Removes `scheduled_message_id` parameter from the create scheduled
message path.
Prep commit for splitting create/edit endpoint for scheduled
messages.
Because of `test-api` runs the tests in alphabetical order based on
the `operationId`, we need two scheduled messages in the test database.
The first for the curl example delete (delete-scheduled-message) and
the second for the curl example update (update-scheduled-message).
Prep commit for adding the scheduled-message endpoints to the API
documentation.
Adds a scheduled message for Iago in the test database so that it
can be deleted in the delete cURL example in the api-test suite.
This implements the core of the rewrite described in:
For the backend data model for UserPresence to one that supports much
more efficient queries and is more correct around handling of multiple
clients. The main loss of functionality is that we no longer track
which Client sent presence data (so we will no longer be able to say
using UserPresence "the user was last online on their desktop 15
minutes ago, but was online with their phone 3 minutes ago"). If we
consider that information important for the occasional investigation
query, we have can construct that answer data via UserActivity
already. It's not worth making Presence much more expensive/complex
to support it.
For slim_presence clients, this sends the same data format we sent
before, albeit with less complexity involved in constructing it. Note
that we at present will always send both last_active_time and
last_connected_time; we may revisit that in the future.
This commit doesn't include the finalizing migration, which drops the
UserPresenceOld table.
The way to deploy is to start the backfill migration with the server
down and then start the server *without* the user_presence queue worker,
to let the migration finish without having new data interfering with it.
Once the migration is done, the queue worker can be started, leading to
the presence data catching up to the current state as the queue worker
goes over the queued up events and updating the UserPresence table.
Co-authored-by: Mateusz Mandera <mateusz.mandera@zulip.com>
Servers that had upgraded from a Zulip server version that did not yet
support the user_uuid field to one that did could end up with some
mobile devices having two push notifications registrations, one with a
user_id and the other with a user_uuid.
Fix this issue by sending both user_id and user_uuid, and clearing
This commit updates the pattern for dealing with tuples
returned by the delete() query.
The '(num_deleted, ignored) = ModelName.objects.filter().delete()'
pattern is preferred due to better readability.
We avoid the pattern '(num_deleted, _)' because Django uses _
for translation, which may lead to future bugs.
To avoid people calling "create_user_group" instead of
"check_add_user_group", we rename it to make its purpose clearer.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Since this function creates a new user group into the database,
it is more appropriate to have it not as a generic "lib" function
but as an "action".
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
The Django convention is for __repr__ to include the type and __str__
to omit it. In fact its default __repr__ implementation for models
automatically adds a type prefix to __str__, which has resulted in the
type being duplicated:
>>> UserProfile.objects.first()
<UserProfile: <UserProfile: emailgateway@zulip.com <Realm: zulipinternal 1>>>
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The post-delete signal on AlertWord clears the realm cache; when it is
called repeatedly, this results in re-fetching the realm object O(n)
times, where n scales by number of users in the database.
Disconnect this cache-clearing signal before removing the AlertWord
entries, and reconnect it afterwards. This is not thread-safe, but
this section is single-threaded. It is also probably unnecessary to
re-connect the signal, as rest of `./manage.py populate_db` does not
delete AlertWord objects, but cleanliness dictates doing the
re-connection.
This drops the time to repeatedly run:
python3 ./manage.py populate_db --num-messages=0 --extra-users=1000
...from 47 seconds to 36 seconds.
This commits update the code to use user-level email_address_visibility
setting instead of realm-level to set or update the value of UserProfile.email
field and to send the emails to clients.
Major changes are -
- UserProfile.email field is set while creating the user according to
RealmUserDefault.email_address_visbility.
- UserProfile.email field is updated according to change in the setting.
- 'email_address_visibility' is added to person objects in user add event
and in avatar change event.
- client_gravatar can be different for different users when computing
avatar_url for messages and user objects since email available to clients
is dependent on user-level setting.
- For bots, email_address_visibility is set to EVERYONE while creating
them irrespective of realm-default value.
- Test changes are basically setting user-level setting instead of realm
setting and modifying the checks accordingly.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
These were useful as a transitional workaround to ignore type errors
that only show up with django-stubs, while avoiding errors about
unused type: ignore comments without django-stubs. Now that the
django-stubs transition is complete, switch to type: ignore comments
so that mypy will tell us if they become unnecessary. Many already
have.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This commit adds the OPTIONAL .realm attribute to Message
(and ArchivedMessage), with the server changes for making new Messages
have this set. Old Messages still have to be migrated to backfill this,
before it can be non-nullable.
Appropriate test changes to correctly set .realm for Messages the tests
manually create are included here as well.
Previously, we type the model fields with explicit type annotations
manually with the approximate types. This was because the lack of types
for Django.
django-stubs provides more specific types for all these fields that
incompatible with our previous approximate annotations. So now we can
remove the inline type annotations and rely on the types defined in the
stubs. This allows mypy to infer the types of the model fields for us.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Small follow-up to d86e4ac34d.
get_ makes it sound like it doesn't have side-effects, when these are
actually much like the django ORM .get_or_create function.
These are used for creating huddles and private messages (and some
UserPresence objects). It'd be really weird, and potentially create some
Messages that break our assumptions, for this to end up involving users
in multiple realms.
I believe currently this hasn't been happening, because when
this line runs, there are only users in "zulip" realm and system bots in
"zulipinternal" - but the query has been excluding bots already.
Still, this query should be explicit about grabbing users from a single
realm. This will also be helpful for the work adding the denormalized
Message.realm field - so that the realm of Message objects that get
manually created in generate_and_send_messages can be simply set to
"zulip" with confidence that it's correct.
In the presence of **kwargs, this is required by the Concatenate type
expected by default_never_cache_responses.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
In 5c49e4ba06, we neglected to include
the CSRF and caching decorators required for all API views in the new
remote_server_dispatch function.
I'm not sure why our automated tests didn't catch this, but this made
the remote server API endpoints nonfunctional in a production
environment.
This change incorporate should_rate_limit into rate_limit_user and
rate_limit_request_by_ip. Note a slight behavior change to other callers
to rate_limit_request_by_ip is made as we now check if the client is
eligible to be exempted from rate limiting now, which was previously
only done as a part of zerver.lib.rate_limiter.rate_limit.
Now we mock zerver.lib.rate_limiter.RateLimitedUser instead of
zerver.decorator.rate_limit_user in
zerver.tests.test_decorators.RateLimitTestCase, because rate_limit_user
will always be called but rate limit only happens the should_rate_limit
check passes;
we can continue to mock zerver.lib.rate_limiter.rate_limit_ip, because the
decorated view functions call rate_limit_request_by_ip that calls
rate_limit_ip when the should_rate_limit check passes.
We need to mock zerver.decorator.rate_limit_user for SkipRateLimitingTest
now because rate_limit has been removed. We don't need to mock
RateLimitedUser in this case because we are only verifying that
the skip_rate_limiting flag works.
To ensure coverage in add_logging_data, a new test case is added to use
a web_public_view (which decorates the view function with
add_logging_data) with a new flag to check_rate_limit_public_or_user_views.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This allows us to avoid importing from zilencer conditionally in
zerver.lib.rate_limiter, as we make rate limiting self-contained now.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This refactors the test case alongside, since normal views accessed by
remote server do not get rate limited by remote server anymore.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
`remote_server_path` allows us to get rid of all the `validate_entity`
calls in `zilencer.views` and remove all the `Union` type annotations
in the signatures of the authenticated view functions.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This allows us to separate the zilencer paths from other JSON paths,
with explicit type annotation expecting `RemoteZulipServer` as the
second parameter of the handler using
authenticated_remote_server_view.
The test case is also updated to remove a test for a situation that no
longer occurs anymore, since we don't perform subdomain checks on
remote servers.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Adds an API endpoint for accessing read receipts for other users, as
well as a modal UI for displaying that information.
Enables the previously merged privacy settings UI for managing whether
a user makes read receipts data available to other users.
Documentation is pending, and we'll likely want to link to the
documentation with help_settings_link once it is complete.
Fixes#3618.
Co-authored-by: Tim Abbott <tabbott@zulip.com>
This commit modifies bulk_create_users to add the users to the
respective system groups. And due to this change, now bots in
development environment are also added to system groups.
Tests are changed accordingly as more UserGroupMembeship objects
are created.
`_cache` is not an attribute defined on `BaseCache`, but an
implementation detail of django_bmemcache.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
The field_data sent from client while creating a select
type field is a dict with a number as key.
In development database the field data for "Favorite editor"
field was of different form where the option label was used
as key in the dict.
This commit fixes it to be of the same as it is when creating
a field from web-app. As a result, we also need to update
the tests and this commit also update field_data for other
select-type fields.
We increase the total number of messages, since increasing the number
of topics would otherwise have the side effect of making it hard to
find longer conversations.
We do not allow keeping vacant private streams as we deactivate them
when all users are unsubscribed, so it is better to add at least one
user in the initial private stream created while creating realm.
This commit adds code to copy the realm-level default of
settings while creating users through bulk_create_users.
We do not directly call 'copy_default_settings' as it
calls ".save()" but here we want to bulk_create the objects
for efficiency.
We also add the code to set realm-default of enter_sends as
True for the Zulip dev server as done in 754b547e8 and thus
we remove enter_sends argument from create_user_profile as
it is of no use now.
Bots don't generally do API requests to mark messages as read. If they
did, it's likely because the developer of the bot wants them to appear
in read receipts or similar (E.g. as an indication of what messages
have been processed).
So we should avoid setting the read flag on bot messages the test
database.
76deb30312 changed this to not just be the URL, but rather a
prefixed hash of the URL, but failed to update this location which
wrote to it. This meant that this pre-population step was writing to
the wrong keys in the durable cache, and thus ineffective.
Then, da33b72848 switched the cache to be in-memory, making this
write to the wrong keys in an in-process memory store. There is no
way to pre-fill this sort of cache, except at server start-up.
Finally, and most fundamentally, 8c0c9ca7a4 then disabled
`inline_url_embed_preview` by default, making the code entirely moot.
Remove the triply-unnecessary code.
We now call this function inside do_create_user(...,
realm_creation=True), which generally improves readability and
robustness of the codebase.
This fixes a bug where this onboarding content was not correctly done
when creating a realm via LDAP, and also will be important as we add
new code paths that might let you create a realm.
This commit adds users to the appropriate system user group
based on their role. We also change the user groups when
changing role of the user.
We also add migration to add existing users to the appropriate
user groups.
This commit adds update_users_in_full_members_system_group which
is currently used to update the full members group on changing
role of a user. This function will be modified in next commit such
that it can be used to update full members group on changing
waiting_period_threshold setting of realm.
This is the first step to making the full switch to self-hosted servers
use user uuids, per issue #18017. The old id format is still supported
of course, for backward compatibility.
This commit is separate in order to allow deploying *just* the bouncer
API change to production first.
This reverts bc15085098 (which provided
not justification for its change) and moves further, down to 2 retries
from the default of 5.
10 retries, with exponential backoff, is equivalent to sleeping 2^11
seconds, or just about 34 minutes (though the code uses a jitter which
may make this up to 51 minutes). This is an unreasonable amount of
time to spend in this codepath -- as only one worker is used, and it
is single-threaded, this could effectively block all missed message
notifications for half an hour or longer.
This is also necessary because messages sent through the push bouncer
are sent synchronously; the sending server uses a 30-second timeout,
set in PushBouncerSession. Having retries which linger longer than
this can cause duplicate messages; the sending server will time out
and re-queue the message in RabbitMQ, while the push bouncer's request
will continue, and may succeed.
Limit to 2 retries (APNS currently uses 3), and results expected max
of 4 seconds of sleep, potentially up to 6. If this fails, there
exists another retry loop above it, at the RabbitMQ layer (either
locally, or via the remote server's queue), which will result in up to
3 additional retries -- all told, the request will me made to FCM up
to 12 times.
After failing to notice a place where we wanted to hide timezone
information, we decided to add timezones to some of the test
users, so that we can better consider the effects of timezones
when manually testing.
Testing:
* ran populate_db and confirmed users had timezones in the UI
* updated test_populate_db.py
Since we do not allow to remove owners from bots, it is better
to keep owners for the bots in development environment as well.
We need to change puppeteer tests here because now desdemona
already has bots in dev server and thus "Active bots" section
is opened by default in the settings instead of "Add a new bot"
section.
Adds request as a parameter to json_success as a refactor towards
making `ignored_parameters_unsupported` functionality available
for all API endpoints.
Also, removes any data parameters that are an empty dict or
a dict with the generic success response values.
As @timabbott mentioned on #20577, this command was mostly useful
during early development of the feature, and is no longer needed now
that we have an API for accomplishing the same thing.
Fixes “RemovedInDjango40Warning: Passing None for the middleware
get_response argument is deprecated.” from LogRequests().
Signed-off-by: Anders Kaseorg <anders@zulip.com>
get_remote_server_by_uuid (called in validate_api_key) raises
ValidationError when given an invalid UUID due to how Django handles
UUIDField. We don't want that exception and prefer the ordinary
DoesNotExist exception to be raised.
APNs payloads nest the zulip-custom data further than the top level,
as Android notifications do. This led to APNs data silently never
being truncated; this case was not caught in tests because the mocks
provided the wrong data for the APNs structure.
Adjust to look in the appropriate place within the APNs data, and
truncate that.
As explained in the comments in the code, just doing UUID(string) and
catching ValueError is not enough, because the uuid library sometimes
tries to modify the string to convert it into a valid UUID:
>>> a = '18cedb98-5222-5f34-50a9-fc418e1ba972'
>>> uuid.UUID(a, version=4)
UUID('18cedb98-5222-4f34-90a9-fc418e1ba972')
Given that these values are uuids, it's better to use UUIDField which is
meant for exactly that, rather than an arbitrary CharField.
This requires modifying some tests to use valid uuids.