There is no reason to have an index on just `realm_id` or `remote_id`,
as those values mean nothing outside of the scope of a specific
`server_id`. Remove those never-used single-column indexes from the
two tables that have them.
By contrast, the pair of `server_id` and `remote_id` is quite useful
and specific -- it is a unique pair, and every POST of statistics from
a remote host requires looking up the highest `remote_id` for a given
`server_id`, which (without this index) is otherwise a quite large
scan.
Add a unique constraint, which (in PostgreSQL) is implemented as a
unique index.
_default_manager is the same as objects on most of our models. But
when a model class is stored in a variable, the type system doesn’t
know which model the variable is referring to, so it can’t know that
objects even exists (Django doesn’t add it if the user added a custom
manager of a different name). django-stubs used to incorrectly assume
it exists unconditionally, but it no longer does.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This migration applies under the assumption that extra_data_json has
been populated for all existing and coming audit log entries.
- This removes the manual conversions back and forth for extra_data
throughout the codebase including the orjson.loads(), orjson.dumps(),
and str() calls.
- The custom handler used for converting Decimal is removed since
DjangoJSONEncoder handles that for extra_data.
- We remove None-checks for extra_data because it is now no longer
nullable.
- Meanwhile, we want the bouncer to support processing RealmAuditLog entries for
remote servers before and after the JSONField migration on extra_data.
- Since now extra_data should always be a dict for the newer remote
server, which is now migrated, the test cases are updated to create
RealmAuditLog objects by passing a dict for extra_data before
sending over the analytics data. Note that while JSONField allows for
non-dict values, a proper remote server always passes a dict for
extra_data.
- We still test out the legacy extra_data format because not all
remote servers have migrated to use JSONField extra_data.
This verifies that support for extra_data being a string or None has not
been dropped.
Co-authored-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
The curl examples of reordering linkifiers require there to be some
linkifiers in the database to be reordered. This adjusts some test cases
so they do not assume that there is no linkifier in the test db.
This commit updates the select_related calls in queries to
get UserProfile objects in populate_db code to pass "realm"
as argument to select_related call.
Also, note that "realm" is the only non-null foreign key field
in UserProfile object, so select_related() was only fetching
realm object previously as well. But we should still pass "realm"
as argument in select_related call so that we can make sure that
only required fields are selected in case we add more foreign
keys to UserProfile in future.
Translators benefit from the extra information in the field names, and
need the reordering freedom that isn’t available with multiple
positional fields.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The previous function was poorly named, asked for a
Realm object when realm_id sufficed, and returned a
tuple of strings that had different semantics.
I also avoid calling it duplicate times in a couple
places, although it was probably rarely the case that
both invocations actually happened if upstream
validations were working.
Note that there is a TypedDict called EmojiInfo, so I
chose EmojiData here. Perhaps a better name would be
TinyEmojiData or something.
I also simplify the reaction tests with a verify
helper.
This migration is reasonably complex because of various anomalies in existing
data.
Note that there are cases when extra_data does not contain data that is
proper json with possibly single quotes. Thus we need to use
"ast.literal_eval" to cover that.
There is also a special case for "event_type == USER_FULL_NAME_CHANGED",
where extra_data is a plain str. This event_type is only used for
RealmAuditLog, so the zilencer migration script does not need to handle
it.
The migration does not handle "event_type == REALM_DISCOUNT_CHANGED"
because ast.literal_eval only allow Python literals. We expect the admin
to populate the jsonified extra_data for extra_data_json manually
beforehand.
This chunks the backfilling migration to reduce potential block time.
The migration for zilencer is mostly similar to the one for zerver; except that
the backfill helper is added in a wrapper and unrelated events are
removed.
**Logging and error recovery**
We print out a warning when the extra_data_json field of an entry
would have been overwritten by a value inconsistent with what we derived
from extra_data. Usually this only happens when the extra_data was
corrupted before this migration. This prevents data loss by backing up
possibly corrupted data in extra_data_json with the keys
"inconsistent_old_extra_data" and "inconsistent_old_extra_data_json".
More roundtrips to the database are needed for inconsistent data, which are
expected to be infrequent.
This also outputs messages when there are audit log entries with decimals,
indicating that such entries are not backfilled. Do note that audit log
entries with decimals are not populated with "inconsistent_old_extra_data_*"
in the JSONField, because they are not overwritten.
For such audit log entries with "extra_data_json" marked as inconsistent,
we skip them in the migration. Because when we have discovered anomalies in a
previous run, there is no need to overwrite them again nesting the extra keys
we added to it.
**Testing**
We create a migration test case utilizing the property of bulk_create
that it doesn't call our modified save method.
We extend ZulipTestCase to support verifying console output at the test
case level. The implementation is crude but the use case should be rare
enough that we don't need it to be too elaborate.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Note that we use the DjangoJSONEncoder so that we have builtin support
for parsing Decimal and datetime.
During this intermediate state, the migration that creates
extra_data_json field has been run. We prepare for running the backfilling
migration that populates extra_data_json from extra_data.
This change implements double-write, which is important to keep the
state of extra data consistent. For most extra_data usage, this is
handled by the overriden `save` method on `AbstractRealmAuditLog`, where
we either generates extra_data_json using orjson.loads or
ast.literal_eval.
While backfilling ensures that old realm audit log entries have
extra_data_json populated, double-write ensures that any new entries
generated will also have extra_data_json set. So that we can then safely
rename extra_data_json to extra_data while ensuring the non-nullable
invariant.
For completeness, we additionally set RealmAuditLog.NEW_VALUE for
the USER_FULL_NAME_CHANGED event. This cannot be handled with the
overridden `save`.
This addresses: https://github.com/zulip/zulip/pull/23116#discussion_r1040277795
Note that extra_data_json at this point is not used yet. So the test
cases do not need to switch to testing extra_data_json. This is later
done after we rename extra_data_json to extra_data.
Double-write for the remote server audit logs is special, because we only
get the dumped bytes from an external source. Luckily, none of the
payload carries extra_data that is not generated using orjson.dumps for
audit logs of event types in SYNC_BILLING_EVENTS. This can be verified
by looking at:
`git grep -A 6 -E "event_type=.*(USER_CREATED|USER_ACTIVATED|USER_DEACTIVATED|USER_REACTIVATED|USER_ROLE_CHANGED|REALM_DEACTIVATED|REALM_REACTIVATED)"`
Therefore, we just need to populate extra_data_json doing an
orjson.loads call after a None-check.
Co-authored-by: Zixuan James Li <p359101898@gmail.com>
This adds support to accepting extra_data being dict from remote
servers' RealmAuditLog entries. So that it is forward-compatible with
servers that have migrated to use JSONField for RealmAuditLog just in
case. This prepares us for migrating zilencer's audit log models to use
JSONField for extra_data.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Part of splitting creating and editing scheduled messages.
Should be merged with final commit in series. Breaks tests.
Removes `scheduled_message_id` parameter from the create scheduled
message path.
Prep commit for splitting create/edit endpoint for scheduled
messages.
Because of `test-api` runs the tests in alphabetical order based on
the `operationId`, we need two scheduled messages in the test database.
The first for the curl example delete (delete-scheduled-message) and
the second for the curl example update (update-scheduled-message).
Prep commit for adding the scheduled-message endpoints to the API
documentation.
Adds a scheduled message for Iago in the test database so that it
can be deleted in the delete cURL example in the api-test suite.
This implements the core of the rewrite described in:
For the backend data model for UserPresence to one that supports much
more efficient queries and is more correct around handling of multiple
clients. The main loss of functionality is that we no longer track
which Client sent presence data (so we will no longer be able to say
using UserPresence "the user was last online on their desktop 15
minutes ago, but was online with their phone 3 minutes ago"). If we
consider that information important for the occasional investigation
query, we have can construct that answer data via UserActivity
already. It's not worth making Presence much more expensive/complex
to support it.
For slim_presence clients, this sends the same data format we sent
before, albeit with less complexity involved in constructing it. Note
that we at present will always send both last_active_time and
last_connected_time; we may revisit that in the future.
This commit doesn't include the finalizing migration, which drops the
UserPresenceOld table.
The way to deploy is to start the backfill migration with the server
down and then start the server *without* the user_presence queue worker,
to let the migration finish without having new data interfering with it.
Once the migration is done, the queue worker can be started, leading to
the presence data catching up to the current state as the queue worker
goes over the queued up events and updating the UserPresence table.
Co-authored-by: Mateusz Mandera <mateusz.mandera@zulip.com>
Servers that had upgraded from a Zulip server version that did not yet
support the user_uuid field to one that did could end up with some
mobile devices having two push notifications registrations, one with a
user_id and the other with a user_uuid.
Fix this issue by sending both user_id and user_uuid, and clearing
This commit updates the pattern for dealing with tuples
returned by the delete() query.
The '(num_deleted, ignored) = ModelName.objects.filter().delete()'
pattern is preferred due to better readability.
We avoid the pattern '(num_deleted, _)' because Django uses _
for translation, which may lead to future bugs.
To avoid people calling "create_user_group" instead of
"check_add_user_group", we rename it to make its purpose clearer.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Since this function creates a new user group into the database,
it is more appropriate to have it not as a generic "lib" function
but as an "action".
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
The Django convention is for __repr__ to include the type and __str__
to omit it. In fact its default __repr__ implementation for models
automatically adds a type prefix to __str__, which has resulted in the
type being duplicated:
>>> UserProfile.objects.first()
<UserProfile: <UserProfile: emailgateway@zulip.com <Realm: zulipinternal 1>>>
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The post-delete signal on AlertWord clears the realm cache; when it is
called repeatedly, this results in re-fetching the realm object O(n)
times, where n scales by number of users in the database.
Disconnect this cache-clearing signal before removing the AlertWord
entries, and reconnect it afterwards. This is not thread-safe, but
this section is single-threaded. It is also probably unnecessary to
re-connect the signal, as rest of `./manage.py populate_db` does not
delete AlertWord objects, but cleanliness dictates doing the
re-connection.
This drops the time to repeatedly run:
python3 ./manage.py populate_db --num-messages=0 --extra-users=1000
...from 47 seconds to 36 seconds.
This commits update the code to use user-level email_address_visibility
setting instead of realm-level to set or update the value of UserProfile.email
field and to send the emails to clients.
Major changes are -
- UserProfile.email field is set while creating the user according to
RealmUserDefault.email_address_visbility.
- UserProfile.email field is updated according to change in the setting.
- 'email_address_visibility' is added to person objects in user add event
and in avatar change event.
- client_gravatar can be different for different users when computing
avatar_url for messages and user objects since email available to clients
is dependent on user-level setting.
- For bots, email_address_visibility is set to EVERYONE while creating
them irrespective of realm-default value.
- Test changes are basically setting user-level setting instead of realm
setting and modifying the checks accordingly.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>