We now ensure that all message ids are sorted BEFORE
we split them into batches.
We now do a few extra "slim" queries to get message
ids up front.
But, now, when we divide them into batches, we no
longer run 2 or 3 different complicated queries in
a loop. We just basically hydrate our message ids,
so `write_message_partials` should be easy to reason
about.
This change also means that for tiny realms with
< 1000 messages you will always have just one
json file, since we aggregate the ids from the
queries before batching.
This accomplishes a few things:
* It extracts `chunkify` rather than having us
clumsily track chunking-related stuff in a
big loop that is doing other stuff.
* It makes it so that all message ids
in message-000001.json < message-000002.json.
* It makes it easier for us to customize
the messages we send to a single user
(coming soon).
BTW we probably have a slicker version of chunkify
somewhere in our codebase, but I couldn't remember
where.
Following b3c58f454f, we want to clean up
old topics that may contain the disallowed characters. The Message table
is large, so we go in batches, making sure we limit topic fetches and
UPDATE query to no more than BATCH_SIZE Message rows per query.
Now all file writes go through our three
helper functions, and we consistently
write a single log message after the file
gets written.
I killed off write_message_exports, since
all but one of its callers can call
write_table_data, which automatically
sorts data. In particular, our Message
and UserMessage data will now be sorted
by ids.
This probably just postpones the list creation until
Django builds the "IN" query, but semantically it's
good to work in sets where we don't have any
meaningful ordering of the list that gets used.
The immediate benefit of this is stronger mypy
checks (avoiding the ugly union caused by message
files).
The subsequent commit will add sorting.
We have test coverage on all these lines insofar
as if you comment out the lines, tests will
explode (i.e. more than superficial line
coverage).
The distinction here wasn't super meaningful
due to the way we order our "elif" statements,
but we want to reserver "normal_parent" for the
majority of use cases, where you simply tell
the Config what the "foreign_key" is.
For realm-wide exports, there is no reason to query
inefficiently against a list of modified users.
We move the Config out of the common child configs.
Even though Django usually treats foo__in
and foo_id__in identically for filters where
foo is a ForeignKey type, we want to insist
on somewhat more consistent syntax, because
we have the odd combo of type and type_id
in Recipient, where type_id is kinda like a
foreign key, but not a ForeignKey.
So we assert for now that all our include_rows
values end in "_id__in".
Zulip shows two guides on How to reply, first one by
the welcome bot and second one is intro_reply hotspot.
To simply and avoid redundancy, intro_reply hotspot is
removed.
Fixes#20482.
Force postgres to give reactions in ID order - which
is generally chronological order. Results in frontend
displaying reactions in said order.
Fixes#20060.
In many of our stream notification messages, we make use of the
same silent user mention syntax, the template for which was always
hardcoded. This commit adds a helper function that all relevant
callers can call to get the right syntax when mentioning users.
Thanks to Tim Abbott for this suggestion!
Removed existing empty narrow divs from app/home.html and created
a new javascript module to dynamically load empty narrow messages
using handlebar template.
Fixes#18797
The original intention of this was to prevent coding
errors with realm getters that don't, um, filter
on realm.
Unfortunately, you can still write a broken realm getter
that forgets to filter on realm, but which returns a
Set, and the new safeguards won't see any difference.
We could make all the getters return sorted lists
instead, but that's for another day.
This code does serve another purpose, which is to
prevet egregious bugs in the import itself.
The diff here is ugly, but to summarize:
BEFORE IMPORT:
define get_user_id
define get_huddle_hashes
AFTER IMPORT AND MAKING GETTERS:
check realm id
define assert_realm_values
verify emoji codes
check huddle hashes
We don't have automated test coverage on this yet,
but below are the results from manual testing.
Note that we include the realm icon and logo even
though they were not created by Cordelia.
./manage.py export_single_user cordelia@zulip.com
$ (cd /tmp/zulip-export-4v3mo802/ && find .)
.
./emoji
./emoji/2
./emoji/2/emoji
./emoji/2/emoji/images
./emoji/2/emoji/images/3.jpg
./emoji/records.json
./messages-000001.json
./realm_icons
./realm_icons/2
./realm_icons/2/night_logo.original
./realm_icons/2/night_logo.png
./realm_icons/2/icon.png
./realm_icons/2/icon.original
./realm_icons/records.json
./avatars
./avatars/2
./avatars/2/c5125af0447f4d66ce34c1b32eac75ac27ebe0e7.original
./avatars/2/c5125af0447f4d66ce34c1b32eac75ac27ebe0e7.png
./avatars/records.json
./uploads
./uploads/2
./uploads/2/68
./uploads/2/68/xyEkC5dTIp8m42_6HJ3kBfdt
./uploads/2/68/xyEkC5dTIp8m42_6HJ3kBfdt/denver.jpg
./uploads/2/96
./uploads/2/96/ol5WE6RTUntvuPDSpJUrYTim
./uploads/2/96/ol5WE6RTUntvuPDSpJUrYTim/denver.jpg
./uploads/records.json
./user.json
There are tactical reasons to remove this assertion.
Basically, the reason it's safe to remove is that it's
been around a long time and we would have seen this
operationally. Also, the check to make sure that the
S3 filename thingy matches the avatar hash is a much
stronger check.
We will soon restore a stronger version of this check
that applies to all of our asset types (emojis/avatars/etc.).