Sometimes we may get data to import, due to export bugs, malformed data
etc., which doesn't have the invariant of RealmEmoji.author always being
set. The import code should fix that, by choosing a reasonable default
and setting it.
This allows verify_uploads to use the database
as the authoritative source for what attachments
we need to look for when we're verifying the
images got exported properly, while still
also verifying attachment.json is correct.
It is better for the verifying code to just explicitly
ensure that the exported file bytes match the bytes
in the test image. This introduces a tiny bit more
of I/O.
It's easier to read the code without the intermediate
full_data dictionary that obscures where the files live.
We also avoid some unnecessary file i/o in the tests.
We do a sanity check for every table
that gets written to user.json as part of
the single-user export.
If we add more tables to the single-user export,
the test that I modified here will now ask
the author to add a new checker function, which
means we should always have at least a basic
sanity check for every exported table as long
as we stay in this new paradigm.
We also remove a little bit of old code that
became redundant.
We now ensure that all message ids are sorted BEFORE
we split them into batches.
We now do a few extra "slim" queries to get message
ids up front.
But, now, when we divide them into batches, we no
longer run 2 or 3 different complicated queries in
a loop. We just basically hydrate our message ids,
so `write_message_partials` should be easy to reason
about.
This change also means that for tiny realms with
< 1000 messages you will always have just one
json file, since we aggregate the ids from the
queries before batching.
The original intention of this was to prevent coding
errors with realm getters that don't, um, filter
on realm.
Unfortunately, you can still write a broken realm getter
that forgets to filter on realm, but which returns a
Set, and the new safeguards won't see any difference.
We could make all the getters return sorted lists
instead, but that's for another day.
This code does serve another purpose, which is to
prevet egregious bugs in the import itself.
The diff here is ugly, but to summarize:
BEFORE IMPORT:
define get_user_id
define get_huddle_hashes
AFTER IMPORT AND MAKING GETTERS:
check realm id
define assert_realm_values
verify emoji codes
check huddle hashes