This allows verify_uploads to use the database
as the authoritative source for what attachments
we need to look for when we're verifying the
images got exported properly, while still
also verifying attachment.json is correct.
It is better for the verifying code to just explicitly
ensure that the exported file bytes match the bytes
in the test image. This introduces a tiny bit more
of I/O.
It's easier to read the code without the intermediate
full_data dictionary that obscures where the files live.
We also avoid some unnecessary file i/o in the tests.
We do a sanity check for every table
that gets written to user.json as part of
the single-user export.
If we add more tables to the single-user export,
the test that I modified here will now ask
the author to add a new checker function, which
means we should always have at least a basic
sanity check for every exported table as long
as we stay in this new paradigm.
We also remove a little bit of old code that
became redundant.
We now ensure that all message ids are sorted BEFORE
we split them into batches.
We now do a few extra "slim" queries to get message
ids up front.
But, now, when we divide them into batches, we no
longer run 2 or 3 different complicated queries in
a loop. We just basically hydrate our message ids,
so `write_message_partials` should be easy to reason
about.
This change also means that for tiny realms with
< 1000 messages you will always have just one
json file, since we aggregate the ids from the
queries before batching.
The original intention of this was to prevent coding
errors with realm getters that don't, um, filter
on realm.
Unfortunately, you can still write a broken realm getter
that forgets to filter on realm, but which returns a
Set, and the new safeguards won't see any difference.
We could make all the getters return sorted lists
instead, but that's for another day.
This code does serve another purpose, which is to
prevet egregious bugs in the import itself.
The diff here is ugly, but to summarize:
BEFORE IMPORT:
define get_user_id
define get_huddle_hashes
AFTER IMPORT AND MAKING GETTERS:
check realm id
define assert_realm_values
verify emoji codes
check huddle hashes
It is confusing to have the plan type constants not be namespaced
by the thing they represent. We already have a namespacing
convention in place for constants, so we should use it for
Realm.plan_type as well.
Because we create all realms with do_create_user (including in the
test suite), we just need to change that function, add a migration for
existing realms, and ensure the data import code path correctly
creates these objects.
Note that the import code path will create a RealmUserDefault row with
default values if it is not present in the import data, which is
important for importing data from other tools like Slack.
This fixes a batch of mypy errors of the following format:
'Item "None" of "Optional[Something]" has no attribute "abc"
Since we have already been recklessly using these attritbutes
in the tests, adding assertions beforehand is justified presuming
that they oughtn't to be None.