The comments explain this pretty well, but basically because we
rewrite the realm ID during the import process, we need to edit all
the message bodies that link to an attachment to instead link to the
post-processed URL where that file will be hosted on the new server.
Fixes#8926.
'processing_emojis' check is added in the 'import_uploads'
function, so that the emoji files present in the to be imported
data file can be uploaded.
The procedure of saving emoji files in slack importer is same as
saving attachments and avatars, and the import has the similar
procedure too.
In importing avatars, we use the implementation where the 'avatar_path'
is seperately calculated using realm and user ID and then the content
of the path provided in the avatar's 'records.json' are copied to this
'avatar_path'.
Similary, here for the uploads, 's3_file_name' is seperately calculated
using the realm ID and uploaded file name and then the content of the
path provided in upload's 'records.json' are copied to this 's3_file_name'.
'recipient_field' is added as a bool variable in the function
'update_id_map' to update the recipient foreign keys.
Recipient Foreign Key is equal to the UserProfile ID, if the
type is 1, and the same is equal to Stream ID, if the type is 2.
Hence a check is added in the 'update_id_map' field for this.
All the objects with realm ID as the foreign keys need to
be remapped with updated with the allocated ID.
Also the ID of the realm object itself is updated with the allocated
ID.
The 'id_field' bool variable is added to the function just to check
if the field is the ID of that object, and not the foreign key relation.
For foreign key field names, a "_id" has to be added after the field name,
however we don't need that for the ID field of the object.
During a slack import, we don't have medium-size avatars already
available in the export data set (and possibly also with a normal
import/export?). The medium size avatar can be created by the
'ensure_medium_avatar_image' function, which checks if the medium
image exists, and if it doesn't, it creates the image.
This commit was substantially edited by tabbott to get rid of an
undefined variable bug, avoid initializing the upload backend classes
in a loop, and add some TODO notes on things that could be improved
later.
This fixes some subtle JavaScript exceptions we've been getting in
zulipchat.com, caused by the system bot realm there not being "zulip"
interacting with get_cross_realm_users.
ScheduledJob was written for much more generality than it ended up being
used for. Currently it is used by send_future_email, and nothing
else. Tailoring the model to emails in particular will make it easier to do
things like selectively clear emails when people unsubscribe from particular
email types, or seamlessly handle using the same email on multiple realms.
This system hasn't been in active use for several years, and had some
problems with it's design. So it makes sense to just remove it to declutter
the codebase.
Fixes#5655.
boto's stubs have been updated in mypy 0.4.7, which has given us
more information about what type of strings are expected as
parameters in various functions.
This is some of the code we'd need if we wanted to have Zulip generate
avatars for things. Since it is so little useful code, and it's not
clear we will need this feature ever, we can remove this code to make
the codebase less confusing. It'd be easy to dig this out of history
if we ever want it.
Fixes#2101.
This fixes a nasty bug where exporting messages sent by a single user
might only contain some of the messages in the event that the
unspecified sort order by the database didn't happen to be sorted by
message ID.
This commit only addresses tables that currently derive from
user_profile_config in get_realm_config:
zerver_userpresence
zerver_useractivity
zerver_useractivityinterval
zerver_subscription
zerver_recipient
zerver_stream
zerver_huddle
It also introduces an entry in realm.json for a virtual
table called "zerver_userprofile_mirrordummy" for dummy users,
which include prior dummy users and users excluded from the call
to do_export_realm().
Note that this feature is not yet exposed in the management command.
Now attachment data gets written to its own json file. We are
splitting this out so that will be easier for us to cross-check
attachments against messages without holding up writing a lot
of the other realm data. (message cross-checking is coming soon)
This commit doesn't change any behavior; it just moves fetching
attachments out of the Config scheme and into its own method.
This prepares us to start writing attachment data to its own
file and cross-checking against message ids (coming soon).
We now just have a single configuration get_realm_config() that
handles most of the top-down realm export tables. (It basically
does everything not related to messages or uploads/avatars.)
Unifying the configs allows us to be more strict in our
configuration about checking for anomalies. In the future
we may need to loosen up some of those restrictions again,
but for now we are picky and paranoid.
Fetch stream data only for stream recipients, instead of
getting streams via realm_id.
(This change is kind of moot for now, since our stream recipients
include all possible stream recipients in the realm, but this
sets us up for when we start restricting users that we export
within the realm.)
This commit introduces the ability to do custom fetches
and to essentially use temp tables for intermediate results.
(The temp table stuff deals with recipients/subscriptions
having three different flavors--user, stream, and huddle.)
The function to create the message partial files has been
renamed to export_partial_message_files(). It now gets its own
list of user profile ids and recipient ids from the response,
so that we can de-clutter do_export_realm().
The name avatar_bucket was confusing for a boolean, and
in some places it was used for non-S3 paths.
I considered the more concise 'is_avatar', but that
was still confusing when you are processing multiple
files, because you think it's a calculated property
on one file instead of an overall codepath switch.
I also considered splitting up some functions, but
there is a lot of common logic between handling
file uploads and avatars that's not trivial to extract
into helpers, especially on the S3 side.
I did some minor moving around of code that made us have
one fewer function without any additional conditional
logic. The names are more explicit about saying
"from_local" and "from_s3". Also, there is less clutter
now in do_export_realm(), which is evolving into more of
a dispatcher and less of a worker.
This is pretty minor cleanup, but it makes it a little more
explicit what we're writing to the shard file, and it allows
us to use a more specific mypy type when calling
floatify_datetime_fields.
We no longer have an in-process code path to export
UserMessage rows. We want to only maintain the
subprocess code, which we'll always use in production,
and which will work fine in dev.
I also fixed some small things like removing unnecessary return
statements, and adding a TODO.
In some cases I explicitly cast stuff at run-time to set() or
str() to appease mypy, as well as make it clear to somebody
reading the code that the callee might not respect ordering
or tolerate unicode.
The previous export tool would only work properly for small realms,
and was missing a number of important features:
* Export of avatars and uploads from S3
* Export of presence data, activity data, etc.
* Faithful export/import of timestamps
* Parallel export of messages
* Not OOM killing for large realms
The new tool runs as a pair of documented management commands, and
solves all of those problems.
Also we add a new management command for exporting the data of an
individual user.