Commit Graph

87 Commits

Author SHA1 Message Date
Steve Howell 7a429d1e30 export: Add sanity_check_stream_data(). 2016-08-12 15:27:22 -07:00
Steve Howell ec86e475b4 export: Add Config.post_process_data 2016-08-12 15:12:01 -07:00
Steve Howell 0c2c331905 export: Flip how we fetch stream subscription data.
We now get stream subscriptions BEFORE stream recipients.
2016-08-12 15:12:01 -07:00
Steve Howell 70a916aae3 export: Flip how we fetch user subscription data.
We now get user subscriptions BEFORE user recipients.
2016-08-12 15:12:01 -07:00
Steve Howell 2a2ce6ada1 export: Remove hard-to-maintain code comment.
Subsequent changes are gonna make the top-down/bottom-up comment
no longer valid.
2016-08-12 15:12:01 -07:00
Steve Howell 6fdd42c08b export: Create convenient soft links. 2016-08-12 10:48:33 -07:00
Steve Howell 70b68ddcc3 export: Use a config for export_single_user(). 2016-08-12 10:37:41 -07:00
Steve Howell c69a5bdec3 export: Handle more tables via export_from_config().
This commit introduces the ability to do custom fetches
and to essentially use temp tables for intermediate results.

(The temp table stuff deals with recipients/subscriptions
having three different flavors--user, stream, and huddle.)
2016-08-12 10:37:35 -07:00
Steve Howell f471a1779e export: Handle simple exports with export_from_config().
This handles the simple tables that don't need custom fetches.
2016-08-12 09:54:57 -07:00
Steve Howell 682155778d export: Add export_with_config().
Subsequent commits will start to use this.
2016-08-12 09:54:57 -07:00
Steve Howell b0e6d20321 export: Write stats.txt for `./manage.py export <realm>`. 2016-08-12 09:06:10 -07:00
Steve Howell df3aa39be3 export: Extract write_data_to_file(). 2016-08-11 15:51:22 -07:00
Steve Howell f29b32bbb2 export: Clarify message exporting code.
The function to create the message partial files has been
renamed to export_partial_message_files().  It now gets its own
list of user profile ids and recipient ids from the response,
so that we can de-clutter do_export_realm().
2016-08-11 15:51:22 -07:00
Steve Howell 5cd915694a export: Extract launch_user_message_subprocesses().
This is the last in a series of commits that makes it
so that do_export_realm() mostly delegates work out
to other functions.
2016-08-11 15:21:30 -07:00
Steve Howell b383f5ca5d export: Extract fetch_user_profile_cross_realm(). 2016-08-11 15:21:30 -07:00
Steve Howell fee2106c6f export: Extract fetch_huddle_objects().
This also removes the dead codepath for include_private=False.
2016-08-11 15:21:30 -07:00
Steve Howell a6235f6a60 export: Add comments to export_single_user().
(This is a bit of a prefactoring to hopefully create a nice
diff in a subsequent commit.)
2016-08-11 15:21:30 -07:00
Steve Howell 6e7fe76cf4 export: s/avatar_bucket/processing_avatars
The name avatar_bucket was confusing for a boolean, and
in some places it was used for non-S3 paths.

I considered the more concise 'is_avatar', but that
was still confusing when you are processing multiple
files, because you think it's a calculated property
on one file instead of an overall codepath switch.

I also considered splitting up some functions, but
there is a lot of common logic between handling
file uploads and avatars that's not trivial to extract
into helpers, especially on the S3 side.
2016-08-11 15:21:30 -07:00
Steve Howell 3dab366733 export: Clean up names of upload/avatar export functions.
I did some minor moving around of code that made us have
one fewer function without any additional conditional
logic. The names are more explicit about saying
"from_local" and "from_s3".  Also, there is less clutter
now in do_export_realm(), which is evolving into more of
a dispatcher and less of a worker.
2016-08-11 15:21:30 -07:00
Steve Howell d62a351107 export: Add sanity_check_output(). 2016-08-11 15:21:30 -07:00
Steve Howell 06b0df5efc export: Remove spurious select_related() call for Client. 2016-08-10 14:16:17 -07:00
Steve Howell cb59a11f0a export: Extract get_primary_ids(). 2016-08-10 14:16:17 -07:00
Steve Howell 90e9083b81 export: Extract filter_by_realm(). 2016-08-10 14:16:17 -07:00
Steve Howell 4b6b1b8ad4 export: Extract filter_by_users(). 2016-08-10 14:16:17 -07:00
Steve Howell db9edfce34 export: Use DATE_FIELDS in fix_datetime_fields().
Now we only call this once per table and use DATE_FIELDS to
look up the data fields.
2016-08-10 14:16:17 -07:00
Steve Howell 35c59fc4d7 export: Clean up export_messages().
This is pretty minor cleanup, but it makes it a little more
explicit what we're writing to the shard file, and it allows
us to use a more specific mypy type when calling
floatify_datetime_fields.
2016-08-10 14:16:17 -07:00
Steve Howell 1d1f36c0b8 export: Always use subprocesses to export UserMessage.
We no longer have an in-process code path to export
UserMessage rows.  We want to only maintain the
subprocess code, which we'll always use in production,
and which will work fine in dev.
2016-08-10 14:16:17 -07:00
Steve Howell 78bbefbf94 export: Create import_attachments. 2016-08-10 14:16:17 -07:00
Steve Howell 7ec6a394fe export: Filter Attachment objects by realm. 2016-08-09 16:47:14 -07:00
Steve Howell cecfaa7761 export: Extract import_message_data(). 2016-08-09 16:47:14 -07:00
Steve Howell 5386ed280e export: Extract update_id_map().
We also use a vanilla dictionary instead of a defaultdict, so
that we explicitly initialize what tables are being re-mapped.
2016-08-09 16:47:14 -07:00
Steve Howell 217ef8a4d2 export: Split fix_foreign_keys() into two functions.
We now have convert_to_id_fields for the simple case, and
re_map_foreign_keys for the more complex case. I also
renamed some parameters and variables.
2016-08-09 16:47:14 -07:00
Steve Howell dd88ffccfd export: Extract make_raw() in lib/export.py. 2016-08-09 15:58:27 -07:00
Steve Howell 09fa343bdd export: Use DATE_FIELDS in floatify_datetime_fields.
This avoids a little bit of code duplication, plus it should
make it a little easier to add new date fields in the future.
2016-08-09 15:58:27 -07:00
Steve Howell c14ab3c91f export: Add annotations to zerver/lib/export.py.
I also fixed some small things like removing unnecessary return
statements, and adding a TODO.

In some cases I explicitly cast stuff at run-time to set() or
str() to appease mypy, as well as make it clear to somebody
reading the code that the callee might not respect ordering
or tolerate unicode.
2016-08-09 15:58:27 -07:00
Steve Howell f18cc4ae3a export: Added export_avatars_local_helper(). 2016-08-09 15:58:27 -07:00
Tim Abbott 6264ff7039 Add new Zulip realm import/export tool.
The previous export tool would only work properly for small realms,
and was missing a number of important features:
* Export of avatars and uploads from S3
* Export of presence data, activity data, etc.
* Faithful export/import of timestamps
* Parallel export of messages
* Not OOM killing for large realms

The new tool runs as a pair of documented management commands, and
solves all of those problems.

Also we add a new management command for exporting the data of an
individual user.
2016-08-08 14:58:18 -07:00