zulip

Commit Graph

Author	SHA1	Message	Date
Steve Howell	a5f651b81a	export: Add some user_id filtering to do_export_realm(). This commit only addresses tables that currently derive from user_profile_config in get_realm_config: zerver_userpresence zerver_useractivity zerver_useractivityinterval zerver_subscription zerver_recipient zerver_stream zerver_huddle It also introduces an entry in realm.json for a virtual table called "zerver_userprofile_mirrordummy" for dummy users, which include prior dummy users and users excluded from the call to do_export_realm(). Note that this feature is not yet exposed in the management command.	2016-08-16 13:38:37 -07:00
Tim Abbott	4a46b879ee	import_uploads_s3: Fix setting of content-type.	2016-08-13 11:26:53 -07:00
Tim Abbott	8edc5110cd	export: Set re_map_foreign_keys verbose default to False. Otherwise, this is super spammy when doing a large import.	2016-08-13 11:26:16 -07:00
Tim Abbott	8b170665e4	export: Add assertion that ./manage.py exists in current directory. Otherwise we'll fail starting the UserMessage export, later on.	2016-08-13 10:13:11 -07:00
Tim Abbott	856ab48ec6	export: Fix stream export sanity check.	2016-08-13 10:02:34 -07:00
Steve Howell	e932b03999	export: Clean up path.join/makedirs for avatars/uploads.	2016-08-13 09:31:49 -07:00
Steve Howell	12c25e7d5c	export: Filter attachments by message id.	2016-08-13 09:31:49 -07:00
Steve Howell	0f493d5000	export: Return msg ids from export_partial_message_files().	2016-08-13 09:31:49 -07:00
Steve Howell	0c3e98fa91	export: Introduce attachment.json file. Now attachment data gets written to its own json file. We are splitting this out so that will be easier for us to cross-check attachments against messages without holding up writing a lot of the other realm data. (message cross-checking is coming soon)	2016-08-12 18:59:14 -07:00
Steve Howell	ea0a7d87c8	export: Refactor how we fetch attachment data. This commit doesn't change any behavior; it just moves fetching attachments out of the Config scheme and into its own method. This prepares us to start writing attachment data to its own file and cross-checking against message ids (coming soon).	2016-08-12 18:59:14 -07:00
Steve Howell	fba7a9ca21	export: Unify top-down export configuration. We now just have a single configuration get_realm_config() that handles most of the top-down realm export tables. (It basically does everything not related to messages or uploads/avatars.) Unifying the configs allows us to be more strict in our configuration about checking for anomalies. In the future we may need to loosen up some of those restrictions again, but for now we are picky and paranoid.	2016-08-12 15:27:23 -07:00
Steve Howell	5a5353b846	export: Fetch stream data only for stream recipients. Fetch stream data only for stream recipients, instead of getting streams via realm_id. (This change is kind of moot for now, since our stream recipients include all possible stream recipients in the realm, but this sets us up for when we start restricting users that we export within the realm.)	2016-08-12 15:27:22 -07:00
Steve Howell	7a429d1e30	export: Add sanity_check_stream_data().	2016-08-12 15:27:22 -07:00
Steve Howell	ec86e475b4	export: Add Config.post_process_data	2016-08-12 15:12:01 -07:00
Steve Howell	0c2c331905	export: Flip how we fetch stream subscription data. We now get stream subscriptions BEFORE stream recipients.	2016-08-12 15:12:01 -07:00
Steve Howell	70a916aae3	export: Flip how we fetch user subscription data. We now get user subscriptions BEFORE user recipients.	2016-08-12 15:12:01 -07:00
Steve Howell	2a2ce6ada1	export: Remove hard-to-maintain code comment. Subsequent changes are gonna make the top-down/bottom-up comment no longer valid.	2016-08-12 15:12:01 -07:00
Steve Howell	6fdd42c08b	export: Create convenient soft links.	2016-08-12 10:48:33 -07:00
Steve Howell	70b68ddcc3	export: Use a config for export_single_user().	2016-08-12 10:37:41 -07:00
Steve Howell	c69a5bdec3	export: Handle more tables via export_from_config(). This commit introduces the ability to do custom fetches and to essentially use temp tables for intermediate results. (The temp table stuff deals with recipients/subscriptions having three different flavors--user, stream, and huddle.)	2016-08-12 10:37:35 -07:00
Steve Howell	f471a1779e	export: Handle simple exports with export_from_config(). This handles the simple tables that don't need custom fetches.	2016-08-12 09:54:57 -07:00
Steve Howell	682155778d	export: Add export_with_config(). Subsequent commits will start to use this.	2016-08-12 09:54:57 -07:00
Steve Howell	b0e6d20321	export: Write stats.txt for `./manage.py export <realm>`.	2016-08-12 09:06:10 -07:00
Steve Howell	df3aa39be3	export: Extract write_data_to_file().	2016-08-11 15:51:22 -07:00
Steve Howell	f29b32bbb2	export: Clarify message exporting code. The function to create the message partial files has been renamed to export_partial_message_files(). It now gets its own list of user profile ids and recipient ids from the response, so that we can de-clutter do_export_realm().	2016-08-11 15:51:22 -07:00
Steve Howell	5cd915694a	export: Extract launch_user_message_subprocesses(). This is the last in a series of commits that makes it so that do_export_realm() mostly delegates work out to other functions.	2016-08-11 15:21:30 -07:00
Steve Howell	b383f5ca5d	export: Extract fetch_user_profile_cross_realm().	2016-08-11 15:21:30 -07:00
Steve Howell	fee2106c6f	export: Extract fetch_huddle_objects(). This also removes the dead codepath for include_private=False.	2016-08-11 15:21:30 -07:00
Steve Howell	a6235f6a60	export: Add comments to export_single_user(). (This is a bit of a prefactoring to hopefully create a nice diff in a subsequent commit.)	2016-08-11 15:21:30 -07:00
Steve Howell	6e7fe76cf4	export: s/avatar_bucket/processing_avatars The name avatar_bucket was confusing for a boolean, and in some places it was used for non-S3 paths. I considered the more concise 'is_avatar', but that was still confusing when you are processing multiple files, because you think it's a calculated property on one file instead of an overall codepath switch. I also considered splitting up some functions, but there is a lot of common logic between handling file uploads and avatars that's not trivial to extract into helpers, especially on the S3 side.	2016-08-11 15:21:30 -07:00
Steve Howell	3dab366733	export: Clean up names of upload/avatar export functions. I did some minor moving around of code that made us have one fewer function without any additional conditional logic. The names are more explicit about saying "from_local" and "from_s3". Also, there is less clutter now in do_export_realm(), which is evolving into more of a dispatcher and less of a worker.	2016-08-11 15:21:30 -07:00
Steve Howell	d62a351107	export: Add sanity_check_output().	2016-08-11 15:21:30 -07:00
Steve Howell	06b0df5efc	export: Remove spurious select_related() call for Client.	2016-08-10 14:16:17 -07:00
Steve Howell	cb59a11f0a	export: Extract get_primary_ids().	2016-08-10 14:16:17 -07:00
Steve Howell	90e9083b81	export: Extract filter_by_realm().	2016-08-10 14:16:17 -07:00
Steve Howell	4b6b1b8ad4	export: Extract filter_by_users().	2016-08-10 14:16:17 -07:00
Steve Howell	db9edfce34	export: Use DATE_FIELDS in fix_datetime_fields(). Now we only call this once per table and use DATE_FIELDS to look up the data fields.	2016-08-10 14:16:17 -07:00
Steve Howell	35c59fc4d7	export: Clean up export_messages(). This is pretty minor cleanup, but it makes it a little more explicit what we're writing to the shard file, and it allows us to use a more specific mypy type when calling floatify_datetime_fields.	2016-08-10 14:16:17 -07:00
Steve Howell	1d1f36c0b8	export: Always use subprocesses to export UserMessage. We no longer have an in-process code path to export UserMessage rows. We want to only maintain the subprocess code, which we'll always use in production, and which will work fine in dev.	2016-08-10 14:16:17 -07:00
Steve Howell	78bbefbf94	export: Create import_attachments.	2016-08-10 14:16:17 -07:00
Steve Howell	7ec6a394fe	export: Filter Attachment objects by realm.	2016-08-09 16:47:14 -07:00
Steve Howell	cecfaa7761	export: Extract import_message_data().	2016-08-09 16:47:14 -07:00
Steve Howell	5386ed280e	export: Extract update_id_map(). We also use a vanilla dictionary instead of a defaultdict, so that we explicitly initialize what tables are being re-mapped.	2016-08-09 16:47:14 -07:00
Steve Howell	217ef8a4d2	export: Split fix_foreign_keys() into two functions. We now have convert_to_id_fields for the simple case, and re_map_foreign_keys for the more complex case. I also renamed some parameters and variables.	2016-08-09 16:47:14 -07:00
Steve Howell	dd88ffccfd	export: Extract make_raw() in lib/export.py.	2016-08-09 15:58:27 -07:00
Steve Howell	09fa343bdd	export: Use DATE_FIELDS in floatify_datetime_fields. This avoids a little bit of code duplication, plus it should make it a little easier to add new date fields in the future.	2016-08-09 15:58:27 -07:00
Steve Howell	c14ab3c91f	export: Add annotations to zerver/lib/export.py. I also fixed some small things like removing unnecessary return statements, and adding a TODO. In some cases I explicitly cast stuff at run-time to set() or str() to appease mypy, as well as make it clear to somebody reading the code that the callee might not respect ordering or tolerate unicode.	2016-08-09 15:58:27 -07:00
Steve Howell	f18cc4ae3a	export: Added export_avatars_local_helper().	2016-08-09 15:58:27 -07:00
Tim Abbott	6264ff7039	Add new Zulip realm import/export tool. The previous export tool would only work properly for small realms, and was missing a number of important features: * Export of avatars and uploads from S3 * Export of presence data, activity data, etc. * Faithful export/import of timestamps * Parallel export of messages * Not OOM killing for large realms The new tool runs as a pair of documented management commands, and solves all of those problems. Also we add a new management command for exporting the data of an individual user.	2016-08-08 14:58:18 -07:00

1 2

99 Commits