zulip

Commit Graph

Author	SHA1	Message	Date
Vishnu Ks	ce1d6044db	import: Replace data-stream-id in rendered_content. See the data-user-id commit for details.	2019-05-28 12:53:20 -07:00
Vishnu Ks	cb5b3f347b	import: Replace data-user-id in rendered_content with new user id. Previously, if you exported a Zulip organization and then re-imported it, we'd end up renumbering the user IDs and all direct foreign key references to them in the database, but not the data-user-id references in mentions. Fix this by parsing the message content and doing that renumbering. (Because we import raw markdown, not HTML, from third-party tools, these changes won't affect data import from slack etc.) Fixes the high-priority part of #11293.	2019-05-28 12:53:19 -07:00
Anders Kaseorg	643bd18b9f	lint: Fix code that evaded our lint checks for string % non-tuple. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-04-23 15:21:37 -07:00
Challa Venkata Raghava Reddy	b69aec2dbc	streams: Add first_message_id tracking first message in stream. This field is primarily intended to support avoiding displaying the "more topics" feature in new organizations and streams, where we might know that all messages in the stream are already available in the browser. Based on original work by Roman Godov, and significantly modified by tabbott. The second migration involved here could be expensive on Zulip Cloud, but is unlikely to be an issue on other servers.	2019-03-11 13:30:49 -07:00
Hemanth V. Alluri	ae126c452b	stream-descriptions: Create wrapper for rendering stream descriptions. In commit `de65a04` we can see that if the need ever arises to modify how stream descriptions are rendered, we would need to make changes at 5 different call points which can be quite cumbersome. So this functionality has been extracted to a new method called 'render_stream_descriptions'.	2019-03-06 17:16:14 -08:00
Bennet Sunder	7c5f316cb8	alert_words: Performance improvements in looking for alert_words. This commit leverages the ahocorasick algorithm to build a set of user_ids that have their alert_words present in the message. It runs in linear time of the order of length of the input message as opposed to number of alert_words. This is after building a ahocorasick Automaton which runs in O(number of alert_words in entire realm) which is usually cached.	2019-03-01 15:36:39 -08:00
Tim Abbott	de65a04ae0	streams: Disable inline URL preview when rendering stream descriptions. We want to use the baseline features of bugdown, but not fancy things like inline URL previews, since the whole structure of stream descriptions is to have a single-line thing supporting some formatting. The migration part of this change fixes a bug encountered by some organizations upgrading from older versions of Zulip.	2019-02-28 17:00:40 -08:00
Tim Abbott	4d08461ab1	import: Set plan_type to SELF_HOSTED on import. We've for a while had logic to set plan_type to LIMITED when importing into Zulip Cloud; we need corresponding logic to set it to SELF_HOSTED when importing into a self-hosted server. Fixes #11541.	2019-02-12 16:01:02 -08:00
Wyatt Hoodes	9c68a97472	import/export: Use separate analytics.json for analytics data. This helps keep the realm.json small and easy to process; previously, almost the entire size of that file was the analytics data. We implement this by refactoring the analytics Config objects into a separate subroutine that writes to a separate file, plus the corresponding import code. Manual testing was performed by exporting the 'analytics' realm, and importing back to a newly created 'test' realm. The 'test' realm was then exported and the json files were inspected. The data appeared consistent with no abnormalities. Fixes: #11220.	2019-02-04 10:59:24 -08:00
Hemanth V. Alluri	73d26c8b28	streams: Render and store the stream description from the backend. This commit does the following three things: 1. Update stream model to accomodate rendered description. 2. Render and save the stream rendered description on update. 3. Render and save stream descriptions on creation. Further, the stream's rendered description is also sent whenever the stream's description is being sent. This is preparatory work for eliminating the use of the non-authoritative marked.js markdown parser for stream descriptions.	2019-02-01 22:24:18 -08:00
Vishnu Ks	bec875a9af	import realm: Use processes for resizing avatar images. This should significantly improve the data import performance when importing large open source realms from Slack. Fixes #11009.	2019-01-25 12:37:12 -08:00
Tim Abbott	dfaa2e481d	import: Log a warning when avatars can't be thumbnailed. This fixes a potential crash in the import tool if a single user has a broken avatar image.	2019-01-15 16:48:04 -08:00
Tim Abbott	6eda129741	export: Export and import analytics table data. This should eliminate the need to do manual analytics work when importing organizations imported/exported using the zulip -> zulip import/export tools.	2019-01-04 16:22:18 -08:00
Tim Abbott	48ccb3ad18	import: Move realm_tables to the appropriate file. These had ended up in the wrong place when we split export from import.	2019-01-04 16:22:18 -08:00
Tim Abbott	b33e0ad539	import: Fix pointer logic to sort by message_id. Previously, the pointer calculation logic wasn't sorting by message ID, which caused the database queries to not properly use the indexes they should.	2019-01-04 16:22:18 -08:00
Tim Abbott	a1919971e4	import: Handle invalid data-user-id values for mentions. This is an issue with zulip -> zulip server data imports.	2019-01-02 15:23:09 -08:00
Tim Abbott	b63f8b59b2	import: Handle corner case around EMAIL_GATEWAY_BOT emails.	2019-01-02 15:23:09 -08:00
Tim Abbott	8cfea958de	import: Fix pointer logic for zulip->zulip imports. Previously, the pointer was almost guaranteed to be an invalid random value, because we renumber message IDs unconditionally now.	2019-01-02 15:23:09 -08:00
Tim Abbott	74ff77d366	import: Always set a valid content-type for S3 backend. The octet-stream content type is potentially under-specified, but it's better than potentially submitting None and increases consistency of this part of the codebase.	2018-12-29 22:13:11 -08:00
Tim Abbott	f0c7424957	import: Fix sending floats to boto S3 metadata keys. The boto library's s3 interface allows setting only string-format metadata keys. So we need to cast the last_modified floating-point timestamp into a string before storing on the S3 object. This bug mostly broke uploading avatars when using the S3 storage backend.	2018-12-29 22:09:31 -08:00
Tim Abbott	c995e8e2ae	import: Ensure presence of basic avatar images for HipChat. Our HipChat conversion tool didn't properly handle basic avatar images, resulting in only the medium-size avatar images being imported properly. This fixes that bug by asking the import tool to do the thumbnailing for the basic avatar image (from the .original file) as well as the medium avatar image.	2018-12-27 17:47:09 -08:00
Rishi Gupta	8a95526ced	billing: Always transition to Realm.LIMITED via do_change_plan_type. Fixes a bug in import_realm where secondary attributes like message visibility weren't being set, and also makes bugs like this less likely in the future. Also, putting the plan_type change at the end of import_realm, so that future restrictions to LIMITED realms don't affect the import process.	2018-12-13 13:26:24 -08:00
Tim Abbott	1adc40f014	import: Deduplicate functions for uploading to S3/files. We've had a long stream of bugs existed because only one of these two code paths was tested (usually the local uploads backend). By deduplicating these functions, we ensure that this category of bugs no longer happens. Following my recent refactor, this is just a straightforward merge, with code for one or the other backend ending up inside an if statement.	2018-12-05 16:15:01 -08:00
Tim Abbott	c9b801efde	import: Use the s3_path attribute for path_maps unconditionally. While the s3_path is almost always the same as the path, structurally, `path` is the location in the export object, whereas s3_path is the URL path.	2018-12-05 16:15:01 -08:00
Tim Abbott	f4c5a45f4f	import: Fix S3 paths for imported avatar PNG. Previously, we were incorrectly importing avatar PNGs to a filename without the .png extension, resulting in them effectively not being imported. This was mitigated by the fact that we imported the originals and ran the appropriate `ensure_` functions, but still a bug.	2018-12-05 16:15:01 -08:00
Tim Abbott	412dc8dcda	import: Set last_modified in import_uploads_local. This has no effect other than to make the S3 and local code paths more nearly identical.	2018-12-05 16:15:01 -08:00
Tim Abbott	d8d0492d64	import: Restructure uploads path logic to be more similar. This is preparation for future deduplication of the two redundant uploads backends.	2018-12-05 16:15:01 -08:00
Tim Abbott	671ceccd78	import: Deduplicate medium avatars special logic. This requires a bit of care with upload_backend to avoid breaking how we mock that class in our tests.	2018-12-05 16:15:01 -08:00
Tim Abbott	36b43a6d7a	import: Deduplicate first block of import_uploads logic.	2018-12-05 16:15:01 -08:00
Tim Abbott	f80bab58c0	import_realm: Add progress indicator for importing uploads. This makes it easier to see how we're doing when uploading a very large number of files.	2018-12-05 16:15:01 -08:00
Steve Howell	88f50b97fd	import: Render content before inserting messages. By rendering content before bulk importing messages, we avoid O(N) database hops.	2018-11-07 10:33:11 -08:00
Steve Howell	bf3f7d93d0	Simplify params for fix_message_rendered_content.	2018-11-07 10:33:11 -08:00
Steve Howell	0878d86706	import: Avoid unnecessary Message lookups. We now no longer go the DB to get a Message object during render.	2018-11-07 10:33:11 -08:00
Steve Howell	1e12b13a56	import: Avoid unnecessary sender lookups. This commit speeds up the import by avoiding sender lookups and instead using the data for users that we already have in memory. This avoids a few DB hops, many hops to memcached, plus some object construction. We now call do_render_markdown() directly. This also makes it more explicit that the import has never rendered alert words.	2018-11-07 10:33:10 -08:00
Steve Howell	f9a7451167	import: Pass in realm to render codepath. We avoid querying the same realm multiple times.	2018-11-07 10:08:46 -08:00
Steve Howell	92a7f04149	import: Inline save_message_rendered_content(). This function requires a message object, whereas we want to work with JSON data to avoid necessary queries when we import data. Inlining the function sets us up for a subsequent refactoring. We change the way we deal with theoretical return values of `None` to use an assertion; otherwise, we would have to loosen up a bunch of mypy types from `str` to `Optional[str]`. It's not clear `None` is even possible--we've moved toward throwing exceptions there instead of silently failing.	2018-11-07 10:08:45 -08:00
Tim Abbott	e14a35b490	import: Don't assume a last_modified key is present. This fixes an exception when importing uploaded file data from Slack/HipChat.	2018-11-07 09:52:35 -08:00
Tim Abbott	1bf385e35f	import: Avoid sending a content-type of None to S3. The previous logic was incorrect, in that if `content_type` was set to None (which happens with Slack/HipChat export, among other things), then we wouldn't run the `guess_type` logic to auto-detect the Content-Type to send to S3.	2018-11-06 13:03:14 -08:00
Steve Howell	a092bee6b3	import: Reduce memory usage for UserMessage ids. The UserMessage table can be huge, so creating a bunch of entries in `ID_MAP` can overflow memory. We don't have any tables that depend on `UserMessage`, and we don't send the 'id' fields from `zerver_usermessage` to the database, so re-mapping them was just busy-work.	2018-11-05 10:18:01 -08:00
Steve Howell	53436b4b41	import: Rename id_maps -> ID_MAP.	2018-10-23 17:27:37 -05:00
Steve Howell	bd9e4ef0c8	import: Use pub_date to sort message ids. When we create new ids for message rows, we now sort the new ids by their corresponding pub_date values in the rows. This takes a sizable chunk of memory. This feature only gets turned on if you set sort_by_date to True in realm.json.	2018-10-23 17:27:37 -05:00
Steve Howell	2d4b09f59d	utils: Add process_list_in_batches().	2018-10-15 10:54:23 -07:00
Steve Howell	493aae2958	imports: Make loading UserMessage faster and more robust. We use UserMessageLite to avoid Django overhead, and we do updates in chunks of 10000. (The export may be broken into several files already, but a reasonable chunking at import time is good defense against running out of memory.)	2018-10-13 16:43:28 -07:00
Steve Howell	329154da32	import: Speed up create_subscription_events(). The code was needlessly querying the DB to get full objects for entities where we only needed user_id, realm_id, and stream_id. With my test data of ~1000 records this sped up the function from ~8s to ~0.5s. The speedup would probably be even more for larger data sets.	2018-10-02 16:55:16 -07:00
Tim Abbott	a0451b692f	import: Move zerver_client import before realm import. This table is independent of the realm/stream table dance, and moving it here helps makes the flow read more clearly.	2018-09-21 10:58:24 -07:00
Rishi Gupta	b470cef864	import: Set Realm.plan_type to SELF_HOSTED on import. Tweaked by tabbott to avoid an unnecessary .save().	2018-09-21 10:57:22 -07:00
Tim Abbott	e2bd03365e	import: Fix handling of recipient IDs for welcome bot. If any user had sent the reply to the welcome bot recommended by our tutorial, then the Zulip export/import process didn't work properly, because we weren't including (and then remapping) the recipient ID for sending PMs to the cross-realm bots. This commit fixes that gap, by recording the necessary data on the export side, and doing the appropriate remapping on the import side.	2018-09-20 17:55:17 -07:00
Tim Abbott	c9189439de	import: Handle signup_notifications_stream_id. Previously, our realm import logic only did the special remapping logic for the original notifications_stream_id; when we added the new signup_notifications_stream_id field, we neglected to handle it in the same way.	2018-09-20 17:41:55 -07:00
Rhea Parekh	20bca1409f	import: Set emoji records 'last_modified' value in 'import_uploads_s3'. The 'last_modified' value in emoji records is needed for uploading the file to the S3 backend. We set the same in the function 'import_uploads_s3'. We also have to remove the keyword 'last_modified' while building the RealmEmoji dict, as it is not a field which exists in RealmEmoji objects.	2018-08-10 16:20:36 -07:00
Tim Abbott	2f6f38fa7f	import: Guess upload content-types when unavailable from export. This is mostly for exports from other software like Slack, that might not provide a content-type.	2018-08-10 09:32:28 -07:00

1 2

81 Commits