zulip

Commit Graph

Author	SHA1	Message	Date
Steve Howell	ea26372083	hipchat: Make conversion work with UUID ids from Stride. Normal hipchat exports use integer ids for their users and "rooms," which we just borrowed during conversion. Atlassian Stride uses stride UUIDs for these instead, but otherwise has the same export format. We now introduce IdMapper to handle external ids that aren't integer. The IdMapper will map UUID ids to ints and remember them. For ints it just leaves them alone. Fixes #10805.	2018-11-14 23:22:40 -08:00
Steve Howell	aff84cd1e9	hipchat: Skip attachments without paths. This is a short term workaround. Some variants of HipChat exports are missing `path`, and we just punt for now.	2018-11-14 23:14:13 -08:00
Steve Howell	d86dd165da	gitter/slack/hipchat: Remove "subject" from conversions. We (lexically) remove "subject" from the conversion code. The `build_message` helper calls `set_topic_name` under the hood, so things still have "subject" in the JSON. There was good code coverage on `build_message`.	2018-11-12 15:47:11 -08:00
Tim Abbott	e88998e6d4	import: Fix buggy handling of avatars in Slack conversion. This was a pretty nasty error, where we were accidentally accessing the parent list in this inner loop function. This appears to have been introduced as a refactoring bug in `7822ef38c2`.	2018-11-08 15:03:39 -08:00
Tim Abbott	8b661f2f03	slack import: Correctly detect the commenting user. Fixes #10772.	2018-11-06 13:14:23 -08:00
Tim Abbott	81a4c846f4	hipchat: Set s3_path for exported emoji. This fixes an issue where the import process would fail when importing to a server using the S3 backend.	2018-11-06 13:02:04 -08:00
Tim Abbott	539e84e9a1	hipchat import: Stop setting last_modified=None. The last_modified field is intended to support setting the orig-last-modified field in the S3 backend when importing, basically to keep track of this bit of pre-export data for debugging. In the event that it isn't available, the correct thing to do is not write out an invalid `last_modified` field; we should just not write it out at all.	2018-11-06 12:50:36 -08:00
Tim Abbott	d54af3cb5b	hipchat import: Handle deactivated users without an email address. We saw this in a recent HipChat import data set.	2018-11-01 10:09:19 -07:00
Steve Howell	30c493ed24	slack import: Generate message_id/reaction_id with NEXT_ID. This avoids the need to pass tuples of ints around, which is pretty brittle.	2018-10-29 13:24:50 -07:00
Steve Howell	2f58eb1057	slack import: Extract process_message_files(). This is mostly an extraction, but it does change the way we calculate `content`. We append the markdown links from ALL files to any content that came in the message itself. Separating this out also allows us to add more test coverage for the extracted code.	2018-10-29 13:24:50 -07:00
Steve Howell	00f822a26a	conversion: Generate attachment_ids with helpers.	2018-10-29 13:24:50 -07:00
Steve Howell	5cb60f7bea	conversions: Use subscriber_map for Slack/Gitter. We now use subscriber_map for building UserMessage rows in Slack/Gitter conversions. This is mostly designed to simplify the code, rather than having to scan the entire subscribers for each message. I am guessing this will improve performance for most conversions. We sort small lists on every message, in order to be deterministic, but the sorting cost is probably more than offset by avoiding the O(N) scans across all subscriptions. Also, it's probably negligible in the grand scheme of things, compared to JSON parsing, file I/O, etc. This commits also fixes some typos with mentioned_users_id -> mentioned_user_ids and cleans up a test a bit as well.	2018-10-29 13:24:50 -07:00
Steve Howell	adb458a5df	refactor: Use build_user_message for Slack/Gitter. We now have all three third party conversions (Gitter/Slack/Hipchat) go through build_user_message(). Hipchat was already using this helper. We also avoid callers having to pass in an id to build_user_message().	2018-10-29 13:24:50 -07:00
Steve Howell	5194701787	conversions: Use NEXT_ID for usermessage_id. This is mostly complicated due to the way that the Slack import passes around tuples of ids to maintain four different parallel sequences.	2018-10-29 13:24:50 -07:00
Steve Howell	9145cd16cf	minor: Change topic for imported hipchat messages.	2018-10-25 14:16:11 -05:00
Steve Howell	78f6e3ac7d	hipchat import: Fix data issues with PMs. We now set the is_private flag on UserMessage rows for PMs and set their subject to ''.	2018-10-25 09:11:36 -05:00
Steve Howell	272b954790	hipchat import: Add option to mask content. Masking content can be useful for testing out conversions where you're dealing with data from customers and want to avoid inadvertently reading their content (while still having semi-realistic messages).	2018-10-25 08:31:01 -05:00
Steve Howell	6e8ae2e3fd	hipchat import: Support private stream subscribers. We now create private stream subscriptions that are based off of `members` and `owner` from room data in `rooms.json`.	2018-10-25 08:31:01 -05:00
Steve Howell	25f532ca2f	refactor: Break up build_subscriptions. Having two smaller functions should make it easier to customize the behavior for each specific use case. The only reason they were ever coupled was to keep ids in sequence, but the recent NEXT_ID changes make that a non-issue now.	2018-10-25 08:31:01 -05:00
Steve Howell	2ed9fbd25b	conversions: Use NEXT_ID for recipient and subscription ids. The NEXT_ID scheme seems pretty robust, so I'm fixing a few easy places.	2018-10-25 08:31:01 -05:00
Steve Howell	50f76e58ce	conversions: Make NEXT_ID a true singleton. We now instantiate NEXT_ID in sequencer.py, which avoids having multiple modules make multiple copies of a sequencer and possibly causing id collisions.	2018-10-25 08:31:01 -05:00
Steve Howell	fe6df1c222	hipchat import: Fix bug w/rogue UserMessage records. This bug was introduced very recently and is an aliasing bug. It caused extra UserMessage rows to be created as we inadvertently updated the underlying subscriber_map sets for multiple messages. This probably mostly affected PMs. It's doubtful the bug ever got out into the field.	2018-10-24 18:44:18 -05:00
Steve Howell	409e2b4134	hipchat import: Support sender_id == 0 use case.	2018-10-23 17:27:37 -05:00
Steve Howell	876a72c467	hipchat import: Extract get_hipchat_sender_id().	2018-10-23 17:27:37 -05:00
Steve Howell	481488a35e	Extract make_subscriber_map(). We extract this function and put it in the shared library `import_util.py`. Also, we make it one time higher up in the call stack, rather than re-building it for every batch of messages. I doubt this was super expensive, but there's no reason to repeatedly execute this.	2018-10-23 17:27:37 -05:00
Steve Howell	737e02a2e6	hipchat import: Fix PM messages. Before this fix, we were creating two copies of every PM Message in zerver_message with only corresponding UserMessage row. Now we only create one PM Message per message, which we accomplish by making sure we only use imported messages from the sender's history.json file. And then we write UserMessage rows for both participants by making sure to include sender_id in the set of user_ids that feeds into making UserMessage. For the case where you PM yourself, there's just one UserMessage row. It does not appear that we need to support huddles yet.	2018-10-23 17:27:37 -05:00
Steve Howell	bd9e4ef0c8	import: Use pub_date to sort message ids. When we create new ids for message rows, we now sort the new ids by their corresponding pub_date values in the rows. This takes a sizable chunk of memory. This feature only gets turned on if you set sort_by_date to True in realm.json.	2018-10-23 17:27:37 -05:00
Steve Howell	d1ff903534	refactor: Rename build_user -> build_user_profile. This makes greps less confusing.	2018-10-23 17:27:37 -05:00
Steve Howell	ff61c56f47	hipchat import: Add NotificationMessage support.	2018-10-17 12:11:08 -07:00
Tim Abbott	f9b6eeb488	import: Migrate from json to ujson for better perf. We expect to get better memory performace from ujson than json. We also do a better job of closing file handles. This likely fixes #10377.	2018-10-17 12:11:08 -07:00
Tim Abbott	78a15dd715	slack import: Fix obscure email address for Slackbot. Since we know what slackbot is, we don't need to give it a crazy hash as its email address.	2018-10-16 16:33:41 -07:00
Steve Howell	b1dd9a251b	hipchat import: Break messages into smaller batches. Even individual "room" files from hipchat can be large, so we process only 1000 messages at a time within each file, which produces smaller JSON files.	2018-10-15 10:54:23 -07:00
Steve Howell	6650bb2240	minor: Move fix_mentions() closer to caller.	2018-10-15 10:54:23 -07:00
Steve Howell	219ff0f749	hipchat import: Extract UserHandler class.	2018-10-15 10:54:23 -07:00
Steve Howell	2d523fd668	hipchat import: Extract make_user_messages().	2018-10-15 10:54:23 -07:00
Steve Howell	ca0495cbe6	hipchat import: Support attachments.	2018-10-15 10:54:23 -07:00
Steve Howell	d71f3eb1bf	hipchat import: Add some more logging.	2018-10-14 09:29:04 -07:00
Steve Howell	d933779477	hipchat import: Support PrivateUserMessage data. We now import PM data from HipChat.	2018-10-13 16:47:44 -07:00
Steve Howell	f0c3ee0a2e	hipchat import: Write smaller message files. We now write new message files for each new input file + message type we process. This helps the importer not run out of memory later.	2018-10-13 16:47:44 -07:00
Steve Howell	75fc5d41c9	hipchat import: Refactor write_message_data. The goal here is to make it easier to handle other message types by moving the key-specific stuff to the top of the file.	2018-10-13 16:47:44 -07:00
Steve Howell	cc55eb8154	hipchat import: Only process UserMessage rows for now.	2018-10-13 16:47:44 -07:00
Steve Howell	3baac7ddf3	hipchat import: Handle missing emails for guest users.	2018-10-13 16:47:44 -07:00
Steve Howell	8accc60ca7	import_util: Support multiple message ids for attachments.	2018-10-13 16:47:44 -07:00
Steve Howell	23d7b3d2cc	import: De-dup create_converted_data_files helper.	2018-10-13 16:47:41 -07:00
Steve Howell	91905bd66a	import: Add sequencer library. This avoids some tedious code related to making ids in conversion programs.	2018-10-13 16:47:39 -07:00
Steve Howell	9f2aad55b5	hipchat import: Handle users without avatars.	2018-10-12 07:03:25 -04:00
Steve Howell	4b82326376	hipchat import: Support guest users. We simplify the code for is_realm_admin and set is_guest as well. I verified that build_user() is not used by Slack/Gitter, so the extra argument there should be fine. Fixes #10639	2018-10-11 15:28:58 -07:00
Steve Howell	4da664817b	hipchat conversion: Add messages.	2018-10-02 16:55:16 -07:00
Steve Howell	f296d60dad	hipchat conversion: Add emoji support.	2018-10-02 16:55:16 -07:00
Steve Howell	9518b1344a	hipchat conversion: Process avatars. This processes the avatar payloads that we get in users.json.	2018-10-02 16:55:16 -07:00

1 2

76 Commits