Commit Graph

30 Commits

Author SHA1 Message Date
Steve Howell 50f76e58ce conversions: Make NEXT_ID a true singleton.
We now instantiate NEXT_ID in sequencer.py, which avoids
having multiple modules make multiple copies of a sequencer
and possibly causing id collisions.
2018-10-25 08:31:01 -05:00
Steve Howell fe6df1c222 hipchat import: Fix bug w/rogue UserMessage records.
This bug was introduced very recently and is an
aliasing bug.  It caused extra UserMessage rows to
be created as we inadvertently updated the underlying
subscriber_map sets for multiple messages.

This probably mostly affected PMs.

It's doubtful the bug ever got out into the field.
2018-10-24 18:44:18 -05:00
Steve Howell 409e2b4134 hipchat import: Support sender_id == 0 use case. 2018-10-23 17:27:37 -05:00
Steve Howell 876a72c467 hipchat import: Extract get_hipchat_sender_id(). 2018-10-23 17:27:37 -05:00
Steve Howell 481488a35e Extract make_subscriber_map().
We extract this function and put it in the shared
library `import_util.py`.

Also, we make it one time higher up in the call
stack, rather than re-building it for every batch
of messages.  I doubt this was super expensive, but
there's no reason to repeatedly execute this.
2018-10-23 17:27:37 -05:00
Steve Howell 737e02a2e6 hipchat import: Fix PM messages.
Before this fix, we were creating two copies of every
PM Message in zerver_message with only corresponding
UserMessage row.

Now we only create one PM Message per message, which
we accomplish by making sure we only use imported
messages from the sender's history.json file.  And
then we write UserMessage rows for both participants
by making sure to include sender_id in the set of
user_ids that feeds into making UserMessage.  For
the case where you PM yourself, there's just one
UserMessage row.

It does not appear that we need to support huddles
yet.
2018-10-23 17:27:37 -05:00
Steve Howell bd9e4ef0c8 import: Use pub_date to sort message ids.
When we create new ids for message rows, we
now sort the new ids by their corresponding
pub_date values in the rows.

This takes a sizable chunk of memory.

This feature only gets turned on if you
set sort_by_date to True in realm.json.
2018-10-23 17:27:37 -05:00
Steve Howell d1ff903534 refactor: Rename build_user -> build_user_profile.
This makes greps less confusing.
2018-10-23 17:27:37 -05:00
Steve Howell ff61c56f47 hipchat import: Add NotificationMessage support. 2018-10-17 12:11:08 -07:00
Tim Abbott f9b6eeb488 import: Migrate from json to ujson for better perf.
We expect to get better memory performace from
ujson than json.

We also do a better job of closing file handles.

This likely fixes #10377.
2018-10-17 12:11:08 -07:00
Steve Howell b1dd9a251b hipchat import: Break messages into smaller batches.
Even individual "room" files from hipchat can be large,
so we process only 1000 messages at a time
within each file, which produces smaller JSON files.
2018-10-15 10:54:23 -07:00
Steve Howell 6650bb2240 minor: Move fix_mentions() closer to caller. 2018-10-15 10:54:23 -07:00
Steve Howell 219ff0f749 hipchat import: Extract UserHandler class. 2018-10-15 10:54:23 -07:00
Steve Howell 2d523fd668 hipchat import: Extract make_user_messages(). 2018-10-15 10:54:23 -07:00
Steve Howell ca0495cbe6 hipchat import: Support attachments. 2018-10-15 10:54:23 -07:00
Steve Howell d71f3eb1bf hipchat import: Add some more logging. 2018-10-14 09:29:04 -07:00
Steve Howell d933779477 hipchat import: Support PrivateUserMessage data.
We now import PM data from HipChat.
2018-10-13 16:47:44 -07:00
Steve Howell f0c3ee0a2e hipchat import: Write smaller message files.
We now write new message files for each new input
file + message type we process.  This helps the
importer not run out of memory later.
2018-10-13 16:47:44 -07:00
Steve Howell 75fc5d41c9 hipchat import: Refactor write_message_data.
The goal here is to make it easier to handle other
message types by moving the key-specific stuff
to the top of the file.
2018-10-13 16:47:44 -07:00
Steve Howell cc55eb8154 hipchat import: Only process UserMessage rows for now. 2018-10-13 16:47:44 -07:00
Steve Howell 3baac7ddf3 hipchat import: Handle missing emails for guest users. 2018-10-13 16:47:44 -07:00
Steve Howell 23d7b3d2cc import: De-dup create_converted_data_files helper. 2018-10-13 16:47:41 -07:00
Steve Howell 91905bd66a import: Add sequencer library.
This avoids some tedious code related to making ids
in conversion programs.
2018-10-13 16:47:39 -07:00
Steve Howell 9f2aad55b5 hipchat import: Handle users without avatars. 2018-10-12 07:03:25 -04:00
Steve Howell 4b82326376 hipchat import: Support guest users.
We simplify the code for is_realm_admin
and set is_guest as well.

I verified that build_user() is not used
by Slack/Gitter, so the extra argument there
should be fine.

Fixes #10639
2018-10-11 15:28:58 -07:00
Steve Howell 4da664817b hipchat conversion: Add messages. 2018-10-02 16:55:16 -07:00
Steve Howell f296d60dad hipchat conversion: Add emoji support. 2018-10-02 16:55:16 -07:00
Steve Howell 9518b1344a hipchat conversion: Process avatars.
This processes the avatar payloads that we
get in users.json.
2018-10-02 16:55:16 -07:00
Steve Howell c0f15c3860 hipchat conversion: Include deactivated users/streams.
We now include deleted/deactivated data from the old system.
2018-10-02 16:55:16 -07:00
Steve Howell faea26783b Create convert_hipchat_data.
This is a very early version of a tool to convert Hipchat
tar files into data files that can be used by the Zulip
import process.

We include the most fundamental entities--users and
streams.  Customers who don't care about past messages
or customizations could start an instance off of this
and start communicating.

Of course, there are a lot of things missing in the
initial version:

    * messages!
    * file assets -- avatars, emojis, attachments
    * probably lots of other minor things

We currently ignore any incoming dates from Hipchat data
and just use the current time.  This is consistent with
other imports.

We also don't have any docs yet, although the process
will be extremely similar to the "Slack" process:

    https://zulipchat.com/help/import-from-slack

Also, there's a comment at the top of convert_hipchat_data.py
that describes how to test this in dev mode.

I tested this by following the steps in the comment above.
The users just "show up" in /devlogin, so that's nice, and
you can send messages to other users.  To verify the stream
data you have to go into the gear menu and click on "All
Streams", then you can subscribe and send a message.

Production users will need to get new passwords and
re-subscribe to streams.  We will probably auto-subscribe
all users to public streams.
2018-10-02 16:55:16 -07:00