Commit Graph

58 Commits

Author SHA1 Message Date
Anders Kaseorg 56a675d5ec export: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:25:27 -08:00
Tim Abbott 9b25f8789f hipchat: Fix handling of user IDs in Stride import.
We've had this code oscillated a few times; the original comparison
was added as part of Stride import but broke HipChat import.
c34a8f2e69 fixed HipChat import but
regressed Stride.

This change fixes this for both HipChat + Stride.
2019-01-31 12:40:05 -08:00
Tim Abbott 4c603990d2 hipchat: Use HTML2Text for the content.
While the result is by no means perfect, it's significantly cleaner
than what we had before this.
2019-01-09 16:59:45 -08:00
Tim Abbott 53436766c1 hipchat: Improve import of public room subscribers.
Now, if you pass an api_key, we'll initialize the public room
subscribers to be whatever they were at the time the import happened.

Also, document the situation on the caveats section.
2019-01-09 16:50:00 -08:00
Tim Abbott 035138dd98 hipchat: Refactor code for building subscriptions.
This moves the filtering of invite-only into the caller, and also
adjusts the indentation.
2019-01-09 16:50:00 -08:00
Tim Abbott c34a8f2e69 hipchat: Fix importing of private messages.
Apparently a stupid typing issue meant that we broke this a few weeks
ago.
2019-01-09 16:50:00 -08:00
Tim Abbott b7fb919dfa hipchat: Don't enable slim_mode by default.
For small organizations, generally one prefers the non-slim_mode
default behavior.
2019-01-04 11:30:25 -08:00
Tim Abbott 492230f405 hipchat import: Always include deleted users in import.
The slim_mode setting had been incorrectly configured to skip
"deleted" users, resulting in bugs where private messages with deleted
users would not be imported.
2019-01-04 11:23:02 -08:00
Tim Abbott 41b7d9a4c8 hipchat: Handle unusual emoticons.json format.
Apparently, hc-migrate can generate emoticons.json files with a
somewhat different format.  Assuming that other files are in the
normal format, we should be able to handle it like this.

See report in #11135.
2018-12-29 18:59:31 -08:00
Tim Abbott 44f117ac72 hipchat: Handle case where emoticons.json is not in export.
Apparently, some methods of exporting from HipChat do not include an
emoticons.json file.  We could test for this using the
`include_emoticons` field in `metadata.json`, but we currently don't
even bother to read that file.  Rather than changing that, we just
print a warning and proceed.  This is arguably better anyway, in that
often not having emoticons.json is the result of user error when
exporting, and it's nice to flag that this is happening.

Fixes #11135.
2018-12-29 18:54:50 -08:00
Tim Abbott 00826486bd hipchat: Fix typo in logging output. 2018-11-26 16:44:31 -08:00
Steve Howell 38f81d5d20 hipchat: Skip public stream subs in slim mode. 2018-11-26 16:37:30 -08:00
Steve Howell c2e9f5eb0a hipchat: Limit messages in slim mode.
For messages with strange senders, we don't import
messages.  Basically, we only import a message if
it has sender with an id that maps to a non-deleted
user.
2018-11-26 16:37:30 -08:00
Steve Howell 3a7788217e hipchat: Skip really long messages. 2018-11-26 16:37:30 -08:00
Steve Howell e57a932692 hipchat: Fix avatars.
This code was not reading any avatars because
it was not referencing 'User' to get to the avatar,
and it was not re-mapping user ids for some reason.
2018-11-26 16:37:30 -08:00
Steve Howell ad35e371fe hipchat: Support slim_mode flag.
We now skip deleted users.  There is a flag
here that's hard coded to True--we may decide
later to make this a command line option.
2018-11-26 16:37:30 -08:00
Steve Howell bd1e96cf63 hipchat: Rework stream/subscriber logic.
We now account for streams having users that
may be deleted.  We do a couple things:

    - use a loop instead of map
    - only pass in users to hipchat_subscriber
    - early-exit if there are not users
    - skip owner/members logic for public streams
2018-11-26 16:37:30 -08:00
Steve Howell 1335dfd295 hipchat: Handle messages with missing recipients.
If a message is for a stream or user that we didn't
load, then we just skip it.
2018-11-26 16:37:30 -08:00
Steve Howell ea26372083 hipchat: Make conversion work with UUID ids from Stride.
Normal hipchat exports use integer ids for their
users and "rooms," which we just borrowed during
conversion.

Atlassian Stride uses stride UUIDs for these instead, but otherwise
has the same export format.

We now introduce IdMapper to handle external ids
that aren't integer.  The IdMapper will map UUID
ids to ints and remember them.  For ints it just
leaves them alone.

Fixes #10805.
2018-11-14 23:22:40 -08:00
Steve Howell d86dd165da gitter/slack/hipchat: Remove "subject" from conversions.
We (lexically) remove "subject" from the conversion code.  The
`build_message` helper calls `set_topic_name` under the hood,
so things still have "subject" in the JSON.

There was good code coverage on `build_message`.
2018-11-12 15:47:11 -08:00
Tim Abbott 81a4c846f4 hipchat: Set s3_path for exported emoji.
This fixes an issue where the import process would fail when importing
to a server using the S3 backend.
2018-11-06 13:02:04 -08:00
Tim Abbott d54af3cb5b hipchat import: Handle deactivated users without an email address.
We saw this in a recent HipChat import data set.
2018-11-01 10:09:19 -07:00
Steve Howell adb458a5df refactor: Use build_user_message for Slack/Gitter.
We now have all three third party
conversions (Gitter/Slack/Hipchat)
go through build_user_message().

Hipchat was already using this helper.

We also avoid callers having to pass in
an id to build_user_message().
2018-10-29 13:24:50 -07:00
Steve Howell 9145cd16cf minor: Change topic for imported hipchat messages. 2018-10-25 14:16:11 -05:00
Steve Howell 78f6e3ac7d hipchat import: Fix data issues with PMs.
We now set the is_private flag on UserMessage
rows for PMs and set their subject to ''.
2018-10-25 09:11:36 -05:00
Steve Howell 272b954790 hipchat import: Add option to mask content.
Masking content can be useful for testing
out conversions where you're dealing
with data from customers and want to avoid
inadvertently reading their content (while
still having semi-realistic messages).
2018-10-25 08:31:01 -05:00
Steve Howell 6e8ae2e3fd hipchat import: Support private stream subscribers.
We now create private stream subscriptions that are
based off of `members` and `owner` from room data
in `rooms.json`.
2018-10-25 08:31:01 -05:00
Steve Howell 25f532ca2f refactor: Break up build_subscriptions.
Having two smaller functions should make it
easier to customize the behavior for each specific
use case.  The only reason they were ever coupled
was to keep ids in sequence, but the recent NEXT_ID
changes make that a non-issue now.
2018-10-25 08:31:01 -05:00
Steve Howell 50f76e58ce conversions: Make NEXT_ID a true singleton.
We now instantiate NEXT_ID in sequencer.py, which avoids
having multiple modules make multiple copies of a sequencer
and possibly causing id collisions.
2018-10-25 08:31:01 -05:00
Steve Howell fe6df1c222 hipchat import: Fix bug w/rogue UserMessage records.
This bug was introduced very recently and is an
aliasing bug.  It caused extra UserMessage rows to
be created as we inadvertently updated the underlying
subscriber_map sets for multiple messages.

This probably mostly affected PMs.

It's doubtful the bug ever got out into the field.
2018-10-24 18:44:18 -05:00
Steve Howell 409e2b4134 hipchat import: Support sender_id == 0 use case. 2018-10-23 17:27:37 -05:00
Steve Howell 876a72c467 hipchat import: Extract get_hipchat_sender_id(). 2018-10-23 17:27:37 -05:00
Steve Howell 481488a35e Extract make_subscriber_map().
We extract this function and put it in the shared
library `import_util.py`.

Also, we make it one time higher up in the call
stack, rather than re-building it for every batch
of messages.  I doubt this was super expensive, but
there's no reason to repeatedly execute this.
2018-10-23 17:27:37 -05:00
Steve Howell 737e02a2e6 hipchat import: Fix PM messages.
Before this fix, we were creating two copies of every
PM Message in zerver_message with only corresponding
UserMessage row.

Now we only create one PM Message per message, which
we accomplish by making sure we only use imported
messages from the sender's history.json file.  And
then we write UserMessage rows for both participants
by making sure to include sender_id in the set of
user_ids that feeds into making UserMessage.  For
the case where you PM yourself, there's just one
UserMessage row.

It does not appear that we need to support huddles
yet.
2018-10-23 17:27:37 -05:00
Steve Howell bd9e4ef0c8 import: Use pub_date to sort message ids.
When we create new ids for message rows, we
now sort the new ids by their corresponding
pub_date values in the rows.

This takes a sizable chunk of memory.

This feature only gets turned on if you
set sort_by_date to True in realm.json.
2018-10-23 17:27:37 -05:00
Steve Howell d1ff903534 refactor: Rename build_user -> build_user_profile.
This makes greps less confusing.
2018-10-23 17:27:37 -05:00
Steve Howell ff61c56f47 hipchat import: Add NotificationMessage support. 2018-10-17 12:11:08 -07:00
Tim Abbott f9b6eeb488 import: Migrate from json to ujson for better perf.
We expect to get better memory performace from
ujson than json.

We also do a better job of closing file handles.

This likely fixes #10377.
2018-10-17 12:11:08 -07:00
Steve Howell b1dd9a251b hipchat import: Break messages into smaller batches.
Even individual "room" files from hipchat can be large,
so we process only 1000 messages at a time
within each file, which produces smaller JSON files.
2018-10-15 10:54:23 -07:00
Steve Howell 6650bb2240 minor: Move fix_mentions() closer to caller. 2018-10-15 10:54:23 -07:00
Steve Howell 219ff0f749 hipchat import: Extract UserHandler class. 2018-10-15 10:54:23 -07:00
Steve Howell 2d523fd668 hipchat import: Extract make_user_messages(). 2018-10-15 10:54:23 -07:00
Steve Howell ca0495cbe6 hipchat import: Support attachments. 2018-10-15 10:54:23 -07:00
Steve Howell d71f3eb1bf hipchat import: Add some more logging. 2018-10-14 09:29:04 -07:00
Steve Howell d933779477 hipchat import: Support PrivateUserMessage data.
We now import PM data from HipChat.
2018-10-13 16:47:44 -07:00
Steve Howell f0c3ee0a2e hipchat import: Write smaller message files.
We now write new message files for each new input
file + message type we process.  This helps the
importer not run out of memory later.
2018-10-13 16:47:44 -07:00
Steve Howell 75fc5d41c9 hipchat import: Refactor write_message_data.
The goal here is to make it easier to handle other
message types by moving the key-specific stuff
to the top of the file.
2018-10-13 16:47:44 -07:00
Steve Howell cc55eb8154 hipchat import: Only process UserMessage rows for now. 2018-10-13 16:47:44 -07:00
Steve Howell 3baac7ddf3 hipchat import: Handle missing emails for guest users. 2018-10-13 16:47:44 -07:00
Steve Howell 23d7b3d2cc import: De-dup create_converted_data_files helper. 2018-10-13 16:47:41 -07:00