Commit Graph

105 Commits

Author SHA1 Message Date
Anders Kaseorg 56a675d5ec export: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:25:27 -08:00
Hemanth V. Alluri 73d26c8b28 streams: Render and store the stream description from the backend.
This commit does the following three things:
    1. Update stream model to accomodate rendered description.
    2. Render and save the stream rendered description on update.
    3. Render and save stream descriptions on creation.

Further, the stream's rendered description is also sent whenever the
stream's description is being sent.

This is preparatory work for eliminating the use of the
non-authoritative marked.js markdown parser for stream descriptions.
2019-02-01 22:24:18 -08:00
Tim Abbott 9b25f8789f hipchat: Fix handling of user IDs in Stride import.
We've had this code oscillated a few times; the original comparison
was added as part of Stride import but broke HipChat import.
c34a8f2e69 fixed HipChat import but
regressed Stride.

This change fixes this for both HipChat + Stride.
2019-01-31 12:40:05 -08:00
Pragati Agrawal e1772b3b8f tools: Upgrade Pycodestyle and fix new linter errors.
Here, we are upgrading pycodestyle version from 2.4.0 to 2.5.0.

Fixes: #11396.
2019-01-31 12:21:41 -08:00
Matthew Wegner 370cf1a2cb import: Normalize Slackbot String Comparison.
In very old Slack workspaces, slackbot can appear as "Slackbot", and
the import script only checks for "slackbot" (case sensitive).  This
breaks the import--it throws the assert that immediately follows the
test.  I don't know how common this is, but it definitely affected our
import.

The simple fix is to compare against a lowercased-version of the
user's full name.
2019-01-28 14:59:41 -08:00
Tim Abbott 4c603990d2 hipchat: Use HTML2Text for the content.
While the result is by no means perfect, it's significantly cleaner
than what we had before this.
2019-01-09 16:59:45 -08:00
Tim Abbott 53436766c1 hipchat: Improve import of public room subscribers.
Now, if you pass an api_key, we'll initialize the public room
subscribers to be whatever they were at the time the import happened.

Also, document the situation on the caveats section.
2019-01-09 16:50:00 -08:00
Tim Abbott 035138dd98 hipchat: Refactor code for building subscriptions.
This moves the filtering of invite-only into the caller, and also
adjusts the indentation.
2019-01-09 16:50:00 -08:00
Tim Abbott c34a8f2e69 hipchat: Fix importing of private messages.
Apparently a stupid typing issue meant that we broke this a few weeks
ago.
2019-01-09 16:50:00 -08:00
Tim Abbott b7fb919dfa hipchat: Don't enable slim_mode by default.
For small organizations, generally one prefers the non-slim_mode
default behavior.
2019-01-04 11:30:25 -08:00
Tim Abbott 492230f405 hipchat import: Always include deleted users in import.
The slim_mode setting had been incorrectly configured to skip
"deleted" users, resulting in bugs where private messages with deleted
users would not be imported.
2019-01-04 11:23:02 -08:00
Tim Abbott 41b7d9a4c8 hipchat: Handle unusual emoticons.json format.
Apparently, hc-migrate can generate emoticons.json files with a
somewhat different format.  Assuming that other files are in the
normal format, we should be able to handle it like this.

See report in #11135.
2018-12-29 18:59:31 -08:00
Tim Abbott 44f117ac72 hipchat: Handle case where emoticons.json is not in export.
Apparently, some methods of exporting from HipChat do not include an
emoticons.json file.  We could test for this using the
`include_emoticons` field in `metadata.json`, but we currently don't
even bother to read that file.  Rather than changing that, we just
print a warning and proceed.  This is arguably better anyway, in that
often not having emoticons.json is the result of user error when
exporting, and it's nice to flag that this is happening.

Fixes #11135.
2018-12-29 18:54:50 -08:00
Tim Abbott c995e8e2ae import: Ensure presence of basic avatar images for HipChat.
Our HipChat conversion tool didn't properly handle basic avatar
images, resulting in only the medium-size avatar images being imported
properly.  This fixes that bug by asking the import tool to do the
thumbnailing for the basic avatar image (from the .original file) as
well as the medium avatar image.
2018-12-27 17:47:09 -08:00
Tim Abbott 8a90441d2f slack import: Import long-inactive users as long-term idle.
This avoids creating UserMessage rows for long-inactive users in
organizations with many thousands of users.
2018-12-16 18:52:20 -08:00
Tim Abbott a6ca95dfc4 slack import: Fix all messages being imported to one channel.
This was an ugly variable-escape-from-loop regression introduced in
e59ff6e6db.
2018-12-12 17:54:37 -08:00
Tim Abbott d6217eb862 slack import: Fix empty values for custom profile fields.
The Slack import process would incorrectly issue
CustomProfileFieldValue entries with a value of "" for users who
didn't have a given CustomProfileField (especially common for the
"skype" and "phone" fields).  This had no user-visible effect, but
certainly added some clutter in the database.
2018-12-12 12:58:27 -08:00
Tim Abbott e9900b2bdf gitter: Do something reasonable with invalid fullnames. 2018-12-12 10:07:52 -08:00
rht e59ff6e6db slack import: Eliminate need to load all messages into memory.
This works by yielding messages sorted based on timestamp.  Because
the Slack exports are broken into files by date, it's convenient to do
a 2-layer sorting process, where we open all the files for a given
day, and then sort their messages by timestamp before yielding them.

Fixes #10930.
2018-12-05 12:20:50 -08:00
Tim Abbott 48a3975ec0 import: Avoid unnecessary forks when downloading attachments.
The previous implementation used run_parallel incorrectly, passing it
a set of very small jobs (each was to download a single file), which
meant that we'd end up forking once for every file to download.

This correct implementation sends each of N threads 1/N of the files
to download, which is more consistent with the goal of distributing
the download work between N threads.
2018-12-02 13:50:27 -08:00
Tim Abbott 00826486bd hipchat: Fix typo in logging output. 2018-11-26 16:44:31 -08:00
Steve Howell 38f81d5d20 hipchat: Skip public stream subs in slim mode. 2018-11-26 16:37:30 -08:00
Steve Howell c2e9f5eb0a hipchat: Limit messages in slim mode.
For messages with strange senders, we don't import
messages.  Basically, we only import a message if
it has sender with an id that maps to a non-deleted
user.
2018-11-26 16:37:30 -08:00
Steve Howell 3a7788217e hipchat: Skip really long messages. 2018-11-26 16:37:30 -08:00
Steve Howell e57a932692 hipchat: Fix avatars.
This code was not reading any avatars because
it was not referencing 'User' to get to the avatar,
and it was not re-mapping user ids for some reason.
2018-11-26 16:37:30 -08:00
Steve Howell ad35e371fe hipchat: Support slim_mode flag.
We now skip deleted users.  There is a flag
here that's hard coded to True--we may decide
later to make this a command line option.
2018-11-26 16:37:30 -08:00
Steve Howell bd1e96cf63 hipchat: Rework stream/subscriber logic.
We now account for streams having users that
may be deleted.  We do a couple things:

    - use a loop instead of map
    - only pass in users to hipchat_subscriber
    - early-exit if there are not users
    - skip owner/members logic for public streams
2018-11-26 16:37:30 -08:00
Steve Howell 1335dfd295 hipchat: Handle messages with missing recipients.
If a message is for a stream or user that we didn't
load, then we just skip it.
2018-11-26 16:37:30 -08:00
Steve Howell ff68757358 hipchat: Just skip over missing attachments.
It seems like we get a lot of exports with bad
attachment data, and some folks don't necessarily
care, so we just skip for now.
2018-11-26 16:37:30 -08:00
Steve Howell ea26372083 hipchat: Make conversion work with UUID ids from Stride.
Normal hipchat exports use integer ids for their
users and "rooms," which we just borrowed during
conversion.

Atlassian Stride uses stride UUIDs for these instead, but otherwise
has the same export format.

We now introduce IdMapper to handle external ids
that aren't integer.  The IdMapper will map UUID
ids to ints and remember them.  For ints it just
leaves them alone.

Fixes #10805.
2018-11-14 23:22:40 -08:00
Steve Howell aff84cd1e9 hipchat: Skip attachments without paths.
This is a short term workaround.  Some variants
of HipChat exports are missing `path`, and we just
punt for now.
2018-11-14 23:14:13 -08:00
Steve Howell d86dd165da gitter/slack/hipchat: Remove "subject" from conversions.
We (lexically) remove "subject" from the conversion code.  The
`build_message` helper calls `set_topic_name` under the hood,
so things still have "subject" in the JSON.

There was good code coverage on `build_message`.
2018-11-12 15:47:11 -08:00
Tim Abbott e88998e6d4 import: Fix buggy handling of avatars in Slack conversion.
This was a pretty nasty error, where we were accidentally accessing
the parent list in this inner loop function.

This appears to have been introduced as a refactoring bug in
7822ef38c2.
2018-11-08 15:03:39 -08:00
Tim Abbott 8b661f2f03 slack import: Correctly detect the commenting user.
Fixes #10772.
2018-11-06 13:14:23 -08:00
Tim Abbott 81a4c846f4 hipchat: Set s3_path for exported emoji.
This fixes an issue where the import process would fail when importing
to a server using the S3 backend.
2018-11-06 13:02:04 -08:00
Tim Abbott 539e84e9a1 hipchat import: Stop setting last_modified=None.
The last_modified field is intended to support setting the
orig-last-modified field in the S3 backend when importing, basically
to keep track of this bit of pre-export data for debugging.  In the
event that it isn't available, the correct thing to do is not write
out an invalid `last_modified` field; we should just not write it out
at all.
2018-11-06 12:50:36 -08:00
Tim Abbott d54af3cb5b hipchat import: Handle deactivated users without an email address.
We saw this in a recent HipChat import data set.
2018-11-01 10:09:19 -07:00
Steve Howell 30c493ed24 slack import: Generate message_id/reaction_id with NEXT_ID.
This avoids the need to pass tuples of ints around, which
is pretty brittle.
2018-10-29 13:24:50 -07:00
Steve Howell 2f58eb1057 slack import: Extract process_message_files().
This is mostly an extraction, but it does change the
way we calculate `content`.  We append the markdown
links from ALL files to any content that came in the
message itself.

Separating this out also allows us to add more
test coverage for the extracted code.
2018-10-29 13:24:50 -07:00
Steve Howell 00f822a26a conversion: Generate attachment_ids with helpers. 2018-10-29 13:24:50 -07:00
Steve Howell 5cb60f7bea conversions: Use subscriber_map for Slack/Gitter.
We now use subscriber_map for building UserMessage
rows in Slack/Gitter conversions.

This is mostly designed to simplify the code, rather
than having to scan the entire subscribers for each
message.

I am guessing this will improve performance for most
conversions.  We sort small lists on every message,
in order to be deterministic, but the sorting cost
is probably more than offset by avoiding the O(N)
scans across all subscriptions.  Also, it's probably
negligible in the grand scheme of things, compared
to JSON parsing, file I/O, etc.

This commits also fixes some typos with mentioned_users_id ->
mentioned_user_ids and cleans up a test a bit as well.
2018-10-29 13:24:50 -07:00
Steve Howell adb458a5df refactor: Use build_user_message for Slack/Gitter.
We now have all three third party
conversions (Gitter/Slack/Hipchat)
go through build_user_message().

Hipchat was already using this helper.

We also avoid callers having to pass in
an id to build_user_message().
2018-10-29 13:24:50 -07:00
Steve Howell 5194701787 conversions: Use NEXT_ID for usermessage_id.
This is mostly complicated due to the way that the
Slack import passes around tuples of ids to maintain
four different parallel sequences.
2018-10-29 13:24:50 -07:00
Steve Howell 9145cd16cf minor: Change topic for imported hipchat messages. 2018-10-25 14:16:11 -05:00
Steve Howell 78f6e3ac7d hipchat import: Fix data issues with PMs.
We now set the is_private flag on UserMessage
rows for PMs and set their subject to ''.
2018-10-25 09:11:36 -05:00
Steve Howell 272b954790 hipchat import: Add option to mask content.
Masking content can be useful for testing
out conversions where you're dealing
with data from customers and want to avoid
inadvertently reading their content (while
still having semi-realistic messages).
2018-10-25 08:31:01 -05:00
Steve Howell 6e8ae2e3fd hipchat import: Support private stream subscribers.
We now create private stream subscriptions that are
based off of `members` and `owner` from room data
in `rooms.json`.
2018-10-25 08:31:01 -05:00
Steve Howell 25f532ca2f refactor: Break up build_subscriptions.
Having two smaller functions should make it
easier to customize the behavior for each specific
use case.  The only reason they were ever coupled
was to keep ids in sequence, but the recent NEXT_ID
changes make that a non-issue now.
2018-10-25 08:31:01 -05:00
Steve Howell 2ed9fbd25b conversions: Use NEXT_ID for recipient and subscription ids.
The NEXT_ID scheme seems pretty robust, so I'm fixing a
few easy places.
2018-10-25 08:31:01 -05:00
Steve Howell 50f76e58ce conversions: Make NEXT_ID a true singleton.
We now instantiate NEXT_ID in sequencer.py, which avoids
having multiple modules make multiple copies of a sequencer
and possibly causing id collisions.
2018-10-25 08:31:01 -05:00