Commit Graph

288 Commits

Author SHA1 Message Date
Anders Kaseorg 50e6cba1af ruff: Fix UP032 Use f-string instead of `format` call.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-07-19 16:14:59 -07:00
Anders Kaseorg 052984bc14 utils: Remove make_safe_digest wrapper.
It’s unclear what was supposed to be “safe” about this wrapper.  The
hashlib API is fine without it, and we don’t want to encourage further
use of SHA-1.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-07-19 10:54:05 -07:00
Steve Howell 67cdf1a7b4 emojis: Use get_emoji_data.
The previous function was poorly named, asked for a
Realm object when realm_id sufficed, and returned a
tuple of strings that had different semantics.

I also avoid calling it duplicate times in a couple
places, although it was probably rarely the case that
both invocations actually happened if upstream
validations were working.

Note that there is a TypedDict called EmojiInfo, so I
chose EmojiData here.  Perhaps a better name would be
TinyEmojiData or something.

I also simplify the reaction tests with a verify
helper.
2023-07-17 09:35:53 -07:00
Alex Vandiver 21aeb4a040 slack: Handle the special case of permissions denied on team.info call.
This is a follow-up to 4c8915c8e4, for
the case when the `team:read` permission is missing, which causes the
`team.info` call itself to fail.  The error message supplies
information about the provided and missing permissions -- but it also
still sends the `X-OAuth-Scopes` header which we normall read, so we can
use that as normal.
2023-06-27 11:04:41 -07:00
Lauryn Menard d3f7cfccbc zerver: Update comments with "private message" or "PM".
Updates comments/doc-strings that use "private message" or "PM" in
files in the `/zerver` directory to instead use "direct message".
2023-06-23 11:24:13 -07:00
Lauryn Menard 2eeeda7694 mattermost: Update references to "private message" and "PM".
Updates references to "private message" and "PM" in the data import
and related tests for Mattermost to be "direct message" or "DM"
instead.
2023-06-23 11:24:13 -07:00
Alex Vandiver 4c8915c8e4 slack: Provide more information when a Slack token fails to validate. 2023-06-23 11:09:45 -07:00
rht 1c84f02f57 slack import: Convert threads to nicely named Zulip topics.
Fixes #9006.
2023-05-30 16:35:19 -07:00
Anders Kaseorg 9797de52a0 ruff: Fix RUF010 Use conversion in f-string.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-05-26 22:09:18 -07:00
Alex Vandiver 724de9cd49 rocketchat: Treat users with "bot" roles as bots when importing.
We previously relied on `type`, but we have observed bots typed with a
`bot` role as well.
2023-05-16 15:10:58 -07:00
Alex Vandiver 34394cec9a rocketchat: Handle users with no email address set.
Fixes: #25596.
2023-05-16 15:10:58 -07:00
Mateusz Mandera ffa3aa8487 auth: Rewrite data model for tracking enabled auth backends.
So far, we've used the BitField .authentication_methods on Realm
for tracking which backends are enabled for an organization. This
however made it a pain to add new backends (requiring altering the
column and a migration - particularly troublesome if someone wanted to
create their own custom auth backend for their server).

Instead this will be tracked through the existence of the appropriate
rows in the RealmAuthenticationMethods table.
2023-04-18 09:22:56 -07:00
Alex Vandiver 567d1d54e7 upload: Rename upload_message_file to use word "attachment".
For consistency with the table, which is named Attachment.
2023-03-02 16:36:19 -08:00
Alex Vandiver fe654b76b7 data_import: Stop tar'ing up converted data.
`./manage.py import` does not take a tarball; it takes a directory.
Making a separate tarball is a waste of CPU time and disk, as it is
never used.

This was included in the commit of the initial Slack conversion code
in 5b37c5562b and propagated from there into every conversion tool.

Remove the unnecessary tarball creation.
2023-02-26 17:42:01 -08:00
Anders Kaseorg df001db1a9 black: Reformat with Black 23.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.

(This does not actually upgrade our Python environment to Black 23
yet.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-02 10:40:13 -08:00
Alex Vandiver 92c8c17190 import: Add the UTF-8 flag on file entries in zipfiles from Slack.
Fixes: #22533.
2023-01-31 16:07:48 -08:00
Anders Kaseorg e5d671bf2b ruff: Fix SIM210 Use `bool(…)` instead of `True if … else False`.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-23 11:18:36 -08:00
Anders Kaseorg b8b29dc3ad ruff: Fix SIM110 Use `return any(…)` instead of `for` loop.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-23 11:18:36 -08:00
Anders Kaseorg 8f7a7877fe python: Clean up janky URL matching code with urlsplit.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-18 17:25:46 -05:00
Alex Vandiver 7c0d414aff uploads: Split out S3 and local file backends into separate files.
The uploads file is large, and conceptually the S3 and local-file
backends are separable.
2023-01-09 18:23:58 -05:00
Anders Kaseorg e1ed44907b ruff: Fix SIM118 Use `key in dict` instead of `key in dict.keys()`.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-04 16:25:07 -08:00
Anders Kaseorg edab4ec997 rocketchat: Import timezone-aware datetimes.
The bson library creates naive datetime objects by default.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-12-27 10:34:30 -08:00
Anders Kaseorg 872f4b41c1 ci: Check that non-scripts aren’t marked executable.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-12-07 09:54:01 -08:00
M@ 280523ad48
slack import: Merge and dedupe same-base emoji reactions and userlists.
The naive solution #23465 creates situations where the same user can have
multiple reactions as the base emojis are not unique, e.g. +1::skin2
and +1::skin4 would both reduce to +1 but the userlists are separate.
This solution handles the reduction, merges the same-base reactions,
and deduplicates the userlist.

Co-authored-by: Alex Vandiver <alexmv@zulip.com>
Co-authored-by: rht <rhtbot@protonmail.com>
2022-11-16 11:11:43 -08:00
Anders Kaseorg 0258fba345 ruff: Fix N811 constant imported as non-constant.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-11-16 09:29:11 -08:00
rht 5ce2103b87 Slack import: Cache emoji.json in static/generated/emoji.
Previously, emoji.json was read from
"$ZULIP_PATH/node_modules/emoji-datasource-google/emoji.json".
This path doesn't exist in production when installing from scratch from
a release tarball. And so, we ensure emoji.json exists by copying it to
`static/generated/emoji`.

With tweaks to comments by tabbott.

Fixes: #23469
2022-11-15 10:43:11 -08:00
Matt Keller 958b58f174 slack: Skip reactions for deleted users.
Fixes #23552.
2022-11-14 13:08:15 -08:00
Matt Keller 8aa7ff4bbb slack: Parse emoji skin tone variants.
Fixes part of #23276.
2022-11-07 14:25:49 -08:00
Matt Keller 4d87bf291c slack: Skip files where file_access: file_not_found. 2022-10-25 12:18:20 -07:00
Matt Keller c5f106ce1b slack: Skip files where file_access: access_denied.
These stubs are incomplete and should be treated akin to tombstones.
2022-10-11 10:53:16 -07:00
Mateusz Mandera 00b3546c9f models: Add denormalized .realm column to Message.
This commit adds the OPTIONAL .realm attribute to Message
(and ArchivedMessage), with the server changes for making new Messages
have this set. Old Messages still have to be migrated to backfill this,
before it can be non-nullable.

Appropriate test changes to correctly set .realm for Messages the tests
manually create are included here as well.
2022-10-07 10:09:38 -07:00
Anders Kaseorg 47c5deeccd python: Mark dict parameters with defaults as read-only.
Found by semgrep 0.115 more accurately applying the rule added in
commit 0d6c771baf (#15349).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-10-06 13:48:28 -07:00
Mateusz Mandera 2811a1228f import_util: Make build_message only take kwargs.
build_message has a lot of arguments, so it's hard to verify correctness
of callers that just try to get the order right. It's much clearer to be
explicit via kwargs. mattermost.py and rocketchat.py already do this, so
let's bring slack.py and gitter.py up to par.
2022-09-27 15:04:48 -07:00
Matt Keller fd996c286e slack: Filter out non-.json files for processing. 2022-09-23 09:59:34 -07:00
rht a7cff0f091 Slack import: Translate to emoji name to codepoint using iamcal data.
Because Slack emoji naming is different from Zulip's.
According to https://emojipedia.org/slack/, Slack's emoji shortcodes are
derived from https://github.com/iamcal/emoji-data.
There are probably some deviations from that dataset, but this PR should
at least catch the ones that are identical to iamcal's.
2022-09-17 12:04:07 -07:00
Florian Pritz a276603766 rocketchat: Deduplicate and ignore huddle rooms with same users.
If there are more than 1 room with the same set of users, the import
will fail due to a unique constraint on the huddle_hash. Figuring out
why and which room is causing this database error is kinda difficult.

We deduplicate those cases here and simply merge the rooms together.
Note however, that the deduplication does not work as expected so we
simply ignore them all together for now and only raise an exception
along some logging output. At least this way, it is pretty clear what is
wrong and you do not have to wait to get a database error during the
actual import.

We also ignore empty huddle rooms since those are the duplicates that
caused problems for me and if they are empty, ignoring them is easier
than trying to get the merge to work.

Not sure where those channels come from since we discovered this with
production data.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 3677aabcbd rocketchat: Ignore mention mapping failures.
Not sure where those come from since we discovered this with production
data.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz c308799133 rocketchat: Only set message content if it exists.
Not sure where those come from since we discovered this with production
data.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 1cc2764d45 rocketchat: Ignore reactions from non-existant users.
Not sure where those come from since we discovered this with production
data. Somehow there were reactions with usernames that were old and no
longer existed.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 26fe028534 rocketchat: Truncate long stream names.
These will lead to an error during import otherwise.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 3a27919b5b rocketchat: Ignore rocketchat attachments without types.
Not sure where those come from since we discovered this with production
data.

There only was a single instance of this in my entire batch of data in
an old message from the time when we started using Rocket.Chat. This
might be an old issue or it might require some special settings that
were later changed.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 5ec8f4ef09 rocketchat: Ignore missing rocketchat attachments.
Not sure where those come from since we discovered this with production
data.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 96fa0991f8 rocketchat: Handle long or invalid rocketchat attachment names.
Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Mateusz Mandera 5bcf78e0cb import: Fix timestamp check in long_term_idle_helper.
This is supposed to be 60 days, but timestamps are in seconds.
2022-08-29 15:18:00 -07:00
Mateusz Mandera d350406991 gitter: Make imported Realm start with only GitHub auth enabled.
Users will only be able to login via GitHub, because imported users
get GitHub's generated noreply email addresses - so this should be the
only auth method enabled at first, to avoid confusion.
2022-08-29 11:10:18 -07:00
Mateusz Mandera eed8800573 long_term_idle_helper: Change all_user_ids arg to an Iterator. 2022-08-29 11:03:27 -07:00
Mateusz Mandera 4c7a9816ff gitter: Soft deactivate appropriate imported users.
We want to use the long_term_idle_helper logic for gitter imports just
like we do for slack.
2022-08-29 11:03:27 -07:00
Mateusz Mandera 75f26bb8ff long_term_idle_helper: Take list of user_ids as arg instead of dicts.
Only ["id"] is accessed on the dicts (representing the external tool
users). Given that for some tools the id may be under a different name
etc. due to different user dicts format, it's best to just pass those
ids to the function so that it can stay generalized and not reliant
on a specific user dict format.
2022-08-29 11:03:27 -07:00
Mateusz Mandera 7ac31223e8 gitter: Extract get_user_from_message helper. 2022-08-29 11:03:27 -07:00
Mateusz Mandera c4c270380a slack: Use get_timestamp_from_message helper function where relevant.
get_timestamp_from_message was extracted in the previous commit. We can
deduplicate and the code a bit cleaner by using it where appropriate
instead of message["ts"].
2022-08-29 11:03:27 -07:00