Commit Graph

32 Commits

Author SHA1 Message Date
Anders Kaseorg 052984bc14 utils: Remove make_safe_digest wrapper.
It’s unclear what was supposed to be “safe” about this wrapper.  The
hashlib API is fine without it, and we don’t want to encourage further
use of SHA-1.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-07-19 10:54:05 -07:00
Lauryn Menard d3f7cfccbc zerver: Update comments with "private message" or "PM".
Updates comments/doc-strings that use "private message" or "PM" in
files in the `/zerver` directory to instead use "direct message".
2023-06-23 11:24:13 -07:00
Alex Vandiver 724de9cd49 rocketchat: Treat users with "bot" roles as bots when importing.
We previously relied on `type`, but we have observed bots typed with a
`bot` role as well.
2023-05-16 15:10:58 -07:00
Alex Vandiver 34394cec9a rocketchat: Handle users with no email address set.
Fixes: #25596.
2023-05-16 15:10:58 -07:00
Alex Vandiver fe654b76b7 data_import: Stop tar'ing up converted data.
`./manage.py import` does not take a tarball; it takes a directory.
Making a separate tarball is a waste of CPU time and disk, as it is
never used.

This was included in the commit of the initial Slack conversion code
in 5b37c5562b and propagated from there into every conversion tool.

Remove the unnecessary tarball creation.
2023-02-26 17:42:01 -08:00
Anders Kaseorg e5d671bf2b ruff: Fix SIM210 Use `bool(…)` instead of `True if … else False`.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-23 11:18:36 -08:00
Alex Vandiver 7c0d414aff uploads: Split out S3 and local file backends into separate files.
The uploads file is large, and conceptually the S3 and local-file
backends are separable.
2023-01-09 18:23:58 -05:00
Anders Kaseorg edab4ec997 rocketchat: Import timezone-aware datetimes.
The bson library creates naive datetime objects by default.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-12-27 10:34:30 -08:00
Mateusz Mandera 00b3546c9f models: Add denormalized .realm column to Message.
This commit adds the OPTIONAL .realm attribute to Message
(and ArchivedMessage), with the server changes for making new Messages
have this set. Old Messages still have to be migrated to backfill this,
before it can be non-nullable.

Appropriate test changes to correctly set .realm for Messages the tests
manually create are included here as well.
2022-10-07 10:09:38 -07:00
Florian Pritz a276603766 rocketchat: Deduplicate and ignore huddle rooms with same users.
If there are more than 1 room with the same set of users, the import
will fail due to a unique constraint on the huddle_hash. Figuring out
why and which room is causing this database error is kinda difficult.

We deduplicate those cases here and simply merge the rooms together.
Note however, that the deduplication does not work as expected so we
simply ignore them all together for now and only raise an exception
along some logging output. At least this way, it is pretty clear what is
wrong and you do not have to wait to get a database error during the
actual import.

We also ignore empty huddle rooms since those are the duplicates that
caused problems for me and if they are empty, ignoring them is easier
than trying to get the merge to work.

Not sure where those channels come from since we discovered this with
production data.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 3677aabcbd rocketchat: Ignore mention mapping failures.
Not sure where those come from since we discovered this with production
data.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz c308799133 rocketchat: Only set message content if it exists.
Not sure where those come from since we discovered this with production
data.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 1cc2764d45 rocketchat: Ignore reactions from non-existant users.
Not sure where those come from since we discovered this with production
data. Somehow there were reactions with usernames that were old and no
longer existed.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 26fe028534 rocketchat: Truncate long stream names.
These will lead to an error during import otherwise.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 3a27919b5b rocketchat: Ignore rocketchat attachments without types.
Not sure where those come from since we discovered this with production
data.

There only was a single instance of this in my entire batch of data in
an old message from the time when we started using Rocket.Chat. This
might be an old issue or it might require some special settings that
were later changed.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 5ec8f4ef09 rocketchat: Ignore missing rocketchat attachments.
Not sure where those come from since we discovered this with production
data.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 96fa0991f8 rocketchat: Handle long or invalid rocketchat attachment names.
Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Alex Vandiver e653bb2733 rocketchat: Handle PMs with only one recipient.
These are either to a deleted user, or actually to the same user.  In
any case, treat them as self-messages.
2022-08-09 10:58:58 -07:00
Alex Vandiver 51421f378b rocketchat: Skip mentions of unknown users.
It is apparently possible to have a mention of a user who is not (or
no longer?) in the `users.bson` table.

Skip such mention for the purposes of Zulip import; there's nothing
better for us to do.
2022-08-09 10:58:58 -07:00
Alex Vandiver 28a29e64a0 rocketchat: File upload chunks may exist without their metadata.
This is likely an error somewhere in rocketchat's MongoDB "eventual
consistency," but there is no problem with skipping the chunks at this
step.

In the one case where this was observed so far, the upload-id was not
referenced in any message -- if it is referenced and has chunks, but
has no metadata, we will fail later, at that reference.
2022-08-09 10:58:58 -07:00
Zixuan James Li 63e9ae8389 typing: Apply trivial fixes to adjust edge cases in typing.
Add none-checks, rename variables (to avoid redefinition of
the same variable with different types error), add necessary
type annotations.

This is a part of #18777.

Signed-off-by: Zixuan James Li <359101898@qq.com>
2022-05-30 12:03:51 -07:00
Priyansh Garg 42f231c85c data_import: Ignore Rocket.Chat livechat streams/messages.
This resolves the issues reported in #20108, major chunk of which were
due to the incomplete support for importing the livechat streams/messages
in the tool. So, it's best not to import any livechat streams/messages for
now until a complete support for importing the same is developed.
2021-11-07 09:50:55 -08:00
Priyansh Garg 17409a78be data_import: Fix a few KeyError bugs in Rocket.Chat import tool.
This commit fixes a few bugs in Rocket.Chat import tool as reported on CZO.

Link: https://chat.zulip.org/#narrow/stream/9-issues/topic/Rocketchat.20Import
2021-11-03 16:50:56 -07:00
Priyansh Garg 0c2e4eec20 data_import: Import Rocket.Chat threads as separate topics. 2021-11-01 17:13:35 -07:00
Priyansh Garg 0db9b7287b data_import: Import Rocket.Chat messages from direct discussions.
This commit adds functionality to import messages from the
Discussions having direct channels as their parent. As we don't
have topics in the PMs, the messages are imported in interleaved
form in the imported direct channels/PMs.

This was completely unsupported earlier and would have resulted in
an error.
2021-11-01 17:09:11 -07:00
Priyansh Garg 5f1e246230 data_import: Import wildcard mention data from Rocket.Chat. 2021-11-01 17:06:15 -07:00
Priyansh Garg 26f16b9eec data_import: Separate logic for naming Rocket.Chat streams and topics. 2021-11-01 16:48:25 -07:00
Priyansh Garg 54452fef6c data_import: Fix channel mentions in Rocket.Chat import.
While the STREAM_LINK_REGEX and STREAM_TOPIC_LINK_REGEX
identifies the stream and topic mentions in the content
correctly (tested by printing out the matches), the
stream/topic mentions are still not linked to the
corresponding streams/topics for imported messages, as
a `zulip_message` instance is required for linking these
mentions to actual streams/topics (see `StreamPattern`
class in `markdown/__init__.py`) which is not provided
while processing the markdown for imported messages.
2021-08-09 06:38:26 -07:00
Priyansh Garg aed4e48da7 data_import: Import attachments from Rocket.Chat. 2021-08-09 06:38:26 -07:00
Priyansh Garg 65e28907cb data_import: Import custom emoji from Rocket.Chat. 2021-08-09 06:38:26 -07:00
Priyansh Garg 044fe547d3 data_import: Add huddle import support for Rocket.Chat. 2021-07-28 15:45:54 -07:00
Priyansh Garg 24dd0ff96c data_import: Add rocket chat import tool.
This commit allows to import the following from rocketchat:
* All users
* All public/private channels
* All teams and its public/private channels
* All discussion rooms as topics in their parent channel
* All the messages in all the channels
* All private conversations
* Reactions on messages (except for custom emojis)
* Mentions in messages (except @all, @here mentions)
2021-07-28 15:28:56 -07:00