Commit Graph

53 Commits

Author SHA1 Message Date
PieterCK 0d7199b22e data_import: Add migration status file to converted exports.
This commit updates all third-party importer tools (Slack, Mattermost,
and Rocket Chat) in the `zerver/data_import` directory to also output a
migration_status.json file in their output tarball.

This is required because all importable tarball will be checked for
migration compatibility during import.

Fixes #28443.
2024-11-08 15:52:45 -08:00
Harsh e468818d2b import: Remove skipping of too-long messages during import.
This commit eliminates the skipping of messages longer than 10K characters during the import process.
2024-11-07 16:04:14 -08:00
Tim Abbott 010410c849 rocketchat: Validate custom emoji before larger data sets.
This is a data set that's relatively likely to have weird failures,
and also likely to be fairly small.
2024-10-17 12:25:18 -07:00
Tim Abbott 6e4da50577 rocketchat: Complete metadata verification before importing uploads.
This is not the best factored version of this, but it saves effort
changing the tests, and importantly should make failures involving
metadata only take a couple seconds rather than first doing a giant
BSON read before learning about them.
2024-10-17 12:25:18 -07:00
Tim Abbott 79b6f43d0e rocketchat: Move bson_code_options to a global variable.
This will make it a lot easier to only read files in when we actually
need them.
2024-10-17 12:25:18 -07:00
PieterCK 6289a551aa data_import: Add email validation to third-party data converters.
This commit makes the third-party data converters check for invalid user
emails. If it finds any, it’ll raise an Exception and show an error
message with all the bad emails listed out.

Fixes: #31783
2024-10-15 16:04:43 -07:00
roanster007 c6a06d4684 direct_message_group: Add new `group_size` field.
This commit adds a new `group_size` field to the `DirectMessageGroup`
model, and backfills its value to each of the existing direct message
groups.

Fixes part of #25713
2024-08-23 11:09:41 -07:00
Prakhar Pratyush 19d56f77b5 rocketchat: Fix "OSError: [Errno 36] File name too long" error.
Earlier, we were replacing too long attachment name with random uuid
when the character count of the file name was greater than 255.

This results in "OSError: [Errno 36] File name too long" error in
few cases when the file name has less than 255 characters but more
than 255 bytes (file name with Non-ASCII characters).

This commit updates the code to check the file name's byte size
instead of characters count.
2024-08-14 18:18:31 -07:00
roanster007 7b3e163d55 refactor: Rename `huddle` to `direct_message_group` in non api files.
This commit completes rename of "huddle" to "direct_message_group"
in all the non API files.

Part of #28640
2024-07-31 23:25:56 -07:00
Anders Kaseorg 722842a0aa rocketchat: Remove unnecessary SHA-1 hashing of direct message groups.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-17 15:56:00 -07:00
Anders Kaseorg 27b0618704 data_import: Fix IdMapper typing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-17 15:56:00 -07:00
Anders Kaseorg 6412c2d630 ruff: Fix FURB142 Use of set.add() in a for loop.
This is a preview rule, not yet enabled by default.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-14 13:52:59 -07:00
Anders Kaseorg e08a24e47f ruff: Fix UP006 Use `list` instead of `List` for type annotation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
roanster007 52692a6448 refactor: Rename `huddle` to `direct_message_group` in non API.
This commit performs a sweep on the first batch of non API
files to rename "huddle" to "direct_message_group`.

It also renames variables and methods of type -
"huddle_message" to "group_direct_message".

This is a part of #28640
2024-07-04 07:56:31 -07:00
Alex Vandiver 17fb23746f upload: Move methods into zerver.lib.upload from .base. 2024-06-26 16:43:11 -07:00
Lauryn Menard c931966e1b help: Rename and redirect stream-sending-policy for channel. 2024-05-03 13:02:20 -07:00
John Lu a5cf0ec526
refactor: Replace HUDDLE with DIRECT_MESSAGE_GROUP.
Replaced HUDDLE attribute with DIRECT_MESSAGE_GROUP using VS Code search,
part of a general renaming of the object class.

Fixes part of #28640.

Co-authored-by: JohnLu2004 <JohnLu10212004@gmail.com>
2024-03-21 16:39:33 -07:00
Anders Kaseorg 4aa2d76bea models: Extract zerver.models.streams.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg 51f1dc257d models: Extract zerver.models.recipients.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg e601d0ae7c models: Rename zerver/models.py to zerver/models/__init__.py.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg 2665a3ce2b python: Elide unnecessary list wrappers.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-09-13 12:41:23 -07:00
Anders Kaseorg 052984bc14 utils: Remove make_safe_digest wrapper.
It’s unclear what was supposed to be “safe” about this wrapper.  The
hashlib API is fine without it, and we don’t want to encourage further
use of SHA-1.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-07-19 10:54:05 -07:00
Lauryn Menard d3f7cfccbc zerver: Update comments with "private message" or "PM".
Updates comments/doc-strings that use "private message" or "PM" in
files in the `/zerver` directory to instead use "direct message".
2023-06-23 11:24:13 -07:00
Alex Vandiver 724de9cd49 rocketchat: Treat users with "bot" roles as bots when importing.
We previously relied on `type`, but we have observed bots typed with a
`bot` role as well.
2023-05-16 15:10:58 -07:00
Alex Vandiver 34394cec9a rocketchat: Handle users with no email address set.
Fixes: #25596.
2023-05-16 15:10:58 -07:00
Alex Vandiver fe654b76b7 data_import: Stop tar'ing up converted data.
`./manage.py import` does not take a tarball; it takes a directory.
Making a separate tarball is a waste of CPU time and disk, as it is
never used.

This was included in the commit of the initial Slack conversion code
in 5b37c5562b and propagated from there into every conversion tool.

Remove the unnecessary tarball creation.
2023-02-26 17:42:01 -08:00
Anders Kaseorg e5d671bf2b ruff: Fix SIM210 Use `bool(…)` instead of `True if … else False`.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-23 11:18:36 -08:00
Alex Vandiver 7c0d414aff uploads: Split out S3 and local file backends into separate files.
The uploads file is large, and conceptually the S3 and local-file
backends are separable.
2023-01-09 18:23:58 -05:00
Anders Kaseorg edab4ec997 rocketchat: Import timezone-aware datetimes.
The bson library creates naive datetime objects by default.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-12-27 10:34:30 -08:00
Mateusz Mandera 00b3546c9f models: Add denormalized .realm column to Message.
This commit adds the OPTIONAL .realm attribute to Message
(and ArchivedMessage), with the server changes for making new Messages
have this set. Old Messages still have to be migrated to backfill this,
before it can be non-nullable.

Appropriate test changes to correctly set .realm for Messages the tests
manually create are included here as well.
2022-10-07 10:09:38 -07:00
Florian Pritz a276603766 rocketchat: Deduplicate and ignore huddle rooms with same users.
If there are more than 1 room with the same set of users, the import
will fail due to a unique constraint on the huddle_hash. Figuring out
why and which room is causing this database error is kinda difficult.

We deduplicate those cases here and simply merge the rooms together.
Note however, that the deduplication does not work as expected so we
simply ignore them all together for now and only raise an exception
along some logging output. At least this way, it is pretty clear what is
wrong and you do not have to wait to get a database error during the
actual import.

We also ignore empty huddle rooms since those are the duplicates that
caused problems for me and if they are empty, ignoring them is easier
than trying to get the merge to work.

Not sure where those channels come from since we discovered this with
production data.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 3677aabcbd rocketchat: Ignore mention mapping failures.
Not sure where those come from since we discovered this with production
data.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz c308799133 rocketchat: Only set message content if it exists.
Not sure where those come from since we discovered this with production
data.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 1cc2764d45 rocketchat: Ignore reactions from non-existant users.
Not sure where those come from since we discovered this with production
data. Somehow there were reactions with usernames that were old and no
longer existed.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 26fe028534 rocketchat: Truncate long stream names.
These will lead to an error during import otherwise.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 3a27919b5b rocketchat: Ignore rocketchat attachments without types.
Not sure where those come from since we discovered this with production
data.

There only was a single instance of this in my entire batch of data in
an old message from the time when we started using Rocket.Chat. This
might be an old issue or it might require some special settings that
were later changed.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 5ec8f4ef09 rocketchat: Ignore missing rocketchat attachments.
Not sure where those come from since we discovered this with production
data.

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Florian Pritz 96fa0991f8 rocketchat: Handle long or invalid rocketchat attachment names.
Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-09-09 16:57:24 -07:00
Alex Vandiver e653bb2733 rocketchat: Handle PMs with only one recipient.
These are either to a deleted user, or actually to the same user.  In
any case, treat them as self-messages.
2022-08-09 10:58:58 -07:00
Alex Vandiver 51421f378b rocketchat: Skip mentions of unknown users.
It is apparently possible to have a mention of a user who is not (or
no longer?) in the `users.bson` table.

Skip such mention for the purposes of Zulip import; there's nothing
better for us to do.
2022-08-09 10:58:58 -07:00
Alex Vandiver 28a29e64a0 rocketchat: File upload chunks may exist without their metadata.
This is likely an error somewhere in rocketchat's MongoDB "eventual
consistency," but there is no problem with skipping the chunks at this
step.

In the one case where this was observed so far, the upload-id was not
referenced in any message -- if it is referenced and has chunks, but
has no metadata, we will fail later, at that reference.
2022-08-09 10:58:58 -07:00
Zixuan James Li 63e9ae8389 typing: Apply trivial fixes to adjust edge cases in typing.
Add none-checks, rename variables (to avoid redefinition of
the same variable with different types error), add necessary
type annotations.

This is a part of #18777.

Signed-off-by: Zixuan James Li <359101898@qq.com>
2022-05-30 12:03:51 -07:00
Priyansh Garg 42f231c85c data_import: Ignore Rocket.Chat livechat streams/messages.
This resolves the issues reported in #20108, major chunk of which were
due to the incomplete support for importing the livechat streams/messages
in the tool. So, it's best not to import any livechat streams/messages for
now until a complete support for importing the same is developed.
2021-11-07 09:50:55 -08:00
Priyansh Garg 17409a78be data_import: Fix a few KeyError bugs in Rocket.Chat import tool.
This commit fixes a few bugs in Rocket.Chat import tool as reported on CZO.

Link: https://chat.zulip.org/#narrow/stream/9-issues/topic/Rocketchat.20Import
2021-11-03 16:50:56 -07:00
Priyansh Garg 0c2e4eec20 data_import: Import Rocket.Chat threads as separate topics. 2021-11-01 17:13:35 -07:00
Priyansh Garg 0db9b7287b data_import: Import Rocket.Chat messages from direct discussions.
This commit adds functionality to import messages from the
Discussions having direct channels as their parent. As we don't
have topics in the PMs, the messages are imported in interleaved
form in the imported direct channels/PMs.

This was completely unsupported earlier and would have resulted in
an error.
2021-11-01 17:09:11 -07:00
Priyansh Garg 5f1e246230 data_import: Import wildcard mention data from Rocket.Chat. 2021-11-01 17:06:15 -07:00
Priyansh Garg 26f16b9eec data_import: Separate logic for naming Rocket.Chat streams and topics. 2021-11-01 16:48:25 -07:00
Priyansh Garg 54452fef6c data_import: Fix channel mentions in Rocket.Chat import.
While the STREAM_LINK_REGEX and STREAM_TOPIC_LINK_REGEX
identifies the stream and topic mentions in the content
correctly (tested by printing out the matches), the
stream/topic mentions are still not linked to the
corresponding streams/topics for imported messages, as
a `zulip_message` instance is required for linking these
mentions to actual streams/topics (see `StreamPattern`
class in `markdown/__init__.py`) which is not provided
while processing the markdown for imported messages.
2021-08-09 06:38:26 -07:00
Priyansh Garg aed4e48da7 data_import: Import attachments from Rocket.Chat. 2021-08-09 06:38:26 -07:00