This commit updates all third-party importer tools (Slack, Mattermost,
and Rocket Chat) in the `zerver/data_import` directory to also output a
migration_status.json file in their output tarball.
This is required because all importable tarball will be checked for
migration compatibility during import.
Fixes#28443.
This is not the best factored version of this, but it saves effort
changing the tests, and importantly should make failures involving
metadata only take a couple seconds rather than first doing a giant
BSON read before learning about them.
This commit makes the third-party data converters check for invalid user
emails. If it finds any, it’ll raise an Exception and show an error
message with all the bad emails listed out.
Fixes: #31783
This commit adds a new `group_size` field to the `DirectMessageGroup`
model, and backfills its value to each of the existing direct message
groups.
Fixes part of #25713
Earlier, we were replacing too long attachment name with random uuid
when the character count of the file name was greater than 255.
This results in "OSError: [Errno 36] File name too long" error in
few cases when the file name has less than 255 characters but more
than 255 bytes (file name with Non-ASCII characters).
This commit updates the code to check the file name's byte size
instead of characters count.
This commit performs a sweep on the first batch of non API
files to rename "huddle" to "direct_message_group`.
It also renames variables and methods of type -
"huddle_message" to "group_direct_message".
This is a part of #28640
Replaced HUDDLE attribute with DIRECT_MESSAGE_GROUP using VS Code search,
part of a general renaming of the object class.
Fixes part of #28640.
Co-authored-by: JohnLu2004 <JohnLu10212004@gmail.com>
It’s unclear what was supposed to be “safe” about this wrapper. The
hashlib API is fine without it, and we don’t want to encourage further
use of SHA-1.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
`./manage.py import` does not take a tarball; it takes a directory.
Making a separate tarball is a waste of CPU time and disk, as it is
never used.
This was included in the commit of the initial Slack conversion code
in 5b37c5562b and propagated from there into every conversion tool.
Remove the unnecessary tarball creation.
This commit adds the OPTIONAL .realm attribute to Message
(and ArchivedMessage), with the server changes for making new Messages
have this set. Old Messages still have to be migrated to backfill this,
before it can be non-nullable.
Appropriate test changes to correctly set .realm for Messages the tests
manually create are included here as well.
If there are more than 1 room with the same set of users, the import
will fail due to a unique constraint on the huddle_hash. Figuring out
why and which room is causing this database error is kinda difficult.
We deduplicate those cases here and simply merge the rooms together.
Note however, that the deduplication does not work as expected so we
simply ignore them all together for now and only raise an exception
along some logging output. At least this way, it is pretty clear what is
wrong and you do not have to wait to get a database error during the
actual import.
We also ignore empty huddle rooms since those are the duplicates that
caused problems for me and if they are empty, ignoring them is easier
than trying to get the merge to work.
Not sure where those channels come from since we discovered this with
production data.
Signed-off-by: Florian Pritz <bluewind@xinu.at>
Not sure where those come from since we discovered this with production
data. Somehow there were reactions with usernames that were old and no
longer existed.
Signed-off-by: Florian Pritz <bluewind@xinu.at>
Not sure where those come from since we discovered this with production
data.
There only was a single instance of this in my entire batch of data in
an old message from the time when we started using Rocket.Chat. This
might be an old issue or it might require some special settings that
were later changed.
Signed-off-by: Florian Pritz <bluewind@xinu.at>
It is apparently possible to have a mention of a user who is not (or
no longer?) in the `users.bson` table.
Skip such mention for the purposes of Zulip import; there's nothing
better for us to do.
This is likely an error somewhere in rocketchat's MongoDB "eventual
consistency," but there is no problem with skipping the chunks at this
step.
In the one case where this was observed so far, the upload-id was not
referenced in any message -- if it is referenced and has chunks, but
has no metadata, we will fail later, at that reference.
Add none-checks, rename variables (to avoid redefinition of
the same variable with different types error), add necessary
type annotations.
This is a part of #18777.
Signed-off-by: Zixuan James Li <359101898@qq.com>
This resolves the issues reported in #20108, major chunk of which were
due to the incomplete support for importing the livechat streams/messages
in the tool. So, it's best not to import any livechat streams/messages for
now until a complete support for importing the same is developed.
This commit adds functionality to import messages from the
Discussions having direct channels as their parent. As we don't
have topics in the PMs, the messages are imported in interleaved
form in the imported direct channels/PMs.
This was completely unsupported earlier and would have resulted in
an error.
While the STREAM_LINK_REGEX and STREAM_TOPIC_LINK_REGEX
identifies the stream and topic mentions in the content
correctly (tested by printing out the matches), the
stream/topic mentions are still not linked to the
corresponding streams/topics for imported messages, as
a `zulip_message` instance is required for linking these
mentions to actual streams/topics (see `StreamPattern`
class in `markdown/__init__.py`) which is not provided
while processing the markdown for imported messages.