This adds `normalize_body_for_import` to normalize messages from
third-party importers by removing NUL bytes and also updates import
test files data to test this.
Fixes#31930.
html2text mangles Unicode by default, with a --unicode-snob option to
disable it. If I have to get called a “snob” for wanting to correctly
support non-English languages, then uh, I’ll take one for the team.
https://github.com/Alir3z4/html2text/blob/2024.2.26/html2text/config.py#L111-L150
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Currently, Slack messages containing hyperlinks
(e.g.,<http://foo.com|Foo!>) are converted like
normal links. This commit reformats Slack
hyperlinks into Zulip-friendly markdown
(e.g., [Foo!](http://foo.com)).
Part of #32165.
This is not the best factored version of this, but it saves effort
changing the tests, and importantly should make failures involving
metadata only take a couple seconds rather than first doing a giant
BSON read before learning about them.
This commit makes the third-party data converters check for invalid user
emails. If it finds any, it’ll raise an Exception and show an error
message with all the bad emails listed out.
Fixes: #31783
This lets slack conversions be done on development hosts, which have a
trailing :9991 on their EXTERNAL_HOST; otherwise, we generate fake
emails like `imported-slack-bot@host.name:9991` which fail to
validate.
This commit adds a new `group_size` field to the `DirectMessageGroup`
model, and backfills its value to each of the existing direct message
groups.
Fixes part of #25713
The Slack API when returning the emoji records, returns the record for
its thumbsup_all emoji with the url ending with .png, even though the
file is a gif.
For that reason, we have to make that code correct file extensions based
on the response content-type. Emojis are the smallest set of images to
download, so for simplicity of implementation, we remove the
parallelization of the downloads in favor of just processing them
serially.
Earlier, we were replacing too long attachment name with random uuid
when the character count of the file name was greater than 255.
This results in "OSError: [Errno 36] File name too long" error in
few cases when the file name has less than 255 characters but more
than 255 bytes (file name with Non-ASCII characters).
This commit updates the code to check the file name's byte size
instead of characters count.
We use a truncated SHA256 of the id and a server-side secret to make
emoji have non-guessable filenames, while also making collisions
unlikely.
We also adjust the Slack import to use the same SHA-based name,
instead of taking the same name as it had in Slack.
This commit renames the "Huddle" Django model class to
"DirectMessageGroup", while maintaining the same table --
"zerver_huddle".
Fixes part of #28640.
Hash the salt, user-id, and now avatar version into the filename.
This allows the URL contents to be immutable, and thus to be marked as
immutable and cacheable. Since avatars are served unauthenticated,
hashing with a server-side salt makes the current and past avatars not
enumerable.
This requires plumbing the current (or future) avatar version through
various parts of the upload process.
Since this already requires a full migration of current avatars, also
take the opportunity to fix the missing `.png` on S3 uploads (#12852).
We switch from SHA-1 to SHA-256, but truncate it such that avatar URL
data does not substantially increase in size.
Fixes: #12852.
This commit performs a sweep on the first batch of non API
files to rename "huddle" to "direct_message_group`.
It also renames variables and methods of type -
"huddle_message" to "group_direct_message".
This is a part of #28640
Gitter broke their older API as part of being integrated
into Matrix.
Their announcement blog says:
"Anything left using the Gitter APIs will need to be
updated to use the Matrix API"
This commit drops the legacy Gitter import tool and
we plan to build a new one for Matrix in future.
The returns plugin hasn’t been updated for mypy ≥ 1.6. This
annotation is more limited in that it only supports a fixed number of
positional arguments and no keyword arguments, but is good enough for
our purposes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Replaced HUDDLE attribute with DIRECT_MESSAGE_GROUP using VS Code search,
part of a general renaming of the object class.
Fixes part of #28640.
Co-authored-by: JohnLu2004 <JohnLu10212004@gmail.com>
Only affects zulipchat, by being based on the BILLING_ENABLED setting.
The restricted backends in this commit are
- AzureAD - restricted to Standard plan
- SAML - restricted to Plus plan, although it was already practically
restricted due to requiring server-side configuration to be done by us
This restriction is placed upon **enabling** a backend - so
organizations that already have a backend enabled, will continue to be
able to use it. This allows us to make exceptions and enable a backend
for an org manually via the shell, and to grandfather organizations into
keeping the backend they have been relying on.
It is possible to have multiple users with the same email address --
for instance, when two users are guests in shared channels via two
different other Slack instances.
Combine those Slack user-ids into one Zulip user, by their user-id;
otherwise, we run into problems during import due to duplicate keys.