The S3 backend implementation of upload_emoji_image was accessing
emoji_file.name - which is redundant because emoji_file_name already
gets passed in and can be used, and an object of type IO[bytes] may not
have the .name attribute. Spotted by @Fingel.
The target realm was not being passed to create_attachment in
upload_message_file implementations. This was a bug in the edge-case of
cross-realm messages - in particular, causing a bug in the email
gateway:
When an email with an attachment is sent, the message is mirrored to
Zulip with Email Gateway Bot as the message sender and uploader of the
attachment. Due to the realm not being passed to create_attachment, the
Attachment would get created with .realm being the system bot realm,
making the attachment inaccessible under some conditions due to failing
the following condition check (that's expected to pass, provided that
the .realm is set correctly):
```
if (
attachment.is_realm_public
and attachment.realm == user_profile.realm
and user_profile.can_access_public_streams()
):
# Any user in the realm can access realm-public files
return True
```
Fixes “DeprecationWarning: 'jinja2.Markup' is deprecated and will be
removed in Jinja 3.1. Import 'markupsafe.Markup' instead.”
Signed-off-by: Anders Kaseorg <anders@zulip.com>
GIF files can be `.GIF`, and also we determine the file format by
inspecting the image data, so there's no reason to have this
assertion.
(The code for serving still images does not rely on the file being a
GIF.)
Now, when we add a custom animated emoji to the realm
we also save a still image of it (1st frame of the gif). So
we can avoid showing an animated emoji every time.
get_public_upload_root_url and construct_public_upload_url_base were
both doing basically the same thing in the same. We deduplicate this,
making them share the same code, using the approach from
get_public_upload_root_url of using urljoin.
Using a format string is not a great idea, as it doesn't handle the case
of the URL already having parts that will be interpreted as format
string metacharacters. On the downside, this approach negatively affects
performance:
```
...: s = time.time()
...: for i in range(0, 250):
...: r = u.get_public_upload_url("foo")
...: print(time.time()-s)
0.020366191864013672
```
up from 0.001 before this change.
This is much faster than calling generate_presigned_url each time.
```
In [3]: t = time.time()
...: for i in range(250):
...: x = u.get_public_upload_url("foo")
...: print(time.time()-t)
0.0010945796966552734
```
Fixes#18915
This was very slow, causing performance issues. After investigating,
generate_presigned_url is the cheap part of this, but the
session.client() call is expensive - so that's what we should cache.
Before the change:
```
In [4]: t = time.time()
...: for i in range(250):
...: x = u.get_public_upload_url("foo")
...: print(time.time()-t)
6.408717393875122
```
After:
```
In [4]: t = time.time()
...: for i in range(250):
...: x = u.get_public_upload_url("foo")
...: print(time.time()-t)
0.48990607261657715
```
This is not good enough to avoid doing something ugly like replacing
generate_presigned_url with some manual URL manipulation, but it's a
helpful structure that we may find useful with further refactoring.
This causes avatars and emoji which are hosted by Zulip in S3 (or
compatible) servers to no longer go through camo. Routing these
requests through camo does not add any privacy benefit (as the request
logs there go to the Zulip admins regardless), and may break emoji
imported from Slack before 1bf385e35f,
which have `application/octet-stream` as their stored Content-Type.
Our current logic only allows S3 block storage providers whose
upload URL matches with the format used by AWS. This also allows
other styles such as the "virtual host" format used by Oracle cloud.
Fixes#17762.
In Django 3.2 slugify strips trailing dashes and underscores:
0382ecfe02
sanitize_name doesn't so this difference should be documented like the
others.
Underscore character is already covered by \w, so _ in the regex is
redundant. Also the docstring is mildly incorrect - underscore already
is an allowed character by django's slugify (and always was) for the
aforementioned reason.
`ensure_basic_avatar_image` and `ensure_medium_avatar_image` are
essentially the same thing, except a size parameter.
So, refactor them into a single function.
This doesn't introduce any functional changes.
django.utils.translation.ugettext is a deprecated alias of
django.utils.translation.gettext as of Django 3.0, and will be removed
in Django 4.0.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Boto3 does not allow setting the endpoint url from
the config file. Thus we create a django setting
variable (`S3_ENDPOINT_URL`) which is passed to
service clients and resources of `boto3.Session`.
We also update the uploads-backend documentation
and remove the config environment variable as now
AWS supports the SIGv4 signature format by default.
And the region name is passed as a parameter instead
of creating a config file for just this value.
Fixes#16246.
This preserves the alpha layer on GIF images that need to be resized
before being uploaded. Two important changes occur here:
1. The new frame is a *copy* of the original image, which preserves the
GIF info.
2. The disposal method of the original GIF is preserved. This
essentially determines what state each frame of the GIF starts from
when it is drawn; see PIL's docs:
https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#saving
for more info.
This resolves some but not all of the test cases in #16370.
Replaced ImageOps.fit by ImageOps.pad, in zerver/lib/upload.py, which
returns a sized and padded version of the image, expanded to fill the
requested aspect ratio and size.
Fixes part of #16370.
Previously, S3UploadBackend.delete_export_tarball failed to strip the
leading ‘/’ from the export path. This mistake is now caught by Moto
1.3.15. I expect it caused deletion failures in the real S3, although
I haven’t verified this.
We store export_path in the audit log with a leading ‘/’, but the
actual S3 keys do not have a leading ‘/’. Changing either system
would require a migration. So the new convention is that the
variables named ‘export_path’ have a leading ‘/’, while variables
named ‘path_id’ or ‘key’ do not.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
There are three functional side effects:
• Correct an insignificant but mathematically offensive bias toward
repeated characters in generate_api_key introduced in commit
47b4283c4b4c70ecde4d3c8de871c90ee2506d87; its entropy is increased
from 190.52864 bits to 190.53428 bits.
• Use the base32 alphabet in confirmation.models.generate_key; its
entropy is reduced from 124.07820 bits to the documented 120 bits, but
now it uses 1 syscall instead of 24.
• Use the base32 alphabet in get_bigbluebutton_url; its entropy is
reduced from 51.69925 bits to 50 bits, but now it uses 1 syscall
instead of 10.
(The base32 alphabet is A-Z 2-7. We could probably replace all of
these with plain secrets.token_urlsafe, since I expect most callers
can handle the full urlsafe_b64 alphabet A-Z a-z 0-9 - _ without
problems.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This commit adds the is_web_public field in the AbstractAttachment
class. This is useful when validating user access to the attachment,
as otherwise we would have to make a query in the db to check if
that attachment was sent in a message in a web-public stream or not.
The S3 data export tool's upload code path uses this nice boto
callback feature for showing a progress bar, which is nice for the
management command. It's spammy/broken in production and the backend
tests, so we change percent_callback to be a parameter passed in so
that it can only be used in the contexts where it makes sense.
With #14378, we regressed back to the state of that
prior to 7e0ea61b00.
We fix this by getting our avatar bucket on
object initialization, and use the appropriate means
of gathering the network location for the urls.
Fixes#14484.
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This commit adds three `.pysa` model files: `false_positives.pysa`
for ruling out false positive flows with `Sanitize` annotations,
`req_lib.pysa` for educating pysa about Zulip's `REQ()` pattern for
extracting user input, and `redirects.pysa` for capturing the risk
of open redirects within Zulip code. Additionally, this commit
introduces `mark_sanitized`, an identity function which can be used
to selectively clear taint in cases where `Sanitize` models will not
work. This commit also puts `mark_sanitized` to work removing known
false postive flows.