This is much faster than calling generate_presigned_url each time.
```
In [3]: t = time.time()
...: for i in range(250):
...: x = u.get_public_upload_url("foo")
...: print(time.time()-t)
0.0010945796966552734
```
Fixes#18915
This was very slow, causing performance issues. After investigating,
generate_presigned_url is the cheap part of this, but the
session.client() call is expensive - so that's what we should cache.
Before the change:
```
In [4]: t = time.time()
...: for i in range(250):
...: x = u.get_public_upload_url("foo")
...: print(time.time()-t)
6.408717393875122
```
After:
```
In [4]: t = time.time()
...: for i in range(250):
...: x = u.get_public_upload_url("foo")
...: print(time.time()-t)
0.48990607261657715
```
This is not good enough to avoid doing something ugly like replacing
generate_presigned_url with some manual URL manipulation, but it's a
helpful structure that we may find useful with further refactoring.
This causes avatars and emoji which are hosted by Zulip in S3 (or
compatible) servers to no longer go through camo. Routing these
requests through camo does not add any privacy benefit (as the request
logs there go to the Zulip admins regardless), and may break emoji
imported from Slack before 1bf385e35f,
which have `application/octet-stream` as their stored Content-Type.
Our current logic only allows S3 block storage providers whose
upload URL matches with the format used by AWS. This also allows
other styles such as the "virtual host" format used by Oracle cloud.
Fixes#17762.
In Django 3.2 slugify strips trailing dashes and underscores:
0382ecfe02
sanitize_name doesn't so this difference should be documented like the
others.
Underscore character is already covered by \w, so _ in the regex is
redundant. Also the docstring is mildly incorrect - underscore already
is an allowed character by django's slugify (and always was) for the
aforementioned reason.
`ensure_basic_avatar_image` and `ensure_medium_avatar_image` are
essentially the same thing, except a size parameter.
So, refactor them into a single function.
This doesn't introduce any functional changes.
django.utils.translation.ugettext is a deprecated alias of
django.utils.translation.gettext as of Django 3.0, and will be removed
in Django 4.0.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Boto3 does not allow setting the endpoint url from
the config file. Thus we create a django setting
variable (`S3_ENDPOINT_URL`) which is passed to
service clients and resources of `boto3.Session`.
We also update the uploads-backend documentation
and remove the config environment variable as now
AWS supports the SIGv4 signature format by default.
And the region name is passed as a parameter instead
of creating a config file for just this value.
Fixes#16246.
This preserves the alpha layer on GIF images that need to be resized
before being uploaded. Two important changes occur here:
1. The new frame is a *copy* of the original image, which preserves the
GIF info.
2. The disposal method of the original GIF is preserved. This
essentially determines what state each frame of the GIF starts from
when it is drawn; see PIL's docs:
https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#saving
for more info.
This resolves some but not all of the test cases in #16370.
Replaced ImageOps.fit by ImageOps.pad, in zerver/lib/upload.py, which
returns a sized and padded version of the image, expanded to fill the
requested aspect ratio and size.
Fixes part of #16370.
Previously, S3UploadBackend.delete_export_tarball failed to strip the
leading ‘/’ from the export path. This mistake is now caught by Moto
1.3.15. I expect it caused deletion failures in the real S3, although
I haven’t verified this.
We store export_path in the audit log with a leading ‘/’, but the
actual S3 keys do not have a leading ‘/’. Changing either system
would require a migration. So the new convention is that the
variables named ‘export_path’ have a leading ‘/’, while variables
named ‘path_id’ or ‘key’ do not.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
There are three functional side effects:
• Correct an insignificant but mathematically offensive bias toward
repeated characters in generate_api_key introduced in commit
47b4283c4b4c70ecde4d3c8de871c90ee2506d87; its entropy is increased
from 190.52864 bits to 190.53428 bits.
• Use the base32 alphabet in confirmation.models.generate_key; its
entropy is reduced from 124.07820 bits to the documented 120 bits, but
now it uses 1 syscall instead of 24.
• Use the base32 alphabet in get_bigbluebutton_url; its entropy is
reduced from 51.69925 bits to 50 bits, but now it uses 1 syscall
instead of 10.
(The base32 alphabet is A-Z 2-7. We could probably replace all of
these with plain secrets.token_urlsafe, since I expect most callers
can handle the full urlsafe_b64 alphabet A-Z a-z 0-9 - _ without
problems.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This commit adds the is_web_public field in the AbstractAttachment
class. This is useful when validating user access to the attachment,
as otherwise we would have to make a query in the db to check if
that attachment was sent in a message in a web-public stream or not.
The S3 data export tool's upload code path uses this nice boto
callback feature for showing a progress bar, which is nice for the
management command. It's spammy/broken in production and the backend
tests, so we change percent_callback to be a parameter passed in so
that it can only be used in the contexts where it makes sense.
With #14378, we regressed back to the state of that
prior to 7e0ea61b00.
We fix this by getting our avatar bucket on
object initialization, and use the appropriate means
of gathering the network location for the urls.
Fixes#14484.
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This commit adds three `.pysa` model files: `false_positives.pysa`
for ruling out false positive flows with `Sanitize` annotations,
`req_lib.pysa` for educating pysa about Zulip's `REQ()` pattern for
extracting user input, and `redirects.pysa` for capturing the risk
of open redirects within Zulip code. Additionally, this commit
introduces `mark_sanitized`, an identity function which can be used
to selectively clear taint in cases where `Sanitize` models will not
work. This commit also puts `mark_sanitized` to work removing known
false postive flows.
This is be useful for the mobile and desktop apps to hand an uploaded
file off to the system browser so that it can render PDFs (Etc.).
The S3 backend implementation is simple; for the local upload backend,
we use Django's signing feature to simulate the same sort of 60-second
lifetime token.
Co-Author-By: Mateusz Mandera <mateusz.mandera@protonmail.com>
For some mobile use cases, 15 seconds is potentially too short for a
busy+slow device to open a browser and fetch the URL. 60 seconds is
plenty, and doesn't carry a materially increased security risk.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
In 3892a8afd8, we restructured the
system for managing uploaded files to a much cleaner model where we
just do parsing inside bugdown.
That new model had potentially buggy handling of cases around both
relative URLs and URLS starting with `realm.host`.
We address this by further rewriting the handling of attachments to
avoid regular expressions entirely, instead relying on urllib for
parsing, and having bugdown output `path_id` values, so that there's
no need for any conversions between formats outside bugdowm.
The check_attachment_reference_change function for processing message
updates is significantly simplified in the process.
The new check on the hostname has the side effect of requiring us to
fix some previously weird/buggy test data.
Co-Author-By: Anders Kaseorg <anders@zulipchat.com>
Co-Author-By: Rohitt Vashishtha <aero31aero@gmail.com>
This commit wraps up the work to remove basic regex based parsing
of messages to handle attachment claiming/unclaiming. We now use
the more dependable Bugdown processor to find potential links and
only operate upon those links instead of parsing the full message
content again.
Previously, we were hardcoding the domain s3.amazonaws.com. Given
that we already have an interface for configuring the host in
/etc/zulip/boto.cfg (which in turn, automatically configures boto), we
just need to actually use the value configured in boto for what S3
hostname to use.
We don't have tests for this new use case, in part because they're
likely annoying to write with `moto` and there hasn't been a huge
amount of demand for it. Since this doesn't regress existing S3
backend support, it seems worth merging.
Apparently, the Zulip notifications (and resulting emails) were
correct, but the download links inside the Zulip UI were incorrectly
not including S3 prefix on the URL, making them not work.
While we're at this, we rewrite the somewhat convoluted previous
system for formatting the data export output.
* Whitelist a small number of image/ types to be served as
non-attachments.
* Serve the file using the type that we validated rather than relying
on an independent guess to match.
This issue can lead to a stored XSS security vulnerability for older
browsers that don't support Content-Security-Policy.
It primarily affects servers using Zulip's local file uploads backend
for servers running Ubuntu 16.04 Xenial or newer; the legacy local
file upload backend for (now EOL) Ubuntu 14.04 Trusty was not affected
and it has limited impact for the S3 upload backend (which uses an
unprivileged S3 bucket domain to serve files).
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
The conditional block containing the tarball upload logic for both S3
and local uploads was deconstructed and moved to the more appropriate
location within `zerver/lib/upload.py`.