Part of splitting creating and editing scheduled messages.
Should be merged with final commit in series. Breaks tests.
Removes `scheduled_message_id` parameter from the create scheduled
message path.
Fixes#25414.
We add Attachment.scheduled_messages relation to track ScheduledMessages
which reference the attachment.
The import bits can be done after merging this, by updating #25345.
Previously, tests that exercised code paths that added local
uploads did not always clean up `settings.LOCAL_UPLOADS_DIR`
after the test was complete.
Updates the `ZulipTestCase` class to remove any local uploads
in the unique `settings.LOCAL_UPLOADS_DIR` in `tearDown` for
all tests.
The Django convention is for __repr__ to include the type and __str__
to omit it. In fact its default __repr__ implementation for models
automatically adds a type prefix to __str__, which has resulted in the
type being duplicated:
>>> UserProfile.objects.first()
<UserProfile: <UserProfile: emailgateway@zulip.com <Realm: zulipinternal 1>>>
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
When file uploads are stored in S3, this means that Zulip serves as a
302 to S3. Because browsers do not cache redirects, this means that
no image contents can be cached -- and upon every page load or reload,
every recently-posted image must be re-fetched. This incurs extra
load on the Zulip server, as well as potentially excessive bandwidth
usage from S3, and on the client's connection.
Switch to fetching the content from S3 in nginx, and serving the
content from nginx. These have `Cache-control: private, immutable`
headers set on the response, allowing browsers to cache them locally.
Because nginx fetching from S3 can be slow, and requests for uploads
will generally be bunched around when a message containing them are
first posted, we instruct nginx to cache the contents locally. This
is safe because uploaded file contents are immutable; access control
is still mediated by Django. The nginx cache key is the URL without
query parameters, as those parameters include a time-limited signed
authentication parameter which lets nginx fetch the non-public file.
This adds a number of nginx-level configuration parameters to control
the caching which nginx performs, including the amount of in-memory
index for he cache, the maximum storage of the cache on disk, and how
long data is retained in the cache. The currently-chosen figures are
reasonable for small to medium deployments.
The most notable effect of this change is in allowing browsers to
cache uploaded image content; however, while there will be many fewer
requests, it also has an improvement on request latency. The
following tests were done with a non-AWS client in SFO, a server and
S3 storage in us-east-1, and with 100 requests after 10 requests of
warm-up (to fill the nginx cache). The mean and standard deviation
are shown.
| | Redirect to S3 | Caching proxy, hot | Caching proxy, cold |
| ----------------- | ------------------- | ------------------- | ------------------- |
| Time in Django | 263.0 ms ± 28.3 ms | 258.0 ms ± 12.3 ms | 258.0 ms ± 12.3 ms |
| Small file (842b) | 586.1 ms ± 21.1 ms | 266.1 ms ± 67.4 ms | 288.6 ms ± 17.7 ms |
| Large file (660k) | 959.6 ms ± 137.9 ms | 609.5 ms ± 13.0 ms | 648.1 ms ± 43.2 ms |
The hot-cache performance is faster for both large and small files,
since it saves the client the time having to make a second request to
a separate host. This performance improvement remains at least 100ms
even if the client is on the same coast as the server.
Cold nginx caches are only slightly slower than hot caches, because
VPC access to S3 endpoints is extremely fast (assuming it is in the
same region as the host), and nginx can pool connections to S3 and
reuse them.
However, all of the 648ms taken to serve a cold-cache large file is
occupied in nginx, as opposed to the only 263ms which was spent in
nginx when using redirects to S3. This means that to overall spend
less time responding to uploaded-file requests in nginx, clients will
need to find files in their local cache, and skip making an
uploaded-file request, at least 60% of the time. Modeling shows a
reduction in the number of client requests by about 70% - 80%.
The `Content-Disposition` header logic can now also be entirely shared
with the local-file codepath, as can the `url_only` path used by
mobile clients. While we could provide the direct-to-S3 temporary
signed URL to mobile clients, we choose to provide the
served-from-Zulip signed URL, to better control caching headers on it,
and greater consistency. In doing so, we adjust the salt used for the
URL; since these URLs are only valid for 60s, the effect of this salt
change is minimal.
Moving `/user_avatars/` to being served partially through Django
removes the need for the `no_serve_uploads` nginx reconfiguring when
switching between S3 and local backends. This is important because a
subsequent commit will move S3 attachments to being served through
nginx, which would make `no_serve_uploads` entirely nonsensical of a
name.
Serve the files through Django, with an offload for the actual image
response to an internal nginx route. In development, serve the files
directly in Django.
We do _not_ mark the contents as immutable for caching purposes, since
the path for avatar images is hashed only by their user-id and a salt,
and as such are reused when a user's avatar is updated.
The `django-sendfile2` module unfortunately only supports a single
`SENDFILE` root path -- an invariant which subsequent commits need to
break. Especially as Zulip only runs with a single webserver, and
thus sendfile backend, the functionality is simple to inline.
It is worth noting that the following headers from the initial Django
response are _preserved_, if present, and sent unmodified to the
client; all other headers are overridden by those supplied by the
internal redirect[^1]:
- Content-Type
- Content-Disposition
- Accept-Ranges
- Set-Cookie
- Cache-Control
- Expires
As such, we explicitly unset the Content-type header to allow nginx to
set it from the static file, but set Content-Disposition and
Cache-Control as we want them to be.
[^1]: https://www.nginx.com/resources/wiki/start/topics/examples/xsendfile/
sendfile already applied a Content-Disposition header, but the
algorithm may provide both `filename=` and `filename*=` values (which
is potentially confusing to clients) and incorrectly slash-escapes
quotes in Unicode strings.
Django provides a correct implementation, but it is only accessible to
FileResponse objects. Since the entire point is to offload the
filehandle handling, we cannot use a FileResponse.
Django 4.2 will make the function available outside of FileResponse.
Until then, extract our own Content-Disposition handling, based on
Django's.
We remove the very verbose comment added in d4360e2287, describing
Content-Disposition headers, as it does not add much.
This adds a helper based on testing patterns of using the "queries_captured"
context manager with "assert_length" to check the number of queries
executed for preventing performance regression.
It explains the rationale of checking the query count through an
"AssertionError" and prints the queries captured as assert_length does,
but with a format optimized for displaying the queries in a more
readable manner.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This is preparatory commit for #18941.
Importing `do_delete_message` from `message_edit.py` was causing a
circular import error. In order to avoid that, we create a separate
message_delete.py file which has all the functions related to deleting
messages.
The tests for deleting messages are present in
`zerver/tests/test_message_edit.py`.
Fixes a part of #18941
This `mimetype` parameter was introduced in c4fa29a and its last
usage removed in 5bab2a3. This parameter was undocumented in the
OpenAPI endpoint documentation for `/user_uploads`, therefore
there shouldn't be client implementations that rely on it's
presence.
Removes the `request.GET` call for the `mimetype` parameter and
replaces it by getting the `content_type` value from the file,
which is an instance of Django's `UploadedFile` class and stores
that file metadata as a property.
If that returns `None` or an empty string, then we try to guess
the `content_type` from the filename, which is the same as the
previous behaviour when `mimetype` was `None` (which we assume
has been true since it's usage was removed; see above).
If unable to guess the `content_type` from the filename, we now
fallback to "application/octet-stream", instead of an empty string
or `None` value.
Also, removes the specific test written for having `mimetype` as
a url parameter in the request, and replaces it with a test that
covers when we try to guess `content_type` from the filename.
This function is oblivious to the existence of ArchivedAttachment, which
is incorrect. A file can be removed if and only if it is not referenced
by any Messages or ArchivedMessages.
Using http://localhost:9991 is incorrect - e.g. messages sent with file
urls constructed trigger do_claim_attachments to be called with empty
list in potential_path_ids.
realm.host should be used in all these places, like in the other tests
in the file.
In these tests, the code ends up with a logged in session when it's
undesired - later on these tests make requests to a different subdomain
- where a logged in session is not supposed to exist. This leads to an
unintended, strange situation where request.user is a user from the old
subdomain but the request itself is to a *different* subdomain. This
throws off get_realm_from_request, which will return the realm from
request.user.realm - which is not what these tests want and can lead to
these tests failing when some of the production code being tested
switches to using get_realm_from_request instead of
get_realm(get_subdomain).
Various backend tests use the `PATCH /messages/{msg_id}` endpoint.
For that endpoint, the message ID is encoded in the URL path and
ignored if provided as a parameter in the the query.
Verified that the tests were providing the same message ID to both
the path and then removed the ignored parameter in the query.
Putting all of the logic in a `finally` block is equivalent to a bare
`except` block, which silently consumes all exceptions.
Move only the most-necessary parts into the except; this lets
`BadImageError` exceptions from `zerver/lib/upload.py` to escape,
allowing better the generic "Image file upload failed" to be replaced
with a more specific message.
It also allows unexpected exceptions, as the previous commit resolved,
to escape and 500. This lets them be detected and resolved, rather
than give users a silently bad experience.
5dab6e9d31 began honoring the list of disposals for every frame.
Unfortunately, passing a list of disposals for a non-animated image
raises an exception:
```
File "zerver/lib/upload.py", line 212, in resize_emoji
image_data = resize_gif(im, size)
File "zerver/lib/upload.py", line 165, in resize_gif
frames[0].save(
File "[...]/PIL/Image.py", line 2212, in save
save_handler(self, fp, filename)
File "[...]/PIL/GifImagePlugin.py", line 605, in _save
_write_single_frame(im, fp, palette)
File "[...]/PIL/GifImagePlugin.py", line 506, in _write_single_frame
_write_local_header(fp, im, (0, 0), flags)
File "[...]/PIL/GifImagePlugin.py", line 647, in _write_local_header
disposal = int(im.encoderinfo.get("disposal", 0))
TypeError: int() argument must be a string, a bytes-like object or a
number, not 'list'
```
`check_add_realm_emoji` calls this as:
```
try:
is_animated = upload_emoji_image(image_file, emoji_file_name, a
uthor)
emoji_uploaded_successfully = True
finally:
if not emoji_uploaded_successfully:
realm_emoji.delete()
return None
# ...
```
This is equivalent to dropping _all_ exceptions silently. As such,
Zulip has silently rejected all non-animated images larger than 64x64
since 5dab6e9d31.
Adjust to only pass a single disposal if there are no additional
frames. Add a test for non-animated images, which requires also
fixing the incidental bug that all GIF images were being recorded as
animated, regardless of if they had more than 1 frame or not.
The target realm was not being passed to create_attachment in
upload_message_file implementations. This was a bug in the edge-case of
cross-realm messages - in particular, causing a bug in the email
gateway:
When an email with an attachment is sent, the message is mirrored to
Zulip with Email Gateway Bot as the message sender and uploader of the
attachment. Due to the realm not being passed to create_attachment, the
Attachment would get created with .realm being the system bot realm,
making the attachment inaccessible under some conditions due to failing
the following condition check (that's expected to pass, provided that
the .realm is set correctly):
```
if (
attachment.is_realm_public
and attachment.realm == user_profile.realm
and user_profile.can_access_public_streams()
):
# Any user in the realm can access realm-public files
return True
```
This also fixes a warning from
RealmExportTest.test_endpoint_local_uploads: “ResourceWarning:
unclosed file <_io.BufferedReader
name='/srv/zulip/var/…/test-export.tar.gz'>”.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
We now complain if a test author sends a stream message
that does not result in the sender getting a
UserMessage row for the message.
This is basically 100% equivalent to complaining that
the author failed to subscribe the sender to the stream
as part of the test setup, as far as I can tell, so the
AssertionError instructs the author to subscribe the
sender to the stream.
We exempt bots from this check, although it is
plausible we should only exempt the system bots like
the notification bot.
I considered auto-subscribing the sender to the stream,
but that can be a little more expensive than the
current check, and we generally want test setup to be
explicit.
If there is some legitimate way than a subscribed human
sender can't get a UserMessage, then we probably want
an explicit test for that, or we may want to change the
backend to just write a UserMessage row in that
hypothetical situation.
For most tests, including almost all the ones fixed
here, the author just wants their test setup to
realistically reflect normal operation, and often devs
may not realize that Cordelia is not subscribed to
Denmark or not realize that Hamlet is not subscribed to
Scotland.
Some of us don't remember our Shakespeare from high
school, and our stream subscriptions don't even
necessarily reflect which countries the Bard placed his
characters in.
There may also be some legitimate use case where an
author wants to simulate sending a message to an
unsubscribed stream, but for those edge cases, they can
always set allow_unsubscribed_sender to True.
If realm is web_public, spectators can now view avatar of other
users.
There is a special exception we had to introduce in rest model to
allow `/avatar` type of urls for `anonymous` access, because they
don't have the /api/v1 prefix.
Fixes#19838.
It is confusing to have the plan type constants not be namespaced
by the thing they represent. We already have a namespacing
convention in place for constants, so we should use it for
Realm.plan_type as well.
Now, when we add a custom animated emoji to the realm
we also save a still image of it (1st frame of the gif). So
we can avoid showing an animated emoji every time.
This commit modifies the copy_user_settings code such that instead
of source user profile, we can have two types of sources - a user
profile and RealmUserDefault table of realm and then set the
settings from RealmUserDefault only is there is no user profile
as a source.
We also rename copy_user_settings to copy_default_settings for
clarity.
This causes avatars and emoji which are hosted by Zulip in S3 (or
compatible) servers to no longer go through camo. Routing these
requests through camo does not add any privacy benefit (as the request
logs there go to the Zulip admins regardless), and may break emoji
imported from Slack before 1bf385e35f,
which have `application/octet-stream` as their stored Content-Type.
`ensure_basic_avatar_image` and `ensure_medium_avatar_image` are
essentially the same thing, except a size parameter.
So, refactor them into a single function.
This doesn't introduce any functional changes.
Previously, S3UploadBackend.delete_export_tarball failed to strip the
leading ‘/’ from the export path. This mistake is now caught by Moto
1.3.15. I expect it caused deletion failures in the real S3, although
I haven’t verified this.
We store export_path in the audit log with a leading ‘/’, but the
actual S3 keys do not have a leading ‘/’. Changing either system
would require a migration. So the new convention is that the
variables named ‘export_path’ have a leading ‘/’, while variables
named ‘path_id’ or ‘key’ do not.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This commit adds the is_web_public field in the AbstractAttachment
class. This is useful when validating user access to the attachment,
as otherwise we would have to make a query in the db to check if
that attachment was sent in a message in a web-public stream or not.
The S3 data export tool's upload code path uses this nice boto
callback feature for showing a progress bar, which is nice for the
management command. It's spammy/broken in production and the backend
tests, so we change percent_callback to be a parameter passed in so
that it can only be used in the contexts where it makes sense.
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Generated by pyupgrade --py36-plus --keep-percent-format, but with the
NamedTuple changes reverted (see commit
ba7906a3c6, #15132).
Signed-off-by: Anders Kaseorg <anders@zulip.com>