Commit Graph

281 Commits

Author SHA1 Message Date
Alex Vandiver 880a3f95a7 tests: Split out s3 and local tests.
This mirrors the split of the code in 7c0d414aff.
2023-03-02 16:36:19 -08:00
Alex Vandiver bd80c048be upload: Rename delete_message_image to use word "attachment".
The table is named Attachment, and not all of them are images.
2023-03-02 16:36:19 -08:00
Alex Vandiver 567d1d54e7 upload: Rename upload_message_file to use word "attachment".
For consistency with the table, which is named Attachment.
2023-03-02 16:36:19 -08:00
Alex Vandiver e31767dda4 settings: Make DEFAULT_LOGO_URI/DEFAULT_AVATAR_URI use staticfiles. 2023-02-14 17:17:06 -05:00
Anders Kaseorg df001db1a9 black: Reformat with Black 23.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.

(This does not actually upgrade our Python environment to Black 23
yet.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-02 10:40:13 -08:00
Alex Vandiver 04cf68b45e uploads: Serve S3 uploads directly from nginx.
When file uploads are stored in S3, this means that Zulip serves as a
302 to S3.  Because browsers do not cache redirects, this means that
no image contents can be cached -- and upon every page load or reload,
every recently-posted image must be re-fetched.  This incurs extra
load on the Zulip server, as well as potentially excessive bandwidth
usage from S3, and on the client's connection.

Switch to fetching the content from S3 in nginx, and serving the
content from nginx.  These have `Cache-control: private, immutable`
headers set on the response, allowing browsers to cache them locally.

Because nginx fetching from S3 can be slow, and requests for uploads
will generally be bunched around when a message containing them are
first posted, we instruct nginx to cache the contents locally.  This
is safe because uploaded file contents are immutable; access control
is still mediated by Django.  The nginx cache key is the URL without
query parameters, as those parameters include a time-limited signed
authentication parameter which lets nginx fetch the non-public file.

This adds a number of nginx-level configuration parameters to control
the caching which nginx performs, including the amount of in-memory
index for he cache, the maximum storage of the cache on disk, and how
long data is retained in the cache.  The currently-chosen figures are
reasonable for small to medium deployments.

The most notable effect of this change is in allowing browsers to
cache uploaded image content; however, while there will be many fewer
requests, it also has an improvement on request latency.  The
following tests were done with a non-AWS client in SFO, a server and
S3 storage in us-east-1, and with 100 requests after 10 requests of
warm-up (to fill the nginx cache).  The mean and standard deviation
are shown.

|                   | Redirect to S3      | Caching proxy, hot  | Caching proxy, cold |
| ----------------- | ------------------- | ------------------- | ------------------- |
| Time in Django    | 263.0 ms ±  28.3 ms | 258.0 ms ±  12.3 ms | 258.0 ms ±  12.3 ms |
| Small file (842b) | 586.1 ms ±  21.1 ms | 266.1 ms ±  67.4 ms | 288.6 ms ±  17.7 ms |
| Large file (660k) | 959.6 ms ± 137.9 ms | 609.5 ms ±  13.0 ms | 648.1 ms ±  43.2 ms |

The hot-cache performance is faster for both large and small files,
since it saves the client the time having to make a second request to
a separate host.  This performance improvement remains at least 100ms
even if the client is on the same coast as the server.

Cold nginx caches are only slightly slower than hot caches, because
VPC access to S3 endpoints is extremely fast (assuming it is in the
same region as the host), and nginx can pool connections to S3 and
reuse them.

However, all of the 648ms taken to serve a cold-cache large file is
occupied in nginx, as opposed to the only 263ms which was spent in
nginx when using redirects to S3.  This means that to overall spend
less time responding to uploaded-file requests in nginx, clients will
need to find files in their local cache, and skip making an
uploaded-file request, at least 60% of the time.  Modeling shows a
reduction in the number of client requests by about 70% - 80%.

The `Content-Disposition` header logic can now also be entirely shared
with the local-file codepath, as can the `url_only` path used by
mobile clients.  While we could provide the direct-to-S3 temporary
signed URL to mobile clients, we choose to provide the
served-from-Zulip signed URL, to better control caching headers on it,
and greater consistency.  In doing so, we adjust the salt used for the
URL; since these URLs are only valid for 60s, the effect of this salt
change is minimal.
2023-01-09 18:23:58 -05:00
Alex Vandiver ed6d62a9e7 avatars: Serve /user_avatars/ through Django, which offloads to nginx.
Moving `/user_avatars/` to being served partially through Django
removes the need for the `no_serve_uploads` nginx reconfiguring when
switching between S3 and local backends.  This is important because a
subsequent commit will move S3 attachments to being served through
nginx, which would make `no_serve_uploads` entirely nonsensical of a
name.

Serve the files through Django, with an offload for the actual image
response to an internal nginx route.  In development, serve the files
directly in Django.

We do _not_ mark the contents as immutable for caching purposes, since
the path for avatar images is hashed only by their user-id and a salt,
and as such are reused when a user's avatar is updated.
2023-01-09 18:23:58 -05:00
Alex Vandiver f0f4aa66e0 uploads: Inline the one callsite of get_local_file_path.
This helps make more explicit the assert_is_local_storage_path which
makes using local_path safe.
2023-01-09 18:23:58 -05:00
Alex Vandiver 7ad06473b6 uploads: Add LOCAL_AVATARS_DIR / LOCAL_FILES_DIR computed settings.
This avoids strewing "avatars" and "files" constants throughout.
2023-01-09 18:23:58 -05:00
Alex Vandiver 24f95a3788 uploads: Move internal upload serving path to under /internal/. 2023-01-09 18:23:58 -05:00
Alex Vandiver cc9b028312 uploads: Set X-Accel-Redirect manually, without using django-sendfile2.
The `django-sendfile2` module unfortunately only supports a single
`SENDFILE` root path -- an invariant which subsequent commits need to
break.  Especially as Zulip only runs with a single webserver, and
thus sendfile backend, the functionality is simple to inline.

It is worth noting that the following headers from the initial Django
response are _preserved_, if present, and sent unmodified to the
client; all other headers are overridden by those supplied by the
internal redirect[^1]:
 - Content-Type
 - Content-Disposition
 - Accept-Ranges
 - Set-Cookie
 - Cache-Control
 - Expires

As such, we explicitly unset the Content-type header to allow nginx to
set it from the static file, but set Content-Disposition and
Cache-Control as we want them to be.

[^1]: https://www.nginx.com/resources/wiki/start/topics/examples/xsendfile/
2023-01-09 18:23:58 -05:00
Alex Vandiver 679fb76acf uploads: Provide our own Content-Disposition header.
sendfile already applied a Content-Disposition header, but the
algorithm may provide both `filename=` and `filename*=` values (which
is potentially confusing to clients) and incorrectly slash-escapes
quotes in Unicode strings.

Django provides a correct implementation, but it is only accessible to
FileResponse objects.  Since the entire point is to offload the
filehandle handling, we cannot use a FileResponse.

Django 4.2 will make the function available outside of FileResponse.
Until then, extract our own Content-Disposition handling, based on
Django's.

We remove the very verbose comment added in d4360e2287, describing
Content-Disposition headers, as it does not add much.
2023-01-09 18:23:58 -05:00
Alex Vandiver 7c0d414aff uploads: Split out S3 and local file backends into separate files.
The uploads file is large, and conceptually the S3 and local-file
backends are separable.
2023-01-09 18:23:58 -05:00
Zixuan James Li 46329a2710 test_classes: Create a dedicate helper for query count check.
This adds a helper based on testing patterns of using the "queries_captured"
context manager with "assert_length" to check the number of queries
executed for preventing performance regression.

It explains the rationale of checking the query count through an
"AssertionError" and prints the queries captured as assert_length does,
but with a format optimized for displaying the queries in a more
readable manner.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-10-17 11:32:52 -07:00
madrix01 4303ba1efc actions: Create a separate message_delete.py file.
This is preparatory commit for #18941.
Importing `do_delete_message` from `message_edit.py` was causing a
circular import error. In order to avoid that, we create a separate
message_delete.py file which has all the functions related to deleting
messages.
The tests for deleting messages are present in
`zerver/tests/test_message_edit.py`.

Fixes a part of #18941
2022-09-01 14:18:38 -07:00
Lauryn Menard aa796af0a8 upload: Remove `mimetype` url parameter in `get_file_info`.
This `mimetype` parameter was introduced in c4fa29a and its last
usage removed in 5bab2a3. This parameter was undocumented in the
OpenAPI endpoint documentation for `/user_uploads`, therefore
there shouldn't be client implementations that rely on it's
presence.

Removes the `request.GET` call for the `mimetype` parameter and
replaces it by getting the `content_type` value from the file,
which is an instance of Django's `UploadedFile` class and stores
that file metadata as a property.

If that returns `None` or an empty string, then we try to guess
the `content_type` from the filename, which is the same as the
previous behaviour when `mimetype` was `None` (which we assume
has been true since it's usage was removed; see above).

If unable to guess the `content_type` from the filename, we now
fallback to "application/octet-stream", instead of an empty string
or `None` value.

Also, removes the specific test written for having `mimetype` as
a url parameter in the request, and replaces it with a test that
covers when we try to guess `content_type` from the filename.
2022-08-08 16:06:09 -07:00
Zixuan James Li a142fbff85 tests: Refactor away result.json() calls with helpers.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-06-06 23:06:00 -07:00
Mateusz Mandera 09dc166b45 do_delete_old_unclaimed_attachments: Consider ArchivedAttachment rows.
This function is oblivious to the existence of ArchivedAttachment, which
is incorrect. A file can be removed if and only if it is not referenced
by any Messages or ArchivedMessages.
2022-06-02 17:32:23 -07:00
Mateusz Mandera 5ff4754090 test_upload: Fix some URLs to uploaded files.
Using http://localhost:9991 is incorrect - e.g. messages sent with file
urls constructed trigger do_claim_attachments to be called with empty
list in potential_path_ids.

realm.host should be used in all these places, like in the other tests
in the file.
2022-06-02 17:32:23 -07:00
Zixuan James Li bb6a934c8d typing: Add appropriate none-checks for LOCAL_UPLOADS_DIR.
This is a part of django-stubs refactorings.

Signed-off-by: Zixuan James Li <359101898@qq.com>
2022-05-31 09:43:55 -07:00
Anders Kaseorg 6331a314d4 Correctly hyphenate “non-”.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-27 22:10:31 -07:00
Anders Kaseorg d58fece832 Correctly hyphenate “web-public”.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-27 22:10:31 -07:00
Aman Agrawal 5ee4f71701 avatar: Add rate limit similar to attachments on medium avatars.
Followup on #20136
2022-04-27 16:51:18 -07:00
Anders Kaseorg e01faebd7e actions: Split out zerver.actions.create_realm.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-14 17:14:37 -07:00
Anders Kaseorg 59f6b090c7 actions: Split out zerver.actions.realm_settings.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-14 17:14:37 -07:00
Anders Kaseorg 975066e3f0 actions: Split out zerver.actions.message_send.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-14 17:14:34 -07:00
Anders Kaseorg ec6355389a actions: Split out zerver.actions.user_settings.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-14 17:14:34 -07:00
Anders Kaseorg e230ea2598 actions: Split out zerver.actions.uploads.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-14 17:14:32 -07:00
Anders Kaseorg 3d7aa98c45 actions: Split out zerver.actions.realm_icon.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-14 17:14:31 -07:00
Anders Kaseorg 7f088f3403 actions: Split out zerver.actions.realm_logo.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-14 17:14:31 -07:00
Aman Agrawal b799ec32b0 upload: Allow rate limited access to spectators for uploaded files.
We allow spectators access to uploaded files in web public streams
but rate limit the daily requests to 1000 per file by default.
2022-03-24 10:50:00 -07:00
Alex Vandiver abed174b12 uploads: Add an endpoint which forces a download.
This is most useful for images hosted in S3, which are otherwise
always displayed in the browser.
2022-03-22 15:05:02 -07:00
Alex Vandiver 95892a5ed3 emoji: Support animated PNGs. 2022-03-15 12:47:21 -07:00
Alex Vandiver fc793c10fa tests: Refactor tests of resizing animated images. 2022-03-15 12:47:21 -07:00
Mateusz Mandera ec5e12ef4e tests: Fix some instances of logged in session polluting test state.
In these tests, the code ends up with a logged in session when it's
undesired - later on these tests make requests to a different subdomain
- where a logged in session is not supposed to exist. This leads to an
unintended, strange situation where request.user is a user from the old
subdomain but the request itself is to a *different* subdomain. This
throws off get_realm_from_request, which will return the realm from
request.user.realm - which is not what these tests want and can lead to
these tests failing when some of the production code being tested
switches to using get_realm_from_request instead of
get_realm(get_subdomain).
2022-02-25 14:02:24 -08:00
Lauryn Menard 7a7f3337c1 tests: Fix unused `message_id` parameter in tests.
Various backend tests use the `PATCH /messages/{msg_id}` endpoint.
For that endpoint, the message ID is encoded in the URL path and
ignored if provided as a parameter in the the query.

Verified that the tests were providing the same message ID to both
the path and then removed the ignored parameter in the query.
2022-02-21 08:52:33 -08:00
Alex Vandiver a40b3e1118 realm_emoji: Stop swallowing all exceptions from upload_emoji_image.
Putting all of the logic in a `finally` block is equivalent to a bare
`except` block, which silently consumes all exceptions.

Move only the most-necessary parts into the except; this lets
`BadImageError` exceptions from `zerver/lib/upload.py` to escape,
allowing better the generic "Image file upload failed" to be replaced
with a more specific message.

It also allows unexpected exceptions, as the previous commit resolved,
to escape and 500.  This lets them be detected and resolved, rather
than give users a silently bad experience.
2022-02-17 12:19:47 -08:00
Alex Vandiver 96a5fa9d78 upload: Fix resizing non-animated images.
5dab6e9d31 began honoring the list of disposals for every frame.
Unfortunately, passing a list of disposals for a non-animated image
raises an exception:
```
  File "zerver/lib/upload.py", line 212, in resize_emoji
    image_data = resize_gif(im, size)
  File "zerver/lib/upload.py", line 165, in resize_gif
    frames[0].save(
  File "[...]/PIL/Image.py", line 2212, in save
    save_handler(self, fp, filename)
  File "[...]/PIL/GifImagePlugin.py", line 605, in _save
    _write_single_frame(im, fp, palette)
  File "[...]/PIL/GifImagePlugin.py", line 506, in _write_single_frame
    _write_local_header(fp, im, (0, 0), flags)
  File "[...]/PIL/GifImagePlugin.py", line 647, in _write_local_header
    disposal = int(im.encoderinfo.get("disposal", 0))
TypeError: int() argument must be a string, a bytes-like object or a
number, not 'list'
```

`check_add_realm_emoji` calls this as:

```
    try:
        is_animated = upload_emoji_image(image_file, emoji_file_name, a
uthor)
        emoji_uploaded_successfully = True
    finally:
        if not emoji_uploaded_successfully:
            realm_emoji.delete()
            return None
        # ...
```

This is equivalent to dropping _all_ exceptions silently.  As such,
Zulip has silently rejected all non-animated images larger than 64x64
since 5dab6e9d31.

Adjust to only pass a single disposal if there are no additional
frames.  Add a test for non-animated images, which requires also
fixing the incidental bug that all GIF images were being recorded as
animated, regardless of if they had more than 1 frame or not.
2022-02-17 12:19:47 -08:00
Aman Agrawal 7614f2203a pricing: Replace "Zulip Standard" with "Zulip Cloud Standard".
Case sensitive replace.
2022-02-09 11:00:24 -08:00
Anders Kaseorg b0ce4f1bce docs: Fix many spelling mistakes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-07 18:51:06 -08:00
Mateusz Mandera 4102816240 upload: Pass the target realm to create_attachment.
The target realm was not being passed to create_attachment in
upload_message_file implementations. This was a bug in the edge-case of
cross-realm messages - in particular, causing a bug in the email
gateway:
When an email with an attachment is sent, the message is mirrored to
Zulip with Email Gateway Bot as the message sender and uploader of the
attachment. Due to the realm not being passed to create_attachment, the
Attachment would get created with .realm being the system bot realm,
making the attachment inaccessible under some conditions due to failing
the following condition check (that's expected to pass, provided that
the .realm is set correctly):
```
    if (
        attachment.is_realm_public
        and attachment.realm == user_profile.realm
        and user_profile.can_access_public_streams()
    ):
        # Any user in the realm can access realm-public files
        return True
```
2022-01-27 17:23:44 -08:00
Anders Kaseorg b729f00fc2 test_upload: Uncomment subTest contexts.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-23 22:14:43 -08:00
Anders Kaseorg ba7ea7cc80 test_classes: Extract assert_streaming_content helper.
This also fixes a warning from
RealmExportTest.test_endpoint_local_uploads: “ResourceWarning:
unclosed file <_io.BufferedReader
name='/srv/zulip/var/…/test-export.tar.gz'>”.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-21 13:37:26 -08:00
Anders Kaseorg 4147da24dd tests: Use read_test_image_file.
Fixes a ResourceWarning from the unclosed file at test_upload.py:1954.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-13 14:59:46 -08:00
Tim Abbott e152f255f5 test_upload: Remove GIF file extension test.
This change should have been in the previous commit.
2021-12-16 16:16:34 -08:00
Steve Howell 2902f8b931 tests: Ensure stream senders get a UserMessage row.
We now complain if a test author sends a stream message
that does not result in the sender getting a
UserMessage row for the message.

This is basically 100% equivalent to complaining that
the author failed to subscribe the sender to the stream
as part of the test setup, as far as I can tell, so the
AssertionError instructs the author to subscribe the
sender to the stream.

We exempt bots from this check, although it is
plausible we should only exempt the system bots like
the notification bot.

I considered auto-subscribing the sender to the stream,
but that can be a little more expensive than the
current check, and we generally want test setup to be
explicit.

If there is some legitimate way than a subscribed human
sender can't get a UserMessage, then we probably want
an explicit test for that, or we may want to change the
backend to just write a UserMessage row in that
hypothetical situation.

For most tests, including almost all the ones fixed
here, the author just wants their test setup to
realistically reflect normal operation, and often devs
may not realize that Cordelia is not subscribed to
Denmark or not realize that Hamlet is not subscribed to
Scotland.

Some of us don't remember our Shakespeare from high
school, and our stream subscriptions don't even
necessarily reflect which countries the Bard placed his
characters in.

There may also be some legitimate use case where an
author wants to simulate sending a message to an
unsubscribed stream, but for those edge cases, they can
always set allow_unsubscribed_sender to True.
2021-12-10 09:40:04 -08:00
Eeshan Garg 2cdaae681d actions: Rename do_change_plan_type -> do change_realm_plan_type.
We will soon be adding an equivalent function for RemoteZulipServer,
so it makes sense to rename this function to be more descriptive.
2021-12-06 16:18:53 -08:00
Sahil Batra 88e21d0387 misc: Replace "night mode" with "dark theme" in comments. 2021-11-26 22:03:29 -08:00
Aman Agrawal 3e689ebae9 users: Allow spectators to view user avatars.
If realm is web_public, spectators can now view avatar of other
users.

There is a special exception we had to introduce in rest model to
allow `/avatar` type of urls for `anonymous` access, because they
don't have the /api/v1 prefix.

Fixes #19838.
2021-11-02 11:26:19 -07:00
Eeshan Garg b325a4f1be realm: Rename plan type constants to be more descriptive.
It is confusing to have the plan type constants not be namespaced
by the thing they represent. We already have a namespacing
convention in place for constants, so we should use it for
Realm.plan_type as well.
2021-10-19 12:20:39 -07:00