docs: Add description of thumbnailing system.

2024-08-29 15:59:19 -04:00 · 2024-08-29 15:59:19 -04:00 · 07ce4f0bc0
parent 9933165e75
commit 07ce4f0bc0
2 changed files with 150 additions and 0 deletions
--- a/docs/subsystems/index.md
+++ b/docs/subsystems/index.md
@ -38,4 +38,5 @@ unread_messages
 billing
 widgets
 slash-commands
+thumbnailing
 ```
--- a/docs/subsystems/thumbnailing.md
+++ b/docs/subsystems/thumbnailing.md
@ -0,0 +1,149 @@
+# Thumbnailing
+
+## libvips
+
+Zulip uses the [`libvips`](https://www.libvips.org/) image processing toolkit
+for thumbnailing, as a low-memory and high-performance image processing
+library. Some smaller images are thumbnailed synchronously inside the Django
+process, but the majority of the work is offloaded to one or more `thumbnail`
+worker processes.
+
+Thumbnailing is a notoriously high-risk surface from a security standpoint,
+since it parses arbitrary binary user input with often complex grammars. On
+versions of `libvips` which support it (>= 8.13, on or after Ubuntu 24.04 or
+Debian 12), Zulip limits `libvips` to only the image parsers and libraries whose
+image formats we expect to parse, all of which are fuzz-tested by
+[`oss-fuzz`](https://google.github.io/oss-fuzz/).
+
+## Avatars
+
+Avatar images are served at two of potential resolutions (100x100 and 500x500,
+the latter of which is called "medium"), and always as PNGs. These are served
+from a "dumb" endpoint -- that is, if S3 is used, we provide a direct link to
+the content in the S3 bucket (or a Cloudfront distribution in front of it), and
+the request does not pass through the Zulip server. This is because avatars are
+referenced in emails, and thus their URLs need to be permanent and
+publicly-accessible. This also means that any choice of resolution and file
+format needs to be entirely done by the client.
+
+Avatars are thumbnailed synchronously upon upload into 100x100 and 500x500 PNGs;
+the originals are not preserved. The smallest dimension is scaled to fit, and
+the largest dimension is cropped centered; the image may be scaled _up_ to fit
+the 100x100 or 500x500 dimensions. To generate the filename, the server hashes
+the avatar salt (a server-side secret), the user-id, and a per-user sequence
+(the "version") to produce a filename which is not enumerable, and can only be
+determined by the server. Hashing the version means that avatars can be served
+with long-lasting caching headers.
+
+The original avatar image is stored adjacent to the thumbnailed versions,
+enabling later re-thumbnailing to other dimensions or formats without requiring
+users to re-upload it.
+
+## Emoji
+
+Emoji URLs are hard-coded into emails, and as such their URLs need to be
+permanent and publicly-accessible. They are served at a consistent 1:1 aspect
+ratio, and while they may be rendered at different scales based on the
+line-height of the client, we only need to store them at one resolution.
+
+Emoji are thumbnailed synchronously into 64x64 images, and they are saved in
+the same file format that they were uploaded in. Transparent pixels are added
+to the smaller dimension to make the image square after resizing. The filename
+of the emoji is based on a hash of the avatar salt (a server-side secret) and
+the emoji's id -- but because the filename is stored in the database, it can be
+anything with sufficient entropy to not be enumerable or have collisions.
+
+For animated emoji, a separate "still" version of the emoji is generated from
+the first frame, as a 64x64 PNG image. This is not currently used, but is
+intended to be part of a user preference to disable emoji animations (see
+[#13434](https://github.com/zulip/zulip/issues/13434)).
+
+The original emoji is stored adjacent to the thumbnailed version, enabling later
+re-thumbnailing to other dimensions or formats without requiring users to
+re-upload it.
+
+There is no technical reason that we preserve the uploader's choice of file
+format, or that we use PNGs as the file format for the "still" version. Both of
+these would plausibly benefit from being WebP images.
+
+## Realm logos
+
+Realm logos are converted to PNGs, thumbnailed down to fit within 800x100; a
+1000x10 pixel image will end up as 800x8, and a 10x20 will end up 10x20. The
+original is stored adjacent to the converted thumbnail.
+
+## Realm icons
+
+Realm icons are converted to PNGs, and treated identical to avatars, albeit only
+producing the 100x100 size.
+
+## File uploads
+
+### Images
+
+When an image file (as determined by the browser-supplied content-type) is
+uploaded, we immediately upload the original content into S3 or onto disk. Its
+headers are then examined, and used to create an ImageAttachment row, with
+properties determined from the image; `thumbnailed_metadata` is left empty. A
+task is dispatched to the `thumbnail` worker to generate thumbnails in all of
+the format/size combinations that the server currently has configured.
+
+Because we parse the image headers enough to extract size information at upload
+time, this also serves as a check that the upload is indeed a valid image. If
+the image is determined to be invalid at this stage, the file upload returns
+200, but the message content is left with a link to the uploaded content, not an
+inline image.
+
+When a message is sent, it checks the ImageAttachment rows for each referenced
+image; if they have a non-empty `thumbnailed_metadata`, then it writes out an
+`img` tag pointing to one of them (see below); otherwise, it writes out a
+specially-tagged "spinner" image, which indicates the server is still processing
+the upload. The image tag encodes the original dimensions and if the image is
+animated into the rendered content so clients can reserve the appropriate space
+in the viewport.
+
+If a message is rendered with a spinner, it also inserts the image into the
+`thumbnail` worker's queue. This is generally redundant -- the image was
+inserted into the queue when the image was uploaded. The exception is if the
+image was uploaded prior to the existence of thumbnailing support, in which case
+the additional is required to have the spinner ever resolve. Since the worker
+takes no action if all necessary thumbnails already exist, this has little cost
+in general.
+
+The `thumbnail` worker generates the thumbnails, uploads them to S3 or disk, and
+then updates the `thumbnailed_metadata` of the ImageAttachment row to contain a
+list of formats/sizes which thumbnails were generated in. At the time of commit,
+if there are already messages which reference the attachment row, then we do a
+"silent" update of all of them to remove the "spinner" and insert an image.
+
+In either case, the image which is inserted into the message body is at a
+"reasonable" scale and format, as decided by the server. The paths to all the
+generated thumbnails are not specified in the message content -- instead, the
+client is told at registration time the set of formats/sizes which the server
+supports, and knows how to transform any single thumbnailed path into any of the
+other supported thumbnail variants. The client is responsible for choosing the
+most appropriate format/size based on viewport size and format support, and
+rewriting the URL accordingly.
+
+All requests for images go through `/user_uploads`, which is processed by
+Django. Any request for an ImageAttachment URL is first determined to be a valid
+format/size for the server's current configuration; if is not valid, the server
+may return any other thumbnail of its choosing (preferring similar sizes, and
+accepted formats based on the client's `Accepts` header).
+
+If the request is for a thumbnail format/size which is supported by the server,
+but not in the ImageAttachment's `thumbnailed_metadata` (as would happen if the
+server's supported set is added to over time) then the server should generate,
+store, and return the requested format/size on-demand.
+
+### Migrations
+
+Historical image uploads have ImageAttachment rows generated for them, but not
+thumbnails. If the message content is re-rendered (for instance, due to being
+edited) then it will trigger the image to be thumbnailed.
+
+### Videos and PDFs
+
+The thumbnailing system only processes images; it does not transcode videos or produce
+image renderings of documents (e.g., PDFs), though those are natural potential
+extensions.