7.8 KiB
Thumbnailing
libvips
Zulip uses the libvips
image processing toolkit
for thumbnailing, as a low-memory and high-performance image processing
library. Some smaller images are thumbnailed synchronously inside the Django
process, but the majority of the work is offloaded to one or more thumbnail
worker processes.
Thumbnailing is a notoriously high-risk surface from a security standpoint,
since it parses arbitrary binary user input with often complex grammars. On
versions of libvips
which support it (>= 8.13, on or after Ubuntu 24.04 or
Debian 12), Zulip limits libvips
to only the image parsers and libraries whose
image formats we expect to parse, all of which are fuzz-tested by
oss-fuzz
.
Avatars
Avatar images are served at two of potential resolutions (100x100 and 500x500, the latter of which is called "medium"), and always as PNGs. These are served from a "dumb" endpoint -- that is, if S3 is used, we provide a direct link to the content in the S3 bucket (or a Cloudfront distribution in front of it), and the request does not pass through the Zulip server. This is because avatars are referenced in emails, and thus their URLs need to be permanent and publicly-accessible. This also means that any choice of resolution and file format needs to be entirely done by the client.
Avatars are thumbnailed synchronously upon upload into 100x100 and 500x500 PNGs; the originals are not preserved. The smallest dimension is scaled to fit, and the largest dimension is cropped centered; the image may be scaled up to fit the 100x100 or 500x500 dimensions. To generate the filename, the server hashes the avatar salt (a server-side secret), the user-id, and a per-user sequence (the "version") to produce a filename which is not enumerable, and can only be determined by the server. Hashing the version means that avatars can be served with long-lasting caching headers.
The original avatar image is stored adjacent to the thumbnailed versions, enabling later re-thumbnailing to other dimensions or formats without requiring users to re-upload it.
Emoji
Emoji URLs are hard-coded into emails, and as such their URLs need to be permanent and publicly-accessible. They are served at a consistent 1:1 aspect ratio, and while they may be rendered at different scales based on the line-height of the client, we only need to store them at one resolution.
Emoji are thumbnailed synchronously into 64x64 images, and they are saved in the same file format that they were uploaded in. Transparent pixels are added to the smaller dimension to make the image square after resizing. The filename of the emoji is based on a hash of the avatar salt (a server-side secret) and the emoji's id -- but because the filename is stored in the database, it can be anything with sufficient entropy to not be enumerable or have collisions.
For animated emoji, a separate "still" version of the emoji is generated from the first frame, as a 64x64 PNG image. This is currently mostly unused, but is intended to be part of a user preference to disable emoji animations (see #13434). Current use is limited to user status display in the the buddy list. When a user uses an animated emoji as their status, the "still" version is used.
The original emoji is stored adjacent to the thumbnailed version, enabling later re-thumbnailing to other dimensions or formats without requiring users to re-upload it.
There is no technical reason that we preserve the uploader's choice of file format, or that we use PNGs as the file format for the "still" version. Both of these would plausibly benefit from being WebP images.
Realm logos
Realm logos are converted to PNGs, thumbnailed down to fit within 800x100; a 1000x10 pixel image will end up as 800x8, and a 10x20 will end up 10x20. The original is stored adjacent to the converted thumbnail.
Realm icons
Realm icons are converted to PNGs, and treated identical to avatars, albeit only producing the 100x100 size.
File uploads
Images
When an image file (as determined by the browser-supplied content-type) is
uploaded, we immediately upload the original content into S3 or onto disk. Its
headers are then examined, and used to create an ImageAttachment row, with
properties determined from the image; thumbnailed_metadata
is left empty. A
task is dispatched to the thumbnail
worker to generate thumbnails in all of
the format/size combinations that the server currently has configured.
Because we parse the image headers enough to extract size information at upload time, this also serves as a check that the upload is indeed a valid image. If the image is determined to be invalid at this stage, the file upload returns 200, but the message content is left with a link to the uploaded content, not an inline image.
When a message is sent, it checks the ImageAttachment rows for each referenced
image; if they have a non-empty thumbnailed_metadata
, then it writes out an
img
tag pointing to one of them (see below); otherwise, it writes out a
specially-tagged "spinner" image, which indicates the server is still processing
the upload. The image tag encodes the original dimensions and if the image is
animated into the rendered content so clients can reserve the appropriate space
in the viewport.
If a message is rendered with a spinner, it also inserts the image into the
thumbnail
worker's queue. This is generally redundant -- the image was
inserted into the queue when the image was uploaded. The exception is if the
image was uploaded prior to the existence of thumbnailing support, in which case
the additional queue insertion is required to have the spinner ever resolve.
Since the worker takes no action if all necessary thumbnails already exist,
this has little cost in general.
The thumbnail
worker generates the thumbnails, uploads them to S3 or disk, and
then updates the thumbnailed_metadata
of the ImageAttachment row to contain a
list of formats/sizes which thumbnails were generated in. At the time of commit,
if there are already messages which reference the attachment row, then we do a
"silent" update of all of them to remove the "spinner" and insert an image.
In either case, the image which is inserted into the message body is at a "reasonable" scale and format, as decided by the server. The paths to all the generated thumbnails are not specified in the message content -- instead, the client is told at registration time the set of formats/sizes which the server supports, and knows how to transform any single thumbnailed path into any of the other supported thumbnail variants. The client is responsible for choosing the most appropriate format/size based on viewport size and format support, and rewriting the URL accordingly.
All requests for images go through /user_uploads
, which is processed by
Django. Any request for an ImageAttachment URL is first determined to be a valid
format/size for the server's current configuration; if is not valid, the server
may return any other thumbnail of its choosing (preferring similar sizes, and
accepted formats based on the client's Accepts
header).
If the request is for a thumbnail format/size which is supported by the server,
but not in the ImageAttachment's thumbnailed_metadata
(as would happen if the
server's supported set is added to over time) then the server should generate,
store, and return the requested format/size on-demand.
Migrations
Historical image uploads have ImageAttachment rows generated for them, but not thumbnails. If the message content is re-rendered (for instance, due to being edited) then it will trigger the image to be thumbnailed.
Videos and PDFs
The thumbnailing system only processes images; it does not transcode videos or produce image renderings of documents (e.g., PDFs), though those are natural potential extensions.