Commit Graph

56 Commits

Author SHA1 Message Date
Mateusz Mandera da4443f392 thumbnail: Make thumbnailing work with data import.
We didn't have thumbnailing for images coming from data import and this
commit adds the functionality.

There are a few fundamental issues that the implementation needs to
solve.

1. The images come from an untrusted source and therefore we don't want
   to just pass them through to thumbnailing without checking. For that
   reason, we cannot just import ImageAttachment rows from the export
   data, even for zulip=>zulip imports.
   The right way to process images is to pass them to maybe_thumbail(),
   which runs libvips_check_image() on them to verify we're okay with
   thumbnailing, creates ImageAttachment rows for them and sends them
   to the thumbnailing queue worker. This approach lets us handle both
   zulip=>zulip and 3rd party=>zulip imports in the same way,

2. There is a somewhat circular dependency between the Message,
   Attachment and ImageAttachment import process:

- ImageAttachments would ideally be created after importing
  Attachments, but they need to already exist at the time of Message
  import. Otherwise, the markdown processor doesn't know it has to add
  HTML for image previews to messages that reference images. This would
  mean that messages imported from 3rd party tools don't get image
  previews.
- Attachments only get created after Message import however, due to the
  many-to-many relationship between Message and Attachment.

This is solved by fixing up some data of Attachments pre-emptively, such
as the path_ids. This gives us the necessary information for creating
ImageAttachments before importing Messages.

While we generate ImageAttachment rows synchronously, the actual
thumbnailing job is sent to the queue worker. Theoretically, the worker
could be very backlogged and not process the thumbnails anytime soon.
This is fine - if the app is loaded and tries to display a message with
such a not-yet-generated thumbnail, the code in `serve_file` will
generate the thumbnails synchronously on the fly and the user will see
the image preview displayed normally. See:

1b47134d0d/zerver/views/upload.py (L333-L342)
2024-10-24 10:32:51 -07:00
Alex Vandiver 9a1f78db22 thumbnail: Support checking for images from streaming sources.
We may not always have trivial access to all of the bytes of the
uploaded file -- for instance, if the file was uploaded previously, or
by some other process.  Downloading the entire image in order to check
its headers is an inefficient use of time and bandwidth.

Adjust `maybe_thumbnail` and dependencies to potentially take a
`pyvips.Source` which supports streaming data from S3 or disk.  This
allows making the ImageAttachment row, if deemed appropriate, based on
only a few KB of data, and not the entire image.
2024-09-17 12:51:30 -07:00
Alex Vandiver ef21dd9b99 thumbnail: Set a stable ordering on ImageAttachment rows for locking.
Failure to have a stable ordering can lead to deadlocks.
2024-09-17 09:14:52 -07:00
Alex Vandiver 8bacdbc895 thumbnail: Put the original dimensions on spinner images.
This lets us reserve the right amount of space in the message feed
immediately.
2024-09-09 15:59:02 -07:00
Anders Kaseorg e3abd09e67 thumbnail: Fix corrupted email notifications due to HTML5 entities.
BeautifulSoup with formatter="html5" unnecessarily escapes many
characters with HTML5-specific entities that cannot be correctly
parsed by lxml during generation of email notifications.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-09-05 16:00:45 -07:00
Anders Kaseorg 91ade25ba3 python: Simplify with str.removeprefix, str.removesuffix.
These are available in Python ≥ 3.9.
https://docs.python.org/3/library/stdtypes.html#str.removeprefix

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-09-03 12:30:16 -07:00
Alex Vandiver 0c07c6531c thumbnail: Enqueue thumbnails when we render a spinner.
Thumbnails are usually enqueued in the worker when the image is
uploaded.  However, for images which were uploaded before the
existence of the thumbnailing worker, and whose metadata was
backfilled (see previous commit) this leaves a permanent spinner,
since nothing triggers the thumbnail worker for them.

Enqueue a thumbnail worker for every spinner which we render into
Markdown.  This ensures that _something_ is attempting to resolve the
spinner which the user sees.  In the case of freshly-uploaded images
which are still in the queue, this results in a duplicate entry in the
thumbnailing queue -- this is harmless, since the worker determines
that all of the thumbnails we need have already been generated, and it
does no further work.  However, in the case of historical uploads, it
properly kicks off the thumbnailing process and results in a
subsequent message update to include the freshly-generated thumbnail.

While specifically useful for backfilled uploads, this is also
generally a good safety step for a good user experience, as it also
prevents dropped events in the queue from unknown causes from leaving
perpetual spinners in the message feed.

Because `get_user_upload_previews` is potentially called twice for
every message with spinners (see 6f20c15ae9), we add an additional
flag to `get_user_upload_previews` to suppress a _second_ event from
being enqueued for every spinner generated.
2024-08-29 12:11:51 -07:00
Alex Vandiver 6f20c15ae9 thumbnail: Resolve a race condition when rendering messages.
Messages are rendered outside of a transaction, for performance
reasons, and then sent inside of one.  This opens thumbnailing up to a
race where the thumbnails have not yet been written when the message
is rendered, but the message has not been sent when thumbnailing
completes, causing `rewrite_thumbnailed_images` to be a no-op and the
message being left with a spinner which never resolves.

Explicitly lock and use he ImageAttachment data inside the
message-sending transaction, to rewrite the message content with the
latest information about the existing thumbnails.

Despite the thumbnailing worker taking a lock on Message rows to
update them, this does not lead to deadlocks -- the INSERT of the
Message rows happens in a transaction, ensuring that either the
message rending blocks the thumbnailing until the Message row is
created, or that the `rewrite_thumbnailed_images` and Message INSERT
waits until thumbnailing is complete (and updated no Message rows).
2024-08-01 16:48:16 -07:00
Mateusz Mandera a0971934d9 thumbnail: Fix typo in comment. 2024-07-30 00:17:59 +02:00
Alex Vandiver c726d2ec01 thumbnail: Do not Camo old thumbor URLs; serve images directly.
Providing a signed Camo URL for arbitrary URLs opened the server up to
being an open redirector.  Return 403 if the URL is not a user upload,
and the backend image if it is.  Since we do not have ImageAttachment
rows for uploads at a time we wrote `/thumbnail?` URLs, return the
full-size content.
2024-07-24 16:04:34 -07:00
Alex Vandiver e3a238fc89 thumbnail: Remove unused thumbnail sizes.
47683144ff switched the web client to prefer the 840x560 size, as the
mobile apps prefer; remove the now-unused 300x200 size.  No client was
using the generated `.jpg` formats, as all clients support `.webp`, so
remove the unused `.jpg` thumbnail as well.
2024-07-24 09:57:20 -07:00
Alex Vandiver e4a8304f57 thumbnail: Store the post-orientation-transformation dimensions.
Modern browsers respect the EXIF orientation information of images,
applying rotation and/or mirroring as specified in those tags.  The
the `width="..."` and `height="..."` tags are to size the image
_after_ applying those orientation transformations.

The `.width` and `.height` properties of libvips' images are _before_
any transformations are applied.  Since we intend to use these to hint
to rendering clients the size that the image should be _rendered at_,
change to storing (and providing to clients) the dimensions of the
rendered image, not the stored bytes.
2024-07-24 09:56:42 -07:00
Alex Vandiver 2ea0cc0005 thumbnail: Add a data-original-dimensions attribute.
This allows clients to potentially lay out the thumbnails more
intelligently, or to provide a better "progressive-load" experience
when enlarging the thumbnail.
2024-07-22 22:41:10 -04:00
Alex Vandiver 65828b20e9 thumbnail: Factor out a dataclass for markdown image metadata. 2024-07-22 22:41:10 -04:00
Alex Vandiver 3ac14632d8 thumbnail: Disable libvips cache.
The libvips cache is 100MB, 100 operations, or 100 files, whichever is
less.  A single Django process or worker is extremely unlikely to ever
see the same image twice, much less within those timeframes.

Disable the cache, since it is mostly useless memory usage for our use
case.
2024-07-22 10:19:33 -07:00
Alex Vandiver b42863be4b markdown: Show thumbnails for uploaded images.
Fixes: #16210.
2024-07-21 18:41:59 -07:00
Alex Vandiver 71406ac767 thumbnail: Factor frames into account for IMAGE_BOMB_TOTAL_PIXELS. 2024-07-21 18:41:59 -07:00
Alex Vandiver 4351cc5914 thumbnail: Move get_image_thumbnail_path and split_thumbnail_path. 2024-07-18 13:50:28 -07:00
Alex Vandiver 2e38f426f4 upload: Generate thumbnails when images are uploaded.
A new table is created to track which path_id attachments are images,
and for those their metadata, and which thumbnails have been created.
Using path_id as the effective primary key lets us ignore if the
attachment is archived or not, saving some foreign key messes.

A new worker is added to observe events when rows are added to this
table, and to generate and store thumbnails for those images in
differing sizes and formats.
2024-07-16 13:22:15 -07:00
Anders Kaseorg 0fa5e7f629 ruff: Fix UP035 Import from `collections.abc`, `typing` instead.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg 531b34cb4c ruff: Fix UP007 Use `X | Y` for type annotations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg e08a24e47f ruff: Fix UP006 Use `list` instead of `List` for type annotation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Alex Vandiver 544d3df057 thumbnail: Stop applying MAX_EMOJI_GIF_FILE_SIZE_BYTES before resizing.
b14a33c659 attempted to make the 128k limit apply _after_ resizing,
but left this check, which examines the pre-resized image size.
2024-07-12 13:26:47 -07:00
Alex Vandiver f6b99171ce emoji: Derive the file extension from a limited set of content-types.
We thumbnail and serve emoji with the same format as they were
uploaded.  However, we preserved the original extension, which might
mismatch with the provided content-type.

Limit the content-type to a subset which is both (a) an image format
we can thumbnail, and (b) a media format which is widely-enough
supported that we are willing to provide it to all browsers.  This
prevents uploading a `.tiff` emoji, for instance.

Based on this limited content-type, we then reverse to find the
reasonable extension to use when storing it.  This is particularly
important because the local file storage uses the file extension to
choose what content-type to re-serve the emoji as.

This does nothing for existing emoji, which may have odd or missing
file extensions.
2024-07-12 13:26:47 -07:00
Alex Vandiver 382cb5bb13 thumbnail: Lock down which formats we parse. 2024-07-11 07:31:39 -07:00
Alex Vandiver 4bc563128e thumbnail: Use a consistent set of supported image types. 2024-07-11 07:31:39 -07:00
Alex Vandiver a091b9ef81 thumbnail: Provide a more explicit hint than "Bad". 2024-07-11 07:31:39 -07:00
Alex Vandiver fb929ca218 thumbnailing: Remove unnecessary third return value from resize_emoji. 2024-06-26 16:43:09 -07:00
Alex Vandiver b14a33c659 thumbnailing: Switch to libvips, from PIL/pillow.
This is done in as much of a drop-in fashion as possible.  Note that
libvips does not support animated PNGs[^1], and as such this
conversion removes support for them as emoji; however, libvips
includes support for webp images, which future commits will take
advantage of.

This removes the MAX_EMOJI_GIF_SIZE limit, since that existed to work
around bugs in Pillow.  MAX_EMOJI_GIF_FILE_SIZE_BYTES is fixed to
actually be 128KiB (not 128MiB, as it actually was), and is counted
_after_ resizing, since the point is to limit the amount of data
transfer to clients.

[^1]: https://github.com/libvips/libvips/discussions/2000
2024-06-26 16:42:57 -07:00
Alex Vandiver 0153d6dbcd thumbnailing: Move resizing functions into zerver.lib.thumbnail. 2024-06-20 23:06:08 -04:00
Alex Vandiver 2c5dff7f59 thumbnailing: Remove unnecessary os.path adjustment.
This is a library file, not a binary; os.path is already set up.
2024-06-20 23:06:08 -04:00
Mateusz Mandera 2299aa3382 docs: Remove some outdated references to thumbnailing.md doc.
The doc was removed in 405bc8dabf
2022-07-12 17:44:24 -07:00
Anders Kaseorg 0b795e492f thumbnail: Remove unused is_camo_url parameter.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-19 01:51:37 -07:00
Anders Kaseorg 405bc8dabf requirements: Remove Thumbor.
Thumbor and tc-aws have been dragging their feet on Python 3 support
for years, and even the alphas and unofficial forks we’ve been running
don’t seem to be maintained anymore.  Depending on these projects is
no longer viable for us.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-06 20:07:32 -07:00
Anders Kaseorg dcdb00a5e6 python: Convert deprecated Django is_safe_url.
django.utils.http.is_safe_url is a deprecated alias of
django.utils.http.url_has_allowed_host_and_scheme as of Django 3.0,
and will be removed in Django 4.0.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-04-15 18:01:34 -07:00
Anders Kaseorg 6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 72d6ff3c3b docs: Fix more capitalization issues.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-23 11:46:55 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Anders Kaseorg 67e7a3631d python: Convert percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-10 15:02:09 -07:00
arpit551 d60efa1478 thumbor: Fix __file__ typo.
Replaced '__file__' typo with __file__ which used to add
wrong path to sys.path.
2020-04-12 11:23:03 -07:00
Anders Kaseorg c734bbd95d python: Modernize legacy Python 2 syntax with pyupgrade.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-09 16:43:22 -07:00
Mateusz Mandera 0e7c97378e is_safe_url: Use allowed_hosts instead of depreciated host argument.
Judging by comparing django 1.11 with django 2.2 code of this function,
this shouldn't change any behavior.
2020-02-04 12:46:53 -08:00
Anders Kaseorg 319e2231b8 thumbnail: Tighten fix for CVE-2019-19775 open redirect.
Due to a known but unfixed bug in the Python standard library’s
urllib.parse module (CVE-2015-2104), a crafted URL could bypass the
validation in the previous patch and still achieve an open redirect.

https://bugs.python.org/issue23505

Switch to using django.utils.http.is_safe_url, which already contains
a workaround for this bug.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-01-16 12:36:24 -08:00
Anders Kaseorg 8e37862b69 CVE-2019-19775: Close open redirect in thumbnail view.
This closes an open redirect vulnerability, one case of which was
found by Graham Bleaney and Ibrahim Mohamed using Pysa.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-12-12 17:29:20 -08:00
Aditya Bansal 079dfadf1a camo: Add endpoint to handle camo requests.
This endpoint serves requests which might originate from an image
preview link which had an http url and the message holding the image
link was rendered before we introduced thumbnailing. In that case
we would have used a camo proxy to proxy http content over https and
avoid mix content warnings.

In near future, we plan to drop use of camo and just rely on thumbor
to serve such images. This endpoint helps maintain backward
compatibility for links which were already rendered.
2019-01-04 10:27:04 -08:00
Aditya Bansal 26c6ef1834 thumbnails: Fix bug with use of filters in thumbnail generation.
We used to add sharpen filter for all the image sizes whereas it was
intended for resized images only which would have been smoothened
out a bit by the resize operation.
This unnecessary use of the filter used to result in weird issues
with full size images.
For example: Image located at this url:-
http://arqex.com/wp-content/uploads/2015/02/trees.png
When rendered in full size would have just boundaries visible.
2019-01-04 19:06:01 +05:30
Aditya Bansal a16bf34c7f thumbnailing: Fix oversharpening of thumbnails.
We seemed to have been doing too much of sharpening on the thumbnails.
The purpose of sharpening here was to just counter the softening
effects of a resize on an image but overdoing it is bad.

Value sharpen(0.5,0.2,true) seems to look good for achieving the
best results here on different displays as revealed in the manual
hit and trial based testing.

Thanks to @borisyankov for pointing out the issue and suggesting
the values.
2018-10-22 22:28:04 +05:30
Aditya Bansal 8324e2c976 thumbnails: Return original path if url is not supposed to be thumbnailed. 2018-10-16 16:00:47 -07:00