zulip

Commit Graph

Author	SHA1	Message	Date
Mateusz Mandera	da4443f392	thumbnail: Make thumbnailing work with data import. We didn't have thumbnailing for images coming from data import and this commit adds the functionality. There are a few fundamental issues that the implementation needs to solve. 1. The images come from an untrusted source and therefore we don't want to just pass them through to thumbnailing without checking. For that reason, we cannot just import ImageAttachment rows from the export data, even for zulip=>zulip imports. The right way to process images is to pass them to maybe_thumbail(), which runs libvips_check_image() on them to verify we're okay with thumbnailing, creates ImageAttachment rows for them and sends them to the thumbnailing queue worker. This approach lets us handle both zulip=>zulip and 3rd party=>zulip imports in the same way, 2. There is a somewhat circular dependency between the Message, Attachment and ImageAttachment import process: - ImageAttachments would ideally be created after importing Attachments, but they need to already exist at the time of Message import. Otherwise, the markdown processor doesn't know it has to add HTML for image previews to messages that reference images. This would mean that messages imported from 3rd party tools don't get image previews. - Attachments only get created after Message import however, due to the many-to-many relationship between Message and Attachment. This is solved by fixing up some data of Attachments pre-emptively, such as the path_ids. This gives us the necessary information for creating ImageAttachments before importing Messages. While we generate ImageAttachment rows synchronously, the actual thumbnailing job is sent to the queue worker. Theoretically, the worker could be very backlogged and not process the thumbnails anytime soon. This is fine - if the app is loaded and tries to display a message with such a not-yet-generated thumbnail, the code in `serve_file` will generate the thumbnails synchronously on the fly and the user will see the image preview displayed normally. See: `1b47134d0d/zerver/views/upload.py (L333-L342)`	2024-10-24 10:32:51 -07:00
Alex Vandiver	9a1f78db22	thumbnail: Support checking for images from streaming sources. We may not always have trivial access to all of the bytes of the uploaded file -- for instance, if the file was uploaded previously, or by some other process. Downloading the entire image in order to check its headers is an inefficient use of time and bandwidth. Adjust `maybe_thumbnail` and dependencies to potentially take a `pyvips.Source` which supports streaming data from S3 or disk. This allows making the ImageAttachment row, if deemed appropriate, based on only a few KB of data, and not the entire image.	2024-09-17 12:51:30 -07:00
Alex Vandiver	ef21dd9b99	thumbnail: Set a stable ordering on ImageAttachment rows for locking. Failure to have a stable ordering can lead to deadlocks.	2024-09-17 09:14:52 -07:00
Alex Vandiver	8bacdbc895	thumbnail: Put the original dimensions on spinner images. This lets us reserve the right amount of space in the message feed immediately.	2024-09-09 15:59:02 -07:00
Anders Kaseorg	e3abd09e67	thumbnail: Fix corrupted email notifications due to HTML5 entities. BeautifulSoup with formatter="html5" unnecessarily escapes many characters with HTML5-specific entities that cannot be correctly parsed by lxml during generation of email notifications. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-09-05 16:00:45 -07:00
Anders Kaseorg	91ade25ba3	python: Simplify with str.removeprefix, str.removesuffix. These are available in Python ≥ 3.9. https://docs.python.org/3/library/stdtypes.html#str.removeprefix Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-09-03 12:30:16 -07:00
Alex Vandiver	0c07c6531c	thumbnail: Enqueue thumbnails when we render a spinner. Thumbnails are usually enqueued in the worker when the image is uploaded. However, for images which were uploaded before the existence of the thumbnailing worker, and whose metadata was backfilled (see previous commit) this leaves a permanent spinner, since nothing triggers the thumbnail worker for them. Enqueue a thumbnail worker for every spinner which we render into Markdown. This ensures that _something_ is attempting to resolve the spinner which the user sees. In the case of freshly-uploaded images which are still in the queue, this results in a duplicate entry in the thumbnailing queue -- this is harmless, since the worker determines that all of the thumbnails we need have already been generated, and it does no further work. However, in the case of historical uploads, it properly kicks off the thumbnailing process and results in a subsequent message update to include the freshly-generated thumbnail. While specifically useful for backfilled uploads, this is also generally a good safety step for a good user experience, as it also prevents dropped events in the queue from unknown causes from leaving perpetual spinners in the message feed. Because `get_user_upload_previews` is potentially called twice for every message with spinners (see `6f20c15ae9`), we add an additional flag to `get_user_upload_previews` to suppress a _second_ event from being enqueued for every spinner generated.	2024-08-29 12:11:51 -07:00
Alex Vandiver	6f20c15ae9	thumbnail: Resolve a race condition when rendering messages. Messages are rendered outside of a transaction, for performance reasons, and then sent inside of one. This opens thumbnailing up to a race where the thumbnails have not yet been written when the message is rendered, but the message has not been sent when thumbnailing completes, causing `rewrite_thumbnailed_images` to be a no-op and the message being left with a spinner which never resolves. Explicitly lock and use he ImageAttachment data inside the message-sending transaction, to rewrite the message content with the latest information about the existing thumbnails. Despite the thumbnailing worker taking a lock on Message rows to update them, this does not lead to deadlocks -- the INSERT of the Message rows happens in a transaction, ensuring that either the message rending blocks the thumbnailing until the Message row is created, or that the `rewrite_thumbnailed_images` and Message INSERT waits until thumbnailing is complete (and updated no Message rows).	2024-08-01 16:48:16 -07:00
Mateusz Mandera	a0971934d9	thumbnail: Fix typo in comment.	2024-07-30 00:17:59 +02:00
Alex Vandiver	c726d2ec01	thumbnail: Do not Camo old thumbor URLs; serve images directly. Providing a signed Camo URL for arbitrary URLs opened the server up to being an open redirector. Return 403 if the URL is not a user upload, and the backend image if it is. Since we do not have ImageAttachment rows for uploads at a time we wrote `/thumbnail?` URLs, return the full-size content.	2024-07-24 16:04:34 -07:00
Alex Vandiver	e3a238fc89	thumbnail: Remove unused thumbnail sizes. `47683144ff` switched the web client to prefer the 840x560 size, as the mobile apps prefer; remove the now-unused 300x200 size. No client was using the generated `.jpg` formats, as all clients support `.webp`, so remove the unused `.jpg` thumbnail as well.	2024-07-24 09:57:20 -07:00
Alex Vandiver	e4a8304f57	thumbnail: Store the post-orientation-transformation dimensions. Modern browsers respect the EXIF orientation information of images, applying rotation and/or mirroring as specified in those tags. The the `width="..."` and `height="..."` tags are to size the image _after_ applying those orientation transformations. The `.width` and `.height` properties of libvips' images are _before_ any transformations are applied. Since we intend to use these to hint to rendering clients the size that the image should be _rendered at_, change to storing (and providing to clients) the dimensions of the rendered image, not the stored bytes.	2024-07-24 09:56:42 -07:00
Alex Vandiver	2ea0cc0005	thumbnail: Add a data-original-dimensions attribute. This allows clients to potentially lay out the thumbnails more intelligently, or to provide a better "progressive-load" experience when enlarging the thumbnail.	2024-07-22 22:41:10 -04:00
Alex Vandiver	65828b20e9	thumbnail: Factor out a dataclass for markdown image metadata.	2024-07-22 22:41:10 -04:00
Alex Vandiver	3ac14632d8	thumbnail: Disable libvips cache. The libvips cache is 100MB, 100 operations, or 100 files, whichever is less. A single Django process or worker is extremely unlikely to ever see the same image twice, much less within those timeframes. Disable the cache, since it is mostly useless memory usage for our use case.	2024-07-22 10:19:33 -07:00
Alex Vandiver	b42863be4b	markdown: Show thumbnails for uploaded images. Fixes: #16210.	2024-07-21 18:41:59 -07:00
Alex Vandiver	71406ac767	thumbnail: Factor frames into account for IMAGE_BOMB_TOTAL_PIXELS.	2024-07-21 18:41:59 -07:00
Alex Vandiver	4351cc5914	thumbnail: Move get_image_thumbnail_path and split_thumbnail_path.	2024-07-18 13:50:28 -07:00
Alex Vandiver	2e38f426f4	upload: Generate thumbnails when images are uploaded. A new table is created to track which path_id attachments are images, and for those their metadata, and which thumbnails have been created. Using path_id as the effective primary key lets us ignore if the attachment is archived or not, saving some foreign key messes. A new worker is added to observe events when rows are added to this table, and to generate and store thumbnails for those images in differing sizes and formats.	2024-07-16 13:22:15 -07:00
Anders Kaseorg	0fa5e7f629	ruff: Fix UP035 Import from `collections.abc`, `typing` instead. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-07-13 22:28:22 -07:00
Anders Kaseorg	531b34cb4c	ruff: Fix UP007 Use `X \| Y` for type annotations. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-07-13 22:28:22 -07:00
Anders Kaseorg	e08a24e47f	ruff: Fix UP006 Use `list` instead of `List` for type annotation. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-07-13 22:28:22 -07:00
Alex Vandiver	544d3df057	thumbnail: Stop applying MAX_EMOJI_GIF_FILE_SIZE_BYTES before resizing. `b14a33c659` attempted to make the 128k limit apply _after_ resizing, but left this check, which examines the pre-resized image size.	2024-07-12 13:26:47 -07:00
Alex Vandiver	f6b99171ce	emoji: Derive the file extension from a limited set of content-types. We thumbnail and serve emoji with the same format as they were uploaded. However, we preserved the original extension, which might mismatch with the provided content-type. Limit the content-type to a subset which is both (a) an image format we can thumbnail, and (b) a media format which is widely-enough supported that we are willing to provide it to all browsers. This prevents uploading a `.tiff` emoji, for instance. Based on this limited content-type, we then reverse to find the reasonable extension to use when storing it. This is particularly important because the local file storage uses the file extension to choose what content-type to re-serve the emoji as. This does nothing for existing emoji, which may have odd or missing file extensions.	2024-07-12 13:26:47 -07:00
Alex Vandiver	382cb5bb13	thumbnail: Lock down which formats we parse.	2024-07-11 07:31:39 -07:00
Alex Vandiver	4bc563128e	thumbnail: Use a consistent set of supported image types.	2024-07-11 07:31:39 -07:00
Alex Vandiver	a091b9ef81	thumbnail: Provide a more explicit hint than "Bad".	2024-07-11 07:31:39 -07:00
Alex Vandiver	fb929ca218	thumbnailing: Remove unnecessary third return value from resize_emoji.	2024-06-26 16:43:09 -07:00
Alex Vandiver	b14a33c659	thumbnailing: Switch to libvips, from PIL/pillow. This is done in as much of a drop-in fashion as possible. Note that libvips does not support animated PNGs[^1], and as such this conversion removes support for them as emoji; however, libvips includes support for webp images, which future commits will take advantage of. This removes the MAX_EMOJI_GIF_SIZE limit, since that existed to work around bugs in Pillow. MAX_EMOJI_GIF_FILE_SIZE_BYTES is fixed to actually be 128KiB (not 128MiB, as it actually was), and is counted _after_ resizing, since the point is to limit the amount of data transfer to clients. [^1]: https://github.com/libvips/libvips/discussions/2000	2024-06-26 16:42:57 -07:00
Alex Vandiver	0153d6dbcd	thumbnailing: Move resizing functions into zerver.lib.thumbnail.	2024-06-20 23:06:08 -04:00
Alex Vandiver	2c5dff7f59	thumbnailing: Remove unnecessary os.path adjustment. This is a library file, not a binary; os.path is already set up.	2024-06-20 23:06:08 -04:00
Mateusz Mandera	2299aa3382	docs: Remove some outdated references to thumbnailing.md doc. The doc was removed in `405bc8dabf`	2022-07-12 17:44:24 -07:00
Anders Kaseorg	0b795e492f	thumbnail: Remove unused is_camo_url parameter. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-19 01:51:37 -07:00
Anders Kaseorg	405bc8dabf	requirements: Remove Thumbor. Thumbor and tc-aws have been dragging their feet on Python 3 support for years, and even the alphas and unofficial forks we’ve been running don’t seem to be maintained anymore. Depending on these projects is no longer viable for us. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-05-06 20:07:32 -07:00
Anders Kaseorg	dcdb00a5e6	python: Convert deprecated Django is_safe_url. django.utils.http.is_safe_url is a deprecated alias of django.utils.http.url_has_allowed_host_and_scheme as of Django 3.0, and will be removed in Django 4.0. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-04-15 18:01:34 -07:00
Anders Kaseorg	6e4c3e41dc	python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	11741543da	python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	72d6ff3c3b	docs: Fix more capitalization issues. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-10-23 11:46:55 -07:00
Anders Kaseorg	365fe0b3d5	python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-11 16:45:32 -07:00
Anders Kaseorg	69730a78cc	python: Use trailing commas consistently. Automatically generated by the following script, based on the output of lint with flake8-comma: import re import sys last_filename = None last_row = None lines = [] for msg in sys.stdin: m = re.match( r"\x1b\[35mflake8 \\|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg ) if m: filename, row_str, col_str, err = m.groups() row, col = int(row_str), int(col_str) if filename == last_filename: assert last_row != row else: if last_filename is not None: with open(last_filename, "w") as f: f.writelines(lines) with open(filename) as f: lines = f.readlines() last_filename = filename last_row = row line = lines[row - 1] if err in ["C812", "C815"]: lines[row - 1] = line[: col - 1] + "," + line[col - 1 :] elif err in ["C819"]: assert line[col - 2] == "," lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ") if last_filename is not None: with open(last_filename, "w") as f: f.writelines(lines) Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-06-11 16:04:12 -07:00
Anders Kaseorg	67e7a3631d	python: Convert percent formatting to Python 3.6 f-strings. Generated by pyupgrade --py36-plus. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-10 15:02:09 -07:00
arpit551	d60efa1478	thumbor: Fix __file__ typo. Replaced '__file__' typo with __file__ which used to add wrong path to sys.path.	2020-04-12 11:23:03 -07:00
Anders Kaseorg	c734bbd95d	python: Modernize legacy Python 2 syntax with pyupgrade. Generated by `pyupgrade --py3-plus --keep-percent-format` on all our Python code except `zthumbor` and `zulip-ec2-configure-interfaces`, followed by manual indentation fixes. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-09 16:43:22 -07:00
Mateusz Mandera	0e7c97378e	is_safe_url: Use allowed_hosts instead of depreciated host argument. Judging by comparing django 1.11 with django 2.2 code of this function, this shouldn't change any behavior.	2020-02-04 12:46:53 -08:00
Anders Kaseorg	319e2231b8	thumbnail: Tighten fix for CVE-2019-19775 open redirect. Due to a known but unfixed bug in the Python standard library’s urllib.parse module (CVE-2015-2104), a crafted URL could bypass the validation in the previous patch and still achieve an open redirect. https://bugs.python.org/issue23505 Switch to using django.utils.http.is_safe_url, which already contains a workaround for this bug. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-01-16 12:36:24 -08:00
Anders Kaseorg	8e37862b69	CVE-2019-19775: Close open redirect in thumbnail view. This closes an open redirect vulnerability, one case of which was found by Graham Bleaney and Ibrahim Mohamed using Pysa. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-12-12 17:29:20 -08:00
Aditya Bansal	079dfadf1a	camo: Add endpoint to handle camo requests. This endpoint serves requests which might originate from an image preview link which had an http url and the message holding the image link was rendered before we introduced thumbnailing. In that case we would have used a camo proxy to proxy http content over https and avoid mix content warnings. In near future, we plan to drop use of camo and just rely on thumbor to serve such images. This endpoint helps maintain backward compatibility for links which were already rendered.	2019-01-04 10:27:04 -08:00
Aditya Bansal	26c6ef1834	thumbnails: Fix bug with use of filters in thumbnail generation. We used to add sharpen filter for all the image sizes whereas it was intended for resized images only which would have been smoothened out a bit by the resize operation. This unnecessary use of the filter used to result in weird issues with full size images. For example: Image located at this url:- http://arqex.com/wp-content/uploads/2015/02/trees.png When rendered in full size would have just boundaries visible.	2019-01-04 19:06:01 +05:30
Aditya Bansal	a16bf34c7f	thumbnailing: Fix oversharpening of thumbnails. We seemed to have been doing too much of sharpening on the thumbnails. The purpose of sharpening here was to just counter the softening effects of a resize on an image but overdoing it is bad. Value sharpen(0.5,0.2,true) seems to look good for achieving the best results here on different displays as revealed in the manual hit and trial based testing. Thanks to @borisyankov for pointing out the issue and suggesting the values.	2018-10-22 22:28:04 +05:30
Aditya Bansal	8324e2c976	thumbnails: Return original path if url is not supposed to be thumbnailed.	2018-10-16 16:00:47 -07:00

1 2

56 Commits