For resizing the icon.png files, we use resize_avatar, not resize_logo.
This is pretty confusing - sure, for icons we use the same function as
for avatars, but we should have a proper name for the function called in
the icon context. So this commit also adds resize_realm_icon, and
changes the calls to resize_avatar in icon contexts to
resize_realm_icon.
This commit adds `durable=True` to the outermost db transactions
created in the following:
* confirm_email_change
* handle_upload_pre_finish_hook
* deliver_scheduled_emails
* restore_data_from_archive
* do_change_realm_subdomain
* do_create_realm
* do_deactivate_realm
* do_reactivate_realm
* do_delete_user
* do_delete_user_preserving_messages
* create_stripe_customer
* process_initial_upgrade
* do_update_plan
* request_sponsorship
* upload_message_attachment
* register_remote_server
* do_soft_deactivate_users
* maybe_send_batched_emails
It helps to avoid creating unintended savepoints in the future.
This is as a part of our plan to explicitly mark all the
transaction.atomic calls with either 'savepoint=False' or
'durable=True' as required.
* 'savepoint=True' is used in special cases.
We didn't have thumbnailing for images coming from data import and this
commit adds the functionality.
There are a few fundamental issues that the implementation needs to
solve.
1. The images come from an untrusted source and therefore we don't want
to just pass them through to thumbnailing without checking. For that
reason, we cannot just import ImageAttachment rows from the export
data, even for zulip=>zulip imports.
The right way to process images is to pass them to maybe_thumbail(),
which runs libvips_check_image() on them to verify we're okay with
thumbnailing, creates ImageAttachment rows for them and sends them
to the thumbnailing queue worker. This approach lets us handle both
zulip=>zulip and 3rd party=>zulip imports in the same way,
2. There is a somewhat circular dependency between the Message,
Attachment and ImageAttachment import process:
- ImageAttachments would ideally be created after importing
Attachments, but they need to already exist at the time of Message
import. Otherwise, the markdown processor doesn't know it has to add
HTML for image previews to messages that reference images. This would
mean that messages imported from 3rd party tools don't get image
previews.
- Attachments only get created after Message import however, due to the
many-to-many relationship between Message and Attachment.
This is solved by fixing up some data of Attachments pre-emptively, such
as the path_ids. This gives us the necessary information for creating
ImageAttachments before importing Messages.
While we generate ImageAttachment rows synchronously, the actual
thumbnailing job is sent to the queue worker. Theoretically, the worker
could be very backlogged and not process the thumbnails anytime soon.
This is fine - if the app is loaded and tries to display a message with
such a not-yet-generated thumbnail, the code in `serve_file` will
generate the thumbnails synchronously on the fly and the user will see
the image preview displayed normally. See:
1b47134d0d/zerver/views/upload.py (L333-L342)
The Content-Type, Content-Disposition, StorageClass, and general
metadata are not set according to our patterns by tusd; copy the file
to itself to update those properties.
Setting `ResponseContentDisposition=attachment` means that we override
the stored `ContentDisposition`, which includes a filename. This
means that using the "Download" link on servers with S3 storage
produced a file named the sanitized version we stored.
Explicitly build a `ContentDisposition` to tell S3 to return, which
includes both `attachment` as well as the filename (if we have it
locally).
This allows finer-grained access control and auditing. The links
generated also expire after one week, and the suggested configuration
is that the underlying data does as well.
Co-authored-by: Prakhar Pratyush <prakhar@zulip.com>
The `get_signed_upload_url` code is called for every S3 file serve
request, and is thus in the hot path. The boto3 client caching
optimization is thus potentially useful as a performance optimization.
We may not always have trivial access to all of the bytes of the
uploaded file -- for instance, if the file was uploaded previously, or
by some other process. Downloading the entire image in order to check
its headers is an inefficient use of time and bandwidth.
Adjust `maybe_thumbnail` and dependencies to potentially take a
`pyvips.Source` which supports streaming data from S3 or disk. This
allows making the ImageAttachment row, if deemed appropriate, based on
only a few KB of data, and not the entire image.
A new table is created to track which path_id attachments are images,
and for those their metadata, and which thumbnails have been created.
Using path_id as the effective primary key lets us ignore if the
attachment is archived or not, saving some foreign key messes.
A new worker is added to observe events when rows are added to this
table, and to generate and store thumbnails for those images in
differing sizes and formats.
We thumbnail and serve emoji with the same format as they were
uploaded. However, we preserved the original extension, which might
mismatch with the provided content-type.
Limit the content-type to a subset which is both (a) an image format
we can thumbnail, and (b) a media format which is widely-enough
supported that we are willing to provide it to all browsers. This
prevents uploading a `.tiff` emoji, for instance.
Based on this limited content-type, we then reverse to find the
reasonable extension to use when storing it. This is particularly
important because the local file storage uses the file extension to
choose what content-type to re-serve the emoji as.
This does nothing for existing emoji, which may have odd or missing
file extensions.
Hash the salt, user-id, and now avatar version into the filename.
This allows the URL contents to be immutable, and thus to be marked as
immutable and cacheable. Since avatars are served unauthenticated,
hashing with a server-side salt makes the current and past avatars not
enumerable.
This requires plumbing the current (or future) avatar version through
various parts of the upload process.
Since this already requires a full migration of current avatars, also
take the opportunity to fix the missing `.png` on S3 uploads (#12852).
We switch from SHA-1 to SHA-256, but truncate it such that avatar URL
data does not substantially increase in size.
Fixes: #12852.
Due to recent refactoring in 9fb03cb2c7, a user could not
upload avatar if the server uses local upload backend and there
was already an avatar file for that user.
This commit fixes it to just check if there exists a file only
when importing and not when the user is actually trying to
change the avatar.
Fixes#30676.
This is done in as much of a drop-in fashion as possible. Note that
libvips does not support animated PNGs[^1], and as such this
conversion removes support for them as emoji; however, libvips
includes support for webp images, which future commits will take
advantage of.
This removes the MAX_EMOJI_GIF_SIZE limit, since that existed to work
around bugs in Pillow. MAX_EMOJI_GIF_FILE_SIZE_BYTES is fixed to
actually be 128KiB (not 128MiB, as it actually was), and is counted
_after_ resizing, since the point is to limit the amount of data
transfer to clients.
[^1]: https://github.com/libvips/libvips/discussions/2000