zulip/zerver
Alex Vandiver 3cbbf2307b thumbnail: Only lock the message row, not the Attachment row.
This prevents a deadlock between the thumbnailing worker and message
sending, as follows:

1. A user uploads an image, making Attachment and ImageAttachment
   rows, as well as enqueuing a job in the thumbnailing queue.

2. Message sending starts a transaction, creates the Message row,
   and calls `do_claim_attachments`, which edits the Attachment row
   of the upload (implicitly locking it).

3. The thumbnailing worker starts a transaction, locks the
   ImageAttachment row for its image, thumbnails it, and then
   attempts to `select_for_update()` the message objects (joined to
   the Attachments table) to find the ones which link to the
   attachment in question. This query blocks, since "a locking
   clause without a table list affects all tables used in the
   statement"[^1] and the message-send request already has a write
   lock on the Attachments row in question.

4. The message-send request attempts to re-fetch the ImageAttachment
   row inside the transaction, which tries to pull a lock on it.

5. Deadlock, because the message-send request has the Attachment
   lock, and waits for the ImageAttachment lock; the thumbnailing
   worker has the ImageAttachment lock, and waits for the Attachment
   lock.

We break this deadlock by limiting the
`update_message_rendered_content` `select_for_update` to only take
the lock on the Message table, and not also the Attachments table --
no changes will be made to the Attachments, so no lock is necessary
there. This allows the thumbnailing worker to successfully pull the
empty list of messages (since the message-send request has not
commits its transaction, and thus the Message row is not visible
yet), and release its ImageAttachment lock so that the message-send
request can proceed.

[^1]: https://www.postgresql.org/docs/current/sql-select.html#SQL-FOR-UPDATE-SHARE
2024-10-04 09:10:14 -07:00
..
actions analytics: Pass subgroup=None to improve indexing. 2024-10-02 14:11:44 -04:00
data_import slack_import: Strip port from "domain_name". 2024-09-26 12:01:11 -07:00
integration_fixtures/nagios
lib message: Do not differentiate topics by case when aggregating. 2024-10-03 16:35:45 -07:00
management backup: Dereference symlinks when building tarball. 2024-10-01 09:51:54 -07:00
migrations realm: Remove user_group_edit_policy from the backend. 2024-10-01 17:35:14 -07:00
models analytics: Pass subgroup=None to improve indexing. 2024-10-02 14:11:44 -04:00
openapi settings: Add group_creator as default for can_manage_group. 2024-10-01 17:35:14 -07:00
tests message: Do not differentiate topics by case when aggregating. 2024-10-03 16:35:45 -07:00
tornado django_api: Rename 'send_event' to 'send_event_rollback_unsafe'. 2024-09-20 15:20:18 -07:00
transaction_tests settings: Add group_creator as default for can_manage_group. 2024-10-01 17:35:14 -07:00
views tusd: Do not delete the .info files. 2024-10-02 13:21:04 -07:00
webhooks slack_incoming: add ok=true to json in case of success. 2024-10-04 08:42:27 -07:00
worker thumbnail: Only lock the message row, not the Attachment row. 2024-10-04 09:10:14 -07:00
__init__.py
apps.py ruff: Fix UP007 Use `X | Y` for type annotations. 2024-07-13 22:28:22 -07:00
context_processors.py ruff: Fix UP035 Import from `collections.abc`, `typing` instead. 2024-07-13 22:28:22 -07:00
decorator.py upload: Use tusd for resumable, larger uploads. 2024-09-19 11:37:29 -07:00
filters.py ruff: Fix UP007 Use `X | Y` for type annotations. 2024-07-13 22:28:22 -07:00
forms.py register: Ask which review site for how found zulip. 2024-09-27 13:23:08 -07:00
logging_handlers.py ruff: Fix UP007 Use `X | Y` for type annotations. 2024-07-13 22:28:22 -07:00
middleware.py ruff: Fix UP035 Import from `collections.abc`, `typing` instead. 2024-07-13 22:28:22 -07:00
signals.py ruff: Bump target-version from py38 to py310. 2024-07-13 22:28:22 -07:00