For exporting full with consent:
* Earlier, a message advertising users to react with thumbs up
was sent and later used to determine the users who consented.
* Now, we no longer need to send such a message. This commit
updates the logic to use `allow_private_data_export` user-setting
to determine users who consented.
Fixes part of #31201.
Messages are rendered outside of a transaction, for performance
reasons, and then sent inside of one. This opens thumbnailing up to a
race where the thumbnails have not yet been written when the message
is rendered, but the message has not been sent when thumbnailing
completes, causing `rewrite_thumbnailed_images` to be a no-op and the
message being left with a spinner which never resolves.
Explicitly lock and use he ImageAttachment data inside the
message-sending transaction, to rewrite the message content with the
latest information about the existing thumbnails.
Despite the thumbnailing worker taking a lock on Message rows to
update them, this does not lead to deadlocks -- the INSERT of the
Message rows happens in a transaction, ensuring that either the
message rending blocks the thumbnailing until the Message row is
created, or that the `rewrite_thumbnailed_images` and Message INSERT
waits until thumbnailing is complete (and updated no Message rows).
This allows clients to potentially lay out the thumbnails more
intelligently, or to provide a better "progressive-load" experience
when enlarging the thumbnail.
A new table is created to track which path_id attachments are images,
and for those their metadata, and which thumbnails have been created.
Using path_id as the effective primary key lets us ignore if the
attachment is archived or not, saving some foreign key messes.
A new worker is added to observe events when rows are added to this
table, and to generate and store thumbnails for those images in
differing sizes and formats.
The "invites" worker exists to do two things -- make a Confirmation
object, and send the outgoing email. Making the Confirmation object
in a background process from where the PreregistrationUser is created
temporarily leaves the PreregistrationUser in invalid state, and
results in 500's, and the user not immediately seeing the sent
invitation. That the "invites" worker also wants to create the
Confirmation object means that "resending" an invite invalidates the
URL in the previous email, which can be confusing to the user.
Moving the Confirmation creation to the same transaction solves both
of these issues, and leaves the "invites" worker with nothing to do
but send the email; as such, we remove it entirely, and use the
existing "email_senders" worker to send the invites. The volume of
invites is small enough that this will not affect other uses of that
worker.
Fixes: #21306Fixes: #24275
The returns plugin hasn’t been updated for mypy ≥ 1.6. This
annotation is more limited in that it only supports a fixed number of
positional arguments and no keyword arguments, but is good enough for
our purposes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
PostgreSQL's estimate of the number of usermessage rows for a single
message can be wildly off, due to poor statistics generation. This
causes this query, with 100-message batch sizes, to incorrectly
estimate millions of matched rows, causing it to perform a full-table
index scan, rather than piecemeal using the `message_id` index.
Reduce the batch size to 50, which is enough to tip in favor of a
rational query plan.
The presence of `len(messages)` outside the transaction caused the
full resultset to be fetched outside of the transaction. This should
ideally be inside the transaction, and also only need be the count.
However, also note that the process of counting matching rows, and
then executing a second query which embeds the same query, is
susceptible to phantom reads, where a query with the same conditions
returns different resultsets, under PostgreSQL's default transaction
isolation of "read committed." While this is possible to resolve by
pulling the returned IDs into a Python list, it would not address the
issue that concurrent updates which change the resultset would make
the overall algorithm still incorrect.
Add a comment clarifying the conditions under which the algorithm is
correct. A more correct algorithm would walk the UserMessage rows
which are unread and in the stream, but this requires a
whole-UserMessage index which would be quite large for such an
infrequent use case.
This leads to significant speedups. In a test, with 100 random unique
event classes, the old code processed a batch of 100 rows (on average
66-ish unique in the batch) in 0.45 seconds. Doing this in a single
query processes the same batch in 0.0076 seconds.
We previously created the connection to the outgoing email server when
the EmailSendingWorker was first created. Since creating the
connection can fail (e.g. because of firewalls or typos in the
hostname), this can cause the `QueueProcessingWorker` creation to
raise an exception. In multi-threaded mode, exceptions in the worker
threads which are _not_ during the handling of a specific event
percolate out to `log_and_exit_if_exception` and trigger the
termination of the entire process -- stopping all worker threads from
making forward progress.
Contain the blast radius of misconfigured email servers by deferring
the opening of the connection until it is first needed. This will not
cause any overall performance change, since it only affects the
latency of the very first email after startup.
Given that most of the use cases for realms-only code path would
really like to upload audit logs too, and the others would likely
produce a better user experience if they upoaded audit logs, we
should just have a single main code path here i.e.
'send_analytics_to_push_bouncer'.
We still only upload usage statistics according to documented
option, and only from the analytics cron job.
The error handling takes place in 'send_analytics_to_push_bouncer'
itself.
Change the url in the notification message to point to the settings
interface rather than linking to the export directly.
This is a much better user experience in the case that the export has
been deleted since the time the export was requested.
Fixes: #26923.
The type annotation for functools.partial uses unchecked Any for all
the function parameters (both early and late). returns.curry.partial
uses a mypy plugin to check the parameters safely.
https://returns.readthedocs.io/en/latest/pages/curry.html
Signed-off-by: Anders Kaseorg <anders@zulip.com>