The query to get "occupied" streams has been expensive
in the past. I'm not sure how much any recent attempts
to optimize that query have mitigated the issue, but
since we clearly aren't sending this data, there is no
reason to compute it.
Using web_public_guest for anonymous users is confusing since
'guest' is actually a logged-in user compared to
web_public_guest which is not logged-in and has only
read access to messages. So, we rename it to
web_public_visitor.
This is a more thorough test of adding multiple
streams for multiple users, including streams
that users have already subscribed to.
The extra queries here are due to the fact
that we call `principal_to_user_profile` in
a loop in the view. So that's an example
of O(N) overhead. We may be able to bulk-fetch
these users eventually.
This is a pure extraction, except that I remove a
redundant check that `len(principals) > 0`. Whenever
that value is false, then `new_subscriptions` will
only have one possible entry, which is the current
user, and we skip that in the loop.
We no longer do O(N) queries to get existing streams.
This is a somewhat contrived use case--generally, we
are not trying to re-subscribe a user to several
streams. Still, we want to avoid this.
This commit also makes `test_bulk_subscribe_many`
do more work, and the change to the test helped
me discover this bug.
If a user asks to be subscribed to a stream
that they are already subscribed to, then
that stream won't be in new_stream_user_ids,
and we won't need to send an event for it.
This change makes that happen more automatically.
Let
U = number of users to subscribe
S = number of streams to subscribe
We were technically doing N^3 amount of work
when we sent certain events, or to be more
precise, U * S * S amount of work. For each
stream, we were looping through a list of tuples
of size U * S to find the users for the stream.
In practice either U or S is usually 1, so the
performance gains here are probably negligible,
especially since the constant factors here
were just slinging around Python data.
But the code is actually more readable now, so
it's a double win.
We rename needs_new_sub (which sounds like
a boolean!) to new_recipient_ids, and we
calculate it explicitly within the loop, so
that we don't need to worry as much about
subsequent passes through the loop mutating it.
This allows us to also remove recipient_ids,
which in turn lets us remove recipients_map,
albeit with a small tweak for stream_map.
I also introduce the my_subs local, which
I use to more directly populate used_colors,
as well as using it as the loop var.
I think it's important that the callers understand
that bulk_add_subscriptions assumes all streams
are being created within a single realm, so I make
it an explicit parameter.
This may be overkill--I would also be happy if we
just included the assertions from this commit.
This function now does all the work that we used
to do with notify_subscriptions_added happening
inside a loop.
There's a small fine-tuning here, where we only
get recent traffic on streams that we're actually
sending events for.
We now just pass in all_subscribers_by_stream, rather
than a callback.
We also move sub_tuples_by_user closer to the
loop where we call notify_subscriptions_added.
This preserves the alpha layer on GIF images that need to be resized
before being uploaded. Two important changes occur here:
1. The new frame is a *copy* of the original image, which preserves the
GIF info.
2. The disposal method of the original GIF is preserved. This
essentially determines what state each frame of the GIF starts from
when it is drawn; see PIL's docs:
https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#saving
for more info.
This resolves some but not all of the test cases in #16370.
ssh always runs its command through a shell (after naïvely joining
multiple arguments with spaces), so it needs an extra level of shell
quoting. This should have no effect because we already validated user
with a regex, but it’s better for escaping to be locally correct in
case the context changes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
do_send_messages has side effects outside the database and may not
work reliably if its database effects are reordered by being inside a
transaction.
This also fixes a bug where we were doing the update incorrectly on
the Message table.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Since this was using repead individual get() calls previously, it
could not be monitored for having a consumer. Add it in, by marking
it of queue type "consumer" (the default), and adding Nagios lines for
it.
Also adjust missedmessage_emails to be monitored; it stopped using
LoopQueueProcessingWorker in 5cec566cb9, but was never added back
into the set of monitored consumers.
This low-level interface allows consuming from a queue with timeouts.
This can be used to either consume in batches (with an upper timeout),
or one-at-a-time. This is notably more performant than calling
`.get()` repeatedly (what json_drain_queue does under the hood), which
is "*highly discouraged* as it is *very inefficient*"[1].
Before this change:
```
$ ./manage.py queue_rate --count 10000 --batch
Purging queue...
Enqueue rate: 11158 / sec
Dequeue rate: 3075 / sec
```
After:
```
$ ./manage.py queue_rate --count 10000 --batch
Purging queue...
Enqueue rate: 11511 / sec
Dequeue rate: 19938 / sec
```
[1] https://www.rabbitmq.com/consumers.html#fetching
`loopworker_sleep_mock` is a file-level variable used to mock out the
sleep() call in LoopQueueProcessingWorker; don't reuse the variable
name for something else.
Despite its name, the `queue_size` method does not return the number
of items in the queue; it returns the number of items that the local
consumer has delivered but unprocessed. These are often, but not
always, the same.
RabbitMQ's queues maintain the queue of unacknowledged messages; when
a consumer connects, it sends to the consumer some number of messages
to handle, known as the "prefetch." This is a performance
optimization, to ensure the consumer code does not need to wait for a
network round-trip before having new data to consume.
The default prefetch is 0, which means that RabbitMQ immediately dumps
all outstanding messages to the consumer, which slowly processes and
acknowledges them. If a second consumer were to connect to the same
queue, they would receive no messages to process, as the first
consumer has already been allocated them. If the first consumer
disconnects or crashes, all prior events sent to it are then made
available for other consumers on the queue.
The consumer does not know the total size of the queue -- merely how
many messages it has been handed.
No change is made to the prefetch here; however, future changes may
wish to limit the prefetch, either for memory-saving, or to allow
multiple consumers to work the same queue.
Rename the method to make clear that it only contains information
about the local queue in the consumer, not the full RabbitMQ queue.
Also include the waiting message count, which is used by the
`consume()` iterator for similar purpose to the pending events list.