Commit Graph

12629 Commits

Author SHA1 Message Date
Steve Howell 1bcb8d8ee8 performance: Avoid computing page_params.streams in webapp.
The query to get "occupied" streams has been expensive
in the past.  I'm not sure how much any recent attempts
to optimize that query have mitigated the issue, but
since we clearly aren't sending this data, there is no
reason to compute it.
2020-10-14 10:53:10 -07:00
Steve Howell 79803f01f4 minor: Format some code in events.py. 2020-10-14 10:53:10 -07:00
Steve Howell 193ca397f9 tests: Include deactivated users for subscribe test. 2020-10-14 10:53:10 -07:00
Aman Agrawal fbf7cb82a7 web_public_guest: Rename to web_public_visitor for clarity.
Using web_public_guest for anonymous users is confusing since
'guest' is actually a logged-in user compared to
web_public_guest which is not logged-in and has only
read access to messages. So, we rename it to
web_public_visitor.
2020-10-13 16:59:52 -07:00
Steve Howell e7a8c7ac48 test: Improve tests for bulk-adding subscribers.
This is a more thorough test of adding multiple
streams for multiple users, including streams
that users have already subscribed to.

The extra queries here are due to the fact
that we call `principal_to_user_profile` in
a loop in the view.  So that's an example
of O(N) overhead.  We may be able to bulk-fetch
these users eventually.
2020-10-13 18:54:55 -04:00
Steve Howell c29ba75135 refactor: Extract send_messages_for_new_subscribers.
This is a pure extraction, except that I remove a
redundant check that `len(principals) > 0`.  Whenever
that value is false, then `new_subscriptions` will
only have one possible entry, which is the current
user, and we skip that in the loop.
2020-10-13 18:54:55 -04:00
Steve Howell 3b338ec32e performance: Optimize filter_stream_authorization.
We no longer do O(N) queries to get existing streams.

This is a somewhat contrived use case--generally, we
are not trying to re-subscribe a user to several
streams.  Still, we want to avoid this.

This commit also makes `test_bulk_subscribe_many`
do more work, and the change to the test helped
me discover this bug.
2020-10-13 18:54:55 -04:00
Anders Kaseorg 6564540d15 docs: Fix some spelling errors.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-13 15:47:13 -07:00
Anders Kaseorg dd48dbd912 docs: Add spaces to “check out”, “log in”, “set up”, “sign up” as verbs.
“Checkout”, “login”, “setup”, and “signup” are nouns, not verbs.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-13 15:47:13 -07:00
Steve Howell 598601e8fc stream events: Prevent spurious events.
If a user asks to be subscribed to a stream
that they are already subscribed to, then
that stream won't be in new_stream_user_ids,
and we won't need to send an event for it.

This change makes that happen more automatically.
2020-10-13 11:28:17 -07:00
Steve Howell 18771099e4 performance: Introduce new_stream_user_ids.
Let
    U = number of users to subscribe
    S = number of streams to subscribe

We were technically doing N^3 amount of work
when we sent certain events, or to be more
precise, U * S * S amount of work.  For each
stream, we were looping through a list of tuples
of size U * S to find the users for the stream.

In practice either U or S is usually 1, so the
performance gains here are probably negligible,
especially since the constant factors here
were just slinging around Python data.

But the code is actually more readable now, so
it's a double win.
2020-10-13 11:28:17 -07:00
Steve Howell ebb605319b refactor: Rename stream_map to recipient_id_to_stream.
I want to make a new dict called stream_id_to_stream,
and stream_map would be confusing.
2020-10-13 11:28:17 -07:00
Steve Howell b502957184 refactor: Extract new_recipient_ids local.
We rename needs_new_sub (which sounds like
a boolean!) to new_recipient_ids, and we
calculate it explicitly within the loop, so
that we don't need to worry as much about
subsequent passes through the loop mutating it.

This allows us to also remove recipient_ids,
which in turn lets us remove recipients_map,
albeit with a small tweak for stream_map.

I also introduce the my_subs local, which
I use to more directly populate used_colors,
as well as using it as the loop var.
2020-10-13 11:28:17 -07:00
Steve Howell 766892d8aa import: Reuse get_last_message_id() helper. 2020-10-13 11:28:17 -07:00
Steve Howell 188cc9bb3b minor: Fix user/stream in test_subscriptions. 2020-10-13 11:28:17 -07:00
Steve Howell 9df9934ed6 refactor: Pass realm to bulk_add_subscriptions.
I think it's important that the callers understand
that bulk_add_subscriptions assumes all streams
are being created within a single realm, so I make
it an explicit parameter.

This may be overkill--I would also be happy if we
just included the assertions from this commit.
2020-10-13 11:28:17 -07:00
Steve Howell efc931a671 minor: Extract realm local. 2020-10-13 11:28:17 -07:00
Steve Howell b2d0a2efb9 refactor: Extract send_subscription_add_events.
This function now does all the work that we used
to do with notify_subscriptions_added happening
inside a loop.

There's a small fine-tuning here, where we only
get recent traffic on streams that we're actually
sending events for.
2020-10-13 11:28:17 -07:00
Steve Howell 223ce83a0a refactor: Clean up call to notify_subscriptions_added.
We now just pass in all_subscribers_by_stream, rather
than a callback.

We also move sub_tuples_by_user closer to the
loop where we call notify_subscriptions_added.
2020-10-13 11:28:17 -07:00
Steve Howell 811426b345 Extract send_stream_creation_events_for_private_streams.
We can probably avoid passing in users here.
2020-10-12 16:40:37 -07:00
Steve Howell 1cfaef0d1a refactor: Simplify pick_color logic.
This removes the need to jankily mutate
the active flag in the caller, and we don't
need to mutate our subs_by_user either.
2020-10-12 16:40:37 -07:00
Steve Howell 13569ff97a refactor: Eliminate new_subs.
We now just process new subs for a user immediately
within the loop.
2020-10-12 16:40:37 -07:00
Steve Howell 8c70fbde78 refactor: Use subs_to_add in return value.
The subs_to_add is directly related to a var
called new_subs, which I hope to eliminate
soon.
2020-10-12 16:40:37 -07:00
Steve Howell 1afca3d430 minor: Extract local for stream. 2020-10-12 16:40:37 -07:00
Steve Howell 84aa1389d8 Extract bulk_add_subs_to_db_with_logging.
This is a trivial code extraction.
2020-10-12 16:40:37 -07:00
Steve Howell 3ff9ce78ea refactor: Extract send_peer_add_events. 2020-10-12 16:40:37 -07:00
Alex Vandiver f3ba227614 create_user: Strip whitespace from initial password file.
Fixes #12144.
2020-10-11 16:29:00 -07:00
Cody Piersall 5dab6e9d31 emoji-upload: Fix transparency issues on GIF emoji upload.
This preserves the alpha layer on GIF images that need to be resized
before being uploaded.  Two important changes occur here:

1. The new frame is a *copy* of the original image, which preserves the
   GIF info.
2. The disposal method of the original GIF is preserved.  This
   essentially determines what state each frame of the GIF starts from
   when it is drawn; see PIL's docs:
   https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#saving
   for more info.

This resolves some but not all of the test cases in #16370.
2020-10-11 16:23:07 -07:00
Anders Kaseorg b7a94be152 python: Catch BaseException when we need to clean something up.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:16:16 -07:00
Anders Kaseorg 7f69c1d3d5 python: Catch specific exceptions from requests.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:11:41 -07:00
Anders Kaseorg 17ac17286c python: Catch specific exceptions from subprocess.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:11:41 -07:00
Anders Kaseorg aabef3d9be python: Catch specific exceptions from orjson.
Followup to #16120.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:11:41 -07:00
Anders Kaseorg 234f7245cf export_usermessage_batch: Use os.rename.
This avoids an extra stat call to check whether the target is a
directory.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:11:35 -07:00
Anders Kaseorg 83eca256a4 compilemessages: Use polib for get_name_from_po_file.
This also corrects the name of zh_TW from “Chinese” to
“Chinese (Taiwan)”.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:11:35 -07:00
Anders Kaseorg 1346c5397a zephyr: Use correct shell quoting for ssh.
ssh always runs its command through a shell (after naïvely joining
multiple arguments with spaces), so it needs an extra level of shell
quoting.  This should have no effect because we already validated user
with a regex, but it’s better for escaping to be locally correct in
case the context changes.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:11:35 -07:00
Anders Kaseorg 82593338ba report: Show Git commit in a way that works for merges.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:11:35 -07:00
Anders Kaseorg c9fec8f021 deliver_scheduled_messages: Don’t do_send_messages inside a transaction.
do_send_messages has side effects outside the database and may not
work reliably if its database effects are reordered by being inside a
transaction.

This also fixes a bug where we were doing the update incorrectly on
the Message table.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:09:22 -07:00
Alex Vandiver c2132a4f9c queue: Drop register_json_consumer / json_drain_queue interface.
Now that all callsites use the same interface, drop the now-unused
ones, and their tests.
2020-10-11 14:19:42 -07:00
Alex Vandiver 5477b9d9a1 queue: Switch tests to start_json_consumer interface. 2020-10-11 14:19:42 -07:00
Alex Vandiver 179c387409 tornado: Switch to start_json_consumer interface. 2020-10-11 14:19:42 -07:00
Alex Vandiver f0b23b0752 queue: Switch non-batch consumer to also use start_json_consumer.
This has no effect on consumption rate, but unifies the codepaths.
Before:
```
$ ./manage.py queue_rate --count 50000
Purging queue...
Enqueue rate: 11187 / sec
Dequeue rate: 4158 / sec
```

After:
```
$ ./manage.py queue_rate --count 50000
Purging queue...
Enqueue rate: 11010 / sec
Dequeue rate: 4113 / sec
```
2020-10-11 14:19:42 -07:00
Alex Vandiver 45c9c3cc30 queue: Monitor user_activity queue, now that it has a consumer.
Since this was using repead individual get() calls previously, it
could not be monitored for having a consumer.  Add it in, by marking
it of queue type "consumer" (the default), and adding Nagios lines for
it.

Also adjust missedmessage_emails to be monitored; it stopped using
LoopQueueProcessingWorker in 5cec566cb9, but was never added back
into the set of monitored consumers.
2020-10-11 14:19:42 -07:00
Alex Vandiver f9358d5330 queue: Switch batch interface to use the channel.consume iterator.
This low-level interface allows consuming from a queue with timeouts.
This can be used to either consume in batches (with an upper timeout),
or one-at-a-time.  This is notably more performant than calling
`.get()` repeatedly (what json_drain_queue does under the hood), which
is "*highly discouraged* as it is *very inefficient*"[1].

Before this change:
```
$ ./manage.py queue_rate --count 10000 --batch
Purging queue...
Enqueue rate: 11158 / sec
Dequeue rate: 3075 / sec
```

After:
```
$ ./manage.py queue_rate --count 10000 --batch
Purging queue...
Enqueue rate: 11511 / sec
Dequeue rate: 19938 / sec
```

[1] https://www.rabbitmq.com/consumers.html#fetching
2020-10-11 14:19:40 -07:00
Alex Vandiver 571f8b8664 queue: Use low-level queue_purge to empty at the end of tests.
This is O(1) at the RabbitMQ API level, and doesn't rely on the code
under test to function correctly during test cleanup.
2020-10-09 20:43:49 -07:00
Alex Vandiver ac0ba21c2c tests: Stop reusing a variable name.
`loopworker_sleep_mock` is a file-level variable used to mock out the
sleep() call in LoopQueueProcessingWorker; don't reuse the variable
name for something else.
2020-10-09 20:42:20 -07:00
Alex Vandiver 754638f673 tests: Refactor test_queue_worker to separate queues. 2020-10-09 20:42:12 -07:00
Alex Vandiver 2547bdbf4a queue: Rename consume_wrapper to a better name. 2020-10-09 20:40:51 -07:00
Alex Vandiver d5a6b0f99a queue: Rename queue_size, and update for all local queues.
Despite its name, the `queue_size` method does not return the number
of items in the queue; it returns the number of items that the local
consumer has delivered but unprocessed.  These are often, but not
always, the same.

RabbitMQ's queues maintain the queue of unacknowledged messages; when
a consumer connects, it sends to the consumer some number of messages
to handle, known as the "prefetch."  This is a performance
optimization, to ensure the consumer code does not need to wait for a
network round-trip before having new data to consume.

The default prefetch is 0, which means that RabbitMQ immediately dumps
all outstanding messages to the consumer, which slowly processes and
acknowledges them.  If a second consumer were to connect to the same
queue, they would receive no messages to process, as the first
consumer has already been allocated them.  If the first consumer
disconnects or crashes, all prior events sent to it are then made
available for other consumers on the queue.

The consumer does not know the total size of the queue -- merely how
many messages it has been handed.

No change is made to the prefetch here; however, future changes may
wish to limit the prefetch, either for memory-saving, or to allow
multiple consumers to work the same queue.

Rename the method to make clear that it only contains information
about the local queue in the consumer, not the full RabbitMQ queue.
Also include the waiting message count, which is used by the
`consume()` iterator for similar purpose to the pending events list.
2020-10-09 20:40:39 -07:00
Alex Vandiver a1ce1aca3b queue: Update comment to be more accurate about import errors. 2020-10-09 20:40:32 -07:00
Alex Vandiver 2d71ca1fb8 email: Remove unused `log_digest_event` function.
Its last callsite was removed in e46cbaffa2.

Also ref #6786.
2020-10-08 20:35:53 -07:00