zulip

Commit Graph

Author	SHA1	Message	Date
Anders Kaseorg	b0ce4f1bce	docs: Fix many spelling mistakes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-07 18:51:06 -08:00
Alex Vandiver	faeffa2466	queue_processors: Set a bounded prefetch size on rabbitmq queues. RabbitMQ clients have a setting called prefetch[1], which controls how many un-acknowledged events the server forwards to the local queue in the client. The default is 0; this means that when clients first connect, the server must send them every message in the queue. This itself may cause unbounded memory usage in the client, but also has other detrimental effects. While the client is attempting to process the head of the queue, it may be unable to read from the TCP socket at the rate that the server is sending to it -- filling the TCP buffers, and causing the server's writes to block. If the server blocks for more than 30 seconds, it times out the send, and closes the connection with: ``` closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672): {writer,send_failed,{error,timeout}} ``` This is https://github.com/pika/pika/issues/753#issuecomment-318119222. Set a prefetch limit of 100 messages, or the batch size, to better handle queues which start with large numbers of outstanding events. Setting prefetch=1 causes significant performance degradation in the no-op queue worker, to 30% of the prefetch=0 performance. Setting prefetch=100 achieves 90% of the prefetch=0 performance, and higher values offer only minor gains above that. For batch workers, their performance is not notably degraded by prefetch equal to their batch size, and they cannot function on smaller prefetches than their batch size. We also set a 100-count prefetch on Tornado workers, as they are potentially susceptible to the same effect. [1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch	2021-11-16 11:48:50 -08:00
Alex Vandiver	7c3507feef	queue: Allow passing down a prefetch count to pika.	2021-11-16 11:48:50 -08:00
PIG208	aa9d73c9f6	typing: Improve typing with assertions. This fixes some mypy errors discovered with django-stubs.	2021-08-20 05:54:19 -07:00
Anders Kaseorg	04feadd917	mypy: Add pika-stubs. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-02 22:31:46 -07:00
Anders Kaseorg	9f8ba913fd	queue: Fix _on_connection_open_error type to accept reason: str. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-02 22:31:46 -07:00
Anders Kaseorg	f7e2426fc5	queue: Fix ensure_queue type to accept a callback returning any object. channel.basic_consume actually returns str. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-02 22:31:46 -07:00
Anders Kaseorg	5e355abe2e	queue: Add missing imports. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-02 22:31:46 -07:00
Anders Kaseorg	87799177b5	queue: Fix channel type for TornadoQueueClient. The BlockingChannel annotations in TornadoQueueClient were flat-out wrong. BlockingChannel and Channel have no common base classes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-02 22:31:46 -07:00
Anders Kaseorg	5751479932	queue: Switch TornadoQueueClient to the new base QueueClient. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-02 22:31:46 -07:00
Anders Kaseorg	bd6a2b149c	queue: Split common part of SimpleQueueClient into new base class. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-02 22:31:46 -07:00
Anders Kaseorg	6e4c3e41dc	python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	11741543da	python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	b7a94be152	python: Catch BaseException when we need to clean something up. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-10-11 16:16:16 -07:00
Alex Vandiver	c2132a4f9c	queue: Drop register_json_consumer / json_drain_queue interface. Now that all callsites use the same interface, drop the now-unused ones, and their tests.	2020-10-11 14:19:42 -07:00
Alex Vandiver	179c387409	tornado: Switch to start_json_consumer interface.	2020-10-11 14:19:42 -07:00
Alex Vandiver	f9358d5330	queue: Switch batch interface to use the channel.consume iterator. This low-level interface allows consuming from a queue with timeouts. This can be used to either consume in batches (with an upper timeout), or one-at-a-time. This is notably more performant than calling `.get()` repeatedly (what json_drain_queue does under the hood), which is "highly discouraged as it is very inefficient"[1]. Before this change: ``` $ ./manage.py queue_rate --count 10000 --batch Purging queue... Enqueue rate: 11158 / sec Dequeue rate: 3075 / sec ``` After: ``` $ ./manage.py queue_rate --count 10000 --batch Purging queue... Enqueue rate: 11511 / sec Dequeue rate: 19938 / sec ``` [1] https://www.rabbitmq.com/consumers.html#fetching	2020-10-11 14:19:40 -07:00
Alex Vandiver	2547bdbf4a	queue: Rename consume_wrapper to a better name.	2020-10-09 20:40:51 -07:00
Alex Vandiver	d5a6b0f99a	queue: Rename queue_size, and update for all local queues. Despite its name, the `queue_size` method does not return the number of items in the queue; it returns the number of items that the local consumer has delivered but unprocessed. These are often, but not always, the same. RabbitMQ's queues maintain the queue of unacknowledged messages; when a consumer connects, it sends to the consumer some number of messages to handle, known as the "prefetch." This is a performance optimization, to ensure the consumer code does not need to wait for a network round-trip before having new data to consume. The default prefetch is 0, which means that RabbitMQ immediately dumps all outstanding messages to the consumer, which slowly processes and acknowledges them. If a second consumer were to connect to the same queue, they would receive no messages to process, as the first consumer has already been allocated them. If the first consumer disconnects or crashes, all prior events sent to it are then made available for other consumers on the queue. The consumer does not know the total size of the queue -- merely how many messages it has been handed. No change is made to the prefetch here; however, future changes may wish to limit the prefetch, either for memory-saving, or to allow multiple consumers to work the same queue. Rename the method to make clear that it only contains information about the local queue in the consumer, not the full RabbitMQ queue. Also include the waiting message count, which is used by the `consume()` iterator for similar purpose to the pending events list.	2020-10-09 20:40:39 -07:00
Alex Vandiver	a1ce1aca3b	queue: Update comment to be more accurate about import errors.	2020-10-09 20:40:32 -07:00
Alex Vandiver	baf882a133	queue: Only ACK drain_queue once it has completed work on the list. Currently, drain_queue and json_drain_queue ack every message as it is pulled off of the queue, until the queue is empty. This means that if the consumer crashes between pulling a batch of messages off the queue, and actually processing them, those messages will be permanently lost. Sending an ACK on every message also results in a significant amount lot of traffic to rabbitmq, with notable performance implications. Send a singular ACK after the processing has completed, by making `drain_queue` into a contextmanager. Additionally, use the `multiple` flag to ACK all of the messages at once -- or explicitly NACK the messages if processing failed. Sending a NACK will re-queue them at the front of the queue. Performance of a no-op dequeue before this change: ``` $ ./manage.py queue_rate --count 50000 --batch Purging queue... Enqueue rate: 10847 / sec Dequeue rate: 2479 / sec ``` Performance of a no-op dequeue after this change (a 25% increase): ``` $ ./manage.py queue_rate --count 50000 --batch Purging queue... Enqueue rate: 10752 / sec Dequeue rate: 3079 / sec ```	2020-10-06 17:26:14 -07:00
Alex Vandiver	2b6989a40f	queue: Remove a no-longer-correct comment. This comment stopped being true in `5686821150`, and very much stopped being relevant in `dd40649e04` when the middleware entirely stopped publishing to a queue.	2020-08-14 11:30:13 -07:00
Anders Kaseorg	61d0417e75	python: Replace ujson with orjson. Fixes #6507. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-08-11 10:55:12 -07:00
Anders Kaseorg	23b815bb50	queue: Fix types to reflect that Pika channels receive bytes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-08-07 11:12:32 -07:00
Anders Kaseorg	489d73f63a	queue: Fix strict_optional errors. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-07-06 11:25:48 -07:00
Anders Kaseorg	1ed2d9b4a0	logging: Use logging.exception and exc_info for unexpected exceptions. logging.exception() and logging.debug(exc_info=True), etc. automatically include a traceback. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	4b6d2cf25f	logging: Pass more format arguments to logging. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	365fe0b3d5	python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-11 16:45:32 -07:00
Anders Kaseorg	67e7a3631d	python: Convert percent formatting to Python 3.6 f-strings. Generated by pyupgrade --py36-plus. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-10 15:02:09 -07:00
Anders Kaseorg	19cc22e5ab	queue: Fix types to reflect that Pika channels transmit bytes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-07 11:09:24 -07:00
Anders Kaseorg	bdc365d0fe	logging: Pass format arguments to logging. https://docs.python.org/3/howto/logging.html#optimization Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-05-02 10:18:02 -07:00
Anders Kaseorg	fead14951c	python: Convert assignment type annotations to Python 3.6 style. This commit was split by tabbott; this piece covers the vast majority of files in Zulip, but excludes scripts/, tools/, and puppet/ to help ensure we at least show the right error messages for Xenial systems. We can likely further refine the remaining pieces with some testing. Generated by com2ann, with whitespace fixes and various manual fixes for runtime issues: - invoiced_through: Optional[LicenseLedger] = models.ForeignKey( + invoiced_through: Optional["LicenseLedger"] = models.ForeignKey( -_apns_client: Optional[APNsClient] = None +_apns_client: Optional["APNsClient"] = None - notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) - signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) + notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) + signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) - author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE) + author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE) - bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL) + bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL) - default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) - default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) + default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) + default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) -descriptors_by_handler_id: Dict[int, ClientDescriptor] = {} +descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {} -worker_classes: Dict[str, Type[QueueProcessingWorker]] = {} -queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {} +worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {} +queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {} -AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None +AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-22 11:02:32 -07:00
Mateusz Mandera	5252b081bd	queue_processors: Gather statistics on queue worker operations.	2020-04-01 16:44:06 -07:00
Anders Kaseorg	a681ca6cf5	queue: Update error callback signatures for Pika 1.1. The expected signatures for these callbacks seem to have changed somewhere in https://github.com/pika/pika/pull/1002. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-11-20 17:23:48 -08:00
Andrew Szeto	b312001fd9	rabbitmq: Set a short TCP keepalive idle time on BlockingConnection. The code comment explains this issue in some detail, but essentially in Kubernetes and Docker Swarm systems, the container overlayer network has a relatively short TCP idle lifetime (about 15 minutes), which can lead to it killing the connection between Tornado and RabbitMQ. We fix this by setting a TCP keepalive on that connection shorter than 15 minutes. Fixes #10776.	2019-10-30 16:15:44 -07:00
Rafid Aslam	447f74ae63	Upgrade pika to 1.1.. Upgrade pika to 1.1. and make some changes accordingly to comply with the new version. Fixes #12899.	2019-10-29 17:01:12 -07:00
neiljp (Neil Pilgrim)	ba7a0934e3	requirements: Upgrade mypy to 0.711. This comes with it a big performance improvement; mypy is now only barely our slowest linter even if it wasn't previously running. Fixes: #12058	2019-07-22 17:12:50 -07:00
Wyatt Hoodes	5686821150	middleware: Change write_log_line to publish as a dict. We were seeing errors when pubishing typical events in the form of `Dict[str, Any]` as the expected type to be a `Union`. So we instead change the only non-dictionary call, to pass a dict instead of `str`.	2019-07-22 17:06:41 -07:00
Vishnu Ks	0d0007742f	requirements: Upgrade pika from 0.12.0 to 0.13.0. The important changes to pika for us are based on this PR of ours: https://github.com/pika/pika/pull/1129 Fixes #11394.	2019-01-31 10:04:07 -08:00
Tim Abbott	c94deff920	mypy: Remove some now-unnecessary type: ignores.	2018-03-28 10:39:05 -07:00
Greg Price	73559e5320	queue: Suppress error mail from brief rabbitmq downtimes. Details in comment. Together with a few previous commits, this should completely eliminate sending error mail to admins when the RabbitMQ server is simply restarted and comes back up normally.	2018-03-21 18:03:05 -07:00
Greg Price	3b3154527f	queue: Don't blow up when a connection closes quickly.	2018-03-20 16:49:05 -07:00
Greg Price	9dcc436766	queue: Fix __init__ logic so heartbeat choice works fully. Because the base class's __init__ calls `_connect`, when we set the value after that call has already returned, our new value only takes effect if the first connection fails and we have to reconnect. Make it take effect from the beginning.	2018-03-20 16:49:05 -07:00
Greg Price	5edc26a0df	queue: Cut disused, broken parameter to `_connect`. This parameter isn't used anywhere. A good thing, because if it were, the code would immediately raise an exception -- `self._on_open_cbs` hasn't been initialized yet when we first call `_connect`, from the base class's `__init__`. So, just cut it. If we later need something like this, it's easy to add a working version then.	2018-03-20 16:49:05 -07:00
Greg Price	4926228071	rabbitmq: Do a better job of retrying failed connections. Empirically, the retry in `_on_connection_closed` didn't actually work -- if a reconnect failed, that was it, and the exception handler didn't get run. A traceback would get logged, but all its frames were in Tornado or Pika, not our own code; presumably something magic and async was happening to the exception. Moreover, though we would make one attempt to reconnect if we had a connection that got closed, we didn't have any form of retry if the original attempt at connecting failed in the first place. Happily, upstream offers a perfectly reasonable bit of API that avoids both of these problems: the on-open-error callback. So use that.	2017-11-29 16:56:29 -08:00
Greg Price	7ac2b58584	rabbitmq: Reorder a bit to group our reconnect logic together.	2017-11-29 16:56:29 -08:00
Greg Price	c32b16715d	tornado: Use spiffy new `call_later` rather than `add_timeout`. This method was new in Tornado 4.0. It saves us from having to get the time ourselves and do the arithmetic -- which not only makes the code a bit shorter, but also easier to get right. Tornado docs (see http://www.tornadoweb.org/en/stable/ioloop.html) say we should have been getting the time from `ioloop.time()` rather than hardcoding `time.time()`, because the loop could e.g. be running on the `time.monotonic()` clock.	2017-11-29 16:56:29 -08:00
Greg Price	73886f57d2	zerver/lib/queue: Clean up import order.	2017-11-29 16:56:29 -08:00
Greg Price	3c4e4c14c9	rabbitmq: Add on-close callback atomically in creating the connection. Adding it afterward is inherently racy, and upstream's API is quite reasonable for avoiding that -- just like we can pass an on-open callback up front, we can do the same with the on-close callback. This is a more thorough version of `4adf2d5c2` from back in 2013-04.	2017-11-29 16:56:29 -08:00
Greg Price	e88c2a7ee4	rabbitmq: Cut redundant `stop_ioloop_on_close` parameter. The default value of this parameter is already False upstream. (It was already False in pika version 0.9.6, which we were supposedly using when we introduced this in 4baeaaa52; not sure what the story was there.)	2017-11-29 16:56:29 -08:00

1 2

87 Commits