zulip

Commit Graph

Author	SHA1	Message	Date
Mateusz Mandera	30ac291eba	emoji: Add migration to reupload all RealmEmoji and ensure .author. Fixes #19732.	2022-02-10 17:45:31 -08:00
Lauryn Menard	c532829c35	backend: Change `do_report_error` return value. As a preparatory step to refactoring json_success to accept request as a parameter, change `do_report_error`, which is called from the events queue for "error_reports", to return None instead of json_success. Adds an assertion error to `ErrorReporter` queue processor and removes `JsonableError` from `do_report_error`. It is likely that `do_error_report` was moved from a view in a previous refactor, but was not updated to no longer return an HttpReponse.	2022-02-04 15:16:55 -08:00
Alex Vandiver	3efed5f1e6	queue_processors: Shut down background missedmessage_emails thread. Python's behaviour on `sys.exit` is to wait for all non-daemon threads to exit. In the context of the missedmessage_emails worker, if any work is pending, a non-daemon Timer thread exists, which is waiting for 5 seconds. As soon as that thread is serviced, it sets up another 5-second Timer, a process which repeats until all ScheduledMessageNotificationEmail records have been handled. This likely takes two minutes, but may theoretically take up to a week until the thread exits, and thus sys.exit can complete. Supervisor only gives the process 30 seconds to shut down, so something else must prevent this endless Timer. When `stop` is called, take the lock so we can mutate the timer. However, since `stop` may have been called from a signal handler, our thread may _already_ have the lock. As Python provides no way to know if our thread is the one which has the lock, make the lock a re-entrant one, allowing us to always try to take it. With the lock in hand, cancel any outstanding timers. A race exists where the timer may not be able to be canceled because it has finished, maybe_send_batched_emails has been called, and is itself blocked on the lock. Handle this case by timing out the thread join in `stop()`, and signal the running thread to exit by unsetting the timer event, which will be detected once it claims the lock.	2021-11-23 10:45:49 -08:00
Mateusz Mandera	8af7ffd9da	rate_limit: Fix logging string when rate limiting email gateway. realm.name is not the right "name" to log, we should use realm.subdomain like everywhere else.	2021-11-22 10:28:56 -08:00
Alex Vandiver	faeffa2466	queue_processors: Set a bounded prefetch size on rabbitmq queues. RabbitMQ clients have a setting called prefetch[1], which controls how many un-acknowledged events the server forwards to the local queue in the client. The default is 0; this means that when clients first connect, the server must send them every message in the queue. This itself may cause unbounded memory usage in the client, but also has other detrimental effects. While the client is attempting to process the head of the queue, it may be unable to read from the TCP socket at the rate that the server is sending to it -- filling the TCP buffers, and causing the server's writes to block. If the server blocks for more than 30 seconds, it times out the send, and closes the connection with: ``` closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672): {writer,send_failed,{error,timeout}} ``` This is https://github.com/pika/pika/issues/753#issuecomment-318119222. Set a prefetch limit of 100 messages, or the batch size, to better handle queues which start with large numbers of outstanding events. Setting prefetch=1 causes significant performance degradation in the no-op queue worker, to 30% of the prefetch=0 performance. Setting prefetch=100 achieves 90% of the prefetch=0 performance, and higher values offer only minor gains above that. For batch workers, their performance is not notably degraded by prefetch equal to their batch size, and they cannot function on smaller prefetches than their batch size. We also set a 100-count prefetch on Tornado workers, as they are potentially susceptible to the same effect. [1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch	2021-11-16 11:48:50 -08:00
Alex Vandiver	64268f47e8	queue_processors: Drop unused current_queue_size, which was local size. The `current_queue_size` key in the queue monitoring stats file was the local queue size, not the global queue size -- `d5a6b0f99a` renamed the function, but did not adjust the queue monitoring JSON, despite the last use of it having been removed in `cd9b194d88`. The function is still used to mark "we emptied our queue," and it remains a reasonable metric for that.	2021-11-16 11:48:50 -08:00
Alex Vandiver	800e38016a	queue_rate: Output to CSV, and run multiple prefetch values.	2021-11-16 11:48:50 -08:00
Shlok Patel	893c9bc896	export: Remove `--delete-after-upload` flag in realm export. For export realm following changes have been made: - `./manage.py export --upload` would delete `.tar.gz` and unpacked dir - `./manage.py export` would only delete `unpacked dir` Besides, we have removed `--delete-after-upload` as we have set it as the default. Fixes #20081	2021-11-03 11:14:02 -07:00
Alex Vandiver	75f1070881	queue_processors: Disable timeouts with PushNotificationsWorker. Since `3853285241`, PushNotificationsWorker uses the aioapns library to send Apple push notifications. This introduces an asyncio event loop into this worker process, which, if unlucky, can respond poorly when a SIGALRM is introduced to it: ``` [asyncio] Task exception was never retrieved future: <Task finished coro=<send_apple_push_notification.<locals>.attempt_send() done, defined at /path/to/zerver/lib/push_notifications.py:166> exception=WorkerTimeoutException(30, 1)> Traceback (most recent call last): File "/path/to/zerver/lib/push_notifications.py", line 169, in attempt_send result = await apns_context.apns.send_notification(request) File "/path/to/zulip-py3-venv/lib/python3.6/site-packages/aioapns/client.py", line 57, in send_notification response = await self.pool.send_notification(request) File "/path/to/zulip-py3-venv/lib/python3.6/site-packages/aioapns/connection.py", line 407, in send_notification response = await connection.send_notification(request) File "/path/to/zulip-py3-venv/lib/python3.6/site-packages/aioapns/connection.py", line 189, in send_notification data = json.dumps(request.message, ensure_ascii=False).encode() File "/usr/lib/python3.6/json/__init__.py", line 238, in dumps **kw).encode(obj) File "/usr/lib/python3.6/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/path/to/zerver/worker/queue_processors.py", line 353, in timer_expired raise WorkerTimeoutException(limit, len(events)) zerver.worker.queue_processors.WorkerTimeoutException: Timed out after 30 seconds processing 1 events ``` ...which subsequently leads to the worker failing to make any progress on the queue. Remove the timeout on the worker. This may result in failing to make forward progress if Apple/Google take overly long handling requests, but is likely preferable to failing to make forward progress if _one_ request takes too long and gets unlucky with when the signal comes through.	2021-10-21 08:59:56 -07:00
Alex Vandiver	ab985c0066	queue_processors: Add a comment clarifying that timeouts only happen when single-threaded.	2021-10-21 08:59:56 -07:00
shanukun	8c1ea78d7d	invite: Extend invite api for handling expiration duration. This extends the invite api endpoints to handle an extra argument, expiration duration, which states the number of days before the invitation link expires. For prereg users, expiration info is attached to event object to pass it to invite queue processor in order to create and send confirmation link. In case of multiuse invites, confirmation links are created directly inside do_create_multiuse_invite_link(), For filtering valid user invites, expiration info stored in Confirmation object is used, which is accessed by a prereg user using reverse generic relations. Fixes #16359.	2021-09-10 16:53:03 -07:00
Anders Kaseorg	646c04eff2	Rename default branch to ‘main’. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-09-06 12:56:35 -07:00
Alex Vandiver	5b45f8a128	queue_processors: Include queue name in the timeout exception. This information can be gleaned from the stacktrace, but making it explicit in the stringification makes it much easier to differentiate types of errors at a glance, particularly in Sentry.	2021-09-02 02:48:34 -07:00
Alex Vandiver	4d98b0552e	missedmessage_emails: Ensure forward progress. maybe_send_batched_emails handles batches of emails from different users at once; as it processes each user's batch, it enqueues messages onto the `email_senders` queue. If `handle_missedmessage_emails` raises an exception when processing a single user's email, no events are marked as handled -- including those that were already handled and enqueued onto `email_senders`. This results in an increasing number of users being sent repeated emails about the same missed messages. Catch and log any exceptions when handling an individual user's events. This guarantees forward progress, and that notifications are sent at-most-once, not at-least-once.	2021-08-20 07:21:39 -07:00
Mateusz Mandera	a01594e72b	bots: Pass realm to get_system_bot call in DeferredWorker.	2021-07-26 15:33:13 -07:00
Abhijeet Prasad Bodas	dd5e12d112	MissedMessageWorker: Use custom batching periods from UserProfile.	2021-07-23 12:13:46 -07:00
Abhijeet Prasad Bodas	9fcb6e51ce	MissedMessageWorker: Handle deleted messages. The test for the try-except block is hacky. See the comment for explaination.	2021-07-23 12:13:46 -07:00
Abhijeet Prasad Bodas	de78b015d9	MissedMessageWorker: Remove unnecessary transaction.atomic. We only have one query which will change database state in this function, and we already have a lock on the process itself, so there's no need for a transaction. This was added in `ebb4eab0f9`.	2021-07-23 12:13:46 -07:00
Abhijeet Prasad Bodas	ebb4eab0f9	worker: Rewrite MissedMessageWorker to not be lossy. Previously, we stored up to 2 minutes worth of email events in memory before processing them. So, if the server were to go down we would lose those events. To fix this, we store the events in the database. This is a prep change for allowing users to set custom grace period for email notifications, since the bug noted above will aggravate with longer grace periods.	2021-07-13 17:21:38 -07:00
Abhijeet Prasad Bodas	e63e86dcb2	worker: Ensure complete coverage for PushNotificationsWorker. The `# nocoverage` was unnecessary apart from for the compatibility code, so add a test for that code and remove the `# nocoverage`. The `message_id` -> `message_ids` conversion was done in `9869153ae8`.	2021-07-13 08:30:31 -07:00
Mateusz Mandera	58d9975cca	embed_links: Interrupt consume() function on worker timeout. This fixes a bug introduced in `95b46549e1` which made the worker simply log a warning about the timeout and then continue consume()ing the event that should have also been interrupted. The idea here is to introduce an exception which can be used to interrupt the consume() process without triggering the regular handling of exceptions that happens in _handle_consume_exception.	2021-07-07 09:24:50 -07:00
Mateusz Mandera	95b46549e1	embed_links: Only log warning if worker times out. Throwing an exception is excessive in case of this worker, as it's expected for it to time out sometimes if the urls take too long to process. With a test added by tabbott.	2021-07-06 14:17:24 -07:00
Mateusz Mandera	d9ab70bdde	queue_processors: Make timer_expired receive list of events as argument. This will give queue workers more flexibility when defining their own override of the method.	2021-07-06 13:46:48 -07:00
Mateusz Mandera	c101f3acd6	queue_processors: Make timer_expired() a method. This allows specific queue workers to override the defaut behavior and implement their own response to the timer expiring. We will want to use this for embed_links queue at least.	2021-07-06 13:46:48 -07:00
PIG208	75cea329b4	markdown: Refactor out additional properties added to Message. This adds a new class called MessageRenderingResult to contain the additional properties we added to the Message object (like alert_words) as well as the rendered content to ensure typesafe reference. No behavioral change is made except changes in typing. This is a preparatory change for adding django-stubs to the backend. Related: #18777	2021-06-24 18:14:53 -07:00
sahil839	37bf160298	queue_processor: Add langauge to the events added to invites queue. This is a prep commit for adding realm-level default for various user settings. We add the language, in which the invite email will be sent, to the dict added to queue itself to avoid making queries in a loop when sending multiple emails from queue. We also handle the case for old events in the queue.	2021-06-22 16:55:32 -07:00
Mateusz Mandera	496e744053	queue_processors: Log more detailed info when marking messages as read.	2021-05-26 11:17:21 -07:00
PIG208	7150fe5dc5	backend: Extract check_update_message from update_message_backend.	2021-05-09 20:44:04 -07:00
Cyril Pletinckx	e4ff372fc3	emails: Transform SMTPException into EmailNotDeliveredException. Django's default SMTP implementation can raise various exceptions when trying to send an email. In order to allow Zulip calling code to catch fewer exceptions to handle any cause of "email not sent", we translate most of them into EmailNotDeliveredException. The non-translated exceptions concern the connection with the SMTP server. They were not merged with the rest to keep some details about the nature of these. Tests are implemented in the test_send_email.py module.	2021-05-05 20:16:11 -07:00
Alex Vandiver	a9688ceb75	worker: Allow long MissedMessageWorker consumes. This will stop dropping events in the case that the background `maybe_send_batched_email` thread takes longer than 30s. However, see also #15280 and the TODO comment about how we lose events upon restart; this worker is still lossy.	2021-05-04 08:45:48 -07:00
Cyril Pletinckx	9afde790c6	email: Open a single SMTP connection to send email batches. Previously the outgoing emails were sent over several SMTP connections through the EmailSendingWorker; establishing a new connection each time adds notable overhead. Redefine EmailSendingWorker worker to be a LoopQueueProcessingWorker, which allows it to handle batches of events. At the same time, persist the connection across email sending, if possible. The connection is initialized in the constructor of the worker in order to keep the same connection throughout the whole process. The concrete implementation of the consume_batch function is simply processing each email one at a time until they have all been sent. In order to reuse the previously implemented decorator to retry sending failures a new method that meets the decorator's required arguments is declared inside the EmailSendingWorker class. This allows to retry the sending process of a particular email inside the batch if the caught exception leaves this process retriable. A second retry mechanism is used inside the initialize_connection function to redo the opening of the connection until it works or until three attempts failed. For this purpose the backoff module has been added to the dependencies and a test has been added to ensure that this retry mechanism works well. The connection is closed when the stop method is called. Fixes: #17672.	2021-04-26 17:27:22 -07:00
Alex Vandiver	0ad17925eb	send_email: Remove unnecessary send_email_from_dict. This was introduced in `8321bd3f92` to serve as a sort of drop-in replacement for zerver.lib.queue.queue_json_publish, but its use has been subsequently cut out (e.g. `9fcdb6c83ac5`). Remote its last callsite.	2021-04-26 17:27:22 -07:00
Anders Kaseorg	178736c8eb	docs: Fix spelling errors caught by codespell. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-04-26 09:31:08 -07:00
Tim Abbott	260861426c	queue_processors: Document when can remove compatibility code.	2021-04-16 09:55:14 -07:00
Anders Kaseorg	e7ed907cf6	python: Convert deprecated Django ugettext alias to gettext. django.utils.translation.ugettext is a deprecated alias of django.utils.translation.gettext as of Django 3.0, and will be removed in Django 4.0. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-04-15 18:01:34 -07:00
Anders Kaseorg	1fe29aad42	queue_processors: Simplify unnecessary use of Optional. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-04-13 08:54:26 -07:00
Alex Vandiver	a280905a89	outgoing_webhook: Join build_bot_request and send_data_to_server. The existing organization, of returning an opaque blob from `build_bot_request`, which was later consumed by `send_data_to_server`, is not particularly sensible; the steps become oddly split between the OutgoingWebhookWorker, `do_rest_call`, and the `OutgoingWebhookServiceInterface`. Make the `OutgoingWebhookServiceInterface` in charge of building, making, and returning the request in one method; another method handles extracting content from a successful response. `do_rest_call` is responsible for calling both halves of this, and doing common error handling.	2021-03-29 18:24:44 -07:00
Anders Kaseorg	d55dc6f8f1	requirements: Upgrade python-zulip-api from Git. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-03-26 16:31:03 -07:00
Mateusz Mandera	b9c1fed18c	invites: Delete old compat code in the invites queue worker. 1.7.* is old enough at this point that we can clean up this code.	2021-02-26 08:26:43 -08:00
Mateusz Mandera	09fc79f911	actions: Remove realm argument to internal_send_private_message. The argument is redundant.	2021-02-23 15:26:47 -08:00
Anders Kaseorg	6e4c3e41dc	python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	11741543da	python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	5028c081cb	python: Merge concatenated string literals that Black would uglify. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Alex Vandiver	d0f0c2f2ed	digest: Fix the structure that we enqueue across when digesting. This rename was missed in `bfa0bdf3d6`. Without this fix, digest messages fail to send.	2021-02-08 17:28:59 -08:00
Anders Kaseorg	454144c35f	queue_processors: Fix retry_send_email_failures type. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-01-26 13:27:50 -08:00
Steve Howell	bfa0bdf3d6	email digests: Process users in chunks of 30. This should make the queue empty more quickly, because we do bulk queries to prevent database hops.	2021-01-17 11:28:30 -08:00
Alex Vandiver	c2526844e9	worker: Remove SignupWorker and friends. ZULIP_FRIENDS_LIST_ID and MAILCHIMP_API_KEY are not currently used in production. This removes the unused 'signups' queue and worker.	2021-01-17 11:16:35 -08:00
Alex Vandiver	d688e18de2	errors: Remove references to "deployment", use "host". The `deployment` key was only set in `do_report_error`, which is now only used in one codepath (the queue worker). The logging handlers on staging call notify_server_error directly, which omits the `deployment` key. Remove the odd one-of key, and instead simply do dispatch in `do_report_error`.	2021-01-17 11:08:12 -08:00
Anders Kaseorg	cc55393671	python: Open text files as text to skip decode operations. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-10-30 11:36:38 -07:00
Abhijeet Prasad Bodas	e98a8856c7	logging: Add logging in deferred_work queue processor. Adds logging statements in deferred_work queue consume.	2020-10-29 10:34:53 -07:00
Alex Vandiver	142de0f670	queue: Increase default timeout to 30s, from 10s. Not all of the workers are known to be safe to interrupt; they might leave inconsistent state. As such, terminating them with timeouts should currently only be a last-resort against stalled queues, not a regular occurrence.	2020-10-27 16:39:31 -07:00
Alex Vandiver	c73dd194f0	sentry: Group all worker timeouts together, by queue. Since the exception can be triggered at arbitrary places in the stack based on whenever the alarm happens to fire, they do not often group together. Explicitly group them together, grouped only by which queue the work is in.	2020-10-27 16:39:31 -07:00
Alex Vandiver	7cf737988d	queue: Be more explicit about test/real queue division.	2020-10-26 12:32:47 -07:00
Mateusz Mandera	716df658fa	queue_processors: Don't run test queues with run-dev.py.	2020-10-18 14:07:31 -07:00
Steve Howell	378062cc83	performance: Avoid call to access_stream_by_id. We already trust ids that are put on our queue for deferred work. For example, see the code for "mark_stream_messages_as_read_for_everyone" We now pass stream_recipient_id when we queue up work for do_mark_stream_messages_as_read. This generally saves about 3 queries per user when we unsubscribe them from a stream.	2020-10-16 12:58:11 -07:00
Steve Howell	31eb97ddde	performance: Fix do_mark_stream_messages_as_read. This function no longer asks for data that it doesn't need.	2020-10-16 12:58:11 -07:00
Anders Kaseorg	6564540d15	docs: Fix some spelling errors. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-10-13 15:47:13 -07:00
Alex Vandiver	f0b23b0752	queue: Switch non-batch consumer to also use start_json_consumer. This has no effect on consumption rate, but unifies the codepaths. Before: ``` $ ./manage.py queue_rate --count 50000 Purging queue... Enqueue rate: 11187 / sec Dequeue rate: 4158 / sec ``` After: ``` $ ./manage.py queue_rate --count 50000 Purging queue... Enqueue rate: 11010 / sec Dequeue rate: 4113 / sec ```	2020-10-11 14:19:42 -07:00
Alex Vandiver	45c9c3cc30	queue: Monitor user_activity queue, now that it has a consumer. Since this was using repead individual get() calls previously, it could not be monitored for having a consumer. Add it in, by marking it of queue type "consumer" (the default), and adding Nagios lines for it. Also adjust missedmessage_emails to be monitored; it stopped using LoopQueueProcessingWorker in `5cec566cb9`, but was never added back into the set of monitored consumers.	2020-10-11 14:19:42 -07:00
Alex Vandiver	f9358d5330	queue: Switch batch interface to use the channel.consume iterator. This low-level interface allows consuming from a queue with timeouts. This can be used to either consume in batches (with an upper timeout), or one-at-a-time. This is notably more performant than calling `.get()` repeatedly (what json_drain_queue does under the hood), which is "highly discouraged as it is very inefficient"[1]. Before this change: ``` $ ./manage.py queue_rate --count 10000 --batch Purging queue... Enqueue rate: 11158 / sec Dequeue rate: 3075 / sec ``` After: ``` $ ./manage.py queue_rate --count 10000 --batch Purging queue... Enqueue rate: 11511 / sec Dequeue rate: 19938 / sec ``` [1] https://www.rabbitmq.com/consumers.html#fetching	2020-10-11 14:19:40 -07:00
Alex Vandiver	2547bdbf4a	queue: Rename consume_wrapper to a better name.	2020-10-09 20:40:51 -07:00
Alex Vandiver	d5a6b0f99a	queue: Rename queue_size, and update for all local queues. Despite its name, the `queue_size` method does not return the number of items in the queue; it returns the number of items that the local consumer has delivered but unprocessed. These are often, but not always, the same. RabbitMQ's queues maintain the queue of unacknowledged messages; when a consumer connects, it sends to the consumer some number of messages to handle, known as the "prefetch." This is a performance optimization, to ensure the consumer code does not need to wait for a network round-trip before having new data to consume. The default prefetch is 0, which means that RabbitMQ immediately dumps all outstanding messages to the consumer, which slowly processes and acknowledges them. If a second consumer were to connect to the same queue, they would receive no messages to process, as the first consumer has already been allocated them. If the first consumer disconnects or crashes, all prior events sent to it are then made available for other consumers on the queue. The consumer does not know the total size of the queue -- merely how many messages it has been handed. No change is made to the prefetch here; however, future changes may wish to limit the prefetch, either for memory-saving, or to allow multiple consumers to work the same queue. Rename the method to make clear that it only contains information about the local queue in the consumer, not the full RabbitMQ queue. Also include the waiting message count, which is used by the `consume()` iterator for similar purpose to the pending events list.	2020-10-09 20:40:39 -07:00
Anders Kaseorg	9bfbb29763	queue_processors: Use try…finally to prevent leaking an alarm. Otherwise, if consume_func raised an exception for any reason other than the alarm being fired, the still-pending alarm would have fired later at some arbitrary point in the calling code. We need two try…finally blocks in case the signal arrives just before signal.alarm(0). Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-10-07 15:37:46 -07:00
Alex Vandiver	d47637fa40	queue: Set a max consume timeout with SIGALRM. SIGALRM is the simplest way to set a specific maximum duration that queue workers can take to handle a specific message. This only works in non-threaded environments, however, as signal handlers are per-process, not per-thread. The MAX_CONSUME_SECONDS is set quite high, at 10s -- the longest average worker consume time is embed_links, which hovers near 1s. Since just knowing the recent mean does not give much information[1], it is difficult to know how much variance is expected. As such, we set the threshold to be such that only events which are significant outliers will be timed out. This can be tuned downwards as more statistics are gathered on the runtime of the workers. The exception to this is DeferredWorker, which deals with quite-long requests, and thus has no enforceable SLO. [1] https://www.autodesk.com/research/publications/same-stats-different-graphs	2020-10-06 17:26:14 -07:00
Alex Vandiver	baf882a133	queue: Only ACK drain_queue once it has completed work on the list. Currently, drain_queue and json_drain_queue ack every message as it is pulled off of the queue, until the queue is empty. This means that if the consumer crashes between pulling a batch of messages off the queue, and actually processing them, those messages will be permanently lost. Sending an ACK on every message also results in a significant amount lot of traffic to rabbitmq, with notable performance implications. Send a singular ACK after the processing has completed, by making `drain_queue` into a contextmanager. Additionally, use the `multiple` flag to ACK all of the messages at once -- or explicitly NACK the messages if processing failed. Sending a NACK will re-queue them at the front of the queue. Performance of a no-op dequeue before this change: ``` $ ./manage.py queue_rate --count 50000 --batch Purging queue... Enqueue rate: 10847 / sec Dequeue rate: 2479 / sec ``` Performance of a no-op dequeue after this change (a 25% increase): ``` $ ./manage.py queue_rate --count 50000 --batch Purging queue... Enqueue rate: 10752 / sec Dequeue rate: 3079 / sec ```	2020-10-06 17:26:14 -07:00
Alex Vandiver	df86a564dc	queue: Let stop() work with LoopQueueProcessingWorker.	2020-10-06 17:26:14 -07:00
Alex Vandiver	8cf37a0d4b	queue: Add a tool to profile no-op enqueue and dequeue actions.	2020-10-06 17:26:14 -07:00
Tim Abbott	7fa8bafe81	lint: Fix type of initial 0 in queue monitoring.	2020-09-21 15:47:30 -07:00
Mateusz Mandera	810514dd9d	queue: Update stats file every 30 seconds. This system can't update stats while the queue is idle, without using threads for this, but at least we ensure to update the file after consuming an event if more than MAX_SECONDS_BEFORE_UPDATE_STATS passed since the last update, regardless of the number of iterations done so far.	2020-09-21 15:24:02 -07:00
Mateusz Mandera	40c4511a9c	queue: Fix misspelled consume_iteration_counter variable.	2020-09-21 15:22:58 -07:00
Mateusz Mandera	2365a53496	queue: Fix a race condition in monitoring after queue stops being idle. The race condition is described in the comment block removed by this commit. This leaves room for another, remaining race condition that should be virtually impossible, but nevertheless it seems worthwhile to have it documented in the code, so we put a new comment describing it. As a final note, this is not a new race condition, it was hypothetically possible with the old code as well.	2020-09-21 15:22:56 -07:00
Alex Vandiver	de1db2c838	sentry: Provide more metadata in queue processors. This allows aggregation by queue, makes the event data more readily accessible, and clears out the breadcrumbs upon every batch that is serviced.	2020-09-18 15:13:08 -07:00
Mateusz Mandera	bb4567f57e	queue: Extract get_remaining_queue_size method.	2020-09-11 15:51:07 -07:00
Mateusz Mandera	1d466a4fc5	queue: Make embed_link updates stats on every iteration.	2020-09-11 15:51:07 -07:00
Anders Kaseorg	bef46dab3c	python: Prefer kwargs form of dict.update. For less inflation by Black. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-03 17:51:09 -07:00
Anders Kaseorg	ab120a03bc	python: Replace unnecessary intermediate lists with generators. Mostly suggested by the flake8-comprehension plugin. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-02 11:15:41 -07:00
Mateusz Mandera	9b50c49ea7	streams: Mark all messages as read when deactivating a stream. The query to finds and marks all unread UserMessages in the stream as read can be quite expensive, so we'll move that work to the deferred_work queue and split it into batches. Fixes #15770.	2020-09-01 11:24:27 -07:00
Mateusz Mandera	06151672ba	queue: Use locking to avoid race conditions in missedmessage_emails. This queue had a race condition with creation of another Timer while maybe_send_batched_emails is still doing its work, which may cause two or more threads to be running maybe_send_batched_emails at the same time, mutating the shared data simultaneously. Another less likely potential race condition was that maybe_send_batched_emails after sending out its email, can call ensure_timer(). If the consume function is run simultaneously in the main thread, it will call ensure_timer() too, which, given unfortunate timings, might lead to both calls setting a new Timer. We add locking to the queue to avoid such race conditions. Tested manually, by print debugging with the following setup: 1. Making handle_missedmessage_emails sleep 2 seconds for each email, and changed BATCH_DURATION to 1s to make the queue start working right after launching. 2. Putting a bunch of events in the queue. 3. ./manage.py process_queue --queue_name missedmessage_emails 4. Once maybe_send_batched_emails is called and while it's processing the events, I pushed more events to the queue. That triggers the consume() function and ensure_timer(). Before implementing the locking mechanism, this causes two threads to run maybe_send_batched_emails at the same time, mutating each other's shared data, causing a traceback such as Exception in thread Thread-3: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 1182, in run self.function(self.args, *self.kwargs) File "/srv/zulip/zerver/worker/queue_processors.py", line 507, in maybe_send_batched_emails del self.events_by_recipient[user_profile_id] KeyError: '5' With the locking mechanism, things get handled as expected, and ensure_timer() exits if it can't obtain the lock due to maybe_send_batched_emails still working. Co-authored-by: Tim Abbott <tabbott@zulip.com>	2020-08-26 12:40:59 -07:00
Anders Kaseorg	61d0417e75	python: Replace ujson with orjson. Fixes #6507. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-08-11 10:55:12 -07:00
Anders Kaseorg	60a25b2721	docs: Fix spelling errors caught by codespell. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-08-11 10:23:06 -07:00
Alex Vandiver	2928bbc8bd	logging: Report stack_info on logging.exception calls. The exception trace only goes from where the exception was thrown up to where the `logging.exception` call is; any context as to where _that_ was called from is lost, unless `stack_info` is passed as well. Having the stack is particularly useful for Sentry exceptions, which gain the full stack trace. Add `stack_info=True` on all `logging.exception` calls with a non-trivial stack; we omit `wsgi.py`. Adjusts tests to match.	2020-08-11 10:16:54 -07:00
Mateusz Mandera	a7039c815e	queue_processors: Fix UnboundLocalError in QueueProcessingWorker. consume_time_seconds wasn't properly defined at the beginning, so when a BaseException that isn't a subclass of Exception is thrown, the finally: block could be entered with it still undefined.	2020-08-11 10:09:42 -07:00
Anders Kaseorg	8e6a439529	queue_processors: Fix strict_optional errors. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-07-06 11:25:48 -07:00
Tim Abbott	52676c0670	lint: Work around a pyflakes bug. Without this change, pyflakes reports this exception: pyflakes \| zerver/worker/queue_processors.py:152:9 local variable 'e' is assigned to but never used pyflakes \| zerver/worker/queue_processors.py:155:81 undefined name 'e'	2020-07-03 17:24:36 -07:00
Mateusz Mandera	d51afcf485	emails: Improve handling of timeouts when sending. We use the EMAIL_TIMEOUT django setting to timeout after 15s of trying to send an email. This will nicely lead to retries in the email_senders queue, due to the retry_send_email_failures decorator. smtlib documentation suggests that socket.timeout can be raised as the result of timing out, so in attempts I'm getting smtplib.SMTPServerDisconnected. Either way, seems appropriate to add socket.timeout to the exception that we catch.	2020-07-03 16:52:50 -07:00
Vishnu KS	0a36f04c20	i18n: Mark notification bot message in queue_processors for translation.	2020-06-26 14:57:18 -07:00
Mateusz Mandera	85d4536486	docs: Update some comments for the new release versioning scheme. With the new scheme, the equivalent of 2.3 is 4.0.	2020-06-25 10:33:03 -07:00
Anders Kaseorg	579f05f3ed	queue_processors: Avoid unchecked casts. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-22 17:18:19 -07:00
Tim Abbott	4d7550d705	views: Extract message_edit.py for message editing views. This is a pretty clean extraction of files that lets us shrink one of our largest files.	2020-06-22 15:08:34 -07:00
Mateusz Mandera	8d2d64c100	CVE-2020-14215: Fix validation in PreregistrationUser queries. The most import change here is the one in maybe_send_to_registration codepath, as the insufficient validation there could lead to fetching an expired PreregistrationUser that was invited as an administrator admin even years ago, leading to this registration ending up in the new user being a realm administrator. Combined with the buggy migration in 0198_preregistrationuser_invited_as.py, this led to users incorrectly joining as organizations administrators by accident. But even without that bug, this issue could have allowed a user who was invited as an administrator but then had that invitation expire and then joined via social authentication incorrectly join as an organization administrator. The second change is in ConfirmationEmailWorker, where this wasn't a security problem, but if the server was stopped for long enough, with some invites to send out email for in the queue, then after starting it up again, the queue worker would send out emails for invites that had already expired.	2020-06-16 23:35:39 -07:00
Anders Kaseorg	5dc9b55c43	python: Manually convert more percent-formatting to f-strings. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	1ed2d9b4a0	logging: Use logging.exception and exc_info for unexpected exceptions. logging.exception() and logging.debug(exc_info=True), etc. automatically include a traceback. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	a803e68528	email-mirror-postfix: Handle 8-bit messages correctly. Since JSON can’t represent bytes, we encode them with base64. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 20:24:06 -07:00
Anders Kaseorg	bff3dcadc8	email: Migrate to new Python ≥ 3.3 email API. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 20:24:06 -07:00
Anders Kaseorg	365fe0b3d5	python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-11 16:45:32 -07:00
Anders Kaseorg	69730a78cc	python: Use trailing commas consistently. Automatically generated by the following script, based on the output of lint with flake8-comma: import re import sys last_filename = None last_row = None lines = [] for msg in sys.stdin: m = re.match( r"\x1b\[35mflake8 \\|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg ) if m: filename, row_str, col_str, err = m.groups() row, col = int(row_str), int(col_str) if filename == last_filename: assert last_row != row else: if last_filename is not None: with open(last_filename, "w") as f: f.writelines(lines) with open(filename) as f: lines = f.readlines() last_filename = filename last_row = row line = lines[row - 1] if err in ["C812", "C815"]: lines[row - 1] = line[: col - 1] + "," + line[col - 1 :] elif err in ["C819"]: assert line[col - 2] == "," lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ") if last_filename is not None: with open(last_filename, "w") as f: f.writelines(lines) Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-06-11 16:04:12 -07:00
Graham Bleaney	461d5b1a3e	pysa: Introduce sanitizers, models, and inline marking safe. This commit adds three `.pysa` model files: `false_positives.pysa` for ruling out false positive flows with `Sanitize` annotations, `req_lib.pysa` for educating pysa about Zulip's `REQ()` pattern for extracting user input, and `redirects.pysa` for capturing the risk of open redirects within Zulip code. Additionally, this commit introduces `mark_sanitized`, an identity function which can be used to selectively clear taint in cases where `Sanitize` models will not work. This commit also puts `mark_sanitized` to work removing known false postive flows.	2020-06-11 12:57:49 -07:00
Anders Kaseorg	67e7a3631d	python: Convert percent formatting to Python 3.6 f-strings. Generated by pyupgrade --py36-plus. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-10 15:02:09 -07:00
Anders Kaseorg	8dd83228e7	python: Convert "".format to Python 3.6 f-strings. Generated by pyupgrade --py36-plus --keep-percent-format, but with the NamedTuple changes reverted (see commit `ba7906a3c6`, #15132). Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-08 15:31:20 -07:00
Anders Kaseorg	19cc22e5ab	queue: Fix types to reflect that Pika channels transmit bytes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-07 11:09:24 -07:00
Mateusz Mandera	200ce821a2	user_activity: Put client id instead of name in event dicts. This saves the completely unnecessary work of mapping the Client name to its ID. Because we had in-process caching of the immutable Client objects, this isn't a material performance win, but it will eventually let us delete that caching logic and have a simpler system.	2020-05-29 15:19:55 -07:00
Mateusz Mandera	e2262b0b64	queue_processors: Log time spent getting data for url in embed_links.	2020-05-21 12:13:46 -07:00
Mateusz Mandera	dd40649e04	queue_processors: Remove the slow_queries queue. While this functionality to post slow queries to a Zulip stream was very useful in the early days of Zulip, when there were only a few hundred accounts, it's long since been useless since (1) the total request volume on larger Zulip servers run by Zulip developers, and (2) other server operators don't want real-time notifications of slow backend queries. The right structure for this is just a log file. We get rid of the queue and replace it with a "zulip.slow_queries" logger, which will still log to /var/log/zulip/slow_queries.log for ease of access to this information and propagate to the other logging handlers. Reducing the amount of queues is good for lowering zulip's memory footprint and restart performance, since we run at least one dedicated queue worker process for each one in most configurations.	2020-05-11 00:45:13 -07:00
Anders Kaseorg	bdc365d0fe	logging: Pass format arguments to logging. https://docs.python.org/3/howto/logging.html#optimization Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-05-02 10:18:02 -07:00
Wyatt Hoodes	82e7ad8e25	data exports: Handle pending and failed exports. Prior to this change, there were reports of 500s in production due to `export.extra_data` being a Nonetype. This was reproducible using the s3 backend in development when a row was created in the `RealmAuditLog` table, but the export failed in the `DeferredWorker`. This left an entry lying about that was never updated with an `extra_data` field. To fix this, we catch any exceptions in the `DeferredWorker`, and then update `extra_data` to encode the failure. We also fix the fact that we never updated the export UI table with pending exports. These changes also negated the use for the somewhat hacky `clear_success_banner` logic.	2020-04-30 13:00:59 -07:00
Anders Kaseorg	fead14951c	python: Convert assignment type annotations to Python 3.6 style. This commit was split by tabbott; this piece covers the vast majority of files in Zulip, but excludes scripts/, tools/, and puppet/ to help ensure we at least show the right error messages for Xenial systems. We can likely further refine the remaining pieces with some testing. Generated by com2ann, with whitespace fixes and various manual fixes for runtime issues: - invoiced_through: Optional[LicenseLedger] = models.ForeignKey( + invoiced_through: Optional["LicenseLedger"] = models.ForeignKey( -_apns_client: Optional[APNsClient] = None +_apns_client: Optional["APNsClient"] = None - notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) - signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) + notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) + signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) - author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE) + author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE) - bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL) + bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL) - default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) - default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) + default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) + default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) -descriptors_by_handler_id: Dict[int, ClientDescriptor] = {} +descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {} -worker_classes: Dict[str, Type[QueueProcessingWorker]] = {} -queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {} +worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {} +queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {} -AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None +AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-22 11:02:32 -07:00
Mateusz Mandera	fe8f57b8b7	queue_processors: Write a newline char at the end of stats files.	2020-04-10 13:48:16 -07:00
Mateusz Mandera	5252b081bd	queue_processors: Gather statistics on queue worker operations.	2020-04-01 16:44:06 -07:00
arpit551	8f7733cb20	emails: Added placeholders strings in FormAddress. We've had a bug for a while that if any ScheduledEmail objects get created with the wrong email sender address, even after the sysadmin corrects the problem, they'll still get errors because of the objects stored with the wrong format. We solve this by using FromAddress placeholders strings in send_future_email function, so that ScheduledEmail objects end up setting the final `from_address` value when mail is actually sent using the setting in effect at that time. Fixes #11008.	2020-03-27 16:41:02 -07:00
Mateusz Mandera	5da2f80140	queue_processors: Extract a duplicated logic block into do_consume.	2020-03-22 18:45:46 -07:00
Tim Abbott	783a77c532	queue processors: Flush per-request caches after each item. Several of our queues are capable of doing work that includes rendering markdown (outgoing_webhook, embedded_bots, embed_links, and email_mirror). As a result, it's essential that these don't cache per-request data (specifically, realm filters) longer than they should, making editing/deleting linkifiers potentially use old settings until the relevant process was restarted. Flushing these caches is extremely cheap (just clearing two dictionaries) and thus is reasonable to do after every queue event, rather than trying to do it only the ~1/3 of queues that specifically do markdown processing. We do the same in our middleware for reset_queries. It's not worth writing a test for this because it's very difficult to create the test setup situation for this bug with a single test worker process; one needs to edit the linkifier configuration in a different process than the one sending the message in order to see the bug. This was a much larger visible bug on Zulip 2.1.x, where the presence of the message_sender queue meant that this would apply to messages sent via a browser. Fixes #14095.	2020-03-03 15:29:11 -08:00
Steve Howell	2e8dec233e	slow queries: Use internal_send_stream_message(). Note that while the test mocks the actual message send, we now have a `get_stream` call in the queue worker, so we have to set up a real stream for testing (or we could have mocked that as well, but it didn't seem necessary). The setup queries add to the amount of queries reported by the test, plus the `get_stream` call. I just made the query count a digits regex, which is a little bit lame, but I don't think it's worth risking test flakes for this.	2020-02-11 12:20:54 -08:00
Mateusz Mandera	4c5a8e6f0c	queue: Remove missedmessage_email_senders.	2020-01-31 12:13:51 -08:00
Tim Abbott	d70e799466	bots: Remove FEEDBACK_BOT implementation. This legacy cross-realm bot hasn't been used in several years, as far as I know. If we wanted to re-introduce it, I'd want to implement it as an embedded bot using those common APIs, rather than the totally custom hacky code used for it that involves unnecessary queue workers and similar details. Fixes #13533.	2020-01-25 22:41:39 -08:00
Anders Kaseorg	ea6934c26d	dependencies: Remove WebSockets system for sending messages. Zulip has had a small use of WebSockets (specifically, for the code path of sending messages, via the webapp only) since ~2013. We originally added this use of WebSockets in the hope that the latency benefits of doing so would allow us to avoid implementing a markdown local echo; they were not. Further, HTTP/2 may have eliminated the latency difference we hoped to exploit by using WebSockets in any case. While we’d originally imagined using WebSockets for other endpoints, there was never a good justification for moving more components to the WebSockets system. This WebSockets code path had a lot of downsides/complexity, including: * The messy hack involving constructing an emulated request object to hook into doing Django requests. * The `message_senders` queue processor system, which increases RAM needs and must be provisioned independently from the rest of the server). * A duplicate check_send_receive_time Nagios test specific to WebSockets. * The requirement for users to have their firewalls/NATs allow WebSocket connections, and a setting to disable them for networks where WebSockets don’t work. * Dependencies on the SockJS family of libraries, which has at times been poorly maintained, and periodically throws random JavaScript exceptions in our production environments without a deep enough traceback to effectively investigate. * A total of about 1600 lines of our code related to the feature. * Increased load on the Tornado system, especially around a Zulip server restart, and especially for large installations like zulipchat.com, resulting in extra delay before messages can be sent again. As detailed in https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it appears that removing WebSockets moderately increases the time it takes for the `send_message` API query to return from the server, but does not significantly change the time between when a message is sent and when it is received by clients. We don’t understand the reason for that change (suggesting the possibility of a measurement error), and even if it is a real change, we consider that potential small latency regression to be acceptable. If we later want WebSockets, we’ll likely want to just use Django Channels. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-01-14 22:34:00 -08:00
Mateusz Mandera	89046ea1a9	email_mirror: Give extract_and_validate a more descriptive name.	2020-01-12 11:30:18 -08:00
Mateusz Mandera	c011d2c6d3	email_mirror: Migrate missed message addresses from redis to database. Addresses point 1 of #13533. MissedMessageEmailAddress objects get tied to the specific that was missed by the user. A useful benefit of that is that email message sent to that address will handle topic changes - if the message that was missed gets its topic changed, the email response will get posted under the new topic, while in the old model it would get posted under the old topic, which could potentially be confusing. Migrating redis data to this new model is a bit tricky, so the migration code has comments explaining some of the compromises made there, and test_migrations.py tests handling of the various possible cases that could arise.	2020-01-07 13:03:22 -08:00
Mateusz Mandera	e90866876c	queue: Take advantage of ABC for defining abstract worker base classes. QueueProcessingWorker and LoopQueueProcessingWorker are abstract classes meant to be subclassed by a class that will define its own consume() or consume_batch() method. ABCs are suited for that and we can tag consume/consume_batch with the @abstractmethod wrapper which will prevent subclasses that don't define these methods properly to be impossible to even instantiate (as opposed to only crashing once consume() is called). It's also nicely detected by mypy, which will throw errors such as this on invalid use: error: Only concrete class can be given where "Type[TestWorker]" is expected error: Cannot instantiate abstract class 'TestWorker' with abstract attribute 'consume' Due to it being detected by mypy, we can remove the test test_worker_noconsume which just tested the old version of this - raising an exception when the unimplemented consume() gets called. Now it can be handled already on the linter level.	2019-12-28 10:52:17 -08:00
Mateusz Mandera	a54640fc68	queue: Share exception handling code between loop and normal workers. LoopQueueProcessingWorker can handle exceptions inside consume_batch in a similar manner to how QueueProcessingWorker handles exceptions inside consume.	2019-12-28 10:47:36 -08:00
Tim Abbott	1465628c95	queue workers: Use self.queue_name in retry_event calls. This just adds a bit of robustness if we ever end up renaming queues.	2019-12-04 10:08:48 -08:00
Mateusz Mandera	7d0444f903	push_notifs: Improve handling of errors when talking to the bouncer. We use the plumbing introduced in a previous commit, to now raise PushNotificationBouncerRetryLaterError in send_to_push_bouncer in case of issues with talking to the bouncer server. That's a better way of dealing with the errors than the previous approach of returning a "failed" boolean, which generally wasn't checked in the code anyway and did nothing. The PushNotificationBouncerRetryLaterError exception will be nicely handled by queue processors to retry sending again, and due to being a JsonableError, it will also communicate the error to API users.	2019-12-04 09:58:22 -08:00
Mateusz Mandera	20b30e1503	push_notifs: Set up plumbing for retrying in case of bouncer error. We add PushNotificationBouncerRetryLaterError as an exception to signal an error occurred when trying to communicate with the bouncer and it should be retried. We use JsonableError as the base class, because this signal will need to work in two roles: 1. When the push notification was being issued by the queue worker PushNotificationsWorker, it will signal to the worker to requeue the event and try again later. 2. The exception will also possibly be raised (this will be added in the next commit) on codepaths coming from a request to an API endpoint (for example to add a token, to users/me/apns_device_token). In that case, it'll be needed to provide a good error to the API user - and basing this exception on JsonableError will allow that.	2019-12-04 09:58:22 -08:00
Tim Abbott	6407d0b1f9	push_notifications: Clear PushDeviceToken on API key change. This includes adding a new endpoint to the push notification bouncer interface, and code to call it appropriately after resetting a user's personal API key. When we add support for a user having multiple API keys, we may need to add an additional key here to support removing keys associated with just one client.	2019-11-19 15:37:43 -08:00
Tim Abbott	bb64b0fa4d	queue processors: Switch SignupWorker to logging user IDs. This is a better setup than logging emails, especially with EMAIL_ADDRESS_VISIBILITY_ADMINS.	2019-11-15 17:07:24 -08:00
Tim Abbott	d2970a56c2	lint: Remove some unused imports. These were introduced in `ae5bc92602`.	2019-10-10 18:06:30 -07:00
Vishnu KS	ae5bc92602	queue: Don't create confirmation objects twice during invite. A confirmation object is already created when do_send_confirmation_email is called just above. Tweaked by tabbott to remove an unnecessary somewhat hacky database query.	2019-10-10 16:19:42 -07:00
Tim Abbott	1c73ce2450	user_activity: Use LoopQueueProcessingWorker strategy. This should dramatically improve the queue processor's performance in cases where there's a very high volume of requests on a given endpoint by a given user, as described in the new docstring. Until we test this more broadly in production, we won't know if this is a full solution to the problem, but I think it's likely. We've never seen the UserActivityInterval worker end up backlogged without a total queue processor outage, and it should have a similar workload. Fixes #13180.	2019-09-21 11:48:24 -07:00
Tim Abbott	f0d8951035	do_update_user_activity: Refactor to support passing a count. We'll use this in upcoming commits.	2019-09-21 11:47:14 -07:00
Tim Abbott	5c960b3e0f	user_activity: Make the queue processor a bit more efficient. We don't actually need to go to the memcached (falling back to the database) to fetch either user or client objects on every event. For user objects, we actually can just pass through the user ID transparently; for client objects, we can use an in-process cache, since the mapping of string to ID never changes.	2019-09-21 11:47:14 -07:00
Rishi Gupta	e058558a52	emails: Send invitation reminder email two days before expiry. Hopefully this does a better job of spurring people to action, and also suggests a self-service fix if they don't (i.e. contacting the person that invited them).	2019-08-23 12:53:11 -07:00
Rishi Gupta	2d260031ed	emails: Use referrer.delivery_email in invitation emails.	2019-08-23 12:53:11 -07:00
Anders Kaseorg	a5596011a0	queue_processors, python_examples: Fix mypy errors. zerver/openapi/python_examples.py:105: error: Argument 1 to "get_user_presence" of "Client" has incompatible type "str"; expected "Dict[str, Any]" zerver/openapi/python_examples.py:563: error: Argument 1 to "add_reaction" of "Client" has incompatible type "Dict[str, object]"; expected "Dict[str, str]" zerver/openapi/python_examples.py:576: error: Argument 1 to "remove_reaction" of "Client" has incompatible type "Dict[str, object]"; expected "Dict[str, str]" zerver/worker/queue_processors.py:587: error: Argument "client" to "extract_query_without_mention" has incompatible type "EmbeddedBotHandler"; expected "ExternalBotHandler" These were only missed because mypy daemon mode requires us to set `follow_imports = skip` for the `zulip` package. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-08-16 14:13:40 -07:00
Tim Abbott	1d63f129bf	data export: Add summary logging of runtime. This should help us investigate cases where this runs a very long time.	2019-08-12 18:21:08 -07:00
Wyatt Hoodes	7d178bbb0f	queue_processors: Clean up the extra_data dict code. We don't want to add a `deleted_timestamp` key until the export is actually deleted.	2019-08-12 17:51:46 -07:00
Wyatt Hoodes	22842dab34	events: Rename notify_export_completed. notify_realm_export is more reasonable for the context of doing deletion events as well.	2019-08-07 14:18:27 -07:00
Wyatt Hoodes	11db0c23fb	exports: Update extra_data field to a JSON structure. We add the `deleted_timestamp` key to the new `extra_data` dictionary.	2019-08-07 12:04:28 -07:00
neiljp (Neil Pilgrim)	accf4411f0	mypy: Remove type ignore on MissedMessageWorker.stop_timer.	2019-08-06 23:24:56 -07:00
Wyatt Hoodes	bbbea9ec87	events: Rewrite system for managing realm exports. This feature is intended to cover all of our ways of exporting a realm, not just the initial "public export" feature, so we should name things appropriately for that goal. Additionally, we don't want to include data exports in page_params; the original implementation was actually buggy and would have.	2019-07-26 16:38:52 -07:00
Wyatt Hoodes	d070f27359	queue_processors: Change the extra_data field to a relative url path. A better approach as compared to saving the full public url.	2019-07-26 15:50:02 -07:00
Wyatt Hoodes	5686821150	middleware: Change write_log_line to publish as a dict. We were seeing errors when pubishing typical events in the form of `Dict[str, Any]` as the expected type to be a `Union`. So we instead change the only non-dictionary call, to pass a dict instead of `str`.	2019-07-22 17:06:41 -07:00
Wyatt Hoodes	db69cdbcde	public_export: Add support for deleting export after access. The RealmAuditLog object ID was stored in the event sent to the deferred_work queue as a means to update the row's extra_data field. The extra_data field then stores the location of the export.	2019-05-31 22:54:27 -07:00
Wyatt Hoodes	c0ef6c2fc6	export: Add LOCAL_UPLOADS_DIR support to the export feature. A unique path was created using the `LOCAL_UPLOADS_DIR` backend, similar to the code used in `LocalUploadBackend`. The exported tarball was copied to the directory, and an nginx url was created to serve the file publicly. Tweaked by tabbott to output an actual URL.	2019-05-27 20:06:35 -07:00
Wyatt Hoodes	4dd8c133a9	export: Rename `--upload-to-s3` to be `--upload`. The upload option will no longer be limited to strictly S3 uploads. This commit serves as a preliminary step for supporting LOCAL_UPLOADS_DIR as part of the public only export feature.	2019-05-20 19:59:57 -07:00
Wyatt Hoodes	d4715f23d7	public_export: Add backend API endpoint for triggering export. An endpoint was created in zerver/views. Basic rate-limiting was implemented using RealmAuditLog. The idea here is to simply log each export event as a realm_exported event. The number of events occurring in the time delta is checked to ensure that the weekly limit is not exceeded. The event is published to the 'deferred_work' queue processor to prevent the export process from being killed after 60s. Upon completion of the export the realm admin(s) are notified.	2019-04-26 17:24:29 -07:00
Anders Kaseorg	643bd18b9f	lint: Fix code that evaded our lint checks for string % non-tuple. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-04-23 15:21:37 -07:00
Mateusz Mandera	1901775383	email_mirror: Add realm-based rate limiting. Closes #2420 We add rate limiting (max X emails withing Y seconds per realm) to the email mirror. By creating RateLimitedRealmMirror class, inheriting from RateLimitedObject, and rate_limit_mirror_by_realm function, following a mechanism used by rate_limit_user, we're able to have this implementation mostly rely on the already existing, and proven over time, rate_limiter.py code. The rules are configurable in settings.py in RATE_LIMITING_MIRROR_REALM_RULES, analogically to RATE_LIMITING_RULES. Rate limit verification happens in the MirrorWorker in queue_processors.py. We don't rate limit missed message emails, as due to using one time addresses, they're not a spam threat. test_mirror_worker is adapted to the altered MirrorWorker code and a new test - test_mirror_worker_rate_limiting is added in test_queue_worker.py to provide coverage for these changes.	2019-03-18 11:16:58 -07:00
Tim Abbott	50dc317466	notifications: Rename notifications.py to email_notifications.py. This library is entirely about email notifications specifically, and this rename should help make the codebase more readable.	2019-03-15 11:02:17 -07:00
Greg Price	9869153ae8	push notif: Send a batch of message IDs in one `remove` payload. When a bunch of messages with active notifications are all read at once -- e.g. by the user choosing to mark all messages, or all in a stream, as read, or just scrolling quickly through a PM conversation -- there can be a large batch of this information to convey. Doing it in a single GCM/FCM message is better for server congestion, and for the device's battery. The corresponding client-side logic is in zulip/zulip-mobile#3343 . Existing clients today only understand one message ID at a time; so accommodate them by sending individual GCM/FCM messages up to an arbitrary threshold, with the rest only as a batch. Also add an explicit test for this logic. The existing tests that happen to cause this function to run don't exercise the last condition, so without a new test `--coverage` complains.	2019-02-26 16:41:54 -08:00
Anders Kaseorg	f0ecb93515	zerver core: Remove unused imports. Signed-off-by: Anders Kaseorg <andersk@mit.edu>	2019-02-02 17:41:24 -08:00
Pragati Agrawal	e1772b3b8f	tools: Upgrade Pycodestyle and fix new linter errors. Here, we are upgrading pycodestyle version from 2.4.0 to 2.5.0. Fixes: #11396.	2019-01-31 12:21:41 -08:00

1 2 3 4 5 ...

479 Commits