zulip

Commit Graph

Author	SHA1	Message	Date
Anders Kaseorg	ea6934c26d	dependencies: Remove WebSockets system for sending messages. Zulip has had a small use of WebSockets (specifically, for the code path of sending messages, via the webapp only) since ~2013. We originally added this use of WebSockets in the hope that the latency benefits of doing so would allow us to avoid implementing a markdown local echo; they were not. Further, HTTP/2 may have eliminated the latency difference we hoped to exploit by using WebSockets in any case. While we’d originally imagined using WebSockets for other endpoints, there was never a good justification for moving more components to the WebSockets system. This WebSockets code path had a lot of downsides/complexity, including: * The messy hack involving constructing an emulated request object to hook into doing Django requests. * The `message_senders` queue processor system, which increases RAM needs and must be provisioned independently from the rest of the server). * A duplicate check_send_receive_time Nagios test specific to WebSockets. * The requirement for users to have their firewalls/NATs allow WebSocket connections, and a setting to disable them for networks where WebSockets don’t work. * Dependencies on the SockJS family of libraries, which has at times been poorly maintained, and periodically throws random JavaScript exceptions in our production environments without a deep enough traceback to effectively investigate. * A total of about 1600 lines of our code related to the feature. * Increased load on the Tornado system, especially around a Zulip server restart, and especially for large installations like zulipchat.com, resulting in extra delay before messages can be sent again. As detailed in https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it appears that removing WebSockets moderately increases the time it takes for the `send_message` API query to return from the server, but does not significantly change the time between when a message is sent and when it is received by clients. We don’t understand the reason for that change (suggesting the possibility of a measurement error), and even if it is a real change, we consider that potential small latency regression to be acceptable. If we later want WebSockets, we’ll likely want to just use Django Channels. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-01-14 22:34:00 -08:00
Mateusz Mandera	89046ea1a9	email_mirror: Give extract_and_validate a more descriptive name.	2020-01-12 11:30:18 -08:00
Mateusz Mandera	c011d2c6d3	email_mirror: Migrate missed message addresses from redis to database. Addresses point 1 of #13533. MissedMessageEmailAddress objects get tied to the specific that was missed by the user. A useful benefit of that is that email message sent to that address will handle topic changes - if the message that was missed gets its topic changed, the email response will get posted under the new topic, while in the old model it would get posted under the old topic, which could potentially be confusing. Migrating redis data to this new model is a bit tricky, so the migration code has comments explaining some of the compromises made there, and test_migrations.py tests handling of the various possible cases that could arise.	2020-01-07 13:03:22 -08:00
Mateusz Mandera	e90866876c	queue: Take advantage of ABC for defining abstract worker base classes. QueueProcessingWorker and LoopQueueProcessingWorker are abstract classes meant to be subclassed by a class that will define its own consume() or consume_batch() method. ABCs are suited for that and we can tag consume/consume_batch with the @abstractmethod wrapper which will prevent subclasses that don't define these methods properly to be impossible to even instantiate (as opposed to only crashing once consume() is called). It's also nicely detected by mypy, which will throw errors such as this on invalid use: error: Only concrete class can be given where "Type[TestWorker]" is expected error: Cannot instantiate abstract class 'TestWorker' with abstract attribute 'consume' Due to it being detected by mypy, we can remove the test test_worker_noconsume which just tested the old version of this - raising an exception when the unimplemented consume() gets called. Now it can be handled already on the linter level.	2019-12-28 10:52:17 -08:00
Mateusz Mandera	a54640fc68	queue: Share exception handling code between loop and normal workers. LoopQueueProcessingWorker can handle exceptions inside consume_batch in a similar manner to how QueueProcessingWorker handles exceptions inside consume.	2019-12-28 10:47:36 -08:00
Tim Abbott	1465628c95	queue workers: Use self.queue_name in retry_event calls. This just adds a bit of robustness if we ever end up renaming queues.	2019-12-04 10:08:48 -08:00
Mateusz Mandera	7d0444f903	push_notifs: Improve handling of errors when talking to the bouncer. We use the plumbing introduced in a previous commit, to now raise PushNotificationBouncerRetryLaterError in send_to_push_bouncer in case of issues with talking to the bouncer server. That's a better way of dealing with the errors than the previous approach of returning a "failed" boolean, which generally wasn't checked in the code anyway and did nothing. The PushNotificationBouncerRetryLaterError exception will be nicely handled by queue processors to retry sending again, and due to being a JsonableError, it will also communicate the error to API users.	2019-12-04 09:58:22 -08:00
Mateusz Mandera	20b30e1503	push_notifs: Set up plumbing for retrying in case of bouncer error. We add PushNotificationBouncerRetryLaterError as an exception to signal an error occurred when trying to communicate with the bouncer and it should be retried. We use JsonableError as the base class, because this signal will need to work in two roles: 1. When the push notification was being issued by the queue worker PushNotificationsWorker, it will signal to the worker to requeue the event and try again later. 2. The exception will also possibly be raised (this will be added in the next commit) on codepaths coming from a request to an API endpoint (for example to add a token, to users/me/apns_device_token). In that case, it'll be needed to provide a good error to the API user - and basing this exception on JsonableError will allow that.	2019-12-04 09:58:22 -08:00
Tim Abbott	6407d0b1f9	push_notifications: Clear PushDeviceToken on API key change. This includes adding a new endpoint to the push notification bouncer interface, and code to call it appropriately after resetting a user's personal API key. When we add support for a user having multiple API keys, we may need to add an additional key here to support removing keys associated with just one client.	2019-11-19 15:37:43 -08:00
Tim Abbott	bb64b0fa4d	queue processors: Switch SignupWorker to logging user IDs. This is a better setup than logging emails, especially with EMAIL_ADDRESS_VISIBILITY_ADMINS.	2019-11-15 17:07:24 -08:00
Tim Abbott	d2970a56c2	lint: Remove some unused imports. These were introduced in `ae5bc92602`.	2019-10-10 18:06:30 -07:00
Vishnu KS	ae5bc92602	queue: Don't create confirmation objects twice during invite. A confirmation object is already created when do_send_confirmation_email is called just above. Tweaked by tabbott to remove an unnecessary somewhat hacky database query.	2019-10-10 16:19:42 -07:00
Tim Abbott	1c73ce2450	user_activity: Use LoopQueueProcessingWorker strategy. This should dramatically improve the queue processor's performance in cases where there's a very high volume of requests on a given endpoint by a given user, as described in the new docstring. Until we test this more broadly in production, we won't know if this is a full solution to the problem, but I think it's likely. We've never seen the UserActivityInterval worker end up backlogged without a total queue processor outage, and it should have a similar workload. Fixes #13180.	2019-09-21 11:48:24 -07:00
Tim Abbott	f0d8951035	do_update_user_activity: Refactor to support passing a count. We'll use this in upcoming commits.	2019-09-21 11:47:14 -07:00
Tim Abbott	5c960b3e0f	user_activity: Make the queue processor a bit more efficient. We don't actually need to go to the memcached (falling back to the database) to fetch either user or client objects on every event. For user objects, we actually can just pass through the user ID transparently; for client objects, we can use an in-process cache, since the mapping of string to ID never changes.	2019-09-21 11:47:14 -07:00
Rishi Gupta	e058558a52	emails: Send invitation reminder email two days before expiry. Hopefully this does a better job of spurring people to action, and also suggests a self-service fix if they don't (i.e. contacting the person that invited them).	2019-08-23 12:53:11 -07:00
Rishi Gupta	2d260031ed	emails: Use referrer.delivery_email in invitation emails.	2019-08-23 12:53:11 -07:00
Anders Kaseorg	a5596011a0	queue_processors, python_examples: Fix mypy errors. zerver/openapi/python_examples.py:105: error: Argument 1 to "get_user_presence" of "Client" has incompatible type "str"; expected "Dict[str, Any]" zerver/openapi/python_examples.py:563: error: Argument 1 to "add_reaction" of "Client" has incompatible type "Dict[str, object]"; expected "Dict[str, str]" zerver/openapi/python_examples.py:576: error: Argument 1 to "remove_reaction" of "Client" has incompatible type "Dict[str, object]"; expected "Dict[str, str]" zerver/worker/queue_processors.py:587: error: Argument "client" to "extract_query_without_mention" has incompatible type "EmbeddedBotHandler"; expected "ExternalBotHandler" These were only missed because mypy daemon mode requires us to set `follow_imports = skip` for the `zulip` package. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-08-16 14:13:40 -07:00
Tim Abbott	1d63f129bf	data export: Add summary logging of runtime. This should help us investigate cases where this runs a very long time.	2019-08-12 18:21:08 -07:00
Wyatt Hoodes	7d178bbb0f	queue_processors: Clean up the extra_data dict code. We don't want to add a `deleted_timestamp` key until the export is actually deleted.	2019-08-12 17:51:46 -07:00
Wyatt Hoodes	22842dab34	events: Rename notify_export_completed. notify_realm_export is more reasonable for the context of doing deletion events as well.	2019-08-07 14:18:27 -07:00
Wyatt Hoodes	11db0c23fb	exports: Update extra_data field to a JSON structure. We add the `deleted_timestamp` key to the new `extra_data` dictionary.	2019-08-07 12:04:28 -07:00
neiljp (Neil Pilgrim)	accf4411f0	mypy: Remove type ignore on MissedMessageWorker.stop_timer.	2019-08-06 23:24:56 -07:00
Wyatt Hoodes	bbbea9ec87	events: Rewrite system for managing realm exports. This feature is intended to cover all of our ways of exporting a realm, not just the initial "public export" feature, so we should name things appropriately for that goal. Additionally, we don't want to include data exports in page_params; the original implementation was actually buggy and would have.	2019-07-26 16:38:52 -07:00
Wyatt Hoodes	d070f27359	queue_processors: Change the extra_data field to a relative url path. A better approach as compared to saving the full public url.	2019-07-26 15:50:02 -07:00
Wyatt Hoodes	5686821150	middleware: Change write_log_line to publish as a dict. We were seeing errors when pubishing typical events in the form of `Dict[str, Any]` as the expected type to be a `Union`. So we instead change the only non-dictionary call, to pass a dict instead of `str`.	2019-07-22 17:06:41 -07:00
Wyatt Hoodes	db69cdbcde	public_export: Add support for deleting export after access. The RealmAuditLog object ID was stored in the event sent to the deferred_work queue as a means to update the row's extra_data field. The extra_data field then stores the location of the export.	2019-05-31 22:54:27 -07:00
Wyatt Hoodes	c0ef6c2fc6	export: Add LOCAL_UPLOADS_DIR support to the export feature. A unique path was created using the `LOCAL_UPLOADS_DIR` backend, similar to the code used in `LocalUploadBackend`. The exported tarball was copied to the directory, and an nginx url was created to serve the file publicly. Tweaked by tabbott to output an actual URL.	2019-05-27 20:06:35 -07:00
Wyatt Hoodes	4dd8c133a9	export: Rename `--upload-to-s3` to be `--upload`. The upload option will no longer be limited to strictly S3 uploads. This commit serves as a preliminary step for supporting LOCAL_UPLOADS_DIR as part of the public only export feature.	2019-05-20 19:59:57 -07:00
Wyatt Hoodes	d4715f23d7	public_export: Add backend API endpoint for triggering export. An endpoint was created in zerver/views. Basic rate-limiting was implemented using RealmAuditLog. The idea here is to simply log each export event as a realm_exported event. The number of events occurring in the time delta is checked to ensure that the weekly limit is not exceeded. The event is published to the 'deferred_work' queue processor to prevent the export process from being killed after 60s. Upon completion of the export the realm admin(s) are notified.	2019-04-26 17:24:29 -07:00
Anders Kaseorg	643bd18b9f	lint: Fix code that evaded our lint checks for string % non-tuple. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-04-23 15:21:37 -07:00
Mateusz Mandera	1901775383	email_mirror: Add realm-based rate limiting. Closes #2420 We add rate limiting (max X emails withing Y seconds per realm) to the email mirror. By creating RateLimitedRealmMirror class, inheriting from RateLimitedObject, and rate_limit_mirror_by_realm function, following a mechanism used by rate_limit_user, we're able to have this implementation mostly rely on the already existing, and proven over time, rate_limiter.py code. The rules are configurable in settings.py in RATE_LIMITING_MIRROR_REALM_RULES, analogically to RATE_LIMITING_RULES. Rate limit verification happens in the MirrorWorker in queue_processors.py. We don't rate limit missed message emails, as due to using one time addresses, they're not a spam threat. test_mirror_worker is adapted to the altered MirrorWorker code and a new test - test_mirror_worker_rate_limiting is added in test_queue_worker.py to provide coverage for these changes.	2019-03-18 11:16:58 -07:00
Tim Abbott	50dc317466	notifications: Rename notifications.py to email_notifications.py. This library is entirely about email notifications specifically, and this rename should help make the codebase more readable.	2019-03-15 11:02:17 -07:00
Greg Price	9869153ae8	push notif: Send a batch of message IDs in one `remove` payload. When a bunch of messages with active notifications are all read at once -- e.g. by the user choosing to mark all messages, or all in a stream, as read, or just scrolling quickly through a PM conversation -- there can be a large batch of this information to convey. Doing it in a single GCM/FCM message is better for server congestion, and for the device's battery. The corresponding client-side logic is in zulip/zulip-mobile#3343 . Existing clients today only understand one message ID at a time; so accommodate them by sending individual GCM/FCM messages up to an arbitrary threshold, with the rest only as a batch. Also add an explicit test for this logic. The existing tests that happen to cause this function to run don't exercise the last condition, so without a new test `--coverage` complains.	2019-02-26 16:41:54 -08:00
Anders Kaseorg	f0ecb93515	zerver core: Remove unused imports. Signed-off-by: Anders Kaseorg <andersk@mit.edu>	2019-02-02 17:41:24 -08:00
Pragati Agrawal	e1772b3b8f	tools: Upgrade Pycodestyle and fix new linter errors. Here, we are upgrading pycodestyle version from 2.4.0 to 2.5.0. Fixes: #11396.	2019-01-31 12:21:41 -08:00
Raymond Akornor	254bf4c08f	send_email: Add support for passing language into send_future_email. This adds language paramater to send_future_email. As a result, this properly internationalizes invitation reminder emails, by passing correct language into send_future_email. Fixes #11240.	2019-01-09 17:47:58 -08:00
Tim Abbott	02a79b677b	send_email: Extract handle_email_format_changes and use. Apparently, we have a second code path where we might try to call send_email library functions on old data, namely in the queue_processors codebase. So we apply the same migration logic here.	2018-12-04 16:08:18 -08:00
Raymond Akornor	92dc3637df	send_email: Add support for multiple recipients. This adds a function that sends provided email to all administrators of a realm, but in a single email. As a result, send_email now takes arguments to_user_ids and to_emails instead of to_user_id and to_email. We adjust other APIs to match, but note that send_future_email does not yet support the multiple recipients model for good reasons. Tweaked by tabbott to modify `manage.py deliver_email` to handle backwards-compatibily for any ScheduledEmail objects already in the database. Fixes #10896.	2018-12-03 15:12:11 -08:00
Tim Abbott	adf27aae4c	python: Remove now-unnecessary str_utils library. This library was absolutely essential as part of our Python 2->3 migration process, but all of its calls should be either no-ops or encode/decode operations. Note also that the library has been wrong since the incorrect refactoring in `1f9244e060`. Fixes #10807.	2018-11-27 11:57:54 -08:00
Tim Abbott	e06668c7e8	queue_processors: Fix misleading copied comment. This comment was clearly copied from the previous processor.	2018-11-27 11:44:09 -08:00
Tim Abbott	38a6003472	push notifications: Improve logging for missing configuration. While it could make sense to print these logging statements at WARN level on server startup, it doesn't make sense to do so on every message (though it perhaps did make sense to do so before more recent changes added good ways to discover you forgot to configure push notifications). Instead, we now just do a WARN log on queue processor startup, and then at DEBUG level for individual messages. Fixes #10894.	2018-11-27 09:37:57 -08:00
Tim Abbott	48810f43be	queue_processors: Remove unnecessary spammy logging output. This logging statement was incorrectly not removed before merging `5cec566cb9`.	2018-10-31 16:31:35 -07:00
Tim Abbott	5cec566cb9	queue_processors: Rewrite MissedMessageWorker to always wait. Previously, MissedMessageWorker used a batching strategy of just grabbing all the events from the last 2 minutes, and then sending them off as emails. This suffered from the problem that you had a random time, between 0s and 120s, to edit your message before it would be sent out via an email. Additionally, this made the queue had to monitor, because it was expected to pile up large numbers of events, even if everything was fine. We fix this by batching together the events using a timer; the queue processor itself just tracks the items, and then a timer-handler process takes care of ensuring that the emails get sent at least 120s (and at most 130s) after the first triggering message was sent in Zulip. This introduces a new unpleasant bug, namely that when we restart a Zulip server, we can now lose some missed_message email events; further work is required on this point. Fixes #6839.	2018-10-24 14:43:36 -07:00
Tim Abbott	9ed3fe3596	events: Improve logging for batched missed-message email handler.	2018-10-24 11:21:51 -07:00
Steve Howell	69ee84bb14	refactor: Extract build_bot_request(). This fixes a couple things: * process_event() is a pretty vague name * returning tuples should generally be avoided * we were producing the same REST parameters in both subclasses * relative_url_path was always blank * request_kwargs was always empty Now process_event() is called build_bot_request(), and it only returns request data, not a tuple of `rest_operation` and `request_data`. By no longer returning `rest_operation`, there are fewer moving parts. We just have `do_rest_call` make a POST call.	2018-10-11 16:12:07 -07:00
Steve Howell	16eff75e49	refactor: Simplify how we use base_url. Before this change, we instantiated base_url into a superclass of subclasses that returned base_url into a dictionary that gets returned to our caller. Now we just pull base_url out of service when we need to make the REST call.	2018-10-11 16:12:07 -07:00
Tim Abbott	165078b484	queue_processors: Fix bug in handling removed push notifications. Apparently, we were falling through to the "add" case after correctly processing the "remove" case, throwing a 500.	2018-09-20 17:36:54 -07:00
Tim Abbott	da8f4bc0e9	push notifications: Add support for removing GCM push notifications. This uses the recently introduced active_mobile_push_notification flag; messages that have had a mobile push notification sent will have a removal push notification sent as soon as they are marked as read. Note that this feature is behind a setting, SEND_REMOVE_PUSH_NOTIFICATIONS, since the notification format is not supported by the mobile apps yet, and we want to give a grace period before we start sending notifications that appear as (null) to clients. But the tracking logic to maintain the set of message IDs with an active push notification runs unconditionally. This is designed with at-least-once semantics; so mobile clients need to handle the possibility that they receive duplicat requests to remove a push notification. We reuse the existing missedmessage_mobile_notifications queue processor for the work, to avoid materially impacting the latency of marking messages as read. Fixes #7459, though we'll need to open a follow-up issue for using these data on iOS.	2018-08-10 13:58:39 -07:00
Rhea Parekh	cf60b8821d	outgoing webhooks: Warn user that PMs are not supported in Slack-format webhook. Private messages are not supported in Slack-format webhook. Instead of raising a NotImplementedError, we warn the user that PM service is not supported by sending a message to the user. Added tests for the same. Fixes #9239	2018-08-09 17:44:26 -07:00

1 2 3 4 5 ...

265 Commits