diff --git a/docs/architecture-overview.md b/docs/architecture-overview.md index e53f1bfc58..73495cefe1 100644 --- a/docs/architecture-overview.md +++ b/docs/architecture-overview.md @@ -87,10 +87,12 @@ Components ![architecture-simple](images/architecture_simple.png) -### Tornado and Django +### Django and Tornado -We use both the [Tornado](http://www.tornadoweb.org) and -[Django](https://www.djangoproject.com/) Python web frameworks. +Zulip is primarily implemented in the +[Django](https://www.djangoproject.com/) Python web framework. We +also make use of [Tornado](http://www.tornadoweb.org) for the +real-time push system. Django is the main web application server; Tornado runs the server-to-client real-time push system. The app servers are configured @@ -114,6 +116,10 @@ click on something) are processed by the Django application server. One exception to this is that Zulip uses websockets through Tornado to minimize latency on the code path for **sending** messages. +There is detailed documentation on the +[real-time push and event queue system](events-system.html); most of +the code is in `zerver/tornado`. + ### nginx nginx is the front-end web server to all Zulip traffic; it serves static @@ -198,12 +204,10 @@ and the Tornado push system. Two simple wrappers around `pika` (the Python RabbitMQ client) are in `zulip/zerver/lib/queue.py`. There's an asynchronous client for use in -Tornado and a more general client for use elsewhere. - -`zerver/tornado/event_queue.py` has helper functions for putting -events into one queue or another. Most of the processes started by -Supervisor are queue processors that continually pull things out of a -RabbitMQ queue and handle them. +Tornado and a more general client for use elsewhere. Most of the +processes started by Supervisor are queue processors that continually +pull things out of a RabbitMQ queue and handle them; they are defined +in `zerver/worker/queue_processors.py`. Also see [the queuing guide](queuing.html). diff --git a/docs/events-system.md b/docs/events-system.md new file mode 100644 index 0000000000..919206f879 --- /dev/null +++ b/docs/events-system.md @@ -0,0 +1,245 @@ +# Real-time Push and Events + +Zulip's "events system" is the server-to-client push system that +powers our real-time sync. This document explains how it works; to +read an example of how a complete feature using this system works, +check out the +[new application feature tutorial](new-feature-tutorial.html). + +Any single-page web application like Zulip needs a story for how +changes made by one client are synced to other clients, though having +a good architecture for this is particularly important for a chat tool +like Zulip, since the state is constantly changing. When we talk +about clients, think a browser tab, mobile app, or API bot that needs +to receive updates to the Zulip data. The simplest example is a new +message being sent by one client; other clients must be notified in +order to display the message. But a complete application like Zulip +has dozens of different types of data that need to be synced to other +clients, whether it be new streams, changes in a user's name or +avatar, settings changes, etc. In Zulip, we call these updates that +need to be sent to other clients **events**. + +An important thing to understand when designing such a system is that +events need to be synced to every client that has a copy of the old +data if one wants to avoid clients displaying inaccurate data to +users. So if a user has two browser windows open and sends a message, +every client controlled by that user as well as any recipients of the +message, including both of those two browser windows, will receive +that event. (Technically, we don't need to send events to the client +that triggered the change, but this approach saves a bunch of +unnecessary duplicate UI update code, since the client making the +change can just use the same code as every other client, maybe plus a +little notification that the operation succeeded). + +Architecturally, there are a few things needed to make a successful +real-time sync system work: + +* **Generation**. Generating events when changes happen to data, and + determining which users should receive each event. +* **Delivery**. Efficiently delivering those events to interested + clients, ideally in an exactly-once fashion. +* **UI updates**. Updating the UI in the client once it has received + events from the server. + +Reactive JavaScript libraries like React and Vue can help simplify the +last piece, but there aren't good standard systems for doing +generation and delivery, so we have to build them ourselves. + +This document discusses how Zulip solves the generation and delivery +problems in a scalable, correct, and predictable way. + +## Generation system + +Zulip's generation system is built around a Python function, +`send_event(event, users)`. It accepts an event data structure (just +a Python dictionary with some keys and value; `type` is always one of +the keys but the rest depends on the specific event) and a list of +user IDs for the users whose clients should receive the event. In +special cases such as message delivery, the list of users will instead +be a list of dicts mapping user IDs to user-specific data like whether +that user was mentioned in that message. The data passed to +`send_event` are simply marshalled as JSON and placed in the +`notify_tornado` RabbitMQ queue to be consumed by the delivery system. + +Usually, this list of users is one of 3 things: + +* A single user (e.g. for user-level settings changes). +* Everyone in the realm (e.g. for organization-level settings changes, + like new realm emoji). +* Everyone who would receive a given message (for messages, emoji + reactions, message editing, etc.); i.e. the subscribers to a stream + or the people on a private message thread. + +It is the responsibility of the caller of `send_event` to choose the +list of user IDs correctly. There can be security problems if e.g. an +event containing private message content is sent to the entire +organization. However, if an event isn't sent to enough clients, +there will likely be user-visible real-time sync bugs. + +Most of the hard work in event generation is about defining consistent +event dictionaries that are clear, readable, will be useful to the +wide range of possible clients, and make it easy for developers. + +## Delivery system + +Zulip's event delivery (real-time push) system is based on Tornado, +which is ideal for handling a large number of open requests. Details +on Tornado are available in the +[architecture overview](architecture-overview.html), but in short it +is good at holding open a large number of connections for a long time. +The complete system is about 1500 lines of code in `zerver/tornado/`, +primarily `zerver/tornado/event_queue.py`. + +Zulip's event delivery system is based on "long-polling"; basically +clients make `GET /json/events` calls to the server, and the server +doesn't respond to the request until it has an event to deliver to the +client. This approach is reasonably efficient and works everywhere +(unlike websockets, which have a decreasing but nonzero level of +client compatibility problems). + +For each connected client, the **Event Queue Server** maintain an +**event queue**, which contains any events that are to be delivered to +that client which have not yet been acknowledged by that client. +Ignoring the subtle details around error handling, the protocol is +pretty simple; when a client does a `GET /json/events` call, the +server checks if there are any events in the queue. If there are, it +returns the events immediately. If there aren't, it records that +queue as having a waiting client (often called a `handler` in the +code). + +When it pulls an event off the `notify_tornado` RabbitMQ queue, it +simply delivers the event to each queue associated with one of the +target users. If the queue has a waiting client, it breaks the +long-poll connection by returning an HTTP response to the waiting +client request. If there is no waiting client, it simply pushes the +event onto the queue. + +When starting up, each client makes a `POST /json/register` to the +server, which creates a new event queue for that client and return the +`queue_id` as well as an initial `last_event_id` to the client (it can +also, optionally, fetch the initial data to save an RTT and avoid +races; see the below section on initial data fetches for details on +why this is useful). Once the event queue is registered, the client +can just do an infinite loop calling `GET /json/events` with those +parameters, updating `last_event_id` each time to acknowledge any +events it has received (see `call_on_each_event` in the +[Zulip Python API bindings][api-bindings-code] for a complete example +implementation). When handling each `GET /json/events` request, the +queue server can safely delete any events have have an event ID less +than or equal to the client's `last_event_id` (event IDs are just a +counter for the events a given queue has received.) + +If network failures were impossible, the `last_event_id` parameter in +the protocol would not be required, but it is important for enabling +exactly-once delivery in the presence of potential failures. (Without +it, the queue server would have to delete events from the queue as +soon as it attempted to send them to the client; if that specific HTTP +response didn't reach the client due to a network TCP failure, then +those events could be lost). + +[api-bindings-code]: https://github.com/zulip/zulip/blob/master/api/zulip/__init__.py + +The queue servers are a very high-traffic system, processing at a +minimum a request for every message delivered to every Zulip client. +Additionally, as a workaround for low-quality NAT servers that kill +HTTP connections that are open without activity for more than 60s, the +queue servers also send a heartbeat event to each queue at least once +every 45s or so (if no other events have arrived in the meantime). + +To avoid a large memory and other resource leak, the queues are +garbage collected after (by default) 10 minutes of inactivity from a +client, under the theory that the client has likely gone off the +Internet (or no longer exists) access; this happens constantly. If +the client returns, it will receive a "queue not found" error when +requesting events; it's handler for this case should just restart the +client / reload the browser so that it refetches initial data the same +way it would on startup. Since clients have to implement their +startup process anyway, this approach adds minimal technical +complexity to clients. A nice side effect is that if the Event Queue +Server server (which stores queues in memory) were to crash and lose +its data, clients would recover, just as if they had lost Internet +access briefly (there is some DoS risk to manage, though). + +(The Event Queue Server is designed to save any event queues to disk +and reload them when the server is restarted, and catches exceptions +carefully, so such incidents are very rare, but it's nice to have a +design that handles them without leaving broken out-of-date clients +anyway). + +## The initial data fetch + +When a client starts up, it usually wants to get 2 things from the +server: + +* The "current state" of various pieces of data, e.g. the current + settings, set of users in the organization (for typeahead), stream, + messages, etc. (aka the "initial state"). +* A subscription to receive updates to those data when they are + changed by a client (aka an event queue). + +Ideally, one would get those two things atomically, i.e. if some other +user changes their name, either the name change happens before the +fetch (and thus the old name is in the initial state and there will be +an event in the queue for the name change) or after (the new name is +in the initial state, and there is no event for that name change in +the queue). + +Achieving this atomicity goals means we save a huge amount of work +that the N clients for Zulip don't need to worry about a wide range of +potential rare and hard to reproduce race conditions; we just have to +implement things correctly once in the Zulip server. + +This is quite challenging to do technically, because fetching the +initial state for a complex web application like Zulip might involve +dozens of queries to the database, caches, etc. over the course of +100ms or more, and it is thus nearly impossible to do all of those +things together atomically. So instead, we use a more complicated +algorithm that can produce the atomic result from non-atomic +subroutines. Here's how it works when you make a `register` API +request. The request is directly handled by Django: + +* Django makes an HTTP request to Tornado, requesting that a new event + queue be created, and records its queue ID. +* Django does all the various database/cache/etc. queries to fetch the + data, non-atomically, from the various data sources (see + the `fetch_initial_state_data` function). +* Django makes a second HTTP request to Tornado, requesting any events + that had been added to the Tornado event queue since it + was created. +* Finally, Django "applies" the events (see the `apply_events` + function) to the initial state that it fetched. E.g. for a name + change event, it finds the user data in the `realm_user` data + struture, and updates it to have the new name. + +This achieves everything we desire, at the cost that we need to write +the `apply_events` function. This is a difficult function to +implement correctly, because the situations that it tests for almost +never happen (being race conditions). So we have a special test +class, `EventsRegisterTest`, that is specifically designed to test +this function by ensuring the possible race condition always happens. +In particular, it does the following: + +* Call `fetch_initial_state_data` to get the current state. +* Call a state change function that issues an event, +e.g. `do_change_full_name`, and capture any events that are generated. +* Call `apply_events(state, events)`, and compare the result to + calling `fetch_initial_state_data` again now. + +The `apply_events` code is correct if those two results are identical. + +The final detail we need to ensure that `apply_events` always works +correctly is to make sure that we have `EventsRegisterTest` tests for +every event type that can be generated by Zulip. This can be tested +manually using `test-backend --coverage EventsRegisterTest` and then +checking that all the calls to `send_event` are covered. Someday +we'll add automation that verifies this directly by inspecting the +coverage data. + +### Messages + +One exception to the protocol described in the last section is the +actual messages. Because Zulip clients usually fetch them in a +separate AJAX call after the rest of the site is loaded, we don't need +them to be included in the initial state data. To handle those +correctly, clients are responsible for discarding events related to +messages that the client has not yet fetched. diff --git a/docs/index.rst b/docs/index.rst index e7e74bb1d3..f0dfed229d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -115,6 +115,7 @@ Contents: :caption: Subsystem documentation settings + events-system queuing bots-guide custom-apps diff --git a/docs/life-of-a-request.md b/docs/life-of-a-request.md index 5a0d0107ab..50cfd1e3ec 100644 --- a/docs/life-of-a-request.md +++ b/docs/life-of-a-request.md @@ -40,9 +40,9 @@ location /static/ { } ``` -## Nginx routes other requests [between tornado and django][tornado-django] +## Nginx routes other requests [between django and tornado][tornado-django] -[tornado-django]: architecture-overview.html?highlight=tornado#tornado-and-django +[tornado-django]: architecture-overview.html?highlight=tornado#django-and-tornado All our connected clients hold open long-polling connections so that they can receive events (messages, presence notifications, and so on) in diff --git a/docs/new-feature-tutorial.md b/docs/new-feature-tutorial.md index 2dff51418b..2de5fa71c9 100644 --- a/docs/new-feature-tutorial.md +++ b/docs/new-feature-tutorial.md @@ -6,6 +6,11 @@ as an example of the specific steps needed to add a new feature: adding a new option to the application that is dynamically synced through the data system in real-time to all browsers the user may have open. +As you read this, you may find you need to learn about Zulip's +real-time push system; the +[real-time push and events](events-system.html) documentation has a +detailed explanation of how everything works. + ## General Process in brief ### Adding a field to the database