docs: Add a long document explaining the events system.

This is probably one of Zulip's most important systems, and thus worth documenting carefully.
2017-02-10 01:09:31 -08:00 · 2017-02-10 01:09:31 -08:00 · 92219aa3dd
parent 092bdc7645
commit 92219aa3dd
5 changed files with 266 additions and 11 deletions
--- a/docs/architecture-overview.md
+++ b/docs/architecture-overview.md
@ -87,10 +87,12 @@ Components

  ![architecture-simple](images/architecture_simple.png)

-### Tornado and Django
+### Django and Tornado

-We use both the [Tornado](http://www.tornadoweb.org) and
-[Django](https://www.djangoproject.com/) Python web frameworks.
+Zulip is primarily implemented in the
+[Django](https://www.djangoproject.com/) Python web framework.  We
+also make use of [Tornado](http://www.tornadoweb.org) for the
+real-time push system.

 Django is the main web application server; Tornado runs the
 server-to-client real-time push system. The app servers are configured
@ -114,6 +116,10 @@ click on something) are processed by the Django application server. One
 exception to this is that Zulip uses websockets through Tornado to
 minimize latency on the code path for **sending** messages.

+There is detailed documentation on the
+[real-time push and event queue system](events-system.html); most of
+the code is in `zerver/tornado`.
+
 ### nginx

 nginx is the front-end web server to all Zulip traffic; it serves static
@ -198,12 +204,10 @@ and the Tornado push system.

 Two simple wrappers around `pika` (the Python RabbitMQ client) are in
 `zulip/zerver/lib/queue.py`. There's an asynchronous client for use in
-Tornado and a more general client for use elsewhere.
-
-`zerver/tornado/event_queue.py` has helper functions for putting
-events into one queue or another. Most of the processes started by
-Supervisor are queue processors that continually pull things out of a
-RabbitMQ queue and handle them.
+Tornado and a more general client for use elsewhere.  Most of the
+processes started by Supervisor are queue processors that continually
+pull things out of a RabbitMQ queue and handle them; they are defined
+in `zerver/worker/queue_processors.py`.

 Also see [the queuing guide](queuing.html).

--- a/docs/events-system.md
+++ b/docs/events-system.md
@ -0,0 +1,245 @@
+# Real-time Push and Events
+
+Zulip's "events system" is the server-to-client push system that
+powers our real-time sync.  This document explains how it works; to
+read an example of how a complete feature using this system works,
+check out the
+[new application feature tutorial](new-feature-tutorial.html).
+
+Any single-page web application like Zulip needs a story for how
+changes made by one client are synced to other clients, though having
+a good architecture for this is particularly important for a chat tool
+like Zulip, since the state is constantly changing.  When we talk
+about clients, think a browser tab, mobile app, or API bot that needs
+to receive updates to the Zulip data.  The simplest example is a new
+message being sent by one client; other clients must be notified in
+order to display the message.  But a complete application like Zulip
+has dozens of different types of data that need to be synced to other
+clients, whether it be new streams, changes in a user's name or
+avatar, settings changes, etc.  In Zulip, we call these updates that
+need to be sent to other clients **events**.
+
+An important thing to understand when designing such a system is that
+events need to be synced to every client that has a copy of the old
+data if one wants to avoid clients displaying inaccurate data to
+users.  So if a user has two browser windows open and sends a message,
+every client controlled by that user as well as any recipients of the
+message, including both of those two browser windows, will receive
+that event.  (Technically, we don't need to send events to the client
+that triggered the change, but this approach saves a bunch of
+unnecessary duplicate UI update code, since the client making the
+change can just use the same code as every other client, maybe plus a
+little notification that the operation succeeded).
+
+Architecturally, there are a few things needed to make a successful
+real-time sync system work:
+
+* **Generation**.  Generating events when changes happen to data, and
+  determining which users should receive each event.
+* **Delivery**.  Efficiently delivering those events to interested
+  clients, ideally in an exactly-once fashion.
+* **UI updates**.  Updating the UI in the client once it has received
+  events from the server.
+
+Reactive JavaScript libraries like React and Vue can help simplify the
+last piece, but there aren't good standard systems for doing
+generation and delivery, so we have to build them ourselves.
+
+This document discusses how Zulip solves the generation and delivery
+problems in a scalable, correct, and predictable way.
+
+## Generation system
+
+Zulip's generation system is built around a Python function,
+`send_event(event, users)`.  It accepts an event data structure (just
+a Python dictionary with some keys and value; `type` is always one of
+the keys but the rest depends on the specific event) and a list of
+user IDs for the users whose clients should receive the event.  In
+special cases such as message delivery, the list of users will instead
+be a list of dicts mapping user IDs to user-specific data like whether
+that user was mentioned in that message.  The data passed to
+`send_event` are simply marshalled as JSON and placed in the
+`notify_tornado` RabbitMQ queue to be consumed by the delivery system.
+
+Usually, this list of users is one of 3 things:
+
+* A single user (e.g. for user-level settings changes).
+* Everyone in the realm (e.g. for organization-level settings changes,
+  like new realm emoji).
+* Everyone who would receive a given message (for messages, emoji
+  reactions, message editing, etc.); i.e. the subscribers to a stream
+  or the people on a private message thread.
+
+It is the responsibility of the caller of `send_event` to choose the
+list of user IDs correctly.  There can be security problems if e.g. an
+event containing private message content is sent to the entire
+organization.  However, if an event isn't sent to enough clients,
+there will likely be user-visible real-time sync bugs.
+
+Most of the hard work in event generation is about defining consistent
+event dictionaries that are clear, readable, will be useful to the
+wide range of possible clients, and make it easy for developers.
+
+## Delivery system
+
+Zulip's event delivery (real-time push) system is based on Tornado,
+which is ideal for handling a large number of open requests.  Details
+on Tornado are available in the
+[architecture overview](architecture-overview.html), but in short it
+is good at holding open a large number of connections for a long time.
+The complete system is about 1500 lines of code in `zerver/tornado/`,
+primarily `zerver/tornado/event_queue.py`.
+
+Zulip's event delivery system is based on "long-polling"; basically
+clients make `GET /json/events` calls to the server, and the server
+doesn't respond to the request until it has an event to deliver to the
+client.  This approach is reasonably efficient and works everywhere
+(unlike websockets, which have a decreasing but nonzero level of
+client compatibility problems).
+
+For each connected client, the **Event Queue Server** maintain an
+**event queue**, which contains any events that are to be delivered to
+that client which have not yet been acknowledged by that client.
+Ignoring the subtle details around error handling, the protocol is
+pretty simple; when a client does a `GET /json/events` call, the
+server checks if there are any events in the queue.  If there are, it
+returns the events immediately.  If there aren't, it records that
+queue as having a waiting client (often called a `handler` in the
+code).
+
+When it pulls an event off the `notify_tornado` RabbitMQ queue, it
+simply delivers the event to each queue associated with one of the
+target users.  If the queue has a waiting client, it breaks the
+long-poll connection by returning an HTTP response to the waiting
+client request.  If there is no waiting client, it simply pushes the
+event onto the queue.
+
+When starting up, each client makes a `POST /json/register` to the
+server, which creates a new event queue for that client and return the
+`queue_id` as well as an initial `last_event_id` to the client (it can
+also, optionally, fetch the initial data to save an RTT and avoid
+races; see the below section on initial data fetches for details on
+why this is useful).  Once the event queue is registered, the client
+can just do an infinite loop calling `GET /json/events` with those
+parameters, updating `last_event_id` each time to acknowledge any
+events it has received (see `call_on_each_event` in the
+[Zulip Python API bindings][api-bindings-code] for a complete example
+implementation).  When handling each `GET /json/events` request, the
+queue server can safely delete any events have have an event ID less
+than or equal to the client's `last_event_id` (event IDs are just a
+counter for the events a given queue has received.)
+
+If network failures were impossible, the `last_event_id` parameter in
+the protocol would not be required, but it is important for enabling
+exactly-once delivery in the presence of potential failures.  (Without
+it, the queue server would have to delete events from the queue as
+soon as it attempted to send them to the client; if that specific HTTP
+response didn't reach the client due to a network TCP failure, then
+those events could be lost).
+
+[api-bindings-code]: https://github.com/zulip/zulip/blob/master/api/zulip/__init__.py
+
+The queue servers are a very high-traffic system, processing at a
+minimum a request for every message delivered to every Zulip client.
+Additionally, as a workaround for low-quality NAT servers that kill
+HTTP connections that are open without activity for more than 60s, the
+queue servers also send a heartbeat event to each queue at least once
+every 45s or so (if no other events have arrived in the meantime).
+
+To avoid a large memory and other resource leak, the queues are
+garbage collected after (by default) 10 minutes of inactivity from a
+client, under the theory that the client has likely gone off the
+Internet (or no longer exists) access; this happens constantly.  If
+the client returns, it will receive a "queue not found" error when
+requesting events; it's handler for this case should just restart the
+client / reload the browser so that it refetches initial data the same
+way it would on startup.  Since clients have to implement their
+startup process anyway, this approach adds minimal technical
+complexity to clients.  A nice side effect is that if the Event Queue
+Server server (which stores queues in memory) were to crash and lose
+its data, clients would recover, just as if they had lost Internet
+access briefly (there is some DoS risk to manage, though).
+
+(The Event Queue Server is designed to save any event queues to disk
+and reload them when the server is restarted, and catches exceptions
+carefully, so such incidents are very rare, but it's nice to have a
+design that handles them without leaving broken out-of-date clients
+anyway).
+
+## The initial data fetch
+
+When a client starts up, it usually wants to get 2 things from the
+server:
+
+* The "current state" of various pieces of data, e.g. the current
+  settings, set of users in the organization (for typeahead), stream,
+  messages, etc. (aka the "initial state").
+* A subscription to receive updates to those data when they are
+  changed by a client (aka an event queue).
+
+Ideally, one would get those two things atomically, i.e. if some other
+user changes their name, either the name change happens before the
+fetch (and thus the old name is in the initial state and there will be
+an event in the queue for the name change) or after (the new name is
+in the initial state, and there is no event for that name change in
+the queue).
+
+Achieving this atomicity goals means we save a huge amount of work
+that the N clients for Zulip don't need to worry about a wide range of
+potential rare and hard to reproduce race conditions; we just have to
+implement things correctly once in the Zulip server.
+
+This is quite challenging to do technically, because fetching the
+initial state for a complex web application like Zulip might involve
+dozens of queries to the database, caches, etc. over the course of
+100ms or more, and it is thus nearly impossible to do all of those
+things together atomically.  So instead, we use a more complicated
+algorithm that can produce the atomic result from non-atomic
+subroutines.  Here's how it works when you make a `register` API
+request.  The request is directly handled by Django:
+
+* Django makes an HTTP request to Tornado, requesting that a new event
+  queue be created, and records its queue ID.
+* Django does all the various database/cache/etc. queries to fetch the
+  data, non-atomically, from the various data sources (see
+  the `fetch_initial_state_data` function).
+* Django makes a second HTTP request to Tornado, requesting any events
+  that had been added to the Tornado event queue since it
+  was created.
+* Finally, Django "applies" the events (see the `apply_events`
+  function) to the initial state that it fetched.  E.g. for a name
+  change event, it finds the user data in the `realm_user` data
+  struture, and updates it to have the new name.
+
+This achieves everything we desire, at the cost that we need to write
+the `apply_events` function.  This is a difficult function to
+implement correctly, because the situations that it tests for almost
+never happen (being race conditions).  So we have a special test
+class, `EventsRegisterTest`, that is specifically designed to test
+this function by ensuring the possible race condition always happens.
+In particular, it does the following:
+
+* Call `fetch_initial_state_data` to get the current state.
+* Call a state change function that issues an event,
+e.g. `do_change_full_name`, and capture any events that are generated.
+* Call `apply_events(state, events)`, and compare the result to
+  calling `fetch_initial_state_data` again now.
+
+The `apply_events` code is correct if those two results are identical.
+
+The final detail we need to ensure that `apply_events` always works
+correctly is to make sure that we have `EventsRegisterTest` tests for
+every event type that can be generated by Zulip.  This can be tested
+manually using `test-backend --coverage EventsRegisterTest` and then
+checking that all the calls to `send_event` are covered.  Someday
+we'll add automation that verifies this directly by inspecting the
+coverage data.
+
+### Messages
+
+One exception to the protocol described in the last section is the
+actual messages.  Because Zulip clients usually fetch them in a
+separate AJAX call after the rest of the site is loaded, we don't need
+them to be included in the initial state data.  To handle those
+correctly, clients are responsible for discarding events related to
+messages that the client has not yet fetched.
--- a/docs/index.rst
+++ b/docs/index.rst
@ -115,6 +115,7 @@ Contents:
   :caption: Subsystem documentation

   settings
+   events-system
   queuing
   bots-guide
   custom-apps
--- a/docs/life-of-a-request.md
+++ b/docs/life-of-a-request.md
@ -40,9 +40,9 @@ location /static/ {
 }
 ```

-## Nginx routes other requests [between tornado and django][tornado-django]
+## Nginx routes other requests [between django and tornado][tornado-django]

-[tornado-django]: architecture-overview.html?highlight=tornado#tornado-and-django
+[tornado-django]: architecture-overview.html?highlight=tornado#django-and-tornado

 All our connected clients hold open long-polling connections so that
 they can receive events (messages, presence notifications, and so on) in
--- a/docs/new-feature-tutorial.md
+++ b/docs/new-feature-tutorial.md
@ -6,6 +6,11 @@ as an example of the specific steps needed to add a new feature: adding
 a new option to the application that is dynamically synced through the
 data system in real-time to all browsers the user may have open.

+As you read this, you may find you need to learn about Zulip's
+real-time push system; the
+[real-time push and events](events-system.html) documentation has a
+detailed explanation of how everything works.
+
 ## General Process in brief

 ### Adding a field to the database