docs: Update some notes about Tornado scalability.

This commit is contained in:
Tim Abbott 2019-01-16 11:25:43 -08:00
parent 5050b595a0
commit f9b60b4803
2 changed files with 31 additions and 22 deletions

View File

@ -95,16 +95,20 @@ by the Supervisor configuration (which explains how to start the server
processes; see "Supervisor" below) and the nginx configuration (which
explains which HTTP requests get sent to which app server).
Tornado is an asynchronous server and is meant specifically to hold open
tens of thousands of long-lived (long-polling or websocket) connections
-- that is to say, routes that maintain a persistent connection from
every running client. For this reason, it's responsible for event
(message) delivery, but not much else. We try to avoid any blocking
calls in Tornado because we don't want to delay delivery to thousands of
other connections (as this would make Zulip very much not real-time).
For instance, we avoid doing cache or database queries inside the
Tornado code paths, since those blocking requests carry a very high
performance penalty for a single-threaded, asynchronous server.
Tornado is an asynchronous server and is meant specifically to hold
open tens of thousands of long-lived (long-polling or websocket)
connections -- that is to say, routes that maintain a persistent
connection from every running client. For this reason, it's
responsible for event (message) delivery, but not much else. We try to
avoid any blocking calls in Tornado because we don't want to delay
delivery to thousands of other connections (as this would make Zulip
very much not real-time). For instance, we avoid doing cache or
database queries inside the Tornado code paths, since those blocking
requests carry a very high performance penalty for a single-threaded,
asynchronous server system. (In principle, we could do non-blocking
requests to those services, but the Django-based database libraries we
use in most of our codebase using don't support that, and in any case,
our architecture doesn't require Tornado to do that).
The parts that are activated relatively rarely (e.g. when people type or
click on something) are processed by the Django application server. One

View File

@ -436,18 +436,23 @@ running Zulip with larger teams (especially >1000 users).
S3 backend for storing user-uploaded files and avatars and will want
to make sure secrets are available on the hot spare.
* Zulip does not support dividing traffic for a given Zulip realm
between multiple application servers. There are two issues: you
need to share the memcached/Redis/RabbitMQ instance (these should
can be moved to a network service shared by multiple servers with a
bit of configuration) and the Tornado event system for pushing to
browsers currently has no mechanism for multiple frontend servers
(or event processes) talking to each other. One can probably get a
factor of 10 in a single server's scalability by [supporting
multiple tornado processes on a single
server](https://github.com/zulip/zulip/issues/372), which is also
likely the first part of any project to support exchanging events
amongst multiple servers.
* Zulip 2.0 and later supports running multiple Tornado servers
sharded by realm/organization, which is how we scale Zulip Cloud.
* However, Zulip does not yet support dividing traffic for a single
Zulip realm between multiple application servers. There are two
issues: you need to share the memcached/Redis/RabbitMQ instance
(these should can be moved to a network service shared by multiple
servers with a bit of configuration) and the Tornado event system
for pushing to browsers currently has no mechanism for multiple
frontend servers (or event processes) talking to each other. One
can probably get a factor of 10 in a single server's scalability by
[supporting multiple tornado processes on a single server](https://github.com/zulip/zulip/issues/372),
which is also likely the first part of any project to support
exchanging events amongst multiple servers. The work for changing
this is pretty far along, though, and thus while not generally
available yet, we can set it up for users with an enterprise support
contract.
Questions, concerns, and bug reports about this area of Zulip are very
welcome! This is an area we are hoping to improve.