docs: Update some notes about Tornado scalability.

This commit is contained in:
Tim Abbott 2019-01-16 11:25:43 -08:00
parent 5050b595a0
commit f9b60b4803
2 changed files with 31 additions and 22 deletions

View File

@ -95,16 +95,20 @@ by the Supervisor configuration (which explains how to start the server
processes; see "Supervisor" below) and the nginx configuration (which processes; see "Supervisor" below) and the nginx configuration (which
explains which HTTP requests get sent to which app server). explains which HTTP requests get sent to which app server).
Tornado is an asynchronous server and is meant specifically to hold open Tornado is an asynchronous server and is meant specifically to hold
tens of thousands of long-lived (long-polling or websocket) connections open tens of thousands of long-lived (long-polling or websocket)
-- that is to say, routes that maintain a persistent connection from connections -- that is to say, routes that maintain a persistent
every running client. For this reason, it's responsible for event connection from every running client. For this reason, it's
(message) delivery, but not much else. We try to avoid any blocking responsible for event (message) delivery, but not much else. We try to
calls in Tornado because we don't want to delay delivery to thousands of avoid any blocking calls in Tornado because we don't want to delay
other connections (as this would make Zulip very much not real-time). delivery to thousands of other connections (as this would make Zulip
For instance, we avoid doing cache or database queries inside the very much not real-time). For instance, we avoid doing cache or
Tornado code paths, since those blocking requests carry a very high database queries inside the Tornado code paths, since those blocking
performance penalty for a single-threaded, asynchronous server. requests carry a very high performance penalty for a single-threaded,
asynchronous server system. (In principle, we could do non-blocking
requests to those services, but the Django-based database libraries we
use in most of our codebase using don't support that, and in any case,
our architecture doesn't require Tornado to do that).
The parts that are activated relatively rarely (e.g. when people type or The parts that are activated relatively rarely (e.g. when people type or
click on something) are processed by the Django application server. One click on something) are processed by the Django application server. One

View File

@ -436,18 +436,23 @@ running Zulip with larger teams (especially >1000 users).
S3 backend for storing user-uploaded files and avatars and will want S3 backend for storing user-uploaded files and avatars and will want
to make sure secrets are available on the hot spare. to make sure secrets are available on the hot spare.
* Zulip does not support dividing traffic for a given Zulip realm * Zulip 2.0 and later supports running multiple Tornado servers
between multiple application servers. There are two issues: you sharded by realm/organization, which is how we scale Zulip Cloud.
need to share the memcached/Redis/RabbitMQ instance (these should
can be moved to a network service shared by multiple servers with a * However, Zulip does not yet support dividing traffic for a single
bit of configuration) and the Tornado event system for pushing to Zulip realm between multiple application servers. There are two
browsers currently has no mechanism for multiple frontend servers issues: you need to share the memcached/Redis/RabbitMQ instance
(or event processes) talking to each other. One can probably get a (these should can be moved to a network service shared by multiple
factor of 10 in a single server's scalability by [supporting servers with a bit of configuration) and the Tornado event system
multiple tornado processes on a single for pushing to browsers currently has no mechanism for multiple
server](https://github.com/zulip/zulip/issues/372), which is also frontend servers (or event processes) talking to each other. One
likely the first part of any project to support exchanging events can probably get a factor of 10 in a single server's scalability by
amongst multiple servers. [supporting multiple tornado processes on a single server](https://github.com/zulip/zulip/issues/372),
which is also likely the first part of any project to support
exchanging events amongst multiple servers. The work for changing
this is pretty far along, though, and thus while not generally
available yet, we can set it up for users with an enterprise support
contract.
Questions, concerns, and bug reports about this area of Zulip are very Questions, concerns, and bug reports about this area of Zulip are very
welcome! This is an area we are hoping to improve. welcome! This is an area we are hoping to improve.