docs: Update some notes about Tornado scalability.

2019-01-16 11:25:43 -08:00 · 2019-01-16 11:25:43 -08:00 · f9b60b4803
parent 5050b595a0
commit f9b60b4803
2 changed files with 31 additions and 22 deletions
--- a/docs/overview/architecture-overview.md
+++ b/docs/overview/architecture-overview.md
@ -95,16 +95,20 @@ by the Supervisor configuration (which explains how to start the server
 processes; see "Supervisor" below) and the nginx configuration (which
 explains which HTTP requests get sent to which app server).

-Tornado is an asynchronous server and is meant specifically to hold open
-tens of thousands of long-lived (long-polling or websocket) connections
-- that is to say, routes that maintain a persistent connection from
-every running client. For this reason, it's responsible for event
-(message) delivery, but not much else. We try to avoid any blocking
-calls in Tornado because we don't want to delay delivery to thousands of
-other connections (as this would make Zulip very much not real-time).
-For instance, we avoid doing cache or database queries inside the
-Tornado code paths, since those blocking requests carry a very high
-performance penalty for a single-threaded, asynchronous server.
+Tornado is an asynchronous server and is meant specifically to hold
+open tens of thousands of long-lived (long-polling or websocket)
+connections -- that is to say, routes that maintain a persistent
+connection from every running client. For this reason, it's
+responsible for event (message) delivery, but not much else. We try to
+avoid any blocking calls in Tornado because we don't want to delay
+delivery to thousands of other connections (as this would make Zulip
+very much not real-time).  For instance, we avoid doing cache or
+database queries inside the Tornado code paths, since those blocking
+requests carry a very high performance penalty for a single-threaded,
+asynchronous server system.  (In principle, we could do non-blocking
+requests to those services, but the Django-based database libraries we
+use in most of our codebase using don't support that, and in any case,
+our architecture doesn't require Tornado to do that).

 The parts that are activated relatively rarely (e.g. when people type or
 click on something) are processed by the Django application server. One
--- a/docs/production/maintain-secure-upgrade.md
+++ b/docs/production/maintain-secure-upgrade.md
@ -436,18 +436,23 @@ running Zulip with larger teams (especially >1000 users).
  S3 backend for storing user-uploaded files and avatars and will want
  to make sure secrets are available on the hot spare.

-* Zulip does not support dividing traffic for a given Zulip realm
-  between multiple application servers.  There are two issues: you
-  need to share the memcached/Redis/RabbitMQ instance (these should
-  can be moved to a network service shared by multiple servers with a
-  bit of configuration) and the Tornado event system for pushing to
-  browsers currently has no mechanism for multiple frontend servers
-  (or event processes) talking to each other.  One can probably get a
-  factor of 10 in a single server's scalability by [supporting
-  multiple tornado processes on a single
-  server](https://github.com/zulip/zulip/issues/372), which is also
-  likely the first part of any project to support exchanging events
-  amongst multiple servers.
+* Zulip 2.0 and later supports running multiple Tornado servers
+  sharded by realm/organization, which is how we scale Zulip Cloud.
+
+* However, Zulip does not yet support dividing traffic for a single
+  Zulip realm between multiple application servers.  There are two
+  issues: you need to share the memcached/Redis/RabbitMQ instance
+  (these should can be moved to a network service shared by multiple
+  servers with a bit of configuration) and the Tornado event system
+  for pushing to browsers currently has no mechanism for multiple
+  frontend servers (or event processes) talking to each other.  One
+  can probably get a factor of 10 in a single server's scalability by
+  [supporting multiple tornado processes on a single server](https://github.com/zulip/zulip/issues/372),
+  which is also likely the first part of any project to support
+  exchanging events amongst multiple servers.  The work for changing
+  this is pretty far along, though, and thus while not generally
+  available yet, we can set it up for users with an enterprise support
+  contract.

 Questions, concerns, and bug reports about this area of Zulip are very
 welcome!  This is an area we are hoping to improve.