2016-06-04 02:39:06 +02:00
|
|
|
Zulip architectural overview
|
|
|
|
============================
|
|
|
|
|
|
|
|
Key Codebases
|
|
|
|
-------------
|
|
|
|
|
|
|
|
The core Zulip application is at
|
|
|
|
[<https://github.com/zulip/zulip>](https://github.com/zulip/zulip) and
|
2018-01-12 07:20:42 +01:00
|
|
|
is a web application written in Python 3.x and using the Django framework. That
|
|
|
|
codebase includes server-side code and the web client, as well as Python API
|
|
|
|
bindings and most of our integrations with other services and applications (see
|
2019-04-06 02:58:44 +02:00
|
|
|
[the directory structure guide](../overview/directory-structure.html)).
|
2016-06-04 02:39:06 +02:00
|
|
|
|
2017-10-20 18:11:56 +02:00
|
|
|
[Zulip Mobile](https://github.com/zulip/zulip-mobile) is the official
|
|
|
|
mobile Zulip client supporting both iOS and Android, written in
|
|
|
|
JavaScript with React Native, and
|
|
|
|
[Zulip Desktop](https://github.com/zulip/zulip-electron) is the
|
|
|
|
official Zulip desktop client for macOS, Linux, and Windows.
|
|
|
|
|
|
|
|
We also maintain several separate repositories for integrations and
|
|
|
|
other glue code: a
|
|
|
|
[Hubot adapter](https://github.com/zulip/hubot-zulip); integrations
|
|
|
|
with [Phabricator](https://github.com/zulip/phabricator-to-zulip),
|
2016-06-04 02:39:06 +02:00
|
|
|
[Jenkins](https://github.com/zulip/zulip-jenkins-plugin),
|
|
|
|
[Puppet](https://github.com/matthewbarr/puppet-zulip),
|
|
|
|
[Redmine](https://github.com/zulip/zulip-redmine-plugin), and
|
2017-10-20 18:11:56 +02:00
|
|
|
[Trello](https://github.com/zulip/trello-to-zulip);
|
|
|
|
[node.js API bindings](https://github.com/zulip/zulip-node); and our
|
|
|
|
[full-text search PostgreSQL extension](https://github.com/zulip/tsearch_extras).
|
2016-06-04 02:39:06 +02:00
|
|
|
|
|
|
|
We use [Transifex](https://www.transifex.com/zulip/zulip/) to do
|
|
|
|
translations.
|
|
|
|
|
2017-01-15 05:13:22 +01:00
|
|
|
In this overview, we'll mainly discuss the core Zulip server and web
|
2016-06-04 02:39:06 +02:00
|
|
|
application.
|
|
|
|
|
|
|
|
Usage assumptions and concepts
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
Zulip is a real-time web-based chat application meant for companies and
|
|
|
|
similar groups ranging in size from a small team to more than a thousand
|
|
|
|
users. It features real-time notifications, message persistence and
|
2018-06-08 22:05:07 +02:00
|
|
|
search, public group conversations (*streams*), private streams,
|
2016-06-04 02:39:06 +02:00
|
|
|
private one-on-one and group conversations, inline image previews, team
|
2017-01-15 05:13:22 +01:00
|
|
|
presence/buddy lists, a rich API, Markdown message support, and numerous
|
2016-06-04 02:39:06 +02:00
|
|
|
integrations with other services. The maintainer team aims to support
|
|
|
|
users who connect to Zulip using dedicated iOS, Android, Linux, Windows,
|
2017-06-06 05:44:59 +02:00
|
|
|
and macOS clients, as well as people using modern web browsers or
|
2016-06-04 02:39:06 +02:00
|
|
|
dedicated Zulip API clients.
|
|
|
|
|
|
|
|
A server can host multiple Zulip *realms* (organizations) at the same
|
2016-07-12 22:52:29 +02:00
|
|
|
domain, each of which is a private chamber with its own users,
|
|
|
|
streams, customizations, and so on. This means that one person might
|
|
|
|
be a user of multiple Zulip realms. The administrators of a realm can
|
|
|
|
choose whether to allow anyone to register an account and join, or
|
|
|
|
only allow people who have been invited, or restrict registrations to
|
|
|
|
members of particular groups (using email domain names or corporate
|
2017-01-18 02:43:17 +01:00
|
|
|
single-sign-on login for verification). For more on security
|
2019-04-06 02:58:44 +02:00
|
|
|
considerations, see [the security model section](../production/security-model.html).
|
2016-06-04 02:39:06 +02:00
|
|
|
|
2017-08-08 07:37:39 +02:00
|
|
|
The Zulip "All messages" screen is like a chronologically ordered inbox;
|
2016-06-04 02:39:06 +02:00
|
|
|
it displays messages, starting at the oldest message that the user
|
2016-08-11 22:14:22 +02:00
|
|
|
hasn't viewed yet (for more on that logic, see [the guide to the
|
2019-04-06 02:58:44 +02:00
|
|
|
pointer and unread counts](../subsystems/pointer.html)). The "All messages" screen displays
|
2016-08-11 22:14:22 +02:00
|
|
|
the most recent messages in all the streams a user has joined (except
|
|
|
|
for the streams they've muted), as well as private messages from other
|
|
|
|
users, in strict chronological order. A user can *narrow* to view only
|
|
|
|
the messages in a single stream, and can further narrow to focus on a
|
2016-08-15 00:07:53 +02:00
|
|
|
*topic* (thread) within that stream. Each narrow has its own URL. The
|
|
|
|
user can quickly see what conversation they're in -- the stream and
|
2017-01-15 05:13:22 +01:00
|
|
|
topic, or the names of the user(s) they're private messaging with
|
2016-08-15 00:07:53 +02:00
|
|
|
-- using *the recipient bar* displayed atop each conversation.
|
2016-06-04 02:39:06 +02:00
|
|
|
|
|
|
|
Zulip's philosophy is to provide sensible defaults but give the user
|
|
|
|
fine-grained control over their incoming information flow; a user can
|
|
|
|
mute topics and streams, and can make fine-grained choices to reduce
|
|
|
|
real-time notifications they find irrelevant.
|
|
|
|
|
2016-08-15 00:07:53 +02:00
|
|
|
|
2016-06-04 02:39:06 +02:00
|
|
|
Components
|
|
|
|
----------
|
|
|
|
|
2017-11-08 17:55:36 +01:00
|
|
|
![architecture-simple](../images/architecture_simple.png)
|
2017-01-09 22:29:15 +01:00
|
|
|
|
2017-02-10 10:09:31 +01:00
|
|
|
### Django and Tornado
|
2016-06-04 02:39:06 +02:00
|
|
|
|
2017-02-10 10:09:31 +01:00
|
|
|
Zulip is primarily implemented in the
|
|
|
|
[Django](https://www.djangoproject.com/) Python web framework. We
|
|
|
|
also make use of [Tornado](http://www.tornadoweb.org) for the
|
|
|
|
real-time push system.
|
2016-06-04 02:39:06 +02:00
|
|
|
|
|
|
|
Django is the main web application server; Tornado runs the
|
|
|
|
server-to-client real-time push system. The app servers are configured
|
|
|
|
by the Supervisor configuration (which explains how to start the server
|
|
|
|
processes; see "Supervisor" below) and the nginx configuration (which
|
|
|
|
explains which HTTP requests get sent to which app server).
|
|
|
|
|
2019-01-16 20:25:43 +01:00
|
|
|
Tornado is an asynchronous server and is meant specifically to hold
|
|
|
|
open tens of thousands of long-lived (long-polling or websocket)
|
|
|
|
connections -- that is to say, routes that maintain a persistent
|
|
|
|
connection from every running client. For this reason, it's
|
|
|
|
responsible for event (message) delivery, but not much else. We try to
|
|
|
|
avoid any blocking calls in Tornado because we don't want to delay
|
|
|
|
delivery to thousands of other connections (as this would make Zulip
|
|
|
|
very much not real-time). For instance, we avoid doing cache or
|
|
|
|
database queries inside the Tornado code paths, since those blocking
|
|
|
|
requests carry a very high performance penalty for a single-threaded,
|
|
|
|
asynchronous server system. (In principle, we could do non-blocking
|
|
|
|
requests to those services, but the Django-based database libraries we
|
|
|
|
use in most of our codebase using don't support that, and in any case,
|
|
|
|
our architecture doesn't require Tornado to do that).
|
2016-06-04 02:39:06 +02:00
|
|
|
|
|
|
|
The parts that are activated relatively rarely (e.g. when people type or
|
|
|
|
click on something) are processed by the Django application server. One
|
|
|
|
exception to this is that Zulip uses websockets through Tornado to
|
|
|
|
minimize latency on the code path for **sending** messages.
|
|
|
|
|
2017-02-10 10:09:31 +01:00
|
|
|
There is detailed documentation on the
|
2019-04-06 02:58:44 +02:00
|
|
|
[real-time push and event queue system](../subsystems/events-system.html); most of
|
2017-02-10 10:09:31 +01:00
|
|
|
the code is in `zerver/tornado`.
|
|
|
|
|
2017-06-15 20:47:47 +02:00
|
|
|
#### HTML templates, JavaScript, etc.
|
|
|
|
|
|
|
|
Zulip's HTML is primarily implemented using two types of HTML
|
|
|
|
templates: backend templates (powered by the [Jinja2][] template
|
|
|
|
engine used for logged-out ("portico") pages and the webapp's base
|
|
|
|
content) and frontend templates (powered by [Handlebars][]) used for
|
|
|
|
live-rendering HTML from JavaScript for things like the main message
|
|
|
|
feed.
|
|
|
|
|
|
|
|
For more details on the frontend, see our documentation on
|
2019-04-06 02:58:44 +02:00
|
|
|
[translation](../translating/translating.html),
|
|
|
|
[templates](../subsystems/html-templates.html),
|
|
|
|
[directory structure](../overview/directory-structure.html), and
|
|
|
|
[the static asset pipeline](../subsystems/front-end-build-process.html).
|
2017-06-15 20:47:47 +02:00
|
|
|
|
|
|
|
[Jinja2]: http://jinja.pocoo.org/
|
|
|
|
[Handlebars]: http://handlebarsjs.com/
|
|
|
|
|
2016-06-04 02:39:06 +02:00
|
|
|
### nginx
|
|
|
|
|
|
|
|
nginx is the front-end web server to all Zulip traffic; it serves static
|
|
|
|
assets and proxies to Django and Tornado. It handles HTTP requests
|
|
|
|
according to the rules laid down in the many config files found in
|
|
|
|
`zulip/puppet/zulip/files/nginx/`.
|
|
|
|
|
|
|
|
`zulip/puppet/zulip/files/nginx/zulip-include-frontend/app` is the most
|
|
|
|
important of these files. It explains what happens when requests come in
|
|
|
|
from outside.
|
|
|
|
|
|
|
|
- In production, all requests to URLs beginning with `/static/` are
|
|
|
|
served from the corresponding files in `/home/zulip/prod-static/`,
|
|
|
|
and the production build process (`tools/build-release-tarball`)
|
|
|
|
compiles, minifies, and installs the static assets into the
|
|
|
|
`prod-static/` tree form. In development, files are served directly
|
|
|
|
from `/static/` in the git repository.
|
2016-10-19 02:46:21 +02:00
|
|
|
- Requests to `/json/events`, `/api/v1/events`, and `/sockjs` are
|
2016-06-04 02:39:06 +02:00
|
|
|
sent to the Tornado server. These are requests to the real-time push
|
|
|
|
system, because the user's web browser sets up a long-lived TCP
|
|
|
|
connection with Tornado to serve as [a channel for push
|
2016-10-21 15:26:37 +02:00
|
|
|
notifications](https://en.wikipedia.org/wiki/Push_technology#Long_polling).
|
2016-06-04 02:39:06 +02:00
|
|
|
nginx gets the hostname for the Tornado server via
|
|
|
|
`puppet/zulip/files/nginx/zulip-include-frontend/upstreams`.
|
|
|
|
- Requests to all other paths are sent to the Django app via the UNIX
|
2016-11-23 13:36:09 +01:00
|
|
|
socket `unix:/home/zulip/deployments/uwsgi-socket` (defined in
|
2016-06-04 02:39:06 +02:00
|
|
|
`puppet/zulip/files/nginx/zulip-include-frontend/upstreams`). We use
|
2016-11-23 13:36:09 +01:00
|
|
|
`zproject/wsgi.py` to implement uWSGI here (see
|
2016-06-04 02:39:06 +02:00
|
|
|
`django.core.wsgi`).
|
2017-06-15 20:38:38 +02:00
|
|
|
- By default (i.e. if `LOCAL_UPLOADS_DIR` is set), nginx will serve
|
|
|
|
user-uploaded content like avatars, custom emoji, and uploaded
|
|
|
|
files. However, one can configure Zulip to store these in a cloud
|
|
|
|
storage service like Amazon S3 instead.
|
2016-06-04 02:39:06 +02:00
|
|
|
|
|
|
|
### Supervisor
|
|
|
|
|
|
|
|
We use [supervisord](http://supervisord.org/) to start server processes,
|
|
|
|
restart them automatically if they crash, and direct logging.
|
|
|
|
|
|
|
|
The config file is
|
2017-05-06 10:03:15 +02:00
|
|
|
`zulip/puppet/zulip/templates/supervisor/zulip.conf.template.erb`. This
|
|
|
|
is where Tornado and Django are set up, as well as a number of background
|
2016-06-04 02:39:06 +02:00
|
|
|
processes that process event queues. We use event queues for the kinds
|
|
|
|
of tasks that are best run in the background because they are
|
|
|
|
expensive (in terms of performance) and don't have to be synchronous
|
2016-07-27 04:00:29 +02:00
|
|
|
--- e.g., sending emails or updating analytics. Also see [the queuing
|
2019-04-06 02:58:44 +02:00
|
|
|
guide](../subsystems/queuing.html).
|
2016-06-04 02:39:06 +02:00
|
|
|
|
|
|
|
### memcached
|
|
|
|
|
2018-07-31 23:07:42 +02:00
|
|
|
memcached is used to cache database model
|
|
|
|
objects. `zerver/lib/cache.py` and `zerver/lib/cache_helpers.py`
|
|
|
|
manage putting things into memcached, and invalidating the cache when
|
|
|
|
values change. The memcached configuration is in
|
|
|
|
`puppet/zulip/files/memcached.conf`. See our
|
2019-04-06 02:58:44 +02:00
|
|
|
[caching guide](../subsystems/caching.html) to learn how this works in
|
2018-07-31 23:07:42 +02:00
|
|
|
detail.
|
2016-06-04 02:39:06 +02:00
|
|
|
|
|
|
|
### Redis
|
|
|
|
|
|
|
|
Redis is used for a few very short-term data stores, such as in the
|
|
|
|
basis of `zerver/lib/rate_limiter.py`, a per-user rate limiting scheme
|
|
|
|
[example](http://blog.domaintools.com/2013/04/rate-limiting-with-redis/)),
|
|
|
|
and the [email-to-Zulip
|
2017-07-25 03:15:21 +02:00
|
|
|
integration](https://zulipchat.com/integrations/doc/email).
|
2016-06-04 02:39:06 +02:00
|
|
|
|
|
|
|
Redis is configured in `zulip/puppet/zulip/files/redis` and it's a
|
|
|
|
pretty standard configuration except for the last line, which turns off
|
|
|
|
persistence:
|
|
|
|
|
|
|
|
# Zulip-specific configuration: disable saving to disk.
|
|
|
|
save ""
|
|
|
|
|
2019-03-18 20:25:11 +01:00
|
|
|
People often wonder if we could replace memcached with redis (or
|
|
|
|
replace RabbitMQ with redis, with some loss of functionality).
|
|
|
|
|
|
|
|
The answer is likely yes, but it wouldn't improve Zulip.
|
|
|
|
Operationally, our current setup is likely easier to develop and run
|
|
|
|
in production than a pure redis system would be. Meanwhile, the
|
|
|
|
perceived benefit for using redis is usually to reduce memory
|
|
|
|
consumption by running fewer services, and no such benefit would
|
|
|
|
materialize:
|
|
|
|
|
|
|
|
* Our cache uses significant memory, but that memory usage would be
|
|
|
|
essentially the same with redis as it is with memcached.
|
|
|
|
* All of these services have low minimum memory requirements, and in
|
|
|
|
fact our applications for redis and RabbitMQ do not use significant
|
|
|
|
memory even at scale.
|
|
|
|
* We would likely need to run multiple redis services (with different
|
|
|
|
configurations) in order to ensure the pure LRU use case (memcached)
|
|
|
|
doesn't push out data that we want to persist until expiry
|
|
|
|
(redis-based rate limiting) or until consumed (RabbitMQ-based
|
|
|
|
queuing of deferred work).
|
2016-06-04 02:39:06 +02:00
|
|
|
|
|
|
|
### RabbitMQ
|
|
|
|
|
|
|
|
RabbitMQ is a queueing system. Its config files live in
|
|
|
|
`zulip/puppet/zulip/files/rabbitmq`. Initial configuration happens in
|
|
|
|
`zulip/scripts/setup/configure-rabbitmq`.
|
|
|
|
|
|
|
|
We use RabbitMQ for queuing expensive work (e.g. sending emails
|
|
|
|
triggered by a message, push notifications, some analytics, etc.) that
|
|
|
|
require reliable delivery but which we don't want to do on the main
|
|
|
|
thread. It's also used for communication between the application server
|
|
|
|
and the Tornado push system.
|
|
|
|
|
|
|
|
Two simple wrappers around `pika` (the Python RabbitMQ client) are in
|
2016-11-27 06:56:06 +01:00
|
|
|
`zulip/zerver/lib/queue.py`. There's an asynchronous client for use in
|
2017-02-10 10:09:31 +01:00
|
|
|
Tornado and a more general client for use elsewhere. Most of the
|
|
|
|
processes started by Supervisor are queue processors that continually
|
|
|
|
pull things out of a RabbitMQ queue and handle them; they are defined
|
|
|
|
in `zerver/worker/queue_processors.py`.
|
2016-06-04 02:39:06 +02:00
|
|
|
|
2019-04-06 02:58:44 +02:00
|
|
|
Also see [the queuing guide](../subsystems/queuing.html).
|
2016-06-04 02:39:06 +02:00
|
|
|
|
|
|
|
### PostgreSQL
|
|
|
|
|
|
|
|
PostgreSQL (also known as Postgres) is the database that stores all
|
|
|
|
persistent data, that is, data that's expected to live beyond a user's
|
|
|
|
current session.
|
|
|
|
|
|
|
|
In production, Postgres is installed with a default configuration. The
|
|
|
|
directory that would contain configuration files
|
|
|
|
(`puppet/zulip/files/postgresql`) has only a utility script and a custom
|
|
|
|
list of stopwords used by a Postgresql extension.
|
|
|
|
|
2016-06-27 23:50:38 +02:00
|
|
|
In a development environment, configuration of that postgresql
|
|
|
|
extension is handled by `tools/postgres-init-dev-db` (invoked by
|
2017-01-14 11:19:26 +01:00
|
|
|
`tools/provision`). That file also manages setting up the
|
2016-06-27 23:50:38 +02:00
|
|
|
development postgresql user.
|
2016-06-04 02:39:06 +02:00
|
|
|
|
2017-03-19 09:06:05 +01:00
|
|
|
`tools/provision` also invokes `tools/do-destroy-rebuild-database`
|
2016-06-27 23:50:38 +02:00
|
|
|
to create the actual database with its schema.
|
2016-06-04 02:39:06 +02:00
|
|
|
|
2018-07-30 22:16:26 +02:00
|
|
|
### Thumbor and thumbnailing
|
|
|
|
|
|
|
|
We use Thumbor, a popular open source thumbnailing server, to serve
|
|
|
|
images (both for inline URL previews and serving uploaded image
|
2019-04-06 02:58:44 +02:00
|
|
|
files). See [our thumbnailing docs](../subsystems/thumbnailing.html)
|
2018-07-30 22:16:26 +02:00
|
|
|
for more details on how this works.
|
|
|
|
|
2016-06-04 02:39:06 +02:00
|
|
|
### Nagios
|
|
|
|
|
|
|
|
Nagios is an optional component used for notifications to the system
|
|
|
|
administrator, e.g., in case of outages.
|
|
|
|
|
|
|
|
`zulip/puppet/zulip/manifests/nagios.pp` installs Nagios plugins from
|
2016-10-14 15:28:02 +02:00
|
|
|
`puppet/zulip/files/nagios_plugins/`.
|
2016-06-04 02:39:06 +02:00
|
|
|
|
|
|
|
This component is intended to install Nagios plugins intended to be run
|
|
|
|
on a Nagios server; most of the Zulip Nagios plugins are intended to be
|
|
|
|
run on the Zulip servers themselves, and are included with the relevant
|
|
|
|
component of the Zulip server (e.g.
|
|
|
|
`puppet/zulip/manifests/postgres_common.pp` installs a few under
|
|
|
|
`/usr/lib/nagios/plugins/zulip_postgres_common`).
|
2016-10-12 01:43:23 +02:00
|
|
|
|
|
|
|
## Glossary
|
|
|
|
|
|
|
|
This section gives names for some of the elements in the Zulip UI used
|
|
|
|
in Zulip development conversations. Contributions to extend this list
|
|
|
|
are welcome!
|
|
|
|
|
|
|
|
* **chevron**: A small downward-facing arrow next to a message's
|
|
|
|
timestamp, offering contextual options, e.g., "Reply", "Mute [this
|
|
|
|
topic]", or "Link to this conversation". To avoid visual clutter,
|
|
|
|
the chevron only appears in the web UI upon hover.
|
|
|
|
|
2017-03-22 23:49:02 +01:00
|
|
|
* **huddle**: What the codebase calls a "group private message".
|
|
|
|
|
2016-10-12 01:43:23 +02:00
|
|
|
* **message editing**: If the realm admin allows it, then after a user
|
|
|
|
posts a message, the user has a few minutes to click "Edit" and
|
|
|
|
change the content of their message. If they do, Zulip adds a
|
|
|
|
marker such as "(EDITED)" at the top of the message, visible to
|
|
|
|
anyone who can see the message.
|
|
|
|
|
2017-03-22 23:49:02 +01:00
|
|
|
* **realm**: What the codebase calls an "organization" in the UI.
|
|
|
|
|
2016-10-12 01:43:23 +02:00
|
|
|
* **recipient bar**: A visual indication of the context of a message
|
|
|
|
or group of messages, displaying the stream and topic or private
|
|
|
|
message recipient list, at the top of a group of messages. A
|
|
|
|
typical 1-line message to a new recipient shows to the user as
|
|
|
|
three lines of content: first the recipient bar, second the
|
|
|
|
sender's name and avatar alongside the timestamp (and, on hover,
|
|
|
|
the star and the chevron), and third the message content. The
|
|
|
|
recipient bar is or contains hyperlinks to help the user narrow.
|
|
|
|
|
|
|
|
* **star**: Zulip allows a user to mark any message they can see,
|
|
|
|
public or private, as "starred". A user can easily access messages
|
2017-08-08 07:37:39 +02:00
|
|
|
they've starred through the "Starred messages" link in the
|
|
|
|
left sidebar, or use "is:starred" as a narrow or a search
|
2016-10-12 01:43:23 +02:00
|
|
|
constraint. Whether a user has or has not starred a particular
|
|
|
|
message is private; other users and realm admins don't know
|
|
|
|
whether a message has been starred, or by whom.
|
2017-03-22 23:09:41 +01:00
|
|
|
|
2017-03-22 23:49:02 +01:00
|
|
|
* **subject**: What the codebase calls a "topic" in many places.
|
|
|
|
|
2017-03-22 23:09:41 +01:00
|
|
|
* **bankruptcy**: When a user has been off Zulip for several days and
|
|
|
|
has hundreds of unread messages, they are prompted for whether
|
|
|
|
they want to mark all their unread messages as read. This is
|
|
|
|
called "declaring bankruptcy" (in reference to the concept in
|
|
|
|
finance).
|