mirror of https://github.com/zulip/zulip.git
docs: Split maintain-secure-upgrade into dedicated docs.
* Moves "Management commands" to a top-level section. * Moves "Scalability" as a subsection at the bottom of "Requirements". * Moves "Monitoring" as a subsections at the bottom of "Troubleshooting". * Replaces "API and your Zulip URL" with a link to REST API docs. This documentation text has been irrelevant for some time. * Removes maintain-secure-upgrade from the TOC but the file remains to avoid breaking old links from release blog posts and emails.
This commit is contained in:
parent
1c45e4ac11
commit
87a2831b83
|
@ -5,7 +5,7 @@
|
|||
# version e.g. to say that something is likely to have changed.
|
||||
# For more info see: https://www.sphinx-doc.org/en/master/templating.html
|
||||
-->
|
||||
{% if (pagename == "production/email-gateway" or pagename == "production/upgrade-or-modify") and release.endswith('+git') %}
|
||||
{% if (pagename == "production/management-commands" or pagename == "production/email-gateway" or pagename == "production/upgrade-or-modify") and release.endswith('+git') %}
|
||||
<!--
|
||||
# email-gateway.html page doesn't exist in the stable documentation yet.
|
||||
# This temporary workaround prevents CircleCI failure and should be removed after the next release.
|
||||
|
|
|
@ -383,7 +383,7 @@ delete the test import data from your Zulip server before doing a
|
|||
final import. You can **permanently delete** all data from a Zulip
|
||||
organization using the following procedure:
|
||||
|
||||
* Start a [Zulip management shell](../production/maintain-secure-upgrade.html#manage-py-shell)
|
||||
* Start a [Zulip management shell](../production/management-commands.html#manage-py-shell)
|
||||
* In the management shell, run the following commands, replacing `""`
|
||||
with the subdomain if [you are hosting the organization on a
|
||||
subdomain](../production/multiple-organizations.md):
|
||||
|
|
|
@ -8,9 +8,9 @@ Zulip in Production
|
|||
requirements
|
||||
Installing a production server <install>
|
||||
troubleshooting
|
||||
management-commands
|
||||
settings
|
||||
mobile-push-notifications
|
||||
maintain-secure-upgrade
|
||||
upgrade-or-modify
|
||||
security-model
|
||||
authentication-methods
|
||||
|
|
|
@ -109,7 +109,7 @@ for server administrators. This extremely low-traffic list is for
|
|||
important announcements, including new releases and security issues.
|
||||
* Follow [Zulip on Twitter](https://twitter.com/zulip).
|
||||
* Learn how to [configure your Zulip server settings](settings.md).
|
||||
* Learn about [maintaining](../production/maintain-secure-upgrade.md)
|
||||
* Learn about [Backups, export and import](../production/export-and-import.md)
|
||||
and [upgrading](../production/upgrade-or-modify.md) a production Zulip
|
||||
server.
|
||||
|
||||
|
|
|
@ -1,150 +1,14 @@
|
|||
```eval_rst
|
||||
:orphan:
|
||||
```
|
||||
|
||||
# Maintain, secure, and upgrade
|
||||
|
||||
This page covers topics that will help you maintain a healthy, up-to-date, and
|
||||
secure Zulip installation, including:
|
||||
This was once a long page covering a bunch of topics; those topics
|
||||
have since all moved to dedicated pages:
|
||||
|
||||
- [Monitoring](#monitoring)
|
||||
- [Scalability](#scalability)
|
||||
- [Management commands](#management-commands)
|
||||
|
||||
You may also want to read this related content:
|
||||
|
||||
- [Security Model](../production/security-model.md)
|
||||
- [Backups, export and import](../production/export-and-import.md)
|
||||
- [Upgrade or modify Zulip](../production/upgrade-or-modify.md)
|
||||
|
||||
## Monitoring
|
||||
|
||||
The complete Nagios configuration (sans secret keys) used to
|
||||
monitor zulip.com is available under `puppet/zulip_ops` in the
|
||||
Zulip Git repository (those files are not installed in the release
|
||||
tarballs).
|
||||
|
||||
The Nagios plugins used by that configuration are installed
|
||||
automatically by the Zulip installation process in subdirectories
|
||||
under `/usr/lib/nagios/plugins/`. The following is a summary of the
|
||||
various Nagios plugins included with Zulip and what they check:
|
||||
|
||||
Application server and queue worker monitoring:
|
||||
|
||||
* `check_send_receive_time` (sends a test message through the system
|
||||
between two bot users to check that end-to-end message sending works)
|
||||
|
||||
* `check_rabbitmq_consumers` and `check_rabbitmq_queues` (checks for
|
||||
rabbitmq being down or the queue workers being behind)
|
||||
|
||||
* `check_queue_worker_errors` (checks for errors reported by the queue
|
||||
workers)
|
||||
|
||||
* `check_worker_memory` (monitors for memory leaks in queue workers)
|
||||
|
||||
* `check_email_deliverer_backlog` and `check_email_deliverer_process`
|
||||
(monitors for whether scheduled outgoing emails are being sent)
|
||||
|
||||
Database monitoring:
|
||||
|
||||
* `check_postgres_replication_lag` (checks streaming replication is up
|
||||
to date).
|
||||
|
||||
* `check_postgres` (checks the health of the postgres database)
|
||||
|
||||
* `check_postgres_backup` (checks backups are up to date; see above)
|
||||
|
||||
* `check_fts_update_log` (monitors for whether full-text search updates
|
||||
are being processed)
|
||||
|
||||
Standard server monitoring:
|
||||
|
||||
* `check_website_response.sh` (standard HTTP check)
|
||||
|
||||
* `check_debian_packages` (checks apt repository is up to date)
|
||||
|
||||
**Note**: While most commands require no special permissions,
|
||||
`check_email_deliverer_backlog`, requires the `nagios` user to be in
|
||||
the `zulip` group, in order to access `SECRET_KEY` and thus run
|
||||
Zulip management commands.
|
||||
|
||||
If you're using these plugins, bug reports and pull requests to make
|
||||
it easier to monitor Zulip and maintain it in production are
|
||||
encouraged!
|
||||
|
||||
## Scalability
|
||||
|
||||
This section attempts to address the considerations involved with
|
||||
running Zulip with larger teams (especially >1000 users).
|
||||
|
||||
* For an organization with 100+ users, it's important to have more
|
||||
than 4GB of RAM on the system. Zulip will install on a system with
|
||||
2GB of RAM, but with less than 3.5GB of RAM, it will run its
|
||||
[queue processors](../subsystems/queuing.md) multithreaded to conserve memory;
|
||||
this creates a significant performance bottleneck.
|
||||
|
||||
* [chat.zulip.org](../contributing/chat-zulip-org.md), with thousands of user
|
||||
accounts and thousands of messages sent every week, has 8GB of RAM,
|
||||
4 cores, and 80GB of disk. The CPUs are essentially always idle,
|
||||
but the 8GB of RAM is important.
|
||||
|
||||
* We recommend using a [remote postgres
|
||||
database](postgres.md) for isolation, though it is
|
||||
not required. In the following, we discuss a relatively simple
|
||||
configuration with two types of servers: application servers
|
||||
(running Django, Tornado, RabbitMQ, Redis, Memcached, etc.) and
|
||||
database servers.
|
||||
|
||||
* You can scale to a pretty large installation (O(~1000) concurrently
|
||||
active users using it to chat all day) with just a single reasonably
|
||||
large application server (e.g. AWS c3.2xlarge with 8 cores and 16GB
|
||||
of RAM) sitting mostly idle (<10% CPU used and only 4GB of the 16GB
|
||||
RAM actively in use). You can probably get away with half that
|
||||
(e.g. c3.xlarge), but ~8GB of RAM is highly recommended at scale.
|
||||
Beyond a 1000 active users, you will eventually want to increase the
|
||||
memory cap in `memcached.conf` from the default 512MB to avoid high
|
||||
rates of memcached misses.
|
||||
|
||||
* For the database server, we highly recommend SSD disks, and RAM is
|
||||
the primary resource limitation. We have not aggressively tested
|
||||
for the minimum resources required, but 8 cores with 30GB of RAM
|
||||
(e.g. AWS's m3.2xlarge) should suffice; you may be able to get away
|
||||
with less especially on the CPU side. The database load per user is
|
||||
pretty optimized as long as `memcached` is working correctly. This
|
||||
has not been tested, but from extrapolating the load profile, it
|
||||
should be possible to scale a Zulip installation to 10,000s of
|
||||
active users using a single large database server without doing
|
||||
anything complicated like sharding the database.
|
||||
|
||||
* For reasonably high availability, it's easy to run a hot spare
|
||||
application server and a hot spare database (using Postgres
|
||||
streaming replication; see the section on configuring this). Be
|
||||
sure to check out the section on backups if you're hoping to run a
|
||||
spare application server; in particular you probably want to use the
|
||||
S3 backend for storing user-uploaded files and avatars and will want
|
||||
to make sure secrets are available on the hot spare.
|
||||
|
||||
* Zulip 2.0 and later supports running multiple Tornado servers
|
||||
sharded by realm/organization, which is how we scale Zulip Cloud.
|
||||
|
||||
* However, Zulip does not yet support dividing traffic for a single
|
||||
Zulip realm between multiple application servers. There are two
|
||||
issues: you need to share the memcached/Redis/RabbitMQ instance
|
||||
(these should can be moved to a network service shared by multiple
|
||||
servers with a bit of configuration) and the Tornado event system
|
||||
for pushing to browsers currently has no mechanism for multiple
|
||||
frontend servers (or event processes) talking to each other. One
|
||||
can probably get a factor of 10 in a single server's scalability by
|
||||
[supporting multiple tornado processes on a single server](https://github.com/zulip/zulip/issues/372),
|
||||
which is also likely the first part of any project to support
|
||||
exchanging events amongst multiple servers. The work for changing
|
||||
this is pretty far along, though, and thus while not generally
|
||||
available yet, we can set it up for users with an enterprise support
|
||||
contract.
|
||||
|
||||
Questions, concerns, and bug reports about this area of Zulip are very
|
||||
welcome! This is an area we are hoping to improve.
|
||||
|
||||
## Sections that have moved
|
||||
|
||||
These were once subsections of this page, but have since moved to
|
||||
dedicated pages; we preserve them here to avoid breaking old links.
|
||||
### Monitoring
|
||||
Moved to [Troubleshooting](../production/troubleshooting.html#monitoring).
|
||||
|
||||
### Securing your Zulip server
|
||||
|
||||
|
@ -164,151 +28,14 @@ repository](../production/upgrade-or-modify.html#upgrading-from-a-git-repository
|
|||
Moved to [Upgrading the operating
|
||||
system](../production/upgrade-or-modify.html#upgrading-the-operating-system).
|
||||
|
||||
## API and your Zulip URL
|
||||
### Scalability
|
||||
|
||||
To use the Zulip API with your Zulip server, you will need to use the
|
||||
API endpoint of e.g. `https://zulip.example.com/api`. Our Python
|
||||
API example scripts support this via the
|
||||
`--site=https://zulip.example.com` argument. The API bindings
|
||||
support it via putting `site=https://zulip.example.com` in your
|
||||
.zuliprc.
|
||||
Moved to [Scalability](../production/requirements.html#scalability).
|
||||
|
||||
Every Zulip integration supports this sort of argument (or e.g. a
|
||||
`ZULIP_SITE` variable in a zuliprc file or the environment), but this
|
||||
is not yet documented for some of the integrations (the included
|
||||
integration documentation on `/integrations` will properly document
|
||||
how to do this for most integrations). We welcome pull requests for
|
||||
integrations that don't discuss this!
|
||||
### Management commands
|
||||
|
||||
Similarly, you will need to instruct your users to specify the URL
|
||||
for your Zulip server when using the Zulip desktop and mobile apps.
|
||||
Moved to [Management commands](../production/management-commands.md).
|
||||
|
||||
## Memory leak mitigation
|
||||
### API and your Zulip URL
|
||||
|
||||
As a measure to mitigate the impact of potential memory leaks in one
|
||||
of the Zulip daemons, the service automatically restarts itself
|
||||
every Sunday early morning. See `/etc/cron.d/restart-zulip` for the
|
||||
precise configuration.
|
||||
|
||||
## Management commands
|
||||
|
||||
Zulip has a large library of [Django management
|
||||
commands](https://docs.djangoproject.com/en/1.8/ref/django-admin/#django-admin-and-manage-py).
|
||||
To use them, you will want to be logged in as the `zulip` user and for
|
||||
the purposes of this documentation, we assume the current working
|
||||
directory is `/home/zulip/deployments/current`.
|
||||
|
||||
Below, we show several useful examples, but there are more than 100
|
||||
in total. We recommend skimming the usage docs (or if there are none,
|
||||
the code) of a management command before using it, since they are
|
||||
generally less polished and more designed for expert use than the rest
|
||||
of the Zulip system.
|
||||
|
||||
### Running management commands
|
||||
|
||||
Many management commands require the Zulip realm/organization to
|
||||
interact with as an argument, which you can specify via numeric or
|
||||
string ID.
|
||||
|
||||
You can see all the organizations on your Zulip server using
|
||||
`./manage.py list_realms`.
|
||||
|
||||
```
|
||||
zulip@zulip:~$ /home/zulip/deployments/current/manage.py list_realms
|
||||
id string_id name
|
||||
-- --------- ----
|
||||
1 zulipinternal None
|
||||
2 Zulip Community
|
||||
```
|
||||
|
||||
(Note that every Zulip server has a special `zulipinternal` realm containing
|
||||
system-internal bots like `welcome-bot`; you are unlikely to need to
|
||||
interact with that realm.)
|
||||
|
||||
Unless you are
|
||||
[hosting multiple organizations on your Zulip server](../production/multiple-organizations.md),
|
||||
your single Zulip organization on the root domain will have the empty
|
||||
string (`''`) as its `string_id`. So you can run e.g.:
|
||||
```
|
||||
zulip@zulip:~$ /home/zulip/deployments/current/manage.py show_admins -r ''
|
||||
```
|
||||
|
||||
Otherwise, the `string_id` will correspond to the organization's
|
||||
subdomain. E.g. on `it.zulip.example.com`, use
|
||||
`/home/zulip/deployments/current/manage.py show_admins -r it`.
|
||||
|
||||
### manage.py shell
|
||||
|
||||
You can get an iPython shell with full access to code within the Zulip
|
||||
project using `manage.py shell`, e.g., you can do the following to
|
||||
change a user's email address:
|
||||
|
||||
```
|
||||
$ /home/zulip/deployments/current/manage.py shell
|
||||
In [1]: user_profile = get_user_profile_by_email("email@example.com")
|
||||
In [2]: do_change_user_delivery_email(user_profile, "new_email@example.com")
|
||||
```
|
||||
|
||||
#### manage.py dbshell
|
||||
|
||||
This will start a postgres shell connected to the Zulip database.
|
||||
|
||||
### Grant administrator access
|
||||
|
||||
You can make any user a realm administrator on the command line with
|
||||
the `knight` management command:
|
||||
|
||||
```
|
||||
./manage.py knight username@example.com -f
|
||||
```
|
||||
|
||||
#### Creating API super users with manage.py
|
||||
|
||||
If you need to manage the IRC, Jabber, or Zephyr mirrors, you will
|
||||
need to create API super users. To do this, use `./manage.py knight`
|
||||
with the `--permission=api_super_user` argument. See the respective
|
||||
integration scripts for these mirrors (under
|
||||
[`zulip/integrations/`][integrations-source] in the [Zulip Python API
|
||||
repo][python-api-repo]) for further detail on these.
|
||||
|
||||
[integrations-source]: https://github.com/zulip/python-zulip-api/tree/master/zulip/integrations
|
||||
[python-api-repo]: https://github.com/zulip/python-zulip-api
|
||||
|
||||
#### Exporting users and realms with manage.py export
|
||||
|
||||
If you need to do an export of a single user or of an entire realm, we
|
||||
have tools in `management/` that essentially export Zulip data to the
|
||||
file system.
|
||||
|
||||
`export_single_user.py` exports the message history and realm-public
|
||||
metadata for a single Zulip user (including that user's *received*
|
||||
messages as well as their sent messages).
|
||||
|
||||
A good overview of the process for exporting a single realm when
|
||||
moving a realm to a new server (without moving a full database dump)
|
||||
is in
|
||||
[management/export.py](https://github.com/zulip/zulip/blob/master/zerver/management/commands/export.py). We
|
||||
recommend you read the comment there for words of wisdom on speed,
|
||||
what is and is not exported, what will break upon a move to a new
|
||||
server, and suggested procedure.
|
||||
|
||||
### Other useful manage.py commands
|
||||
|
||||
There are dozens of useful management commands under
|
||||
`zerver/management/commands/`. We detail a few here:
|
||||
|
||||
* `manage.py help`: Lists all available management commands.
|
||||
* `manage.py send_custom_email`: Can be used to send an email to a set
|
||||
of users. The `--help` documents how to run it from a `manage.py
|
||||
shell` for use with more complex programmatically computed sets of
|
||||
users.
|
||||
* `manage.py send_password_reset_email`: Sends password reset email(s)
|
||||
to one or more users.
|
||||
* `manage.py change_user_email`: Change a user's email address.
|
||||
|
||||
All of our management commands have internal documentation available
|
||||
via `manage.py command_name --help`.
|
||||
|
||||
## Hosting multiple Zulip organizations
|
||||
|
||||
This is explained in detail on [its own page](../production/multiple-organizations.md).
|
||||
Moved to [REST API](https://chat.zulip.org/api/rest).
|
||||
|
|
|
@ -0,0 +1,118 @@
|
|||
# Management commands
|
||||
|
||||
Zulip has a large library of [Django management
|
||||
commands](https://docs.djangoproject.com/en/1.8/ref/django-admin/#django-admin-and-manage-py).
|
||||
To use them, you will want to be logged in as the `zulip` user and for
|
||||
the purposes of this documentation, we assume the current working
|
||||
directory is `/home/zulip/deployments/current`.
|
||||
|
||||
Below, we show several useful examples, but there are more than 100
|
||||
in total. We recommend skimming the usage docs (or if there are none,
|
||||
the code) of a management command before using it, since they are
|
||||
generally less polished and more designed for expert use than the rest
|
||||
of the Zulip system.
|
||||
|
||||
## Running management commands
|
||||
|
||||
Many management commands require the Zulip realm/organization to
|
||||
interact with as an argument, which you can specify via numeric or
|
||||
string ID.
|
||||
|
||||
You can see all the organizations on your Zulip server using
|
||||
`./manage.py list_realms`.
|
||||
|
||||
```
|
||||
zulip@zulip:~$ /home/zulip/deployments/current/manage.py list_realms
|
||||
id string_id name
|
||||
-- --------- ----
|
||||
1 zulipinternal None
|
||||
2 Zulip Community
|
||||
```
|
||||
|
||||
(Note that every Zulip server has a special `zulipinternal` realm containing
|
||||
system-internal bots like `welcome-bot`; you are unlikely to need to
|
||||
interact with that realm.)
|
||||
|
||||
Unless you are
|
||||
[hosting multiple organizations on your Zulip server](../production/multiple-organizations.md),
|
||||
your single Zulip organization on the root domain will have the empty
|
||||
string (`''`) as its `string_id`. So you can run e.g.:
|
||||
```
|
||||
zulip@zulip:~$ /home/zulip/deployments/current/manage.py show_admins -r ''
|
||||
```
|
||||
|
||||
Otherwise, the `string_id` will correspond to the organization's
|
||||
subdomain. E.g. on `it.zulip.example.com`, use
|
||||
`/home/zulip/deployments/current/manage.py show_admins -r it`.
|
||||
|
||||
## manage.py shell
|
||||
|
||||
You can get an iPython shell with full access to code within the Zulip
|
||||
project using `manage.py shell`, e.g., you can do the following to
|
||||
change a user's email address:
|
||||
|
||||
```
|
||||
$ /home/zulip/deployments/current/manage.py shell
|
||||
In [1]: user_profile = get_user_profile_by_email("email@example.com")
|
||||
In [2]: do_change_user_delivery_email(user_profile, "new_email@example.com")
|
||||
```
|
||||
|
||||
### manage.py dbshell
|
||||
|
||||
This will start a postgres shell connected to the Zulip database.
|
||||
|
||||
## Grant administrator access
|
||||
|
||||
You can make any user a realm administrator on the command line with
|
||||
the `knight` management command:
|
||||
|
||||
```
|
||||
./manage.py knight username@example.com -f
|
||||
```
|
||||
|
||||
### Creating API super users with manage.py
|
||||
|
||||
If you need to manage the IRC, Jabber, or Zephyr mirrors, you will
|
||||
need to create API super users. To do this, use `./manage.py knight`
|
||||
with the `--permission=api_super_user` argument. See the respective
|
||||
integration scripts for these mirrors (under
|
||||
[`zulip/integrations/`][integrations-source] in the [Zulip Python API
|
||||
repo][python-api-repo]) for further detail on these.
|
||||
|
||||
[integrations-source]: https://github.com/zulip/python-zulip-api/tree/master/zulip/integrations
|
||||
[python-api-repo]: https://github.com/zulip/python-zulip-api
|
||||
|
||||
### Exporting users and realms with manage.py export
|
||||
|
||||
If you need to do an export of a single user or of an entire realm, we
|
||||
have tools in `management/` that essentially export Zulip data to the
|
||||
file system.
|
||||
|
||||
`export_single_user.py` exports the message history and realm-public
|
||||
metadata for a single Zulip user (including that user's *received*
|
||||
messages as well as their sent messages).
|
||||
|
||||
A good overview of the process for exporting a single realm when
|
||||
moving a realm to a new server (without moving a full database dump)
|
||||
is in
|
||||
[management/export.py](https://github.com/zulip/zulip/blob/master/zerver/management/commands/export.py). We
|
||||
recommend you read the comment there for words of wisdom on speed,
|
||||
what is and is not exported, what will break upon a move to a new
|
||||
server, and suggested procedure.
|
||||
|
||||
## Other useful manage.py commands
|
||||
|
||||
There are dozens of useful management commands under
|
||||
`zerver/management/commands/`. We detail a few here:
|
||||
|
||||
* `manage.py help`: Lists all available management commands.
|
||||
* `manage.py send_custom_email`: Can be used to send an email to a set
|
||||
of users. The `--help` documents how to run it from a `manage.py
|
||||
shell` for use with more complex programmatically computed sets of
|
||||
users.
|
||||
* `manage.py send_password_reset_email`: Sends password reset email(s)
|
||||
to one or more users.
|
||||
* `manage.py change_user_email`: Change a user's email address.
|
||||
|
||||
All of our management commands have internal documentation available
|
||||
via `manage.py command_name --help`.
|
|
@ -1,4 +1,4 @@
|
|||
# Requirements
|
||||
# Requirements and Scalability
|
||||
|
||||
To run a Zulip server, you will need:
|
||||
* A dedicated machine or VM
|
||||
|
@ -60,7 +60,7 @@ https://help.ubuntu.com/community/Repositories/Ubuntu
|
|||
for organizations with a hundreds of users (active or no).
|
||||
|
||||
See our
|
||||
[documentation on scalability](../production/maintain-secure-upgrade.html#scalability)
|
||||
[documentation on scalability](#scalability)
|
||||
for advice on hardware requirements for larger organizations.
|
||||
|
||||
* Disk space: You'll need at least 10GB of free disk space for a
|
||||
|
@ -132,3 +132,76 @@ Once you have met these requirements, see [full instructions for installing
|
|||
Zulip in production](../production/install.md).
|
||||
|
||||
[trusty-eol]: https://wiki.ubuntu.com/Releases
|
||||
|
||||
## Scalability
|
||||
|
||||
This section attempts to address the considerations involved with
|
||||
running Zulip with larger teams (especially >1000 users).
|
||||
|
||||
* For an organization with 100+ users, it's important to have more
|
||||
than 4GB of RAM on the system. Zulip will install on a system with
|
||||
2GB of RAM, but with less than 3.5GB of RAM, it will run its
|
||||
[queue processors](../subsystems/queuing.md) multithreaded to conserve memory;
|
||||
this creates a significant performance bottleneck.
|
||||
|
||||
* [chat.zulip.org](../contributing/chat-zulip-org.md), with thousands of user
|
||||
accounts and thousands of messages sent every week, has 8GB of RAM,
|
||||
4 cores, and 80GB of disk. The CPUs are essentially always idle,
|
||||
but the 8GB of RAM is important.
|
||||
|
||||
* We recommend using a [remote postgres
|
||||
database](postgres.md) for isolation, though it is
|
||||
not required. In the following, we discuss a relatively simple
|
||||
configuration with two types of servers: application servers
|
||||
(running Django, Tornado, RabbitMQ, Redis, Memcached, etc.) and
|
||||
database servers.
|
||||
|
||||
* You can scale to a pretty large installation (O(~1000) concurrently
|
||||
active users using it to chat all day) with just a single reasonably
|
||||
large application server (e.g. AWS c3.2xlarge with 8 cores and 16GB
|
||||
of RAM) sitting mostly idle (<10% CPU used and only 4GB of the 16GB
|
||||
RAM actively in use). You can probably get away with half that
|
||||
(e.g. c3.xlarge), but ~8GB of RAM is highly recommended at scale.
|
||||
Beyond a 1000 active users, you will eventually want to increase the
|
||||
memory cap in `memcached.conf` from the default 512MB to avoid high
|
||||
rates of memcached misses.
|
||||
|
||||
* For the database server, we highly recommend SSD disks, and RAM is
|
||||
the primary resource limitation. We have not aggressively tested
|
||||
for the minimum resources required, but 8 cores with 30GB of RAM
|
||||
(e.g. AWS's m3.2xlarge) should suffice; you may be able to get away
|
||||
with less especially on the CPU side. The database load per user is
|
||||
pretty optimized as long as `memcached` is working correctly. This
|
||||
has not been tested, but from extrapolating the load profile, it
|
||||
should be possible to scale a Zulip installation to 10,000s of
|
||||
active users using a single large database server without doing
|
||||
anything complicated like sharding the database.
|
||||
|
||||
* For reasonably high availability, it's easy to run a hot spare
|
||||
application server and a hot spare database (using Postgres
|
||||
streaming replication; see the section on configuring this). Be
|
||||
sure to check out the section on backups if you're hoping to run a
|
||||
spare application server; in particular you probably want to use the
|
||||
S3 backend for storing user-uploaded files and avatars and will want
|
||||
to make sure secrets are available on the hot spare.
|
||||
|
||||
* Zulip 2.0 and later supports running multiple Tornado servers
|
||||
sharded by realm/organization, which is how we scale Zulip Cloud.
|
||||
|
||||
* However, Zulip does not yet support dividing traffic for a single
|
||||
Zulip realm between multiple application servers. There are two
|
||||
issues: you need to share the memcached/Redis/RabbitMQ instance
|
||||
(these should can be moved to a network service shared by multiple
|
||||
servers with a bit of configuration) and the Tornado event system
|
||||
for pushing to browsers currently has no mechanism for multiple
|
||||
frontend servers (or event processes) talking to each other. One
|
||||
can probably get a factor of 10 in a single server's scalability by
|
||||
[supporting multiple tornado processes on a single server](https://github.com/zulip/zulip/issues/372),
|
||||
which is also likely the first part of any project to support
|
||||
exchanging events amongst multiple servers. The work for changing
|
||||
this is pretty far along, though, and thus while not generally
|
||||
available yet, we can set it up for users with an enterprise support
|
||||
contract.
|
||||
|
||||
Questions, concerns, and bug reports about this area of Zulip are very
|
||||
welcome! This is an area we are hoping to improve.
|
||||
|
|
|
@ -116,5 +116,5 @@ request; we love even small contributions, and we'd love to make the
|
|||
Zulip documentation cover everything anyone might want to know about
|
||||
running Zulip in production.
|
||||
|
||||
Next: [Maintaining](../production/maintain-secure-upgrade.md) and
|
||||
Next: [Backups, export and import](../production/export-and-import.md) and
|
||||
[upgrading](../production/upgrade-or-modify.md) Zulip in production.
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# Troubleshooting
|
||||
# Troubleshooting and monitoring
|
||||
|
||||
Zulip uses [Supervisor](http://supervisord.org/index.html) to monitor
|
||||
and control its many Python services. Read the next section, [Using
|
||||
|
@ -129,3 +129,65 @@ problems and how to resolve them:
|
|||
not have the correct HTTP Host header, Django rejects it and logs the
|
||||
attempt. For more on this issue, see the [Django release notes on Host header
|
||||
poisoning](https://www.djangoproject.com/weblog/2013/feb/19/security/#s-issue-host-header-poisoning)
|
||||
|
||||
## Monitoring
|
||||
|
||||
The complete Nagios configuration (sans secret keys) used to
|
||||
monitor zulip.com is available under `puppet/zulip_ops` in the
|
||||
Zulip Git repository (those files are not installed in the release
|
||||
tarballs).
|
||||
|
||||
The Nagios plugins used by that configuration are installed
|
||||
automatically by the Zulip installation process in subdirectories
|
||||
under `/usr/lib/nagios/plugins/`. The following is a summary of the
|
||||
various Nagios plugins included with Zulip and what they check:
|
||||
|
||||
Application server and queue worker monitoring:
|
||||
|
||||
* `check_send_receive_time` (sends a test message through the system
|
||||
between two bot users to check that end-to-end message sending works)
|
||||
|
||||
* `check_rabbitmq_consumers` and `check_rabbitmq_queues` (checks for
|
||||
rabbitmq being down or the queue workers being behind)
|
||||
|
||||
* `check_queue_worker_errors` (checks for errors reported by the queue
|
||||
workers)
|
||||
|
||||
* `check_worker_memory` (monitors for memory leaks in queue workers)
|
||||
|
||||
* `check_email_deliverer_backlog` and `check_email_deliverer_process`
|
||||
(monitors for whether scheduled outgoing emails are being sent)
|
||||
|
||||
Database monitoring:
|
||||
|
||||
* `check_postgres_replication_lag` (checks streaming replication is up
|
||||
to date).
|
||||
|
||||
* `check_postgres` (checks the health of the postgres database)
|
||||
|
||||
* `check_postgres_backup` (checks backups are up to date; see above)
|
||||
|
||||
* `check_fts_update_log` (monitors for whether full-text search updates
|
||||
are being processed)
|
||||
|
||||
Standard server monitoring:
|
||||
|
||||
* `check_website_response.sh` (standard HTTP check)
|
||||
|
||||
* `check_debian_packages` (checks apt repository is up to date)
|
||||
|
||||
**Note**: While most commands require no special permissions,
|
||||
`check_email_deliverer_backlog`, requires the `nagios` user to be in
|
||||
the `zulip` group, in order to access `SECRET_KEY` and thus run
|
||||
Zulip management commands.
|
||||
|
||||
If you're using these plugins, bug reports and pull requests to make
|
||||
it easier to monitor Zulip and maintain it in production are
|
||||
encouraged!
|
||||
|
||||
## Memory leak mitigation
|
||||
|
||||
As a measure to mitigate the impact of potential memory leaks in one
|
||||
of the Zulip daemons, the service automatically restarts itself
|
||||
every Sunday early morning. See `/etc/cron.d/restart-zulip` for the
|
||||
precise configuration.
|
||||
|
|
|
@ -80,7 +80,7 @@ the Zulip documentation on
|
|||
[making a user an administrator from the terminal][grant-admin-access]
|
||||
to mark the appropriate users as administrators.
|
||||
|
||||
[grant-admin-access]: https://zulip.readthedocs.io/en/latest/production/maintain-secure-upgrade.html#grant-administrator-access)
|
||||
[grant-admin-access]: https://zulip.readthedocs.io/en/latest/production/management-commands.html#grant-administrator-access)
|
||||
[gitter-api-user-data]: https://developer.gitter.im/docs/user-resource
|
||||
|
||||
## Caveats
|
||||
|
|
Loading…
Reference in New Issue