mirror of https://github.com/zulip/zulip.git
Reorganize postgres docs.
This commit is contained in:
parent
52764763c6
commit
a1683b1eaf
222
README.prod.md
222
README.prod.md
|
@ -455,7 +455,7 @@ computed using a hash of avatar_salt and user's email), etc.
|
|||
they do get large on a busy server, and it's definitely
|
||||
lower-priority.
|
||||
|
||||
### Restoration
|
||||
#### Restoration
|
||||
|
||||
To restore from backups, the process is basically the reverse of the above:
|
||||
|
||||
|
@ -487,9 +487,10 @@ that they are up to date using the Nagios plugin at:
|
|||
Contributions to more fully automate this process or make this section
|
||||
of the guide much more explicit and detailed are very welcome!
|
||||
|
||||
### Postgres streaming replication
|
||||
|
||||
Zulip has database configuration for doing with Postgres streaming
|
||||
#### Postgres streaming replication
|
||||
|
||||
Zulip has database configuration for using Postgres streaming
|
||||
replication; you can see the configuration in these files:
|
||||
|
||||
* puppet/zulip_internal/manifests/postgres_slave.pp
|
||||
|
@ -500,17 +501,6 @@ Contribution of a step-by-step guide for setting this up (and moving
|
|||
this configuration to be available in the main `puppet/zulip/` tree)
|
||||
would be very welcome!
|
||||
|
||||
### Using a remote postgres host
|
||||
|
||||
This is a bit annoying to setup, but you can configure Zulip to use a
|
||||
dedicated postgres server by setting the `REMOTE_POSTGRES_HOST`
|
||||
variable in /etc/zulip/settings.py, and configuring Postgres
|
||||
certificate authentication (see
|
||||
http://www.postgresql.org/docs/9.1/static/ssl-tcp.html and
|
||||
http://www.postgresql.org/docs/9.1/static/libpq-ssl.html for
|
||||
documentation on how to set this up and deploy the certificates) to
|
||||
make the DATABASES configuration in `zproject/settings.py` work (or
|
||||
override that configuration).
|
||||
|
||||
### Monitoring Zulip
|
||||
|
||||
|
@ -551,98 +541,14 @@ Contributions on making it easier to monitor Zulip and maintain it in
|
|||
production, e.g. https://github.com/zulip/zulip/issues/371, are very
|
||||
welcome!
|
||||
|
||||
#### Important postgres alerts
|
||||
|
||||
The `autovac_freeze` postgres alert from `check_postgres` is
|
||||
particularly important. This alert indicates that the age (in terms
|
||||
of number of transactions) of the oldest transaction id (XID) is
|
||||
getting close to the `autovacuum_freeze_max_age` setting. When the
|
||||
oldest XID hits that age, Postgres will force a VACUUM operation,
|
||||
which can often lead to sudden downtime until the operation finishes.
|
||||
If it did not do this and the age of the oldest XID reached 2 billion,
|
||||
transaction id wraparound would occur and there would be data loss.
|
||||
To clear the nagios alert, perform a `VACUUM` in each indicated
|
||||
database as a database superuser (`postgres`).
|
||||
|
||||
See
|
||||
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
|
||||
for more details on postgres vacuuming.
|
||||
|
||||
#### Debugging postgres database issues
|
||||
|
||||
When debugging postgres issues, in addition to the standard `pg_top`
|
||||
tool, often it can be useful to use this query:
|
||||
|
||||
```
|
||||
SELECT procpid,waiting,query_start,current_query FROM pg_stat_activity ORDER BY procpid;
|
||||
```
|
||||
|
||||
which shows the currently running backends and their activity. This is
|
||||
similar to the pg_top output, with the added advantage of showing the
|
||||
complete query, which can be valuable in debugging.
|
||||
|
||||
To stop a runaway query, you can run `SELECT pg_cancel_backend(pid
|
||||
int)` or `SELECT pg_terminate_backend(pid int)` as the 'postgres'
|
||||
user. The former cancels the backend's current query and the latter
|
||||
terminates the backend process. They are implemented by sending SIGINT
|
||||
and SIGTERM to the processes, respectively. We recommend against
|
||||
sending a Postgres process SIGKILL. Doing so will cause the database
|
||||
to kill all current connections, roll back any pending transactions,
|
||||
and enter recovery mode.
|
||||
|
||||
#### Stopping the Zulip postgres database
|
||||
|
||||
To start or stop postgres manually, use the pg_ctlcluster command:
|
||||
|
||||
```
|
||||
pg_ctlcluster 9.1 [--force] main {start|stop|restart|reload}
|
||||
```
|
||||
|
||||
By default, using stop uses "smart" mode, which waits for all clients
|
||||
to disconnect before shutting down the database. This can take
|
||||
prohibitively long. If you use the --force option with stop,
|
||||
pg_ctlcluster will try to use the "fast" mode for shutting
|
||||
down. "Fast" mode is described by the manpage thusly:
|
||||
|
||||
With the --force option the "fast" mode is used which rolls back all
|
||||
active transactions, disconnects clients immediately and thus shuts
|
||||
down cleanly. If that does not work, shutdown is attempted again in
|
||||
"immediate" mode, which can leave the cluster in an inconsistent state
|
||||
and thus will lead to a recovery run at the next start. If this still
|
||||
does not help, the postmaster process is killed. Exits with 0 on
|
||||
success, with 2 if the server is not running, and with 1 on other
|
||||
failure conditions. This mode should only be used when the machine is
|
||||
about to be shut down.
|
||||
|
||||
Many database parameters can be adjusted while the database is
|
||||
running. Just modify /etc/postgresql/9.1/main/postgresql.conf and
|
||||
issue a reload. The logs will note the change.
|
||||
|
||||
#### Debugging issues starting postgres
|
||||
|
||||
pg_ctlcluster often doesn't give you any information on why the
|
||||
database failed to start. It may tell you to check the logs, but you
|
||||
won't find any information there. pg_ctlcluster runs the following
|
||||
command underneath when it actually goes to start Postgres:
|
||||
|
||||
```
|
||||
/usr/lib/postgresql/9.1/bin/pg_ctl start -D /var/lib/postgresql/9.1/main -s -o '-c config_file="/etc/postgresql/9.1/main/postgresql.conf"'
|
||||
```
|
||||
|
||||
Since pg_ctl doesn't redirect stdout or stderr, running the above can
|
||||
give you better diagnostic information. However, you might want to
|
||||
stop Postgres and restart it using pg_ctlcluster after you've debugged
|
||||
with this approach, since it does bypass some of the work that
|
||||
pg_ctlcluster does.
|
||||
|
||||
### Scalability of Zulip
|
||||
|
||||
This section attempts to address the considerations involved with
|
||||
running Zulip with a large team (>1000 users).
|
||||
|
||||
* We recommend using a remote postgres database (see
|
||||
REMOTE_POSTGRES_HOST docs above) for isolation, though it is not
|
||||
required. In the following, we discuss a relatively simple
|
||||
* We recommend using a [remote postgres
|
||||
database](#postgres-database-details) for isolation, though it is
|
||||
not required. In the following, we discuss a relatively simple
|
||||
configuration with two types of servers: application servers
|
||||
(running Django, Tornado, RabbitMQ, Redis, Memcached, etc.) and
|
||||
database servers.
|
||||
|
@ -947,10 +853,25 @@ hostname/DNS side of the configuration. Suggestions for how to
|
|||
improve this SSO setup documentation are very welcome!
|
||||
|
||||
|
||||
Remote Postgresql database
|
||||
==========================
|
||||
Postgres database details
|
||||
=========================
|
||||
|
||||
If you want to use a remote Postgresql database, you should configure the information about the connection with the server. You need a user called "zulip" in your database server. You can configure these options in /etc/zulip/settings.py
|
||||
#### Remote Postgres database
|
||||
|
||||
This is a bit annoying to setup, but you can configure Zulip to use a
|
||||
dedicated postgres server by setting the `REMOTE_POSTGRES_HOST`
|
||||
variable in /etc/zulip/settings.py, and configuring Postgres
|
||||
certificate authentication (see
|
||||
http://www.postgresql.org/docs/9.1/static/ssl-tcp.html and
|
||||
http://www.postgresql.org/docs/9.1/static/libpq-ssl.html for
|
||||
documentation on how to set this up and deploy the certificates) to
|
||||
make the DATABASES configuration in `zproject/settings.py` work (or
|
||||
override that configuration).
|
||||
|
||||
If you want to use a remote Postgresql database, you should configure
|
||||
the information about the connection with the server. You need a user
|
||||
called "zulip" in your database server. You can configure these
|
||||
options in /etc/zulip/settings.py:
|
||||
|
||||
* REMOTE_POSTGRES_HOST: Name or IP address of the remote host
|
||||
* REMOTE_POSTGRES_SSLMODE: SSL Mode used to connect to the server, different options you can use are:
|
||||
|
@ -967,10 +888,101 @@ Then you should specify the password of the user zulip for the database in /etc/
|
|||
postgres_password = xxxx
|
||||
```
|
||||
|
||||
Finally you can stop your database in the zulip server to save some memory, you can do it with:
|
||||
Finally, you can stop your database on the Zulip server via:
|
||||
|
||||
```
|
||||
sudo service postgresql stop
|
||||
sudo update-rc.d postgresql disable
|
||||
```
|
||||
|
||||
In future versions of this feature, we'd like to implement and
|
||||
document how to the remote postgres database server itself
|
||||
automatically by using the Zulip install script with a different set
|
||||
of puppet manifests than the all-in-one feature; if you're interested
|
||||
in working on this, post to the Zulip development mailing list and we
|
||||
can give you some tips.
|
||||
|
||||
#### Debugging postgres database issues
|
||||
|
||||
When debugging postgres issues, in addition to the standard `pg_top`
|
||||
tool, often it can be useful to use this query:
|
||||
|
||||
```
|
||||
SELECT procpid,waiting,query_start,current_query FROM pg_stat_activity ORDER BY procpid;
|
||||
```
|
||||
|
||||
which shows the currently running backends and their activity. This is
|
||||
similar to the pg_top output, with the added advantage of showing the
|
||||
complete query, which can be valuable in debugging.
|
||||
|
||||
To stop a runaway query, you can run `SELECT pg_cancel_backend(pid
|
||||
int)` or `SELECT pg_terminate_backend(pid int)` as the 'postgres'
|
||||
user. The former cancels the backend's current query and the latter
|
||||
terminates the backend process. They are implemented by sending SIGINT
|
||||
and SIGTERM to the processes, respectively. We recommend against
|
||||
sending a Postgres process SIGKILL. Doing so will cause the database
|
||||
to kill all current connections, roll back any pending transactions,
|
||||
and enter recovery mode.
|
||||
|
||||
#### Stopping the Zulip postgres database
|
||||
|
||||
To start or stop postgres manually, use the pg_ctlcluster command:
|
||||
|
||||
```
|
||||
pg_ctlcluster 9.1 [--force] main {start|stop|restart|reload}
|
||||
```
|
||||
|
||||
By default, using stop uses "smart" mode, which waits for all clients
|
||||
to disconnect before shutting down the database. This can take
|
||||
prohibitively long. If you use the --force option with stop,
|
||||
pg_ctlcluster will try to use the "fast" mode for shutting
|
||||
down. "Fast" mode is described by the manpage thusly:
|
||||
|
||||
With the --force option the "fast" mode is used which rolls back all
|
||||
active transactions, disconnects clients immediately and thus shuts
|
||||
down cleanly. If that does not work, shutdown is attempted again in
|
||||
"immediate" mode, which can leave the cluster in an inconsistent state
|
||||
and thus will lead to a recovery run at the next start. If this still
|
||||
does not help, the postmaster process is killed. Exits with 0 on
|
||||
success, with 2 if the server is not running, and with 1 on other
|
||||
failure conditions. This mode should only be used when the machine is
|
||||
about to be shut down.
|
||||
|
||||
Many database parameters can be adjusted while the database is
|
||||
running. Just modify /etc/postgresql/9.1/main/postgresql.conf and
|
||||
issue a reload. The logs will note the change.
|
||||
|
||||
#### Debugging issues starting postgres
|
||||
|
||||
pg_ctlcluster often doesn't give you any information on why the
|
||||
database failed to start. It may tell you to check the logs, but you
|
||||
won't find any information there. pg_ctlcluster runs the following
|
||||
command underneath when it actually goes to start Postgres:
|
||||
|
||||
```
|
||||
/usr/lib/postgresql/9.1/bin/pg_ctl start -D /var/lib/postgresql/9.1/main -s -o '-c config_file="/etc/postgresql/9.1/main/postgresql.conf"'
|
||||
```
|
||||
|
||||
Since pg_ctl doesn't redirect stdout or stderr, running the above can
|
||||
give you better diagnostic information. However, you might want to
|
||||
stop Postgres and restart it using pg_ctlcluster after you've debugged
|
||||
with this approach, since it does bypass some of the work that
|
||||
pg_ctlcluster does.
|
||||
|
||||
|
||||
#### Postgres Vacuuming alerts
|
||||
|
||||
The `autovac_freeze` postgres alert from `check_postgres` is
|
||||
particularly important. This alert indicates that the age (in terms
|
||||
of number of transactions) of the oldest transaction id (XID) is
|
||||
getting close to the `autovacuum_freeze_max_age` setting. When the
|
||||
oldest XID hits that age, Postgres will force a VACUUM operation,
|
||||
which can often lead to sudden downtime until the operation finishes.
|
||||
If it did not do this and the age of the oldest XID reached 2 billion,
|
||||
transaction id wraparound would occur and there would be data loss.
|
||||
To clear the nagios alert, perform a `VACUUM` in each indicated
|
||||
database as a database superuser (`postgres`).
|
||||
|
||||
See
|
||||
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
|
||||
for more details on postgres vacuuming.
|
||||
|
|
Loading…
Reference in New Issue