postgresql: Support replication on PostgreSQL >= 11, document.

PostgreSQL 11 and below used a configuration file names
`recovery.conf` to manage replicas and standbys; support for this was
removed in PostgreSQL 12[1], and the configuration parameters were
moved into the main `postgresql.conf`.

Add `zulip.conf` settings for the primary server hostname and
replication username, so that the complete `postgresql.conf`
configuration on PostgreSQL 14 can continue to be managed, even when
replication is enabled.  For consistency, also begin writing out the
`recovery.conf` for PostgreSQL 11 and below.

In PostgreSQL 12 configuration and later, the `wal_level =
hot_standby` setting is removed, as `hot_standby` is equivalent to
`replica`, which is the default value[2].  Similarly, the
`hot_standby = on` setting is also the default[3].

Documentation is added for these features, and the commentary on the
"Export and Import" page referencing files under `puppet/zulip_ops/`
is removed, as those files no longer have any replication-specific
configuration.

[1]: https://www.postgresql.org/docs/current/recovery-config.html
[2]: https://www.postgresql.org/docs/12/runtime-config-wal.html#GUC-WAL-LEVEL
[3]: https://www.postgresql.org/docs/12/runtime-config-replication.html#GUC-HOT-STANDBY
This commit is contained in:
Alex Vandiver 2021-11-19 15:33:41 -08:00 committed by Tim Abbott
parent 7d3399a970
commit cb2d0ff32b
8 changed files with 111 additions and 37 deletions

View File

@ -497,6 +497,40 @@ The key configuration options are, for the `/json/events` and
with multiple IPs for your Zulip machine; sometimes this happens with with multiple IPs for your Zulip machine; sometimes this happens with
IPv6 configuration). IPv6 configuration).
## PostgreSQL warm standby
Zulip's configuration allows for [warm standby database
replicas][warm-standby] as a disaster recovery solution; see the
linked PostgreSQL documentation for details on this type of
deployment. Zulip's configuration leverages `wal-g`, our [database
backup solution][wal-g], and thus requires that it be configured for
the primary and all secondary warm standby replicas.
The primary should have log-shipping enabled, with:
```ini
[postgresql]
replication = yes
```
Warm spare replicas should have log-shipping enabled, and their
primary replica and replication username configured:
```ini
[postgresql]
replication = yes
replication_user = replicator
replication_primary = hostname-of-primary.example.com
```
The `postgres` user on the replica will need to be able to
authenticate as the `replicator` user, which may require further
configuration of `pg_hba.conf` and client certificates on the
replica.
[warm-standby]: https://www.postgresql.org/docs/current/warm-standby.html
[wal-g]: ../production/export-and-import.html#backup-details
## System and deployment configuration ## System and deployment configuration
The file `/etc/zulip/zulip.conf` is used to configure properties of The file `/etc/zulip/zulip.conf` is used to configure properties of
@ -636,9 +670,23 @@ setting](https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-R
#### `replication` #### `replication`
Set to non-empty to enable replication to enable [streaming Set to non-empty to enable replication to enable [log shipping
replication between PostgreSQL replication between PostgreSQL servers](#postgresql-warm-standby).
servers](../production/export-and-import.html#postgresql-streaming-replication). This should be enabled on the primary, as well as any replicas, and
further requires configuration of
[wal-g](../production/export-and-import.html#backup-details).
#### `replication_primary`
On the [warm standby replicas](#postgresql-warm-standby), set to the
hostname of the primary PostgreSQL server that streaming replication
should be done from.
#### `replication_user`
On the [warm standby replicas](#postgresql-warm-standby), set to the
username that the host should authenticate to the primary PostgreSQL
server as, for streaming replication.
#### `ssl_ca_file` #### `ssl_ca_file`

View File

@ -48,8 +48,8 @@ service (or back):
decommissioning a Zulip organization. decommissioning a Zulip organization.
- It's possible to set up [PostgreSQL streaming - It's possible to set up [PostgreSQL streaming
replication](#postgresql-streaming-replication) and the [S3 file replication](../production/deployment.html#postgresql-warm-standby)
upload and the [S3 file upload
backend](../production/upload-backends.html#s3-backend-configuration) backend](../production/upload-backends.html#s3-backend-configuration)
as part of a high availability environment. as part of a high availability environment.
@ -229,19 +229,6 @@ confirm that your backups are working. You may also want to monitor
that they are up to date using the Nagios plugin at: that they are up to date using the Nagios plugin at:
`puppet/zulip/files/nagios_plugins/zulip_postgresql_backups/check_postgresql_backup`. `puppet/zulip/files/nagios_plugins/zulip_postgresql_backups/check_postgresql_backup`.
## PostgreSQL streaming replication
Zulip has database configuration for using PostgreSQL streaming
replication. You can see the configuration in these files:
- `puppet/zulip_ops/manifests/profile/postgresql.pp`
- `puppet/zulip_ops/files/postgresql/*`
We use this configuration for Zulip Cloud, and it works well in
production, but it's not fully generic. Contributions to make it a
supported and documented option for other installations are
appreciated.
## Data export ## Data export
Zulip's powerful data export tool is designed to handle migration of a Zulip's powerful data export tool is designed to handle migration of a

View File

@ -205,9 +205,9 @@ installing Zulip with a dedicated database server.
single-server installation with 16GB of RAM, 4 cores (essentially single-server installation with 16GB of RAM, 4 cores (essentially
always idle), and its database was using about 100GB of disk. always idle), and its database was using about 100GB of disk.
- **Disaster recovery:** One can easily run a hot spare application - **Disaster recovery:** One can easily run a warm spare application
server and a hot spare database (using [PostgreSQL streaming server and a warm spare database (using [PostgreSQL warm standby
replication][streaming-replication]). Make sure the hot spare replicas][streaming-replication]). Make sure the warm spare
application server has copies of `/etc/zulip` and you're either application server has copies of `/etc/zulip` and you're either
syncing `LOCAL_UPLOADS_DIR` or using the [S3 file uploads syncing `LOCAL_UPLOADS_DIR` or using the [S3 file uploads
backend][s3-uploads]. backend][s3-uploads].
@ -233,5 +233,5 @@ impact Zulip's scalability, this [performance and scalability design
document](../subsystems/performance.md) may also be of interest. document](../subsystems/performance.md) may also be of interest.
[s3-uploads]: ../production/upload-backends.html#s3-backend-configuration [s3-uploads]: ../production/upload-backends.html#s3-backend-configuration
[streaming-replication]: ../production/export-and-import.html#postgresql-streaming-replication [streaming-replication]: ../production/deployment.html#postgresql-warm-standby
[contact-support]: https://zulip.com/help/contact-support [contact-support]: https://zulip.com/help/contact-support

View File

@ -10,9 +10,13 @@ class zulip::profile::postgresql {
$random_page_cost = zulipconf('postgresql', 'random_page_cost', undef) $random_page_cost = zulipconf('postgresql', 'random_page_cost', undef)
$effective_io_concurrency = zulipconf('postgresql', 'effective_io_concurrency', undef) $effective_io_concurrency = zulipconf('postgresql', 'effective_io_concurrency', undef)
$replication = zulipconf('postgresql', 'replication', undef)
$listen_addresses = zulipconf('postgresql', 'listen_addresses', undef) $listen_addresses = zulipconf('postgresql', 'listen_addresses', undef)
$replication = zulipconf('postgresql', 'replication', undef)
$replication_primary = zulipconf('postgresql', 'replication_primary', undef)
$replication_user = zulipconf('postgresql', 'replication_user', undef)
$ssl_cert_file = zulipconf('postgresql', 'ssl_cert_file', undef) $ssl_cert_file = zulipconf('postgresql', 'ssl_cert_file', undef)
$ssl_key_file = zulipconf('postgresql', 'ssl_key_file', undef) $ssl_key_file = zulipconf('postgresql', 'ssl_key_file', undef)
$ssl_ca_file = zulipconf('postgresql', 'ssl_ca_file', undef) $ssl_ca_file = zulipconf('postgresql', 'ssl_ca_file', undef)
@ -33,6 +37,31 @@ class zulip::profile::postgresql {
content => template("zulip/postgresql/${zulip::postgresql_common::version}/postgresql.conf.template.erb"), content => template("zulip/postgresql/${zulip::postgresql_common::version}/postgresql.conf.template.erb"),
} }
if $replication_primary != '' and $replication_user != '' {
if $zulip::postgresql_common::version in ['10', '11'] {
# PostgreSQL 11 and below used a recovery.conf file for replication
file { "${zulip::postgresql_base::postgresql_confdir}/recovery.conf":
ensure => file,
require => Package[$zulip::postgresql_base::postgresql],
owner => 'postgres',
group => 'postgres',
mode => '0644',
content => template('zulip/postgresql/recovery.conf.template.erb'),
}
} else {
# PostgreSQL 12 and above use the presence of a standby.signal
# file to trigger replication
file { "${zulip::postgresql_base::postgresql_confdir}/standby.signal":
ensure => file,
require => Package[$zulip::postgresql_base::postgresql],
owner => 'postgres',
group => 'postgres',
mode => '0644',
content => '',
}
}
}
exec { $zulip::postgresql_base::postgresql_restart: exec { $zulip::postgresql_base::postgresql_restart:
require => Package[$zulip::postgresql_base::postgresql], require => Package[$zulip::postgresql_base::postgresql],
refreshonly => true, refreshonly => true,

View File

@ -787,15 +787,15 @@ effective_io_concurrency = <%= @effective_io_concurrency %>
listen_addresses = <%= @listen_addresses %> listen_addresses = <%= @listen_addresses %>
<% end -%> <% end -%>
<% if @replication != '' -%> <% if @replication != '' || (@replication_primary != '' && @replication_user != '') -%>
# Primary replication settings (ignored on replica) # Replication
wal_level = hot_standby
max_wal_senders = 5 max_wal_senders = 5
archive_mode = on archive_mode = on
archive_command = '/usr/bin/timeout 10m /usr/local/bin/env-wal-g wal-push %p' archive_command = '/usr/bin/timeout 10m /usr/local/bin/env-wal-g wal-push %p'
restore_command = '/usr/local/bin/env-wal-g wal-fetch "%f" "%p"'
# Replica settings (ignored on primary) <% if @replication_primary != '' && @replication_user != '' -%>
hot_standby = on primary_conninfo = 'host=<%= @replication_primary %> user=<%= @replication_user %>'
<% end -%>
<% end -%> <% end -%>
<% if @ssl_cert_file != '' -%> <% if @ssl_cert_file != '' -%>

View File

@ -818,15 +818,15 @@ effective_io_concurrency = <%= @effective_io_concurrency %>
listen_addresses = <%= @listen_addresses %> listen_addresses = <%= @listen_addresses %>
<% end -%> <% end -%>
<% if @replication != '' -%> <% if @replication != '' || (@replication_primary != '' && @replication_user != '') -%>
# Primary replication settings (ignored on replica) # Replication
wal_level = hot_standby
max_wal_senders = 5 max_wal_senders = 5
archive_mode = on archive_mode = on
archive_command = '/usr/bin/timeout 10m /usr/local/bin/env-wal-g wal-push %p' archive_command = '/usr/bin/timeout 10m /usr/local/bin/env-wal-g wal-push %p'
restore_command = '/usr/local/bin/env-wal-g wal-fetch "%f" "%p"'
# Replica settings (ignored on primary) <% if @replication_primary != '' && @replication_user != '' -%>
hot_standby = on primary_conninfo = 'host=<%= @replication_primary %> user=<%= @replication_user %>'
<% end -%>
<% end -%> <% end -%>
<% if @ssl_cert_file != '' -%> <% if @ssl_cert_file != '' -%>

View File

@ -839,11 +839,15 @@ effective_io_concurrency = <%= @effective_io_concurrency %>
listen_addresses = <%= @listen_addresses %> listen_addresses = <%= @listen_addresses %>
<% end -%> <% end -%>
<% if @replication != '' -%> <% if @replication != '' || (@replication_primary != '' && @replication_user != '') -%>
# Primary replication settings (ignored on replica) # Replication
max_wal_senders = 5 max_wal_senders = 5
archive_mode = on archive_mode = on
archive_command = '/usr/bin/timeout 10m /usr/local/bin/env-wal-g wal-push %p' archive_command = '/usr/bin/timeout 10m /usr/local/bin/env-wal-g wal-push %p'
restore_command = '/usr/local/bin/env-wal-g wal-fetch "%f" "%p"'
<% if @replication_primary != '' && @replication_user != '' -%>
primary_conninfo = 'host=<%= @replication_primary %> user=<%= @replication_user %>'
<% end -%>
<% end -%> <% end -%>
<% if @ssl_cert_file != '' -%> <% if @ssl_cert_file != '' -%>

View File

@ -0,0 +1,6 @@
standby_mode = on
restore_command = '/usr/local/bin/env-wal-g wal-fetch "%f" "%p"'
recovery_target_timeline = 'latest'
<% if @replication_primary != '' && @replication_user != '' -%>
primary_conninfo = 'host=<%= @replication_primary %> user=<%= @replication_user %>'
<% end -%>