diff --git a/docs/production/deployment.md b/docs/production/deployment.md index 2af164e508..38cba2824b 100644 --- a/docs/production/deployment.md +++ b/docs/production/deployment.md @@ -497,6 +497,40 @@ The key configuration options are, for the `/json/events` and with multiple IPs for your Zulip machine; sometimes this happens with IPv6 configuration). +## PostgreSQL warm standby + +Zulip's configuration allows for [warm standby database +replicas][warm-standby] as a disaster recovery solution; see the +linked PostgreSQL documentation for details on this type of +deployment. Zulip's configuration leverages `wal-g`, our [database +backup solution][wal-g], and thus requires that it be configured for +the primary and all secondary warm standby replicas. + +The primary should have log-shipping enabled, with: + +```ini +[postgresql] +replication = yes +``` + +Warm spare replicas should have log-shipping enabled, and their +primary replica and replication username configured: + +```ini +[postgresql] +replication = yes +replication_user = replicator +replication_primary = hostname-of-primary.example.com +``` + +The `postgres` user on the replica will need to be able to +authenticate as the `replicator` user, which may require further +configuration of `pg_hba.conf` and client certificates on the +replica. + +[warm-standby]: https://www.postgresql.org/docs/current/warm-standby.html +[wal-g]: ../production/export-and-import.html#backup-details + ## System and deployment configuration The file `/etc/zulip/zulip.conf` is used to configure properties of @@ -636,9 +670,23 @@ setting](https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-R #### `replication` -Set to non-empty to enable replication to enable [streaming -replication between PostgreSQL -servers](../production/export-and-import.html#postgresql-streaming-replication). +Set to non-empty to enable replication to enable [log shipping +replication between PostgreSQL servers](#postgresql-warm-standby). +This should be enabled on the primary, as well as any replicas, and +further requires configuration of +[wal-g](../production/export-and-import.html#backup-details). + +#### `replication_primary` + +On the [warm standby replicas](#postgresql-warm-standby), set to the +hostname of the primary PostgreSQL server that streaming replication +should be done from. + +#### `replication_user` + +On the [warm standby replicas](#postgresql-warm-standby), set to the +username that the host should authenticate to the primary PostgreSQL +server as, for streaming replication. #### `ssl_ca_file` diff --git a/docs/production/export-and-import.md b/docs/production/export-and-import.md index 53191a3c83..9f087a2ec1 100644 --- a/docs/production/export-and-import.md +++ b/docs/production/export-and-import.md @@ -48,8 +48,8 @@ service (or back): decommissioning a Zulip organization. - It's possible to set up [PostgreSQL streaming - replication](#postgresql-streaming-replication) and the [S3 file - upload + replication](../production/deployment.html#postgresql-warm-standby) + and the [S3 file upload backend](../production/upload-backends.html#s3-backend-configuration) as part of a high availability environment. @@ -229,19 +229,6 @@ confirm that your backups are working. You may also want to monitor that they are up to date using the Nagios plugin at: `puppet/zulip/files/nagios_plugins/zulip_postgresql_backups/check_postgresql_backup`. -## PostgreSQL streaming replication - -Zulip has database configuration for using PostgreSQL streaming -replication. You can see the configuration in these files: - -- `puppet/zulip_ops/manifests/profile/postgresql.pp` -- `puppet/zulip_ops/files/postgresql/*` - -We use this configuration for Zulip Cloud, and it works well in -production, but it's not fully generic. Contributions to make it a -supported and documented option for other installations are -appreciated. - ## Data export Zulip's powerful data export tool is designed to handle migration of a diff --git a/docs/production/requirements.md b/docs/production/requirements.md index 2abfef45b5..f7a3e63c27 100644 --- a/docs/production/requirements.md +++ b/docs/production/requirements.md @@ -205,9 +205,9 @@ installing Zulip with a dedicated database server. single-server installation with 16GB of RAM, 4 cores (essentially always idle), and its database was using about 100GB of disk. -- **Disaster recovery:** One can easily run a hot spare application - server and a hot spare database (using [PostgreSQL streaming - replication][streaming-replication]). Make sure the hot spare +- **Disaster recovery:** One can easily run a warm spare application + server and a warm spare database (using [PostgreSQL warm standby + replicas][streaming-replication]). Make sure the warm spare application server has copies of `/etc/zulip` and you're either syncing `LOCAL_UPLOADS_DIR` or using the [S3 file uploads backend][s3-uploads]. @@ -233,5 +233,5 @@ impact Zulip's scalability, this [performance and scalability design document](../subsystems/performance.md) may also be of interest. [s3-uploads]: ../production/upload-backends.html#s3-backend-configuration -[streaming-replication]: ../production/export-and-import.html#postgresql-streaming-replication +[streaming-replication]: ../production/deployment.html#postgresql-warm-standby [contact-support]: https://zulip.com/help/contact-support diff --git a/puppet/zulip/manifests/profile/postgresql.pp b/puppet/zulip/manifests/profile/postgresql.pp index 9ed1ef63c8..c12d60a762 100644 --- a/puppet/zulip/manifests/profile/postgresql.pp +++ b/puppet/zulip/manifests/profile/postgresql.pp @@ -10,9 +10,13 @@ class zulip::profile::postgresql { $random_page_cost = zulipconf('postgresql', 'random_page_cost', undef) $effective_io_concurrency = zulipconf('postgresql', 'effective_io_concurrency', undef) - $replication = zulipconf('postgresql', 'replication', undef) + $listen_addresses = zulipconf('postgresql', 'listen_addresses', undef) + $replication = zulipconf('postgresql', 'replication', undef) + $replication_primary = zulipconf('postgresql', 'replication_primary', undef) + $replication_user = zulipconf('postgresql', 'replication_user', undef) + $ssl_cert_file = zulipconf('postgresql', 'ssl_cert_file', undef) $ssl_key_file = zulipconf('postgresql', 'ssl_key_file', undef) $ssl_ca_file = zulipconf('postgresql', 'ssl_ca_file', undef) @@ -33,6 +37,31 @@ class zulip::profile::postgresql { content => template("zulip/postgresql/${zulip::postgresql_common::version}/postgresql.conf.template.erb"), } + if $replication_primary != '' and $replication_user != '' { + if $zulip::postgresql_common::version in ['10', '11'] { + # PostgreSQL 11 and below used a recovery.conf file for replication + file { "${zulip::postgresql_base::postgresql_confdir}/recovery.conf": + ensure => file, + require => Package[$zulip::postgresql_base::postgresql], + owner => 'postgres', + group => 'postgres', + mode => '0644', + content => template('zulip/postgresql/recovery.conf.template.erb'), + } + } else { + # PostgreSQL 12 and above use the presence of a standby.signal + # file to trigger replication + file { "${zulip::postgresql_base::postgresql_confdir}/standby.signal": + ensure => file, + require => Package[$zulip::postgresql_base::postgresql], + owner => 'postgres', + group => 'postgres', + mode => '0644', + content => '', + } + } + } + exec { $zulip::postgresql_base::postgresql_restart: require => Package[$zulip::postgresql_base::postgresql], refreshonly => true, diff --git a/puppet/zulip/templates/postgresql/12/postgresql.conf.template.erb b/puppet/zulip/templates/postgresql/12/postgresql.conf.template.erb index a9db00d5a3..be3a659ac8 100644 --- a/puppet/zulip/templates/postgresql/12/postgresql.conf.template.erb +++ b/puppet/zulip/templates/postgresql/12/postgresql.conf.template.erb @@ -787,15 +787,15 @@ effective_io_concurrency = <%= @effective_io_concurrency %> listen_addresses = <%= @listen_addresses %> <% end -%> -<% if @replication != '' -%> -# Primary replication settings (ignored on replica) -wal_level = hot_standby +<% if @replication != '' || (@replication_primary != '' && @replication_user != '') -%> +# Replication max_wal_senders = 5 archive_mode = on archive_command = '/usr/bin/timeout 10m /usr/local/bin/env-wal-g wal-push %p' - -# Replica settings (ignored on primary) -hot_standby = on +restore_command = '/usr/local/bin/env-wal-g wal-fetch "%f" "%p"' +<% if @replication_primary != '' && @replication_user != '' -%> +primary_conninfo = 'host=<%= @replication_primary %> user=<%= @replication_user %>' +<% end -%> <% end -%> <% if @ssl_cert_file != '' -%> diff --git a/puppet/zulip/templates/postgresql/13/postgresql.conf.template.erb b/puppet/zulip/templates/postgresql/13/postgresql.conf.template.erb index 1b3aef7b36..80dd31a241 100644 --- a/puppet/zulip/templates/postgresql/13/postgresql.conf.template.erb +++ b/puppet/zulip/templates/postgresql/13/postgresql.conf.template.erb @@ -818,15 +818,15 @@ effective_io_concurrency = <%= @effective_io_concurrency %> listen_addresses = <%= @listen_addresses %> <% end -%> -<% if @replication != '' -%> -# Primary replication settings (ignored on replica) -wal_level = hot_standby +<% if @replication != '' || (@replication_primary != '' && @replication_user != '') -%> +# Replication max_wal_senders = 5 archive_mode = on archive_command = '/usr/bin/timeout 10m /usr/local/bin/env-wal-g wal-push %p' - -# Replica settings (ignored on primary) -hot_standby = on +restore_command = '/usr/local/bin/env-wal-g wal-fetch "%f" "%p"' +<% if @replication_primary != '' && @replication_user != '' -%> +primary_conninfo = 'host=<%= @replication_primary %> user=<%= @replication_user %>' +<% end -%> <% end -%> <% if @ssl_cert_file != '' -%> diff --git a/puppet/zulip/templates/postgresql/14/postgresql.conf.template.erb b/puppet/zulip/templates/postgresql/14/postgresql.conf.template.erb index 9df238e2f5..030624fe65 100644 --- a/puppet/zulip/templates/postgresql/14/postgresql.conf.template.erb +++ b/puppet/zulip/templates/postgresql/14/postgresql.conf.template.erb @@ -839,11 +839,15 @@ effective_io_concurrency = <%= @effective_io_concurrency %> listen_addresses = <%= @listen_addresses %> <% end -%> -<% if @replication != '' -%> -# Primary replication settings (ignored on replica) +<% if @replication != '' || (@replication_primary != '' && @replication_user != '') -%> +# Replication max_wal_senders = 5 archive_mode = on archive_command = '/usr/bin/timeout 10m /usr/local/bin/env-wal-g wal-push %p' +restore_command = '/usr/local/bin/env-wal-g wal-fetch "%f" "%p"' +<% if @replication_primary != '' && @replication_user != '' -%> +primary_conninfo = 'host=<%= @replication_primary %> user=<%= @replication_user %>' +<% end -%> <% end -%> <% if @ssl_cert_file != '' -%> diff --git a/puppet/zulip/templates/postgresql/recovery.conf.template.erb b/puppet/zulip/templates/postgresql/recovery.conf.template.erb new file mode 100644 index 0000000000..e23549b0bf --- /dev/null +++ b/puppet/zulip/templates/postgresql/recovery.conf.template.erb @@ -0,0 +1,6 @@ +standby_mode = on +restore_command = '/usr/local/bin/env-wal-g wal-fetch "%f" "%p"' +recovery_target_timeline = 'latest' +<% if @replication_primary != '' && @replication_user != '' -%> +primary_conninfo = 'host=<%= @replication_primary %> user=<%= @replication_user %>' +<% end -%>