zulip/scripts/setup/upgrade-postgres

#!/usr/bin/env bash
set -eo pipefail

if [ "$EUID" -ne 0 ]; then
    echo "Error: This script must be run as root" >&2
    exit 1
fi

UPGRADE_TO=${1:-12}
UPGRADE_FROM=$(crudini --get /etc/zulip/zulip.conf postgresql version)
ZULIP_PATH="$(dirname "$0")/../.."

if [ "$UPGRADE_TO" = "$UPGRADE_FROM" ]; then
    echo "Already running PostgreSQL $UPGRADE_TO!"
    exit 1
fi

set -x

"$ZULIP_PATH"/scripts/lib/setup-apt-repo
apt-get install -y "postgresql-$UPGRADE_TO"
if pg_lsclusters -h | grep -qE "^$UPGRADE_TO\s+main\b"; then
    pg_dropcluster "$UPGRADE_TO" main --stop
fi

(
    # Two-stage application of Puppet; we apply the bare-bones
    # PostgreSQL configuration first, so that FTS will be configured
    # prior to the pg_upgradecluster.
    TEMP_CONF_DIR=$(mktemp -d)
    cp /etc/zulip/zulip.conf "$TEMP_CONF_DIR"
    ZULIP_CONF="${TEMP_CONF_DIR}/zulip.conf"
    crudini --set "$ZULIP_CONF" postgresql version "$UPGRADE_TO"
    crudini --set "$ZULIP_CONF" machine puppet_classes zulip::profile::base,zulip::postgresql_base
    touch "/usr/share/postgresql/$UPGRADE_TO/pgroonga_setup.sql.applied"

    "$ZULIP_PATH"/scripts/zulip-puppet-apply -f --config "$ZULIP_CONF"
    rm -rf "$TEMP_CONF_DIR"
)

# Capture the output so we know where the path to the post-upgrade scripts is
UPGRADE_LOG=$(mktemp "/var/log/zulip/postgres-upgrade-$UPGRADE_FROM-$UPGRADE_TO.XXXXXXXXX.log")
pg_upgradecluster -v "$UPGRADE_TO" "$UPGRADE_FROM" main --method=upgrade --link | tee "$UPGRADE_LOG"
SCRIPTS_PATH=$(grep -o "/var/log/postgresql/pg_upgradecluster-$UPGRADE_FROM-$UPGRADE_TO-main.*" "$UPGRADE_LOG" || true)

# If the upgrade completed successfully, lock in the new version in
# our configuration immediately
crudini --set /etc/zulip/zulip.conf postgresql version "$UPGRADE_TO"

# Update the statistics
[ -n "$SCRIPTS_PATH" ] && su postgres -c "$SCRIPTS_PATH/analyze_new_cluster.sh"

# Start the database up cleanly
"$ZULIP_PATH"/scripts/zulip-puppet-apply -f

# Drop the old data, binaries, and scripts
pg_dropcluster "$UPGRADE_FROM" main
apt remove -y "postgresql-$UPGRADE_FROM"
if [ -n "$SCRIPTS_PATH" ]; then
    su postgres -c "$SCRIPTS_PATH/delete_old_cluster.sh"
    rm -rf "$SCRIPTS_PATH"
else
    set +x
    echo
    echo
    echo ">>>>> pg_upgradecluster succeeded, but post-upgrade scripts path could not"
    echo "      be parsed out!  Please read the pg_upgradecluster output to understand"
    echo "      the current status of your cluster:"
    echo "          $UPGRADE_LOG"
    echo "      and report this bug with the Postgres $UPGRADE_FROM -> $UPGRADE_TO upgrade to:"
    echo "          https://github.com/zulip/zulip/issues"
    echo
    echo
fi
upgrade: Add a tool to upgrade PostgreSQL. This is based on the existing steps in the documentation, with additional changes now that the PostgreSQL version is stored in `/etc/zulip/zulip.conf`. 2020-06-26 23:32:36 +02:00			`#!/usr/bin/env bash`
upgrade-postgres: Catch failed pg_upgradecluster exit code. Because the command is part of a pipe sequence, the exitcode defaults to the last in the sequence, which is not the most important one here. Set pipefail, which sets the exit status to the exit code of the last program in the sequence to exit non-zero, or 0 if all succeeded. This prevents the upgrade from barreling onward and setting `postgres.version` improperly if the database upgrade step failed. 2020-10-14 23:34:34 +02:00			`set -eo pipefail`
upgrade: Add a tool to upgrade PostgreSQL. This is based on the existing steps in the documentation, with additional changes now that the PostgreSQL version is stored in `/etc/zulip/zulip.conf`. 2020-06-26 23:32:36 +02:00
			`if [ "$EUID" -ne 0 ]; then`
			`echo "Error: This script must be run as root" >&2`
			`exit 1`
			`fi`

			`UPGRADE_TO=${1:-12}`
			`UPGRADE_FROM=$(crudini --get /etc/zulip/zulip.conf postgresql version)`
			`ZULIP_PATH="$(dirname "$0")/../.."`

			`if [ "$UPGRADE_TO" = "$UPGRADE_FROM" ]; then`
			`echo "Already running PostgreSQL $UPGRADE_TO!"`
			`exit 1`
			`fi`

			`set -x`

			`"$ZULIP_PATH"/scripts/lib/setup-apt-repo`
			`apt-get install -y "postgresql-$UPGRADE_TO"`
			`if pg_lsclusters -h \| grep -qE "^$UPGRADE_TO\s+main\b"; then`
			`pg_dropcluster "$UPGRADE_TO" main --stop`
			`fi`

puppet: Apply basic PostgreSQL configuration before pg_upgradecluster. Running `pg-upgradecluster` runs the `CREATE TEXT SEARCH DICTIONARY` and `CREATE TEXT SEARCH CONFIGURATION` from `zerver/migrations/0001_initial.py` on the new PostgreSQL cluster; this requires that the stopwords file and dictionary exist _prior_ to `pg_upgradecluster` being run. This causes a minor dependency conflict -- we do not wish to duplicate the functionality from `zulip::postgres_appdb_base` which configures those files, but installing all of `zulip::postgres_appdb_tuned` will attempt to restart PostgreSQL -- which has not configured the cluster for the new version yet. In order to split out configuration of the prerequisites for the application database, and the steps required to run it, we need to be able to apply only part of the puppet configuration. Use the newly-added `--config` argument to provide a more limited `zulip.conf` which only applies `zulip::postgres_appdb_base` to the new version of Postgres, creating the required tsearch data files. This also preserves the property that a failure at any point prior to the `pg_upgradecluster` is easily recoverable, by re-running `zulip-puppet-apply`. 2020-07-07 01:39:37 +02:00			`(`
docs: Fix more capitalization issues. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2020-10-23 02:43:28 +02:00			`# Two-stage application of Puppet; we apply the bare-bones`
			`# PostgreSQL configuration first, so that FTS will be configured`
puppet: Apply basic PostgreSQL configuration before pg_upgradecluster. Running `pg-upgradecluster` runs the `CREATE TEXT SEARCH DICTIONARY` and `CREATE TEXT SEARCH CONFIGURATION` from `zerver/migrations/0001_initial.py` on the new PostgreSQL cluster; this requires that the stopwords file and dictionary exist _prior_ to `pg_upgradecluster` being run. This causes a minor dependency conflict -- we do not wish to duplicate the functionality from `zulip::postgres_appdb_base` which configures those files, but installing all of `zulip::postgres_appdb_tuned` will attempt to restart PostgreSQL -- which has not configured the cluster for the new version yet. In order to split out configuration of the prerequisites for the application database, and the steps required to run it, we need to be able to apply only part of the puppet configuration. Use the newly-added `--config` argument to provide a more limited `zulip.conf` which only applies `zulip::postgres_appdb_base` to the new version of Postgres, creating the required tsearch data files. This also preserves the property that a failure at any point prior to the `pg_upgradecluster` is easily recoverable, by re-running `zulip-puppet-apply`. 2020-07-07 01:39:37 +02:00			`# prior to the pg_upgradecluster.`
			`TEMP_CONF_DIR=$(mktemp -d)`
			`cp /etc/zulip/zulip.conf "$TEMP_CONF_DIR"`
			`ZULIP_CONF="${TEMP_CONF_DIR}/zulip.conf"`
			`crudini --set "$ZULIP_CONF" postgresql version "$UPGRADE_TO"`
puppet: Rename postgres_appdb to postgresql. There is only one PostgreSQL database; the "appdb" is irrelevant. Also use "postgresql," as it is the name of the software, whereas "postgres" the name of the binary and colloquial name. This is minor cleanup, but enabled by the other renames in the previous commit. 2020-10-20 04:10:17 +02:00			`crudini --set "$ZULIP_CONF" machine puppet_classes zulip::profile::base,zulip::postgresql_base`
puppet: Apply basic PostgreSQL configuration before pg_upgradecluster. Running `pg-upgradecluster` runs the `CREATE TEXT SEARCH DICTIONARY` and `CREATE TEXT SEARCH CONFIGURATION` from `zerver/migrations/0001_initial.py` on the new PostgreSQL cluster; this requires that the stopwords file and dictionary exist _prior_ to `pg_upgradecluster` being run. This causes a minor dependency conflict -- we do not wish to duplicate the functionality from `zulip::postgres_appdb_base` which configures those files, but installing all of `zulip::postgres_appdb_tuned` will attempt to restart PostgreSQL -- which has not configured the cluster for the new version yet. In order to split out configuration of the prerequisites for the application database, and the steps required to run it, we need to be able to apply only part of the puppet configuration. Use the newly-added `--config` argument to provide a more limited `zulip.conf` which only applies `zulip::postgres_appdb_base` to the new version of Postgres, creating the required tsearch data files. This also preserves the property that a failure at any point prior to the `pg_upgradecluster` is easily recoverable, by re-running `zulip-puppet-apply`. 2020-07-07 01:39:37 +02:00			`touch "/usr/share/postgresql/$UPGRADE_TO/pgroonga_setup.sql.applied"`

			`"$ZULIP_PATH"/scripts/zulip-puppet-apply -f --config "$ZULIP_CONF"`
			`rm -rf "$TEMP_CONF_DIR"`
			`)`

upgrade: Use the in-place pg_upgrade, not a full dump/restore. pg_upgradecluster has two possibilities for `--method`: `dump`, and `upgrade`. The former is the default, and does a `pg_dump` of all of the databases in the old cluster and feeds them into the new cluster. This is a sure-fire way of getting the same information in both databases, but may be extremely slow on large databases, and is guaranteed to fail on servers whose databases take up >50% of their disk. The `--method=upgrade` method, by contrast, uses pg_upgrade to copy the raw database data file over to the new cluster, and then fiddles with their internal structure as needed by the upgrade to let them be correct for the new version[1]. This is slightly faster than the dump/load method, since it skips the serialization step, but still requires that there be enough space on disk for both old and new versions at once. `pg_upgrade` is currently supported for all versions of PostgreSQL from 8.4 to 12. Using `pg_upgrade` incurs slightly more risk, but since the it is widely used by now, using it in the relatively-controlled Zulip server environment is reasonable. The expected worst failure is failure to upgrade, not corruption or data loss. Additionally passing `--link` uses hardlinks to link the data files into both the old and new directories simultaneously. This resolve both the runtime of the operation, as well as the disk space usage. The only potential downside to this is that as soon as writes have occurred on the upgraded cluster, the old cluster can no longer be started. Since this tooling intends to remove the old cluster immediately after the upgrade completes successfully, this is not a significant drawback. Switch to using `--method=upgrade --link`. This technique spits out two shell scripts which are expected to be run after completion of the upgrade; one re-analyzes the statistics, the other does an `rm -rf` of the data where it is still hardlinked in the old cluster. Extract the location of these scripts from parsing the `pg_upgradecluster` output; since the path is not static, we must rely on it being relatively easy to parse. The risk of the path changing is lower, and has more obvious failure modes, than inserting the current contents of these upgrade steps into the overall `upgrade-postgres`. [1] https://www.postgresql.org/docs/12/pgupgrade.html 2020-07-10 01:24:46 +02:00			`# Capture the output so we know where the path to the post-upgrade scripts is`
			`UPGRADE_LOG=$(mktemp "/var/log/zulip/postgres-upgrade-$UPGRADE_FROM-$UPGRADE_TO.XXXXXXXXX.log")`
upgrade-postgres: Pass the requested postgres explicitly. 2020-10-01 21:56:59 +02:00			`pg_upgradecluster -v "$UPGRADE_TO" "$UPGRADE_FROM" main --method=upgrade --link \| tee "$UPGRADE_LOG"`
upgrade: Use the in-place pg_upgrade, not a full dump/restore. pg_upgradecluster has two possibilities for `--method`: `dump`, and `upgrade`. The former is the default, and does a `pg_dump` of all of the databases in the old cluster and feeds them into the new cluster. This is a sure-fire way of getting the same information in both databases, but may be extremely slow on large databases, and is guaranteed to fail on servers whose databases take up >50% of their disk. The `--method=upgrade` method, by contrast, uses pg_upgrade to copy the raw database data file over to the new cluster, and then fiddles with their internal structure as needed by the upgrade to let them be correct for the new version[1]. This is slightly faster than the dump/load method, since it skips the serialization step, but still requires that there be enough space on disk for both old and new versions at once. `pg_upgrade` is currently supported for all versions of PostgreSQL from 8.4 to 12. Using `pg_upgrade` incurs slightly more risk, but since the it is widely used by now, using it in the relatively-controlled Zulip server environment is reasonable. The expected worst failure is failure to upgrade, not corruption or data loss. Additionally passing `--link` uses hardlinks to link the data files into both the old and new directories simultaneously. This resolve both the runtime of the operation, as well as the disk space usage. The only potential downside to this is that as soon as writes have occurred on the upgraded cluster, the old cluster can no longer be started. Since this tooling intends to remove the old cluster immediately after the upgrade completes successfully, this is not a significant drawback. Switch to using `--method=upgrade --link`. This technique spits out two shell scripts which are expected to be run after completion of the upgrade; one re-analyzes the statistics, the other does an `rm -rf` of the data where it is still hardlinked in the old cluster. Extract the location of these scripts from parsing the `pg_upgradecluster` output; since the path is not static, we must rely on it being relatively easy to parse. The risk of the path changing is lower, and has more obvious failure modes, than inserting the current contents of these upgrade steps into the overall `upgrade-postgres`. [1] https://www.postgresql.org/docs/12/pgupgrade.html 2020-07-10 01:24:46 +02:00			`SCRIPTS_PATH=$(grep -o "/var/log/postgresql/pg_upgradecluster-$UPGRADE_FROM-$UPGRADE_TO-main.*" "$UPGRADE_LOG" \|\| true)`
upgrade: Add a tool to upgrade PostgreSQL. This is based on the existing steps in the documentation, with additional changes now that the PostgreSQL version is stored in `/etc/zulip/zulip.conf`. 2020-06-26 23:32:36 +02:00
upgrade: Add additional comments. 2020-07-10 01:41:18 +02:00			`# If the upgrade completed successfully, lock in the new version in`
			`# our configuration immediately`
upgrade: Add a tool to upgrade PostgreSQL. This is based on the existing steps in the documentation, with additional changes now that the PostgreSQL version is stored in `/etc/zulip/zulip.conf`. 2020-06-26 23:32:36 +02:00			`crudini --set /etc/zulip/zulip.conf postgresql version "$UPGRADE_TO"`
upgrade: Use the in-place pg_upgrade, not a full dump/restore. pg_upgradecluster has two possibilities for `--method`: `dump`, and `upgrade`. The former is the default, and does a `pg_dump` of all of the databases in the old cluster and feeds them into the new cluster. This is a sure-fire way of getting the same information in both databases, but may be extremely slow on large databases, and is guaranteed to fail on servers whose databases take up >50% of their disk. The `--method=upgrade` method, by contrast, uses pg_upgrade to copy the raw database data file over to the new cluster, and then fiddles with their internal structure as needed by the upgrade to let them be correct for the new version[1]. This is slightly faster than the dump/load method, since it skips the serialization step, but still requires that there be enough space on disk for both old and new versions at once. `pg_upgrade` is currently supported for all versions of PostgreSQL from 8.4 to 12. Using `pg_upgrade` incurs slightly more risk, but since the it is widely used by now, using it in the relatively-controlled Zulip server environment is reasonable. The expected worst failure is failure to upgrade, not corruption or data loss. Additionally passing `--link` uses hardlinks to link the data files into both the old and new directories simultaneously. This resolve both the runtime of the operation, as well as the disk space usage. The only potential downside to this is that as soon as writes have occurred on the upgraded cluster, the old cluster can no longer be started. Since this tooling intends to remove the old cluster immediately after the upgrade completes successfully, this is not a significant drawback. Switch to using `--method=upgrade --link`. This technique spits out two shell scripts which are expected to be run after completion of the upgrade; one re-analyzes the statistics, the other does an `rm -rf` of the data where it is still hardlinked in the old cluster. Extract the location of these scripts from parsing the `pg_upgradecluster` output; since the path is not static, we must rely on it being relatively easy to parse. The risk of the path changing is lower, and has more obvious failure modes, than inserting the current contents of these upgrade steps into the overall `upgrade-postgres`. [1] https://www.postgresql.org/docs/12/pgupgrade.html 2020-07-10 01:24:46 +02:00
			`# Update the statistics`
			`[ -n "$SCRIPTS_PATH" ] && su postgres -c "$SCRIPTS_PATH/analyze_new_cluster.sh"`

upgrade: Add additional comments. 2020-07-10 01:41:18 +02:00			`# Start the database up cleanly`
upgrade: Add a tool to upgrade PostgreSQL. This is based on the existing steps in the documentation, with additional changes now that the PostgreSQL version is stored in `/etc/zulip/zulip.conf`. 2020-06-26 23:32:36 +02:00			`"$ZULIP_PATH"/scripts/zulip-puppet-apply -f`

upgrade: Add additional comments. 2020-07-10 01:41:18 +02:00			`# Drop the old data, binaries, and scripts`
upgrade: Add a tool to upgrade PostgreSQL. This is based on the existing steps in the documentation, with additional changes now that the PostgreSQL version is stored in `/etc/zulip/zulip.conf`. 2020-06-26 23:32:36 +02:00			`pg_dropcluster "$UPGRADE_FROM" main`
			`apt remove -y "postgresql-$UPGRADE_FROM"`
upgrade: Use the in-place pg_upgrade, not a full dump/restore. pg_upgradecluster has two possibilities for `--method`: `dump`, and `upgrade`. The former is the default, and does a `pg_dump` of all of the databases in the old cluster and feeds them into the new cluster. This is a sure-fire way of getting the same information in both databases, but may be extremely slow on large databases, and is guaranteed to fail on servers whose databases take up >50% of their disk. The `--method=upgrade` method, by contrast, uses pg_upgrade to copy the raw database data file over to the new cluster, and then fiddles with their internal structure as needed by the upgrade to let them be correct for the new version[1]. This is slightly faster than the dump/load method, since it skips the serialization step, but still requires that there be enough space on disk for both old and new versions at once. `pg_upgrade` is currently supported for all versions of PostgreSQL from 8.4 to 12. Using `pg_upgrade` incurs slightly more risk, but since the it is widely used by now, using it in the relatively-controlled Zulip server environment is reasonable. The expected worst failure is failure to upgrade, not corruption or data loss. Additionally passing `--link` uses hardlinks to link the data files into both the old and new directories simultaneously. This resolve both the runtime of the operation, as well as the disk space usage. The only potential downside to this is that as soon as writes have occurred on the upgraded cluster, the old cluster can no longer be started. Since this tooling intends to remove the old cluster immediately after the upgrade completes successfully, this is not a significant drawback. Switch to using `--method=upgrade --link`. This technique spits out two shell scripts which are expected to be run after completion of the upgrade; one re-analyzes the statistics, the other does an `rm -rf` of the data where it is still hardlinked in the old cluster. Extract the location of these scripts from parsing the `pg_upgradecluster` output; since the path is not static, we must rely on it being relatively easy to parse. The risk of the path changing is lower, and has more obvious failure modes, than inserting the current contents of these upgrade steps into the overall `upgrade-postgres`. [1] https://www.postgresql.org/docs/12/pgupgrade.html 2020-07-10 01:24:46 +02:00			`if [ -n "$SCRIPTS_PATH" ]; then`
			`su postgres -c "$SCRIPTS_PATH/delete_old_cluster.sh"`
			`rm -rf "$SCRIPTS_PATH"`
			`else`
			`set +x`
			`echo`
			`echo`
			`echo ">>>>> pg_upgradecluster succeeded, but post-upgrade scripts path could not"`
			`echo " be parsed out! Please read the pg_upgradecluster output to understand"`
			`echo " the current status of your cluster:"`
			`echo " $UPGRADE_LOG"`
			`echo " and report this bug with the Postgres $UPGRADE_FROM -> $UPGRADE_TO upgrade to:"`
			`echo " https://github.com/zulip/zulip/issues"`
			`echo`
			`echo`
			`fi`