The `pg_upgrade` tool uses `pg_dump` as an internal step, and verifies
that the version of `pg_upgrade` is the same exactly the same as the
version of the PostgreSQL server it is upgrading to. A mismatch (even
in packaging versions) leads to it aborting:
```
/usr/lib/postgresql/14/bin/pg_upgrade -b /usr/lib/postgresql/13/bin -B /usr/lib/postgresql/14/bin -p 5432 -P 5435 -d /etc/postgresql/13/main -D /etc/postgresql/14/main --link
Finding the real data directory for the source cluster ok
Finding the real data directory for the target cluster ok
check for "/usr/lib/postgresql/14/bin/pg_dump" failed: incorrect version: found "pg_dump (PostgreSQL) 14.6 (Ubuntu 14.6-0ubuntu0.22.04.1)", expected "pg_dump (PostgreSQL) 14.6 (Ubuntu 14.6-1.pgdg22.04+1)"
Failure, exiting
```
Explicitly upgrade `postgresql-client` at the same time we upgrade
`postgresql` itself, so their versions match.
Fixes: #24192
If a previous attempt at an upgrade failed for some reason, the new
PostgreSQL may be installed, and the conversion will succeed, but the
new PostgreSQL daemon will not be running (Puppet does not force it to
start). This causes the upgrade to fail when analyzing statistics,
since the daemon isn't running.
Explicitly start the new PostgreSQL; this does nothing in most cases,
but will provider better resiliency when recovering from previous
partial upgrades.
Some terminals (e.g. ssh from OS X) set an invalid locale, which
causes the `pg_upgradecluster` call late in the upgrade to fail.
Force a known locale, for consistency. This mirrors the settings in
upgrade-zulip-stage-2, set in 11ab545f3b, and its subsequent
cleanups in 64c608a51a, ee0f4ca330, and eda9ce2364.
The `analyze_new_cluster.sh` script output by `pg_upgrade` just runs
`vacuumdb --all --analyze-in-stages`, which runs three passes over the
database, getting better stats each time. Each of these passes is
independent; the third pass does not require the first two.
`--analyze-in-stages` is only provided to get "something" into the
database, on the theory that it could then be started and used. Since
we wait for all three passes to complete before starting the database,
the first two passes add no value.
Additionally, PosttgreSQL 14 and up stop writing the
`analyze_new_cluster.sh` script as part of `pg_upgrade`, suggesting
the equivalent `vacuumdb --all --analyze-in-stages` call instead.
Switch to explicitly call `vacuumdb --all --analyze-only`, since we do
not gain any benefit from `--analyze-in-stages`. We also enable
parallelism, with `--jobs 10`, in order to analyze up to 10 tables in
parallel. This may increase load, but will accelerate the upgrade
process.
We previously used `zulip-puppet-apply` with a custom config file,
with an updated PostgreSQL version but more limited set of
`puppet_classes`, to pre-create the basic settings for the new cluster
before running `pg_upgradecluster`.
Unfortunately, the supervisor config uses `purge => true` to remove
all SUPERVISOR configuration files that are not included in the puppet
configuration; this leads to it removing all other supervisor
processes during the upgrade, only to add them back and start them
during the second `zulip-puppet-apply`.
It also leads to `process-fts-updates` not being started after the
upgrade completes; this is the one supervisor config file which was
not removed and re-added, and thus the one that is not re-started due
to having been re-added. This was not detected in CI because CI added
a `start-server` command which was not in the upgrade documentation.
Set a custom facter fact that prevents the `purge` behaviour of the
supervisor configuration. We want to preserve that behaviour in
general, and using `zulip-puppet-apply` continues to be the best way
to pre-set-up the PostgreSQL configuration -- but we wish to avoid
that behaviour when we know we are applying a subset of the puppet
classes.
Since supervisor configs are no longer removed and re-added, this
requires an explicit start-server step in the instructions after the
upgrades complete. This brings the documentation into alignment with
what CI is testing.