If a previous attempt at an upgrade failed for some reason, the new
PostgreSQL may be installed, and the conversion will succeed, but the
new PostgreSQL daemon will not be running (Puppet does not force it to
start). This causes the upgrade to fail when analyzing statistics,
since the daemon isn't running.
Explicitly start the new PostgreSQL; this does nothing in most cases,
but will provider better resiliency when recovering from previous
partial upgrades.
Some terminals (e.g. ssh from OS X) set an invalid locale, which
causes the `pg_upgradecluster` call late in the upgrade to fail.
Force a known locale, for consistency. This mirrors the settings in
upgrade-zulip-stage-2, set in 11ab545f3b, and its subsequent
cleanups in 64c608a51a, ee0f4ca330, and eda9ce2364.
This implements get_mandatory_secret that ensures SHARED_SECRET is
set when we hit zerver.decorator.authenticate_notify. To avoid getting
ZulipSettingsError when setting up the secrets, we set an environment
variable DISABLE_MANDATORY_SECRET_CHECK to skip the check and default
its value to an empty string.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
As a consequence:
• Bump minimum supported Python version to 3.8.
• Move Vagrant environment to Ubuntu 20.04, which has Python 3.8.
• Move CI frontend tests to Ubuntu 20.04.
• Move production build test to Ubuntu 20.04.
• Move 3.4 upgrade test to Ubuntu 20.04.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
stop-server and restart-server address all services which talk to the
database, and are thus more correct than restarting or stopping
everything in supervisor.
This is possible now that the previous commit ensures that the zulip
user can read the zulip installation directory during
`create-database`; previously, that directory was still owned by root
when `create-database` was run, whereas now it is in
`~zulip/deployments/`.
rabbitmqctl ping only checks that the Erlang process is registered
with epmd. There’s a window after that where the rabbit app is still
starting inside it.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This was not needed for OpenSSL ≥ 1.1.1 (all our supported platforms),
and breaks with OpenSSL ≥ 3.0.0 (Ubuntu 22.04). It was removed from
the upstream configuration file too: https://bugs.debian.org/990228.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The RabbitMQ docs state ([1]):
RabbitMQ nodes and CLI tools (e.g. rabbitmqctl) use a cookie to
determine whether they are allowed to communicate with each
other. [...] The cookie is just a string of alphanumeric
characters up to 255 characters in size. It is usually stored in a
local file.
...and goes on to state (emphasis ours):
If the file does not exist, Erlang VM will try to create one with
a randomly generated value when the RabbitMQ server starts
up. Using such generated cookie files are **appropriate in
development environments only.**
The auto-generated cookie does not use cryptographic sources of
randomness, and generates 20 characters of `[A-Z]`. Because of a
semi-predictable seed, the entropy of this password is thus less than
the idealized 26^20 = 94 bits of entropy; in actuality, it is 36 bits
of entropy, or potentially as low as 20 if the performance of the
server is known.
These sizes are well within the scope of remote brute-force attacks.
On provision, install, and upgrade, replace the default insecure
20-character Erlang cookie with a cryptographically secure
255-character string (the max length allowed).
[1] https://www.rabbitmq.com/clustering.html#erlang-cookie
This reverts commit 889547ff5e. It is
unused in the Docker container, as the configurtaion of the `zulip`
user in the rabbitmq node is done via environment variables. The
Zulip host in that context does not have `rabbitmqctl` installed, and
would have needed to know the Erlang cookie to be able to run these
commands.
As a consequence:
• Bump minimum supported Python version to 3.7.
• Move Vagrant environment to Debian 10, which has Python 3.7.
• Move CI frontend tests to Debian 10.
• Move production build test to Debian 10.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This catches nine functional indexes that the previous query didn’t:
upper_preregistration_email_idx
upper_stream_name_idx
upper_subject_idx
upper_userprofile_email_idx
zerver_message_recipient_upper_subject
zerver_mutedtopic_stream_topic
zerver_stream_realm_id_name_uniq
zerver_userprofile_realm_id_delivery_email_uniq
zerver_userprofile_realm_id_email_uniq
Signed-off-by: Anders Kaseorg <anders@zulip.com>
If nginx was already installed, and we're using the webroot method of
initializing certbot, nginx needs to be reloaded. Hooks in
`/etc/letsencrypt/renewal-hooks/deploy/` do not run during initial
`certbot certonly`, so an explicit reload is required.
The certbot package installs its own systemd timer (and cron job,
which disabled itself if systemd is enabled) which updates
certificates. This process races with the cron job which Zulip
installs -- the only difference being that Zulip respects the
`certbot.auto_renew` setting, and that it passes the deploy hook.
This means that occasionally nginx would not be reloaded, when the
systemd timer caught the expiration first.
Remove the custom cron job and `certbot-maybe-renew` script, and
reconfigure certbot to always reload nginx after deploying, using
certbot directory hooks.
Since `certbot.auto_renew` can't have an effect, remove the setting.
In turn, this removes the need for `--no-zulip-conf` to
`setup-certbot`. `--deploy-hook` is similarly removed, as running
deploy hooks to restart nginx is now the default; pass
`--no-directory-hooks` in standalone mode to not attempt to reload
nginx. The other property of `--deploy-hook`, of skipping symlinking
into place, is given its own flog.
The `analyze_new_cluster.sh` script output by `pg_upgrade` just runs
`vacuumdb --all --analyze-in-stages`, which runs three passes over the
database, getting better stats each time. Each of these passes is
independent; the third pass does not require the first two.
`--analyze-in-stages` is only provided to get "something" into the
database, on the theory that it could then be started and used. Since
we wait for all three passes to complete before starting the database,
the first two passes add no value.
Additionally, PosttgreSQL 14 and up stop writing the
`analyze_new_cluster.sh` script as part of `pg_upgrade`, suggesting
the equivalent `vacuumdb --all --analyze-in-stages` call instead.
Switch to explicitly call `vacuumdb --all --analyze-only`, since we do
not gain any benefit from `--analyze-in-stages`. We also enable
parallelism, with `--jobs 10`, in order to analyze up to 10 tables in
parallel. This may increase load, but will accelerate the upgrade
process.
We previously used `zulip-puppet-apply` with a custom config file,
with an updated PostgreSQL version but more limited set of
`puppet_classes`, to pre-create the basic settings for the new cluster
before running `pg_upgradecluster`.
Unfortunately, the supervisor config uses `purge => true` to remove
all SUPERVISOR configuration files that are not included in the puppet
configuration; this leads to it removing all other supervisor
processes during the upgrade, only to add them back and start them
during the second `zulip-puppet-apply`.
It also leads to `process-fts-updates` not being started after the
upgrade completes; this is the one supervisor config file which was
not removed and re-added, and thus the one that is not re-started due
to having been re-added. This was not detected in CI because CI added
a `start-server` command which was not in the upgrade documentation.
Set a custom facter fact that prevents the `purge` behaviour of the
supervisor configuration. We want to preserve that behaviour in
general, and using `zulip-puppet-apply` continues to be the best way
to pre-set-up the PostgreSQL configuration -- but we wish to avoid
that behaviour when we know we are applying a subset of the puppet
classes.
Since supervisor configs are no longer removed and re-added, this
requires an explicit start-server step in the instructions after the
upgrades complete. This brings the documentation into alignment with
what CI is testing.