No plugins are installed inside the /usr/local/munin/lib this creates
in munin-node, nor are they symlinked into /etc/munin/plugins, so
non-default plugins are added by this.
The one complexity is that hosts_fullstack are treated differently, as
they are not currently found in the manual `hosts` list, and as such
do not get munin monitoring.
check_memcached does not support memcached authentication even in its
latest release (it’s in a TODO item comment, and that’s it), and was
never particularly useful.
When supervisor is first installed, it is started automatically, and
creates the socket, owned by root. Subsequent reconfiguration in
puppet only calls `reread + update`, which is insufficient to apply
the `chown = zulip:zulip` line in `supervisord.conf`, leaving the
socket owned by `root` and the last part of the installation unable to
restart `supervisor` services as the `zulip` user. The `chown` line
in `scripts/lib/install` exists to paper over this.
Add a separate exec target for changes to `supervisord.conf` itself,
which restarts the full service. This leaves the default `restart`
action on the service for the lightweight `reread + update` action,
which is more common.
We use `systemctl` only on redhat-esque builds, because CI runs
Ubuntu, but init is not systemd in that context. `systemctl reload`
is sufficient to re-apply the socket ownership, but a full `restart`
and not `reload` is necessary under `/etc/init.d/supervisor`.
wal-g has a slihghtly different format than wal-e in its `backup-list`
output; it only contains three columns:
- `name`
- `last_modified`,
- `wal_segment_backup_start`
..rather than wal-e's plethora, most of which were blank:
- `name`
- `last_modified`
- `expanded_size_bytes`
- `wal_segment_backup_start`
- `wal_segment_offset_backup_start`
- `wal_segment_backup_stop`
- `wal_segment_offset_backup_stop`
Remove one argument from the split.
In Bionic, nagios-plugins-basic is a transitional package which
depends on monitoring-plugins-basic. In Focal, it is a virtual
package, which means that every time puppet runs, it tries to
re-install the nagios-plugins-basic package.
Switch all instances to referring to `$zulip::common::nagios_plugins`,
and repoint that to monitoring-plugins-basic.
Frontend hosts in multiple-host configurations (including docker
hosts) need a `psql` binary installed. ca9d27175b switched to not
setting `postgresql.version` in `zulip.conf`, which in turn means that
`$zulip::base::postgres_version` is unset. This, in turn, led to the
frontend hosts installing `postgresql-client-`, whose trailing dash
causes apt to _uninstall_ that package.
Unconditionally install `postgresql-client` with no explicit version
attached. This is a metapackage which depends on the latest client
package, which currently means it will install `postgresql-client-12`.
On single-host installs which have configured `postgresql.version` in
`zulip.conf` to be a lower version, this will result in
`postgresql-client-12` existing alongside another
version (e.g. `postgresql-client-10`); `psql` will give the most
recent. This is acceptable because the semantic meaning of the
postgresql version in `zulip.conf` is about the database engine
itself, not the command-line client.
Support for Xenial and Stretch was removed (5154ddafca, 0f4b1076ad,
8944e0ad53, 79acd5ae40, 1219a2e854), but not all codepaths were
updated to remove their conditionals on it.
Remove all code predicated on Xenial or Stretch. debathena support
was migrated to Bionic, since that appears to be the current state of
existing debathena servers.
This prevents memcached from automatically appending the hostname to
the username, which was a source of problems on servers where the
hostname was changed.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
We would prefer to use the postgres packages from Postgres themselves,
if available. However, this requires ensures that, for existing
installs, we preserve the same version of postgres as their base
distribution installed.
Move the version-determination logic from being computed at puppet
interpolation time, to being computed at install time and pinned into
zulip.conf.
Since 9e8f1aacb3, zulip_ops machines
might have two Package declarations for `certbot`, which doesn't work
in puppet.
The fix is, as usual, to use our `zulip::safepackage` wrapper instead.
The style guide for Zulip is to always use "primary" and "replica"
when describing database replication. Adjust a few comments under
`puppet/` that do not adhere to this.
Unfortunately, some references still remain to the insensitive and
inaccurate "master" / "slave" terminology. However, these are only in
files which we are attempting to preserve as close to the upstream
versions they are derived from (e.g. postgresql.conf,
postfix/master.cf).
65774e1c4f switched from using the bundled check_postgres.pl to using
the version from packages; the file itself remained, however.
Remove it, and clean up references to it.
Fixes#15389.
Instead of SSH'ing around to them, run directly on the database hosts.
This means that the replicas do not know how many bytes behind they
are in _receiving_ the wall logs; thus, the monitoring also extends to
the primary database, which knows that information for each replica.
This also allows for detecting when there are too few active replicas.
Use read-only types (List ↦ Sequence, Dict ↦ Mapping, Set ↦
AbstractSet) to guard against accidental mutation of the default
value.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
All differences between the primary and replica roles having been
merged, fold the `postgres_common`, `postgres_master`, and
`postgres_slave` roles into just `postgres_appdb`.
These values differed between the primary and secondary database
hosts, for unclear reasons. The differences date back to their
introduction in 387f63deaa. As the comment in the replica
confguration notes, settings of `vm.dirty_ratio = 10` and
`vm.dirty_background_ratio = 5` matched the kernel defaults for
"newer" kernels; however, kernel 2.6.30 bumped those to 20 and 10,
respectively[1], as a fix for underlying logic now being more correct.
Remove these overrides; they should at very least be consistent across
roles, and the previous values look to be an attempt to tune for a
very much older version of the Linux kernel, which was using an
different, buggier, algorithm under the hood.
[1] 1b5e62b42b
This file controls streaming replication, and recovery using wal-g on
the secondary. The `primary_conninfo` data needs to change on short
notice when database failover happens, in a way that is not suitable
for being controlled by puppet.
PostgreSQL 12, in fact, removes the use of the `recovery.conf` file[1];
the `primary_conninfo` and `restore_command` information goes into the
main `postgresql.conf` file, and the standby status is controlled by
the presence of absence of an empty `standby.signal` file.
Remove the puppet control of the `recovery.conf` file.
[1] https://pgstef.github.io/2018/11/26/postgresql12_preview_recovery_conf_disappears.html
Since the nagios authentication is stored _in the database_, it is
unnecessary to run if the database is simply a replica of the
production database. The only case in which this statement would have
an effect is if the postgres node contains a _different_ (or empty)
database, which `setup_disks` now effectively prevents.
Remove the unnecessary step.
481613a344 updated the `setup_disks` script to no longer reference
`mdadm`, since we no longer set up RAID on servers.
Update the puppet that would call it to remove the `mdadm` dependency,
and run only if the state is not what it produces -- namely, a symlink
for `/var/lib/postgresql`, which must point to an existent
`/srv/postgresql` directory.