The `needrestart` tool added in 22.04 is useful in terms of listing
which services may need to be restarted to pick up updated libraries.
However, it prompts about the current state of services needing
restart for *every* subsequent `apt-get upgrade`, and defaulting core
services to restarting requires carefully manually excluding them
every time, at risk of causing an unscheduled outage.
Build a list of default-off services based on the list in
unattended-upgrades.
This check loads Django, and as such must be run as the zulip user.
Repeat the same pattern used elsewhere in nagios, of writing a state
file, which is read by `check_cron_file`.
Replication checks should only run on primary and replicas, not
standalone hosts; while `autovac_freeze` currently only runs on
primary hosts, it functions identically on replicas, and is fine to
run there.
Make `autovac_freeze` run on all `postgresql` hosts, and make
standalone hosts no longer `postgres_primary`, so they do not fail the
replication tests.
These style of checks just look for matching process names using
`check_remote_arg_string`, which dates to 8edbd64bb8. These were
added because the original two (`missedmessage_emails` and
`slow_queries`) did not create consumers, instead polling for events.
Switch these to checking the queue consumer counts that the
`check-rabbitmq-consumers` check is already writing out. Since the
`missedmessage_emails` was _already_ checked via the consumer check, a
duplicate is not added.
Even the `pageable_servers` group did not page for high load -- in
part because what was "high" depends on the servers. Set slightly
better limits based on server role.
`zmirror` itself was `zmirror_main` + `zmirrorp` but was unused; we
consistently just use the term `zmirror` for the non-personals server,
so use it as the hostgroup name.
The Redis nagios checks themselves are done against `redis` +
`frontends` groups, so there is no need to misleadingly place
`frontends` in the `redis` hostgroup.
5abf4dee92 made this distinction, then multitornado_frontends was
never used; the singletornado_frontends alerting worked even for the
multiple-Tornado instances.
Remove the useless and misleading distinction.
Our current EC2 systems don’t have an interface named ‘eth0’, and if
they did, this script would do nothing but crash with ImportError
because we have never installed boto.utils for Python 3.
(The message of commit 2a4d851a7c made
an effort to document for future researchers why this script should
not have been blindly converted to Python 3. However, commit
2dc6d09c2a (#14278) was evidently
unresearched and untested.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>