zulip/puppet/zulip_ops/files/nagios3/conf.d
Alex Vandiver e94b6afb00 nagios: Remove broken check_email_deliverer_* checks and related code.
These checks suffer from a couple notable problems:
 - They are only enabled on staging hosts -- where they should never
   be run.  Since ef6d0ec5ca, these supervisor processes are only
   run on one host, and never on the staging host.
 - They run as the `nagios` user, which does not have appropriate
   permissions, and thus the checks always fail.  Specifically,
   `nagios` does not have permissions to run `supervisorctl`, since
   the socket is owned by the `zulip` user, and mode 0700; and the
   `nagios` user does not have permission to access Zulip secrets to
   run `./manage.py print_email_delivery_backlog`.

Rather than rewrite these checks to run on a cron as zulip, and check
those file contents as the nagios user, drop these checks -- they can
be rewritten at a later point, or replaced with Prometheus alerting,
and currently serve only to cause always-failing Nagios checks, which
normalizes alert failures.

Leave the files installed if they currently exist, rather than
cluttering puppet with `ensure => absent`; they do no harm if they are
left installed.
2021-08-03 16:07:13 -07:00
..
generic-host_nagios2.cfg
generic-service_nagios2.cfg
hostgroups.cfg puppet: Add a nagios alert configuration for smokescreen. 2021-03-18 10:11:15 -07:00
services.cfg nagios: Remove broken check_email_deliverer_* checks and related code. 2021-08-03 16:07:13 -07:00
timeperiods_nagios2.cfg