Commit Graph

428 Commits

Author SHA1 Message Date
Alex Vandiver 1e81775fa0 nagios: Drop unhelpful hostgroup comment. 2022-06-22 12:07:38 -07:00
Alex Vandiver 7b584401ac nagios: Reformat hostgroups. 2022-06-22 12:07:38 -07:00
Alex Vandiver 93bcb86345 nagios: Reorder service checks. 2022-06-22 12:07:38 -07:00
Alex Vandiver eaaa2fbff8 nagios: Use canonical "hostgroup_name" consistently. 2022-06-22 12:07:38 -07:00
Alex Vandiver e8996b53a5 nagios: Remove unused has_swap hostgroup. 2022-06-22 12:07:38 -07:00
Alex Vandiver 33472ee9ff nagios: Remove unused stats host set. 2022-06-22 12:07:38 -07:00
Alex Vandiver bc4f4b4862 nagios: Make the pageable/not/flaky tri-state clearer. 2022-06-22 12:07:38 -07:00
Alex Vandiver c74f195fba nagios: Split AWS and non-AWS hosts, for ntp checks.
The non-AWS hosts cannot use the AWS ntp server for their check.
2022-06-22 12:07:38 -07:00
Alex Vandiver 872efdee58 nagios: Fold single- and multitornado_frontends back into frontends.
5abf4dee92 made this distinction, then multitornado_frontends was
never used; the singletornado_frontends alerting worked even for the
multiple-Tornado instances.

Remove the useless and misleading distinction.
2022-06-22 12:07:38 -07:00
Alex Vandiver 3741c1c034 puppet: Switch to checking time against the AWS timeserver.
Since this is what chrony is sync'ing to, it lessens the chance of
spurious firings of this alert.

See https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync-service/
2022-05-31 22:57:32 -07:00
Alex Vandiver 7f6a77da31 puppet: Add a redis exporter. 2022-05-03 17:13:44 -07:00
Anders Kaseorg e9ba9b0e0d zulip-ec2-configure-interfaces: Remove.
Our current EC2 systems don’t have an interface named ‘eth0’, and if
they did, this script would do nothing but crash with ImportError
because we have never installed boto.utils for Python 3.

(The message of commit 2a4d851a7c made
an effort to document for future researchers why this script should
not have been blindly converted to Python 3.  However, commit
2dc6d09c2a (#14278) was evidently
unresearched and untested.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-05-03 02:25:59 -07:00
Anders Kaseorg 646a4d19a3 puppet: Remove quotes for enumerable values.
https://puppet.com/docs/puppet/7/style_guide.html#style_guide_module_design-quoting
“If a string is a value from an enumerable set of options, such as
present and absent, it SHOULD NOT be enclosed in quotes at all.”

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-29 22:06:46 -07:00
Alex Vandiver 35db1ee435 puppet: Only include "app_service" section if there are apps.
This works around gravitational/teleport#12256, but also produces config
files that are slightly cleaner.
2022-04-26 16:36:13 -07:00
Alex Vandiver f6d27562fa puppet: Configure chrony to use AWS-local NTP sources.
This prevents hosts from spewing traffic to random hosts across the
Internet.
2022-03-25 17:07:53 -07:00
Alex Vandiver 1bd5723cd2 puppet: Add a prometheus monitor for tornado processes. 2022-03-20 16:12:11 -07:00
Alex Vandiver 6b91652d9a puppet: Open the grok_exporter port.
The complete grok_exporter configuration is not ready to be committed,
but this at least prepares the way for it.
2022-03-20 16:12:11 -07:00
Alex Vandiver 6558655fc6 puppet: Add rabbitmq prometheus plugin, and open the firewall. 2022-03-20 16:12:11 -07:00
Alex Vandiver bdd2f35d05 puppet: Switch czo to using zulip_ops::app_frontend_monitoring.
This was clearly intended in f61ac4a28d
but never executed.
2022-03-20 16:12:11 -07:00
Alex Vandiver 17699bea44 puppet: postgresql_backups is auto-included if s3_backups_bucket is set.
Since 6496d43148.
2022-03-20 16:12:11 -07:00
Alex Vandiver bedc7c2986 puppet: Smokescreen is now auto-included in standalone.
Since c33562f0a8.
2022-03-20 16:12:11 -07:00
Anders Kaseorg b3260bd610 docs: Use Debian and Ubuntu version numbers over development codenames.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-23 12:04:24 -08:00
Alex Vandiver 788daa953b puppet: Factor out $::architecture case statement for golang. 2022-02-15 12:04:37 -08:00
Anders Kaseorg f6a701090c setup-apt-repos: Don’t install lsb_release.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-14 16:38:53 -08:00
Alex Vandiver e032b38661 puppet: Fix typo in uwsgi exporter dependency. 2022-02-08 15:17:17 -08:00
Alex Vandiver 3bbe5c1110 puppet: Put comments on iptables lines.
In addition to documenting the rules.v4 and rules.v6 files slightly,
these comments show up in `iptables -L`:

```
root@hostname:~# iptables -L INPUT
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere
LOGDROP    all  --  anywhere             localhost/8
ACCEPT     all  --  anywhere             anywhere             state RELATED,ESTABLISHED
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:ssh /* ssh */
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:3000 /* grafana */
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:9100 /* node_exporter */
LOGDROP    all  --  anywhere             anywhere
```
2022-01-21 16:46:14 -08:00
Alex Vandiver 6bc5849ea8 puppet: Remove now-unused debathena apt repository. 2022-01-18 14:13:28 -08:00
Alex Vandiver b3f07cc98d puppet: Replace debathena zephyr package with equivalent puppet file. 2022-01-18 14:13:28 -08:00
Alex Vandiver a6d7539571 puppet: Replace debathena krb5 package with equivalent puppet file. 2022-01-18 14:13:28 -08:00
Alex Vandiver 75224ea5de puppet: python-dev is now purely virtual; install python2.7-dev. 2022-01-18 14:13:28 -08:00
Alex Vandiver fc1adef28a puppet: Fix server_name of internal staging server. 2022-01-18 12:36:56 -08:00
Alex Vandiver 7e630b81f8 puppet: Switch to using snakeoil certs for staging.
This parallels ba3b88c81b, but for the
staging host.
2022-01-18 12:36:56 -08:00
Alex Vandiver 0b8a6a51b8 puppet: Remove all parts of AWS kernels.
Otherwise, we just uninstall the meta-package, and still restart into
the installed AWS kernel.
2022-01-12 15:52:19 -08:00
Alex Vandiver 4d7e6b26df puppet: Provide more attributes to teleport on ssh nodes. 2022-01-12 14:15:45 -08:00
Alex Vandiver 339e70671c puppet: Switch Grafana to Grafana 8 Unified Alerting. 2022-01-11 14:27:11 -08:00
Alex Vandiver 6a7eecee9a puppet: Increase load paging thresholds. 2022-01-11 09:38:31 -08:00
Alex Vandiver 1e80b844f4 puppet: Disable apparmor profile for msmtp.
As the nagios user, we want to read the msmtp configuration from
~nagios, which apparmor's profile does not allow msmtp to do.
2022-01-11 09:38:31 -08:00
Alex Vandiver 3c95ad82c6 puppet: Upgrade to nagios4.
This updates the puppeted nagios configuration file for the Nagios4
defaults.
2022-01-11 09:38:31 -08:00
Alex Vandiver 4a95967a33 puppet: Gather uwsgi stats from chat.zulip.org. 2022-01-03 21:26:57 -08:00
Alex Vandiver 8a5be972d2 puppet: Add a uwsgi exporter for monitoring.
This allows investigation of how many workers are busy, and to track
"harikari" terminations.
2022-01-03 15:25:58 -08:00
Anders Kaseorg 82748d45d8 install-yarn: Use test -ef in case /srv is a symlink.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-30 13:42:07 -08:00
Alex Vandiver c094867a74 puppet: Add aarch64 build hashes to external dependencies.
wal-g does not ship aarch64 binaries, currently; the compilation
process([1]) is somewhat complicated, so we defer the decision about
how to support wal-g for aarch64 until a later date.

[1]: https://github.com/wal-g/wal-g/blob/master/docs/PostgreSQL.md#installing
2021-12-29 16:35:15 -08:00
Alex Vandiver f166f9f7d6 puppet: Centralize versions and sha256 hashes of external dependencies.
This will make it easier to update versions of these dependencies.
2021-12-29 16:35:15 -08:00
Alex Vandiver 57662689a9 puppet: Provide a constant homedir for grafana user.
The homedir of a user cannot be changed if any processes are running
as them, so having it change over time as upgrades happen will break
puppet application, as the old grafana process under supervisor will
effectively lock changes to the user's homedir.

Unfortunately, that means that this change will thus fail to
puppet-apply unless `supervisorctl stop grafana` is run first, but
there's no way around that.
2021-12-29 16:35:15 -08:00
Alex Vandiver 6e55e52694 puppet: Pull out grafana $data_dir. 2021-12-29 16:35:15 -08:00
Alex Vandiver 1e4e6a09af puppet: Stop making resources for external binaries and directories.
In the event that extracting doesn't produce the binary we expected it
to, all this will do is create an _empty_ file where we expect the
binary to be.  This will likely muddle debugging.

Since the only reason the resourfce was made in the first place was to
make dependencies clear, switch to depending on the External_Dep
itself, when such a dependency is needed.
2021-12-29 16:35:15 -08:00
Alex Vandiver 3c163a7d5e puppet: Move slash out of $dir by convention. 2021-12-29 16:35:15 -08:00
Alex Vandiver bb5a2c8138 puppet: Move prometheus to external_dep. 2021-12-29 16:35:15 -08:00
Alex Vandiver 2d6c096904 puppet: Move node_exporter to external_dep. 2021-12-29 16:35:15 -08:00
Alex Vandiver e4b23daad7 puppet: Upgrade to Grafana 8.3.2, for CVE-2021-43813. 2021-12-10 14:00:11 -08:00