zulip

Commit Graph

Author	SHA1	Message	Date
Alex Vandiver	93bcb86345	nagios: Reorder service checks.	2022-06-22 12:07:38 -07:00
Alex Vandiver	33472ee9ff	nagios: Remove unused stats host set.	2022-06-22 12:07:38 -07:00
Alex Vandiver	bc4f4b4862	nagios: Make the pageable/not/flaky tri-state clearer.	2022-06-22 12:07:38 -07:00
Alex Vandiver	c74f195fba	nagios: Split AWS and non-AWS hosts, for ntp checks. The non-AWS hosts cannot use the AWS ntp server for their check.	2022-06-22 12:07:38 -07:00
Alex Vandiver	872efdee58	nagios: Fold single- and multitornado_frontends back into frontends. `5abf4dee92` made this distinction, then multitornado_frontends was never used; the singletornado_frontends alerting worked even for the multiple-Tornado instances. Remove the useless and misleading distinction.	2022-06-22 12:07:38 -07:00
Alex Vandiver	7f6a77da31	puppet: Add a redis exporter.	2022-05-03 17:13:44 -07:00
Alex Vandiver	1bd5723cd2	puppet: Add a prometheus monitor for tornado processes.	2022-03-20 16:12:11 -07:00
Anders Kaseorg	b3260bd610	docs: Use Debian and Ubuntu version numbers over development codenames. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-23 12:04:24 -08:00
Alex Vandiver	3c95ad82c6	puppet: Upgrade to nagios4. This updates the puppeted nagios configuration file for the Nagios4 defaults.	2022-01-11 09:38:31 -08:00
Alex Vandiver	8a5be972d2	puppet: Add a uwsgi exporter for monitoring. This allows investigation of how many workers are busy, and to track "harikari" terminations.	2022-01-03 15:25:58 -08:00
Alex Vandiver	bb5a2c8138	puppet: Move prometheus to external_dep.	2021-12-29 16:35:15 -08:00
Alex Vandiver	2d6c096904	puppet: Move node_exporter to external_dep.	2021-12-29 16:35:15 -08:00
Alex Vandiver	291f688678	puppet: Use zulip::external_dep for grafana, template config. Templating the config ensures that the service is restarted when it is upgraded.	2021-12-08 20:58:10 -08:00
Anders Kaseorg	93f62b999e	nagios: Replace check_website_response with standard check_http plugin. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-07-09 16:47:03 -07:00
Alex Vandiver	dd90083ed7	puppet: Provide FQDN of self as URI, so the certificate validates. Failure to do this results in: ``` psql: error: failed to connect to `host=localhost user=zulip database=zulip`: failed to write startup message (x509: certificate is valid for [redacted], not localhost) ```	2021-06-14 00:14:48 -07:00
Alex Vandiver	d905eb6131	puppet: Add a database teleport server. Host-based md5 auth for 127.0.0.1 must be removed from `pg_hba.conf`, otherwise password authentication is preferred over certificate-based authentication for localhost.	2021-06-08 22:21:21 -07:00
Alex Vandiver	a2b1009ed5	puppet: Turn on "authentication" which defaults to user with all rights. Nagios refuses to allow any modifications with use_authentication off; re-enabled "authentication" but set a default user, which (by way of the `*` permissions in `359f37389a`) is allowed to take all actions.	2021-06-08 15:19:28 -07:00
Alex Vandiver	61b6fc865c	puppet: Add a label to teleport applications, to allow RBAC. Roles can only grant or deny access based on labels; set one based on the application name.	2021-06-08 15:19:04 -07:00
Alex Vandiver	4aff5b1d22	puppet: Allow access to `/` in nagios. This was a regression in `51b985b40d`.	2021-06-07 22:40:58 -07:00
Alex Vandiver	359f37389a	puppet: Remove in-nagios auth restrictions. `51b985b40d` made nagios only accessible from localhost, or as proxied via teleport. Remove the HTTP-level auth requirements.	2021-06-07 16:17:45 -07:00
Alex Vandiver	51b985b40d	puppet: Move nagios to behind teleport. This makes the server only accessible via localhost, by way of the Teleport application service.	2021-06-02 18:38:38 -07:00
Alex Vandiver	c9141785fd	puppet: Use concat fragments to place port allows next to services. This means that services will only open their ports if they are actually run, without having to clutter rules.v4 with a log of `if` statements. This does not go as far as using `puppetlabs/firewall`[1] because that would represent an additional DSL to learn; raw IPtables sections can easily be inserted into the generated iptables file via `concat::fragment` (either inline, or as a separate file), but config can be centralized next to the appropriate service. [1] https://forge.puppet.com/modules/puppetlabs/firewall	2021-05-27 21:14:48 -07:00
Alex Vandiver	9ea86c861b	puppet: Add a nagios alert configuration for smokescreen. This verifies that the proxy is working by accessing a highly-available website through it. Since failure of this equates to failures of Sentry notifications and Android mobile push notifications, this is a paging service.	2021-03-18 10:11:15 -07:00
Alex Vandiver	a215c83c2d	puppet: Switch to more explicit variable rather than reuse a nagios one. Redis is not nagios, and this only leads to confusion as to why there is a nagios domain setting on frontend servers; it also leaves the `redis0` part of the name buried in the template. Switch to an explicit variable for the redis hostname.	2021-03-10 11:44:54 -08:00
Alex Vandiver	d938dd9d4a	puppet: Document smokescreen installation, and move to puppet/zulip/. This is more broadly useful than for just Kandra; provide documentation and means to install Smokescreen for stand-alone servers, and motivate its use somewhat more.	2021-03-02 17:16:38 -08:00
Alex Vandiver	32149c6a1c	puppet: Add ksplice uptrack for kernel hotpatches.	2021-02-25 18:05:47 -08:00
Alex Vandiver	0b736ef4cf	puppet: Remove puppet_ops configuration for separate loadbalancer host.	2021-02-22 16:05:13 -08:00
Alex Vandiver	e30b524896	iptables: Limit smokescreen port 4750, add camo port. Limit incoming connections to port 4750 to only the smokescreen host, and also allow access to the Camo server on that host, on port 9292.	2021-02-17 13:52:38 -08:00
Alex Vandiver	29f60bad20	smokescreen: Put the version into the supervisorctl command. This makes it reload correctly if the version is changed.	2021-02-16 08:12:31 -08:00
Alex Vandiver	45f6c79c4a	puppet: Rename postgres_ variables to postgresql_.	2020-10-28 11:51:52 -07:00
Alex Vandiver	e124324050	puppet: Rename postgres_appdb in nagios to postgresql.	2020-10-28 11:51:52 -07:00
Alex Vandiver	78b92a51cc	puppet: Allow access to smokescreen port via iptables.	2020-10-15 15:18:35 -07:00
Alex Vandiver	0d5356969e	puppet: Reformat ipv4 iptables rules comments.	2020-10-15 15:18:35 -07:00
Alex Vandiver	24383a5082	puppet: Rename hosts_domain so hosts_prefix can be grepped for.	2020-07-10 00:14:09 -07:00
Alex Vandiver	f8fc3a16eb	puppet: Use "primary" / "replica" consistently in comments. The style guide for Zulip is to always use "primary" and "replica" when describing database replication. Adjust a few comments under `puppet/` that do not adhere to this. Unfortunately, some references still remain to the insensitive and inaccurate "master" / "slave" terminology. However, these are only in files which we are attempting to preserve as close to the upstream versions they are derived from (e.g. postgresql.conf, postfix/master.cf).	2020-06-15 16:18:07 -07:00
Alex Vandiver	7d4a370a57	puppet: Move monitoring of pg replication to the pg hosts. Instead of SSH'ing around to them, run directly on the database hosts. This means that the replicas do not know how many bytes behind they are in _receiving_ the wall logs; thus, the monitoring also extends to the primary database, which knows that information for each replica. This also allows for detecting when there are too few active replicas.	2020-06-15 16:18:07 -07:00
Alex Vandiver	8b1d49dbc7	puppet: Rename "wiki" realm to "monitoring". This is vestigial. It requires manually altering the `htdigest` file (not stored in this repo) to change the digest realm from `wiki` to `monitoring`, and will re-prompt users for their passwords if the browsers currently store them.	2020-05-30 12:26:21 -07:00
Tim Abbott	cfbb617f5c	puppet: Update nagios configuration for checking local disk.	2020-04-16 17:48:36 -07:00
Stefan Weil	d2fa058cc1	text: Fix some typos (most of them found and fixed by codespell). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-27 17:25:56 -07:00
Anders Kaseorg	becef760bf	cleanup: Delete leading newlines. Previous cleanups (mostly the removals of Python __future__ imports) were done in a way that introduced leading newlines. Delete leading newlines from all files, except static/assets/zulip-emoji/NOTICE, which is a verbatim copy of the Apache 2.0 license. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-08-06 23:29:11 -07:00
Tim Abbott	5abf4dee92	nagios: Add new host groups for Tornado processes. We also move all the existing Tornado monitoring rules to the singletornado_frontends rule.	2018-11-06 16:33:18 -08:00
Tim Abbott	4e8487c886	nagios: Bump maximum processes limits. These seemed to be flapping for no good reason.	2018-05-02 11:12:47 -07:00
Tim Abbott	9ed2a94b8c	nagios: Add configuration designed for full-stack servers. This doesn't yet pass all Nagios checks correctly, and still has a few flaws: * The ideal setup code for the `nagios` user in the database isn't included. * Some of the other details are a bit off; we need to split some host roles. But it's better than nothing, and we can iterate from here.	2018-01-24 14:16:03 -08:00
Tim Abbott	f2055397c1	nagios: Update apache configuration to be generated. Since this is basically just stock Apache configuration for Nagios with a hostname put in, we can just fetch the hostname from our configuration.	2017-10-05 21:51:29 -07:00
Tim Abbott	e6e7bcf6e1	nagios: Move camo_check_url into configuration.	2017-10-05 21:09:24 -07:00
Tim Abbott	13a36d9af3	puppet: Make old redis_tunnel configuration usable. This old puppet configuration was never really used, and regardless hardcoded an ancient zulip.net hostname. We fix this to use the zulipconf system to get the host domain (though not, at present, the hostname).	2017-10-05 20:40:22 -07:00
Tim Abbott	96c3014da0	nagios: Automate configuration of outgoing email with msmtp. Now we no longer need to check in a bunch of hostnames in order to configure Nagios.	2017-10-05 20:29:47 -07:00
Tim Abbott	ba7be4102e	puppet: Update munin tunnels configuration to use zulipconf. This eliminates another old hardcoding of zulip.net.	2017-10-05 20:14:43 -07:00
Tim Abbott	886a8853ac	nagios: Move server-specific config into hostgroups. These new hostgroups exist so we can eliminate explicit references to individual hosts in services.cfg.	2017-10-05 20:06:48 -07:00
Tim Abbott	b6ce9583a9	nagios: Fetch list of hosts from zulip.conf. This makes this much more configurable and much less hardcoded.	2017-10-05 20:06:30 -07:00

1 2

61 Commits