zulip

Commit Graph

Author	SHA1	Message	Date
Alex Vandiver	3bbe5c1110	puppet: Put comments on iptables lines. In addition to documenting the rules.v4 and rules.v6 files slightly, these comments show up in `iptables -L`: ``` root@hostname:~# iptables -L INPUT Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere LOGDROP all -- anywhere localhost/8 ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED ACCEPT tcp -- anywhere anywhere tcp dpt:ssh /* ssh / ACCEPT tcp -- anywhere anywhere tcp dpt:3000 / grafana / ACCEPT tcp -- anywhere anywhere tcp dpt:9100 / node_exporter */ LOGDROP all -- anywhere anywhere ```	2022-01-21 16:46:14 -08:00
Alex Vandiver	6bc5849ea8	puppet: Remove now-unused debathena apt repository.	2022-01-18 14:13:28 -08:00
Alex Vandiver	b3f07cc98d	puppet: Replace debathena zephyr package with equivalent puppet file.	2022-01-18 14:13:28 -08:00
Alex Vandiver	a6d7539571	puppet: Replace debathena krb5 package with equivalent puppet file.	2022-01-18 14:13:28 -08:00
Alex Vandiver	75224ea5de	puppet: python-dev is now purely virtual; install python2.7-dev.	2022-01-18 14:13:28 -08:00
Alex Vandiver	fc1adef28a	puppet: Fix server_name of internal staging server.	2022-01-18 12:36:56 -08:00
Alex Vandiver	7e630b81f8	puppet: Switch to using snakeoil certs for staging. This parallels `ba3b88c81b`, but for the staging host.	2022-01-18 12:36:56 -08:00
Alex Vandiver	0b8a6a51b8	puppet: Remove all parts of AWS kernels. Otherwise, we just uninstall the meta-package, and still restart into the installed AWS kernel.	2022-01-12 15:52:19 -08:00
Alex Vandiver	4d7e6b26df	puppet: Provide more attributes to teleport on ssh nodes.	2022-01-12 14:15:45 -08:00
Alex Vandiver	339e70671c	puppet: Switch Grafana to Grafana 8 Unified Alerting.	2022-01-11 14:27:11 -08:00
Alex Vandiver	6a7eecee9a	puppet: Increase load paging thresholds.	2022-01-11 09:38:31 -08:00
Alex Vandiver	1e80b844f4	puppet: Disable apparmor profile for msmtp. As the nagios user, we want to read the msmtp configuration from ~nagios, which apparmor's profile does not allow msmtp to do.	2022-01-11 09:38:31 -08:00
Alex Vandiver	3c95ad82c6	puppet: Upgrade to nagios4. This updates the puppeted nagios configuration file for the Nagios4 defaults.	2022-01-11 09:38:31 -08:00
Alex Vandiver	4a95967a33	puppet: Gather uwsgi stats from chat.zulip.org.	2022-01-03 21:26:57 -08:00
Alex Vandiver	8a5be972d2	puppet: Add a uwsgi exporter for monitoring. This allows investigation of how many workers are busy, and to track "harikari" terminations.	2022-01-03 15:25:58 -08:00
Anders Kaseorg	82748d45d8	install-yarn: Use test -ef in case /srv is a symlink. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-12-30 13:42:07 -08:00
Alex Vandiver	c094867a74	puppet: Add aarch64 build hashes to external dependencies. wal-g does not ship aarch64 binaries, currently; the compilation process([1]) is somewhat complicated, so we defer the decision about how to support wal-g for aarch64 until a later date. [1]: https://github.com/wal-g/wal-g/blob/master/docs/PostgreSQL.md#installing	2021-12-29 16:35:15 -08:00
Alex Vandiver	f166f9f7d6	puppet: Centralize versions and sha256 hashes of external dependencies. This will make it easier to update versions of these dependencies.	2021-12-29 16:35:15 -08:00
Alex Vandiver	57662689a9	puppet: Provide a constant homedir for grafana user. The homedir of a user cannot be changed if any processes are running as them, so having it change over time as upgrades happen will break puppet application, as the old grafana process under supervisor will effectively lock changes to the user's homedir. Unfortunately, that means that this change will thus fail to puppet-apply unless `supervisorctl stop grafana` is run first, but there's no way around that.	2021-12-29 16:35:15 -08:00
Alex Vandiver	6e55e52694	puppet: Pull out grafana $data_dir.	2021-12-29 16:35:15 -08:00
Alex Vandiver	1e4e6a09af	puppet: Stop making resources for external binaries and directories. In the event that extracting doesn't produce the binary we expected it to, all this will do is create an _empty_ file where we expect the binary to be. This will likely muddle debugging. Since the only reason the resourfce was made in the first place was to make dependencies clear, switch to depending on the External_Dep itself, when such a dependency is needed.	2021-12-29 16:35:15 -08:00
Alex Vandiver	3c163a7d5e	puppet: Move slash out of $dir by convention.	2021-12-29 16:35:15 -08:00
Alex Vandiver	bb5a2c8138	puppet: Move prometheus to external_dep.	2021-12-29 16:35:15 -08:00
Alex Vandiver	2d6c096904	puppet: Move node_exporter to external_dep.	2021-12-29 16:35:15 -08:00
Alex Vandiver	e4b23daad7	puppet: Upgrade to Grafana 8.3.2, for CVE-2021-43813.	2021-12-10 14:00:11 -08:00
Alex Vandiver	053682964e	puppet: Only fetch from running hosts in Grafana ec2 discovery.	2021-12-09 08:12:03 -08:00
Alex Vandiver	291f688678	puppet: Use zulip::external_dep for grafana, template config. Templating the config ensures that the service is restarted when it is upgraded.	2021-12-08 20:58:10 -08:00
Alex Vandiver	3eae429ab4	puppet: Upgrade Grafana to 8.3.1, for CVE-2021-43798.	2021-12-08 20:58:10 -08:00
Alex Vandiver	7db146d0a9	puppet: Do not assume amd64 architecture.	2021-12-06 11:08:50 -08:00
Alex Vandiver	fb2d05f9e3	puppet: Remove unused 'builder' files. These are leftover detritus from the "builder" host, which was removed in `4c9a283542`.	2021-12-06 10:21:50 -08:00
Alex Vandiver	c514feaa22	puppet: Default go-camo to listening on localhost for standalone deploys. The default in the previous commit, inherited from camo, was to bind to 0.0.0.0:9292. In standalone deployments, camo is deployed on the same host as the nginx reverse proxy, and as such there is no need to open it up to other IPs. Make `zulip::camo` take an optional parameter, which allows overriding it in puppet, but skips a `zulip.conf` setting for it, since it is unlikely to be adjust by most users.	2021-11-19 15:58:26 -08:00
Alex Vandiver	b982222e03	camo: Replace with go-camo implementation. The upstream of the `camo` repository[1] has been unmaintained for several years, and is now archived by the owner. Additionally, it has a number of limitations: - It is installed as a sysinit service, which does not run under Docker - It does not prevent access to internal IPs, like 127.0.0.1 - It does not respect standard `HTTP_proxy` environment variables, making it unable to use Smokescreen to prevent the prior flaw - It occasionally just crashes, and thus must have a cron job to restart it. Swap camo out for the drop-in replacement go-camo[2], which has the same external API, requiring not changes to Django code, but is more maintained. Additionally, it resolves all of the above complaints. go-camo is not configured to use Smokescreen as a proxy, because its own private-IP filtering prevents using a proxy which lies within that IP space. It is also unclear if the addition of Smokescreen would provide any additional protection over the existing IP address restrictions in go-camo. go-camo has a subset of the security headers that our nginx reverse proxy sets, and which camo set; provide the missing headers with `-H` to ensure that go-camo, if exposed from behind some other non-nginx load-balancer, still provides the necessary security headers. Fixes #18351 by moving to supervisor. Fixes zulip/docker-zulip#298 also by moving to supervisor. [1] https://github.com/atmos/camo [2] https://github.com/cactus/go-camo	2021-11-19 15:58:26 -08:00
Alex Vandiver	1806e0f45e	puppet: Remove zulip.org configuration.	2021-08-26 17:21:31 -07:00
Alex Vandiver	27881babab	puppet: Increase prometheus storage, from the default 15d.	2021-08-24 23:40:43 -07:00
Alex Vandiver	e46e862f2b	puppet: Add a bare-bones zulipbot profile. This sets up the firewalls appropriate for zulipbot, but does not automate any of the configuration of zulipbot itself.	2021-08-24 16:05:58 -07:00
Alex Vandiver	5857dcd9b4	puppet: Configure ip6tables in parallel to ipv4. Previously, IPv6 firewalls were left at the default all-open. Configure IPv6 equivalently to IPv4.	2021-08-24 16:05:46 -07:00
Alex Vandiver	845509a9ec	puppet: Be explicit that existing iptables are only ipv4.	2021-08-24 16:05:46 -07:00
Alex Vandiver	4dd289cb9d	puppet: Enable prometheus monitoring of supervisord. To be able to read the UNIX socket, this requires running node_exporter as zulip, not as prometheus.	2021-08-03 21:47:02 -07:00
Alex Vandiver	aa940bce72	puppet: Disable hwmon collector, which does nothing on cloud hosts.	2021-08-03 21:47:02 -07:00
Alex Vandiver	e94b6afb00	nagios: Remove broken check_email_deliverer_* checks and related code. These checks suffer from a couple notable problems: - They are only enabled on staging hosts -- where they should never be run. Since `ef6d0ec5ca`, these supervisor processes are only run on one host, and never on the staging host. - They run as the `nagios` user, which does not have appropriate permissions, and thus the checks always fail. Specifically, `nagios` does not have permissions to run `supervisorctl`, since the socket is owned by the `zulip` user, and mode 0700; and the `nagios` user does not have permission to access Zulip secrets to run `./manage.py print_email_delivery_backlog`. Rather than rewrite these checks to run on a cron as zulip, and check those file contents as the nagios user, drop these checks -- they can be rewritten at a later point, or replaced with Prometheus alerting, and currently serve only to cause always-failing Nagios checks, which normalizes alert failures. Leave the files installed if they currently exist, rather than cluttering puppet with `ensure => absent`; they do no harm if they are left installed.	2021-08-03 16:07:13 -07:00
Alex Vandiver	e6bae4f1dd	puppet: Remove zulip::nagios class. `93f62b999e` removed the last file in puppet/zulip/files/nagios_plugins/zulip_nagios_server, which means the singular rule in zulip::nagios no longer applies cleanly. Remove the `zulip::nagios` class, as it is no longer needed.	2021-07-09 17:29:41 -07:00
Anders Kaseorg	93f62b999e	nagios: Replace check_website_response with standard check_http plugin. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-07-09 16:47:03 -07:00
Vishnu KS	e0f5fadb79	billing: Downgrade small realms that are behind on payments. An organization with at most 5 users that is behind on payments isn't worth spending time on investigating the situation. For larger organizations, we likely want somewhat different logic that at least does not void invoices.	2021-07-02 13:19:12 -07:00
Alex Vandiver	6c72698df2	puppet: Move zulip_ops supervisor config into /etc/supervisor/conf.d/zulip/. This is similar cleanup to `3ab9b31d2f`, but only affects zulip_ops services; it serves to ensure that any of these services which are no longer enabled are automatically removed from supervisor. Note that this will cause a supervisor restart on all affected hosts, which will restart all supervisor services.	2021-06-14 17:12:59 -07:00
Alex Vandiver	dd90083ed7	puppet: Provide FQDN of self as URI, so the certificate validates. Failure to do this results in: ``` psql: error: failed to connect to `host=localhost user=zulip database=zulip`: failed to write startup message (x509: certificate is valid for [redacted], not localhost) ```	2021-06-14 00:14:48 -07:00
Alex Vandiver	c90ff80084	puppet: Bump grafana version to 8.0.1. Most notably, this fixes an annoying bug with CloudWatch metrics being repeated in graphs.	2021-06-10 15:49:08 -07:00
Alex Vandiver	d905eb6131	puppet: Add a database teleport server. Host-based md5 auth for 127.0.0.1 must be removed from `pg_hba.conf`, otherwise password authentication is preferred over certificate-based authentication for localhost.	2021-06-08 22:21:21 -07:00
Alex Vandiver	100a899d5d	puppet: Add grafana server.	2021-06-08 22:21:00 -07:00
Alex Vandiver	459f37f041	puppet: Add prometheus server.	2021-06-08 22:21:00 -07:00
Alex Vandiver	19fb58e845	puppet: Add prometheus node exporter.	2021-06-08 22:21:00 -07:00
Alex Vandiver	a2b1009ed5	puppet: Turn on "authentication" which defaults to user with all rights. Nagios refuses to allow any modifications with use_authentication off; re-enabled "authentication" but set a default user, which (by way of the `*` permissions in `359f37389a`) is allowed to take all actions.	2021-06-08 15:19:28 -07:00
Alex Vandiver	61b6fc865c	puppet: Add a label to teleport applications, to allow RBAC. Roles can only grant or deny access based on labels; set one based on the application name.	2021-06-08 15:19:04 -07:00
Alex Vandiver	4aff5b1d22	puppet: Allow access to `/` in nagios. This was a regression in `51b985b40d`.	2021-06-07 22:40:58 -07:00
Alex Vandiver	54768c2210	puppet: Remove now-unused basic auth support files. `51b985b40d` made these unnecessary.	2021-06-07 16:17:45 -07:00
Alex Vandiver	359f37389a	puppet: Remove in-nagios auth restrictions. `51b985b40d` made nagios only accessible from localhost, or as proxied via teleport. Remove the HTTP-level auth requirements.	2021-06-07 16:17:45 -07:00
Alex Vandiver	2352fac6b5	puppet: Fix indentation.	2021-06-02 18:38:38 -07:00
Alex Vandiver	51b985b40d	puppet: Move nagios to behind teleport. This makes the server only accessible via localhost, by way of the Teleport application service.	2021-06-02 18:38:38 -07:00
Alex Vandiver	4f51d32676	puppet: Add a teleport application server. This requires switching to a reverse tunnel for the auth connection, with the side effect that the `zulip_ops::teleport::node` manifest can be applied on servers anywhere in the Internet; they do not need to have any publicly-available open ports.	2021-06-02 18:38:38 -07:00
Alex Vandiver	c59421682f	puppet: Add a teleport node on every host. Teleport nodes[1] are the equivalent to SSH servers. In addition to this config, joining the teleport cluster will require presenting a one-time "join token" from the proxy server[2], which may either be short-lived or static. [1] https://goteleport.com/docs/architecture/nodes/ [2] https://goteleport.com/docs/admin-guide/#adding-nodes-to-the-cluster	2021-06-02 18:38:38 -07:00
Alex Vandiver	1cdf14d195	puppet: Add a teleport server. See https://goteleport.com/docs/architecture/overview/ for the general architecture of a Teleport cluster. This commit adds a Teleport auth[1] and proxy[2] server. The auth server serves as a CA for granting time-bounded access to users and authenticating nodes on the cluster; the proxy provides access and a management UI. [1] https://goteleport.com/docs/architecture/authentication/ [2] https://goteleport.com/docs/architecture/proxy/	2021-06-02 18:38:38 -07:00
Alex Vandiver	3ebd627c50	puppet: Fix "import" -> "include" in chat_zulip_org.	2021-06-02 11:02:34 -07:00
Alex Vandiver	2130fc0645	puppet: Add an explicit class for czo.	2021-06-01 22:18:50 -07:00
Alex Vandiver	c9141785fd	puppet: Use concat fragments to place port allows next to services. This means that services will only open their ports if they are actually run, without having to clutter rules.v4 with a log of `if` statements. This does not go as far as using `puppetlabs/firewall`[1] because that would represent an additional DSL to learn; raw IPtables sections can easily be inserted into the generated iptables file via `concat::fragment` (either inline, or as a separate file), but config can be centralized next to the appropriate service. [1] https://forge.puppet.com/modules/puppetlabs/firewall	2021-05-27 21:14:48 -07:00
Alex Vandiver	4f79b53825	puppet: Factor out firewall config.	2021-05-27 21:14:48 -07:00
Alex Vandiver	f3eea72c2a	setup: Merge multiple setup-apt-repo scripts into one. This moves the `.asc` files into subdirectories, and writes out the according `.list` files into them. It moves from templates to written-out `.list` files for clarity and ease of implementation (Debian and Ubuntu need different templates for `zulip`), and as a way of making explicit which releases are supported for each list. For the special-case of the PGroonga signing key, we source an additional file within the directory. This simplifies the process for adding another class of `.list` file.	2021-05-26 14:42:29 -07:00
Alex Vandiver	4f017614c5	nagios: Replace check_fts_update_log with a process_fts_updates flag. This avoids having to duplicate the connection logic from process_fts_updates. Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>	2021-05-25 13:56:05 -07:00
Alex Vandiver	116e41f1da	puppet: Move files out and back when mounting /srv. Specifically, this affects /srv/zulip-aws-tools.	2021-05-23 13:29:23 -07:00
Alex Vandiver	ea98549e88	puppet: Always install linux-image-virtual, for ksplice support.	2021-05-23 13:29:23 -07:00
Alex Vandiver	0b1dd27841	puppet: AWS mounts its extra disks with inconsistent names. It is now /dev/nvme1n1, not /dev/nvme0n1; but it always has a consistent major/minor node. Source the file that defines these.	2021-05-23 13:29:23 -07:00
Alex Vandiver	033a96aa5d	puppet: Fix check_ssl_certificate check to check named host, not self.	2021-05-17 18:38:30 -07:00
Alex Vandiver	feb7870db7	puppet: Adjust thresholds on autovac_freeze. These thresholds are in relationship to the `autovacuum_freeze_max_age`, not the XID wraparound, which happens at 2^31-1. As such, it is perfectly normal that they hit 100%, and then autovacuum kicks in and brings it back down. The unusual condition is that PostgreSQL pushes past the point where an autovacuum would be triggered -- therein lies the XID wraparound danger. With the `autovacuum_freeze_max_age` set to 2000000000 in `postgresql.conf`, XID wraparound happens at 107.3%. Set the warning and error thresholds to below this, but above 100% so this does not trigger constantly.	2021-05-11 17:11:47 -07:00
Anders Kaseorg	544bbd5398	docs: Fix capitalization mistakes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-05-10 09:57:26 -07:00
Anders Kaseorg	9d57fa9759	puppet: Use pgrep -x to avoid accidental matches. Matching the full process name (-x without -f) or full command line (-xf) is less prone to mistakes like matching a random substring of some other command line or pgrep matching itself. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-05-07 08:54:41 -07:00
Alex Vandiver	6ee74b3433	puppet: Check health of APT repository.	2021-03-23 19:27:42 -07:00
Alex Vandiver	c01345d20c	puppet: Add nagios check for long-lived certs that do not auto-renew.	2021-03-23 19:27:27 -07:00
Alex Vandiver	9ea86c861b	puppet: Add a nagios alert configuration for smokescreen. This verifies that the proxy is working by accessing a highly-available website through it. Since failure of this equates to failures of Sentry notifications and Android mobile push notifications, this is a paging service.	2021-03-18 10:11:15 -07:00
Anders Kaseorg	129ea6dd11	nginx: Consistently listen on IPv6 and with HTTP/2. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-03-17 17:46:32 -07:00
Alex Vandiver	06c07109e4	puppet: Add missing semicolons left off in `ba3b88c81b`.	2021-03-12 15:48:53 -08:00
Alex Vandiver	ba3b88c81b	puppet: Explicitly use the snakeoil certificates for nginx. In production, the `wildcard-zulipchat.com.combined-chain.crt` file is just a symlink to the snakeoil certificates; but we do not puppet that symlink, which makes new hosts fail to start cleanly. Instead, point explicitly to the snakeoil certificate, and explain why.	2021-03-12 13:31:54 -08:00
Alex Vandiver	306bf930f5	puppet: Add a warning if ksplice is enabled but has no key set.	2021-03-10 17:57:20 -08:00
Alex Vandiver	a215c83c2d	puppet: Switch to more explicit variable rather than reuse a nagios one. Redis is not nagios, and this only leads to confusion as to why there is a nagios domain setting on frontend servers; it also leaves the `redis0` part of the name buried in the template. Switch to an explicit variable for the redis hostname.	2021-03-10 11:44:54 -08:00
Alex Vandiver	a5b29398fc	puppet: Only install ksplice uptrack if there is an access key.	2021-03-10 11:44:11 -08:00
Alex Vandiver	d938dd9d4a	puppet: Document smokescreen installation, and move to puppet/zulip/. This is more broadly useful than for just Kandra; provide documentation and means to install Smokescreen for stand-alone servers, and motivate its use somewhat more.	2021-03-02 17:16:38 -08:00
Alex Vandiver	2f5eae5c68	puppet: Minor formatting.	2021-02-28 17:03:29 -08:00
Alex Vandiver	a759d26a32	puppet: Make ksplice config not world-readable, use 'adm' group. This matches the configuration that ksplice itself creates the file and directory with.	2021-02-28 17:03:29 -08:00
Tim Abbott	957c16aa77	nagios: Tweak prod load monitoring parameters. Ultimately this monitoring isn't that helpful, but we're mainly interested in when it spikes to very high numbers.	2021-02-26 08:39:52 -08:00
Alex Vandiver	32149c6a1c	puppet: Add ksplice uptrack for kernel hotpatches.	2021-02-25 18:05:47 -08:00
Alex Vandiver	173d2dec3d	puppet: Check in defensive restart-camo cron job. This was found on lb1; add it to the camo install on smokescreen.	2021-02-24 16:42:21 -08:00
Alex Vandiver	0b736ef4cf	puppet: Remove puppet_ops configuration for separate loadbalancer host.	2021-02-22 16:05:13 -08:00
Alex Vandiver	e30b524896	iptables: Limit smokescreen port 4750, add camo port. Limit incoming connections to port 4750 to only the smokescreen host, and also allow access to the Camo server on that host, on port 9292.	2021-02-17 13:52:38 -08:00
Alex Vandiver	a88af1b5a2	camo: Install on smokescreen host.	2021-02-16 08:12:31 -08:00
Alex Vandiver	29f60bad20	smokescreen: Put the version into the supervisorctl command. This makes it reload correctly if the version is changed.	2021-02-16 08:12:31 -08:00
Anders Kaseorg	6e4c3e41dc	python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	11741543da	python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Alex Vandiver	559cdf7317	puppet: Set APT::Periodic::Unattended-Upgrade in apt config. This is required for unattended upgrades to actually run regularly. In some distributions, it may be found in 20auto-upgrades, but placing it here makes it more discoverable.	2021-02-12 08:59:19 -08:00
Tim Abbott	fd8504e06b	munin: Update to use NAGIOS_BOT_HOST. We haven't actively used this plugin in years, and so it was never converted from the 2014-era monitoring to detect the hostname. This seems worth fixing since we may want to migrate this logic to a more modern monitoring system, and it's helpful to have it correct.	2021-01-27 12:07:09 -08:00
Alex Vandiver	c2526844e9	worker: Remove SignupWorker and friends. ZULIP_FRIENDS_LIST_ID and MAILCHIMP_API_KEY are not currently used in production. This removes the unused 'signups' queue and worker.	2021-01-17 11:16:35 -08:00
Alex Vandiver	90ca06d873	puppet: Allow unattended upgrades of -updates in addition to -security. This ensures that software will be fully up-to-date, not just with security patches.	2020-11-13 16:45:05 -08:00
Tim Abbott	494a685827	puppet: Fix typo in name of missedmessage_emails consumer. This has been present since this check was introduced in `45c9c3cc30`.	2020-10-29 12:28:54 -07:00
Tim Abbott	ab3cb2b3bf	puppet: Fix internal redis puppet configuration. The inherits rule is required for overriding existing configuration files; while the `::profile` piece was missed in the recent ::profile migration.	2020-10-29 11:53:43 -07:00

1 2 3 4 5 ...

453 Commits