Commit Graph

265 Commits

Author SHA1 Message Date
Alex Vandiver 1bddf41731 puppet: Factor out creation of basic user dotfiles. 2024-01-31 16:41:04 -08:00
Alex Vandiver 69ef808d7b puppet: Use IAM Roles Anywhere to get AWS credentials outside EC2. 2024-01-31 16:41:04 -08:00
Alex Vandiver 16305761ac puppet: Use IAM join method, when possible. 2024-01-31 16:41:04 -08:00
Alex Vandiver dbb60dbeb9 puppet: Factor out $is_ec2, clarify comments. 2024-01-31 16:41:04 -08:00
Alex Vandiver 6902d5db47 install-aws-cli: Also install and keep up to date using Puppet.
We previously only did this install on the developer machine and on
initial boot.  Also run it from puppet to make sure we keep the binary
up-to-date.
2024-01-31 16:41:04 -08:00
Alex Vandiver d02354be6c puppet: statuspage-pusher uses zulip.conf for page_id.
This was changed midway through the implementation, from reading it
from `zulip-secrets.conf`, and a couple locations still reference the
secrets path.
2024-01-25 15:37:03 -08:00
Alex Vandiver cd565058cd puppet: Add vector pipelines for other Akamai SQS queues. 2024-01-25 15:36:40 -08:00
Alex Vandiver 147fe19c1f puppet: Fix grafana tarball path.
Grafana 10.2.1 and up package their tarball with a `grafana-v10.2.1`
and not `grafana-10.2.1` as previously.
2024-01-25 13:03:05 -08:00
Tim Abbott 004563b380 puppet: Fix bugs in sysctl configuration. 2024-01-23 09:32:15 -08:00
Alex Vandiver d18de3e0a4 puppet: Add a knob to adjust conntrack max size. 2024-01-10 09:07:00 -08:00
Alex Vandiver 588aec96f9 puppet: Factor out a sysctl operator. 2024-01-10 09:07:00 -08:00
Alex Vandiver 4da87524ff
nagios: Remove provisioning of zulip contact alias.
fcf096c52e removed the callsite which would have notified this
contact. Note that the source config file was presumably installed via the
python-zulip-api package.
2024-01-09 16:01:07 -08:00
Alex Vandiver b000328ba5 puppet: Adjust uptrack permissions and ownership to match package's.
This reverts a759d26a327cd4337d68eaa1d45d6a69edc9161c; apparently the
package has switched back.
2024-01-09 12:31:02 -08:00
Alex Vandiver c47ee4a296 zulip_ops: Configure stats to be pushed to status.zulip.com. 2023-11-16 16:21:12 -05:00
Alex Vandiver 5e49804004 puppet_ops: Include Akamai log parser on prometheus server. 2023-11-13 14:35:39 -05:00
Alex Vandiver 5591d6f65c zulip_ops: Add configuration for Vector Akamai stats.
Akamai writes access logs to S3; we use an SQS events queue, combined
with Vector, to transform those into Prometheus statistics.
2023-11-13 09:53:20 -08:00
Anders Kaseorg 835ee69c80 docs: Fix grammar errors found by mwic.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-10-09 13:24:09 -07:00
Alex Vandiver 528d0ebcf0 puppet: Serve /etc/zulip/well-known/ in nginx as /.well-known/. 2023-10-04 15:56:42 -07:00
Alex Vandiver 5308fbdeac puppet: Add postgresql-client depenencies to monitoring.
The `unless` step errors out if /usr/bin/psql does not exist at
first evaluation time -- protect that with a `test -f` check, and
protect the actual `createuser` with a dependency on `postgresql-client`.
To work around `Zulip::Safepackage` not actually being safe to
instantiate more than once, we move the instantiation of
`Package[postgresql-client]` into a class which can be safely
included one or more times.
2023-09-22 11:45:00 -07:00
Alex Vandiver ccbd834a86 postgres_exporter: Rebase the per-index stats branch.
The branch from the PR is somewhat stale, and is missing important bugfixes.
2023-09-11 17:59:54 -07:00
Alex Vandiver 0c88cfca63 postgres_exporter: Build from source for per-index stats.
This builds prometheus-community/postgres_exporter#843 to track
per-index statistics.
2023-09-11 11:59:39 -07:00
Alex Vandiver c5cace3600 puppet: Fix includes for new name of zulip_ops::prometheus::tornado.
This fixes the `include` name for the file renamed in 740a494ba4.
2023-08-09 02:32:28 +00:00
Alex Vandiver 740a494ba4 puppet: Rename and generalize Tornado process exporter.
Exporting stats about all of the various Zulip processes is useful for
tracking memory leaks, etc.
2023-08-06 13:41:10 -07:00
Alex Vandiver 8743602648 puppet: Allow access to smokescreen metrics on CZO. 2023-07-19 16:20:39 -07:00
Alex Vandiver 9799a03d79 puppet: Expose Smokescreen prometheus metrics on :9810. 2023-07-13 11:47:34 -07:00
Alex Vandiver 3aba2789d3 prometheus: Add an exporter for wal-g backup properties.
Since backups may now taken on arbitrary hosts, we need a blackbox
monitor that _some_ backup was produced.

Add a Prometheus exporter which calls `wal-g backup-list` and reports
statistics about the backups.

This could be extended to include `wal-g wal-verify`, but that
requires a connection to the PostgreSQL server.
2023-04-26 15:41:39 -07:00
Alex Vandiver cace8858f9 puppet: Move logrotate config into app_frontend_base.
7c023042cf moved the logrotate configuration to being a templated
file, from a static file, but missed that the static file was still
referenced from `zulip_ops::app_frontend`; it only updated
`zulip::profile::app_frontend`.  This caused errors in applying puppet
on any `zulip_ops::app_frontend` host.

Prior to 7c023042cf, the Puppet role was identical between those two
classes; deduplicate the rule by moving the updated template
definition into `zulip::app_frontend_base` which is common to those
two classes and not used in any other classes.
2023-04-19 09:34:37 -07:00
Alex Vandiver d0fc3f1c2e puppet: Add prod hooks to push zulip-cloud-current and notify CZO. 2023-04-12 11:36:33 -07:00
Tim Abbott 561daee2a1 puppet: Update declared zmirror dependencies.
Following zulip/python-zulip-api/pull/758/, we're no longer using
python-zephyr, and don't need to build it from source. Additionally,
we no longer need to build a forked Zephyr package, since ZLoadSession
and ZDumpSession were merged in
e6a545e759.
2023-04-06 09:45:06 -07:00
Alex Vandiver 6975417acf puppet: Create zmirror supervisor subdirectory.
To not change the `supervisor.conf` file, which requires a restart of
supervisor (and thus all services running under it, which is extremely
disruptive) we carefully leave the contents unchanged for most
installs, and append a new piece to the file, only for the zmirror
configuration, using `concat`.
2023-04-06 09:45:06 -07:00
Alex Vandiver 8a771c7ac0 hooks: Add a hook to send a Zulip before/after the deploy. 2023-04-05 18:51:55 -04:00
Alex Vandiver 89e366771a prometheus: Add a postgres exporter. 2023-03-30 16:16:18 -07:00
Alex Vandiver c2beb64a79 prometheus: Consistently import the base class and supervisor, if needed. 2023-03-30 16:16:18 -07:00
Alex Vandiver f2a20b56bc puppet: Enable sentry hooks for production and staging. 2023-03-17 08:10:31 -07:00
Alex Vandiver 1a65315566 puppet: Switch teleport to running under systemd, not supervisord.
There is no reason that the base node access method should be run
under supervisor, which exists primarily to give access to the `zulip`
user to restart its managed services.  This access is unnecessary for
Teleport, and also causes unwanted restarts of Teleport services when
the `supervisor` base configuration changes.  Additionally,
supervisor does not support the in-place upgrade process that Teleport
uses, as it replaces its core process with a new one.

Switch to installing a systemd configuration file (as generated by
`teleport install systemd`) for each part of Teleport, customized to
pass a `--config` path.  As such, we explicitly disable the `teleport`
service provided by the package.

The supervisor process is shut down by dint of no longer installing
the file, which purges it from the managed directory, and reloads
Supervisor to pick up the removed service.
2023-03-15 17:23:42 -04:00
Alex Vandiver 044ccdb334 chat.zulip.org: Enable Sentry hook. 2023-02-14 17:20:35 -05:00
Alex Vandiver e8123dfeea puppet: Match the `x` bits on directories to what puppet actually does.
Puppet _always_ sets the `+x` bit on directories if they have the `r`
bit set for that slot[^1]:

> When specifying numeric permissions for directories, Puppet sets the
> search permission wherever the read permission is set.

As such, for instance, `0640` is actually applied as `0750`.

Fix what we "want" to match what puppet is applying, by adding the `x`
bit.  In none of these cases did we actually intend the directory to
not be executable.

[1] https://www.puppet.com/docs/puppet/5.5/types/file.html#file-attribute-mode
2023-01-26 15:06:01 -08:00
Alex Vandiver d0de66b273 puppet: Remove "ensure => absent" rules which have all been applied. 2023-01-24 13:05:24 -08:00
Alex Vandiver 42f84a8cc7 puppet: Use existing autossh tunnels as OpenSSH "master" sockets.
A number of autossh connections are already left open for
port-forwarding Munin ports; autossh starts the connections and
ensures that they are automatically restarted if they are severed.

However, this represents a missed opportunity.  Nagios's monitoring
uses a large number of SSH connections to the remote hosts to run
commands on them; each of these connections requires doing a complete
SSH handshake and authentication, which can have non-trivial network
latency, particularly for hosts which may be located far away, in a
network topology sense (up to 1s for a no-op command!).

Use OpenSSH's ability to multiplex multiple connections over a single
socket, to reuse the already-established connection.  We leave an
explicit `ControlMaster no` in the general configuration, and not
`auto`, as we do not wish any of the short-lived Nagios connections to
get promoted to being a control socket if the autossh is not running
for some reason.

We enable protocol-level keepalives, to give a better chance of the
socket being kept open.
2022-11-01 22:24:40 -07:00
Alex Vandiver 9bd88a93e2 puppet: Tell needrestart to not default to restarting core services.
The `needrestart` tool added in 22.04 is useful in terms of listing
which services may need to be restarted to pick up updated libraries.
However, it prompts about the current state of services needing
restart for *every* subsequent `apt-get upgrade`, and defaulting core
services to restarting requires carefully manually excluding them
every time, at risk of causing an unscheduled outage.

Build a list of default-off services based on the list in
unattended-upgrades.
2022-07-19 17:51:18 -07:00
Alex Vandiver 8bc26aab08 nagios: Switch check_user_zephyr_mirror_liveness to run via cron.
This check loads Django, and as such must be run as the zulip user.
Repeat the same pattern used elsewhere in nagios, of writing a state
file, which is read by `check_cron_file`.
2022-06-22 12:07:38 -07:00
Alex Vandiver 775a084d0f nagios: Add a catchall "other" set. 2022-06-22 12:07:38 -07:00
Alex Vandiver 33472ee9ff nagios: Remove unused stats host set. 2022-06-22 12:07:38 -07:00
Alex Vandiver 7f6a77da31 puppet: Add a redis exporter. 2022-05-03 17:13:44 -07:00
Anders Kaseorg e9ba9b0e0d zulip-ec2-configure-interfaces: Remove.
Our current EC2 systems don’t have an interface named ‘eth0’, and if
they did, this script would do nothing but crash with ImportError
because we have never installed boto.utils for Python 3.

(The message of commit 2a4d851a7c made
an effort to document for future researchers why this script should
not have been blindly converted to Python 3.  However, commit
2dc6d09c2a (#14278) was evidently
unresearched and untested.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-05-03 02:25:59 -07:00
Anders Kaseorg 646a4d19a3 puppet: Remove quotes for enumerable values.
https://puppet.com/docs/puppet/7/style_guide.html#style_guide_module_design-quoting
“If a string is a value from an enumerable set of options, such as
present and absent, it SHOULD NOT be enclosed in quotes at all.”

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-29 22:06:46 -07:00
Alex Vandiver 35db1ee435 puppet: Only include "app_service" section if there are apps.
This works around gravitational/teleport#12256, but also produces config
files that are slightly cleaner.
2022-04-26 16:36:13 -07:00
Alex Vandiver f6d27562fa puppet: Configure chrony to use AWS-local NTP sources.
This prevents hosts from spewing traffic to random hosts across the
Internet.
2022-03-25 17:07:53 -07:00
Alex Vandiver 1bd5723cd2 puppet: Add a prometheus monitor for tornado processes. 2022-03-20 16:12:11 -07:00
Alex Vandiver 6b91652d9a puppet: Open the grok_exporter port.
The complete grok_exporter configuration is not ready to be committed,
but this at least prepares the way for it.
2022-03-20 16:12:11 -07:00