Commit Graph

490 Commits

Author SHA1 Message Date
Alex Vandiver 4a95967a33 puppet: Gather uwsgi stats from chat.zulip.org. 2022-01-03 21:26:57 -08:00
Alex Vandiver 8a5be972d2 puppet: Add a uwsgi exporter for monitoring.
This allows investigation of how many workers are busy, and to track
"harikari" terminations.
2022-01-03 15:25:58 -08:00
Anders Kaseorg 82748d45d8 install-yarn: Use test -ef in case /srv is a symlink.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-30 13:42:07 -08:00
Alex Vandiver c094867a74 puppet: Add aarch64 build hashes to external dependencies.
wal-g does not ship aarch64 binaries, currently; the compilation
process([1]) is somewhat complicated, so we defer the decision about
how to support wal-g for aarch64 until a later date.

[1]: https://github.com/wal-g/wal-g/blob/master/docs/PostgreSQL.md#installing
2021-12-29 16:35:15 -08:00
Alex Vandiver f166f9f7d6 puppet: Centralize versions and sha256 hashes of external dependencies.
This will make it easier to update versions of these dependencies.
2021-12-29 16:35:15 -08:00
Alex Vandiver 57662689a9 puppet: Provide a constant homedir for grafana user.
The homedir of a user cannot be changed if any processes are running
as them, so having it change over time as upgrades happen will break
puppet application, as the old grafana process under supervisor will
effectively lock changes to the user's homedir.

Unfortunately, that means that this change will thus fail to
puppet-apply unless `supervisorctl stop grafana` is run first, but
there's no way around that.
2021-12-29 16:35:15 -08:00
Alex Vandiver 6e55e52694 puppet: Pull out grafana $data_dir. 2021-12-29 16:35:15 -08:00
Alex Vandiver 1e4e6a09af puppet: Stop making resources for external binaries and directories.
In the event that extracting doesn't produce the binary we expected it
to, all this will do is create an _empty_ file where we expect the
binary to be.  This will likely muddle debugging.

Since the only reason the resourfce was made in the first place was to
make dependencies clear, switch to depending on the External_Dep
itself, when such a dependency is needed.
2021-12-29 16:35:15 -08:00
Alex Vandiver 3c163a7d5e puppet: Move slash out of $dir by convention. 2021-12-29 16:35:15 -08:00
Alex Vandiver bb5a2c8138 puppet: Move prometheus to external_dep. 2021-12-29 16:35:15 -08:00
Alex Vandiver 2d6c096904 puppet: Move node_exporter to external_dep. 2021-12-29 16:35:15 -08:00
Alex Vandiver e4b23daad7 puppet: Upgrade to Grafana 8.3.2, for CVE-2021-43813. 2021-12-10 14:00:11 -08:00
Alex Vandiver 053682964e puppet: Only fetch from running hosts in Grafana ec2 discovery. 2021-12-09 08:12:03 -08:00
Alex Vandiver 291f688678 puppet: Use zulip::external_dep for grafana, template config.
Templating the config ensures that the service is restarted when it is
upgraded.
2021-12-08 20:58:10 -08:00
Alex Vandiver 3eae429ab4 puppet: Upgrade Grafana to 8.3.1, for CVE-2021-43798. 2021-12-08 20:58:10 -08:00
Alex Vandiver 7db146d0a9 puppet: Do not assume amd64 architecture. 2021-12-06 11:08:50 -08:00
Alex Vandiver fb2d05f9e3 puppet: Remove unused 'builder' files.
These are leftover detritus from the "builder" host, which was removed
in 4c9a283542.
2021-12-06 10:21:50 -08:00
Alex Vandiver c514feaa22 puppet: Default go-camo to listening on localhost for standalone deploys.
The default in the previous commit, inherited from camo, was to bind
to 0.0.0.0:9292.  In standalone deployments, camo is deployed on the
same host as the nginx reverse proxy, and as such there is no need to
open it up to other IPs.

Make `zulip::camo` take an optional parameter, which allows overriding
it in puppet, but skips a `zulip.conf` setting for it, since it is
unlikely to be adjust by most users.
2021-11-19 15:58:26 -08:00
Alex Vandiver b982222e03 camo: Replace with go-camo implementation.
The upstream of the `camo` repository[1] has been unmaintained for
several years, and is now archived by the owner.  Additionally, it has
a number of limitations:
 - It is installed as a sysinit service, which does not run under
   Docker
 - It does not prevent access to internal IPs, like 127.0.0.1
 - It does not respect standard `HTTP_proxy` environment variables,
   making it unable to use Smokescreen to prevent the prior flaw
 - It occasionally just crashes, and thus must have a cron job to
   restart it.

Swap camo out for the drop-in replacement go-camo[2], which has the
same external API, requiring not changes to Django code, but is more
maintained.  Additionally, it resolves all of the above complaints.

go-camo is not configured to use Smokescreen as a proxy, because its
own private-IP filtering prevents using a proxy which lies within that
IP space.  It is also unclear if the addition of Smokescreen would
provide any additional protection over the existing IP address
restrictions in go-camo.

go-camo has a subset of the security headers that our nginx reverse
proxy sets, and which camo set; provide the missing headers with `-H`
to ensure that go-camo, if exposed from behind some other non-nginx
load-balancer, still provides the necessary security headers.

Fixes #18351 by moving to supervisor.
Fixes zulip/docker-zulip#298 also by moving to supervisor.

[1] https://github.com/atmos/camo
[2] https://github.com/cactus/go-camo
2021-11-19 15:58:26 -08:00
Alex Vandiver 1806e0f45e puppet: Remove zulip.org configuration. 2021-08-26 17:21:31 -07:00
Alex Vandiver 27881babab puppet: Increase prometheus storage, from the default 15d. 2021-08-24 23:40:43 -07:00
Alex Vandiver e46e862f2b puppet: Add a bare-bones zulipbot profile.
This sets up the firewalls appropriate for zulipbot, but does not
automate any of the configuration of zulipbot itself.
2021-08-24 16:05:58 -07:00
Alex Vandiver 5857dcd9b4 puppet: Configure ip6tables in parallel to ipv4.
Previously, IPv6 firewalls were left at the default all-open.

Configure IPv6 equivalently to IPv4.
2021-08-24 16:05:46 -07:00
Alex Vandiver 845509a9ec puppet: Be explicit that existing iptables are only ipv4. 2021-08-24 16:05:46 -07:00
Alex Vandiver 4dd289cb9d puppet: Enable prometheus monitoring of supervisord.
To be able to read the UNIX socket, this requires running
node_exporter as zulip, not as prometheus.
2021-08-03 21:47:02 -07:00
Alex Vandiver aa940bce72 puppet: Disable hwmon collector, which does nothing on cloud hosts. 2021-08-03 21:47:02 -07:00
Alex Vandiver e94b6afb00 nagios: Remove broken check_email_deliverer_* checks and related code.
These checks suffer from a couple notable problems:
 - They are only enabled on staging hosts -- where they should never
   be run.  Since ef6d0ec5ca, these supervisor processes are only
   run on one host, and never on the staging host.
 - They run as the `nagios` user, which does not have appropriate
   permissions, and thus the checks always fail.  Specifically,
   `nagios` does not have permissions to run `supervisorctl`, since
   the socket is owned by the `zulip` user, and mode 0700; and the
   `nagios` user does not have permission to access Zulip secrets to
   run `./manage.py print_email_delivery_backlog`.

Rather than rewrite these checks to run on a cron as zulip, and check
those file contents as the nagios user, drop these checks -- they can
be rewritten at a later point, or replaced with Prometheus alerting,
and currently serve only to cause always-failing Nagios checks, which
normalizes alert failures.

Leave the files installed if they currently exist, rather than
cluttering puppet with `ensure => absent`; they do no harm if they are
left installed.
2021-08-03 16:07:13 -07:00
Alex Vandiver e6bae4f1dd puppet: Remove zulip::nagios class.
93f62b999e removed the last file in
puppet/zulip/files/nagios_plugins/zulip_nagios_server, which means the
singular rule in zulip::nagios no longer applies cleanly.

Remove the `zulip::nagios` class, as it is no longer needed.
2021-07-09 17:29:41 -07:00
Anders Kaseorg 93f62b999e nagios: Replace check_website_response with standard check_http plugin.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-07-09 16:47:03 -07:00
Vishnu KS e0f5fadb79 billing: Downgrade small realms that are behind on payments.
An organization with at most 5 users that is behind on payments isn't
worth spending time on investigating the situation.

For larger organizations, we likely want somewhat different logic that
at least does not void invoices.
2021-07-02 13:19:12 -07:00
Alex Vandiver 6c72698df2 puppet: Move zulip_ops supervisor config into /etc/supervisor/conf.d/zulip/.
This is similar cleanup to 3ab9b31d2f, but only affects zulip_ops
services; it serves to ensure that any of these services which are no
longer enabled are automatically removed from supervisor.

Note that this will cause a supervisor restart on all affected hosts,
which will restart all supervisor services.
2021-06-14 17:12:59 -07:00
Alex Vandiver dd90083ed7 puppet: Provide FQDN of self as URI, so the certificate validates.
Failure to do this results in:
```
psql: error: failed to connect to `host=localhost user=zulip database=zulip`: failed to write startup message (x509: certificate is valid for [redacted], not localhost)
```
2021-06-14 00:14:48 -07:00
Alex Vandiver c90ff80084 puppet: Bump grafana version to 8.0.1.
Most notably, this fixes an annoying bug with CloudWatch metrics being
repeated in graphs.
2021-06-10 15:49:08 -07:00
Alex Vandiver d905eb6131 puppet: Add a database teleport server.
Host-based md5 auth for 127.0.0.1 must be removed from `pg_hba.conf`,
otherwise password authentication is preferred over certificate-based
authentication for localhost.
2021-06-08 22:21:21 -07:00
Alex Vandiver 100a899d5d puppet: Add grafana server. 2021-06-08 22:21:00 -07:00
Alex Vandiver 459f37f041 puppet: Add prometheus server. 2021-06-08 22:21:00 -07:00
Alex Vandiver 19fb58e845 puppet: Add prometheus node exporter. 2021-06-08 22:21:00 -07:00
Alex Vandiver a2b1009ed5 puppet: Turn on "authentication" which defaults to user with all rights.
Nagios refuses to allow any modifications with use_authentication off;
re-enabled "authentication" but set a default user, which (by way of
the `*` permissions in 359f37389a) is allowed to take all actions.
2021-06-08 15:19:28 -07:00
Alex Vandiver 61b6fc865c puppet: Add a label to teleport applications, to allow RBAC.
Roles can only grant or deny access based on labels; set one based on
the application name.
2021-06-08 15:19:04 -07:00
Alex Vandiver 4aff5b1d22 puppet: Allow access to `/` in nagios.
This was a regression in 51b985b40d.
2021-06-07 22:40:58 -07:00
Alex Vandiver 54768c2210 puppet: Remove now-unused basic auth support files.
51b985b40d made these unnecessary.
2021-06-07 16:17:45 -07:00
Alex Vandiver 359f37389a puppet: Remove in-nagios auth restrictions.
51b985b40d made nagios only accessible from localhost, or as proxied
via teleport.  Remove the HTTP-level auth requirements.
2021-06-07 16:17:45 -07:00
Alex Vandiver 2352fac6b5 puppet: Fix indentation. 2021-06-02 18:38:38 -07:00
Alex Vandiver 51b985b40d puppet: Move nagios to behind teleport.
This makes the server only accessible via localhost, by way of the
Teleport application service.
2021-06-02 18:38:38 -07:00
Alex Vandiver 4f51d32676 puppet: Add a teleport application server.
This requires switching to a reverse tunnel for the auth connection,
with the side effect that the `zulip_ops::teleport::node` manifest can
be applied on servers anywhere in the Internet; they do not need to
have any publicly-available open ports.
2021-06-02 18:38:38 -07:00
Alex Vandiver c59421682f puppet: Add a teleport node on every host.
Teleport nodes[1] are the equivalent to SSH servers.  In addition to
this config, joining the teleport cluster will require presenting a
one-time "join token" from the proxy server[2], which may either be
short-lived or static.

[1] https://goteleport.com/docs/architecture/nodes/
[2] https://goteleport.com/docs/admin-guide/#adding-nodes-to-the-cluster
2021-06-02 18:38:38 -07:00
Alex Vandiver 1cdf14d195 puppet: Add a teleport server.
See https://goteleport.com/docs/architecture/overview/ for the general
architecture of a Teleport cluster.  This commit adds a Teleport auth[1]
and proxy[2] server.  The auth server serves as a CA for granting
time-bounded access to users and authenticating nodes on the cluster;
the proxy provides access and a management UI.

[1] https://goteleport.com/docs/architecture/authentication/
[2] https://goteleport.com/docs/architecture/proxy/
2021-06-02 18:38:38 -07:00
Alex Vandiver 3ebd627c50 puppet: Fix "import" -> "include" in chat_zulip_org. 2021-06-02 11:02:34 -07:00
Alex Vandiver 2130fc0645 puppet: Add an explicit class for czo. 2021-06-01 22:18:50 -07:00
Alex Vandiver c9141785fd puppet: Use concat fragments to place port allows next to services.
This means that services will only open their ports if they are
actually run, without having to clutter rules.v4 with a log of `if`
statements.

This does not go as far as using `puppetlabs/firewall`[1] because that
would represent an additional DSL to learn; raw IPtables sections can
easily be inserted into the generated iptables file via
`concat::fragment` (either inline, or as a separate file), but config
can be centralized next to the appropriate service.

[1] https://forge.puppet.com/modules/puppetlabs/firewall
2021-05-27 21:14:48 -07:00
Alex Vandiver 4f79b53825 puppet: Factor out firewall config. 2021-05-27 21:14:48 -07:00
Alex Vandiver f3eea72c2a setup: Merge multiple setup-apt-repo scripts into one.
This moves the `.asc` files into subdirectories, and writes out the
according `.list` files into them.  It moves from templates to
written-out `.list` files for clarity and ease of
implementation (Debian and Ubuntu need different templates for
`zulip`), and as a way of making explicit which releases are supported
for each list.  For the special-case of the PGroonga signing key, we
source an additional file within the directory.

This simplifies the process for adding another class of `.list` file.
2021-05-26 14:42:29 -07:00
Alex Vandiver 4f017614c5 nagios: Replace check_fts_update_log with a process_fts_updates flag.
This avoids having to duplicate the connection logic from
process_fts_updates.

Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>
2021-05-25 13:56:05 -07:00
Alex Vandiver 116e41f1da puppet: Move files out and back when mounting /srv.
Specifically, this affects /srv/zulip-aws-tools.
2021-05-23 13:29:23 -07:00
Alex Vandiver ea98549e88 puppet: Always install linux-image-virtual, for ksplice support. 2021-05-23 13:29:23 -07:00
Alex Vandiver 0b1dd27841 puppet: AWS mounts its extra disks with inconsistent names.
It is now /dev/nvme1n1, not /dev/nvme0n1; but it always has a
consistent major/minor node.  Source the file that defines these.
2021-05-23 13:29:23 -07:00
Alex Vandiver 033a96aa5d puppet: Fix check_ssl_certificate check to check named host, not self. 2021-05-17 18:38:30 -07:00
Alex Vandiver feb7870db7 puppet: Adjust thresholds on autovac_freeze.
These thresholds are in relationship to the
`autovacuum_freeze_max_age`, *not* the XID wraparound, which happens
at 2^31-1.  As such, it is *perfectly normal* that they hit 100%, and
then autovacuum kicks in and brings it back down.  The unusual
condition is that PostgreSQL pushes past the point where an autovacuum
would be triggered -- therein lies the XID wraparound danger.

With the `autovacuum_freeze_max_age` set to 2000000000 in
`postgresql.conf`, XID wraparound happens at 107.3%.  Set the warning
and error thresholds to below this, but above 100% so this does not
trigger constantly.
2021-05-11 17:11:47 -07:00
Anders Kaseorg 544bbd5398 docs: Fix capitalization mistakes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-10 09:57:26 -07:00
Anders Kaseorg 9d57fa9759 puppet: Use pgrep -x to avoid accidental matches.
Matching the full process name (-x without -f) or full command
line (-xf) is less prone to mistakes like matching a random substring
of some other command line or pgrep matching itself.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-07 08:54:41 -07:00
Alex Vandiver 6ee74b3433 puppet: Check health of APT repository. 2021-03-23 19:27:42 -07:00
Alex Vandiver c01345d20c puppet: Add nagios check for long-lived certs that do not auto-renew. 2021-03-23 19:27:27 -07:00
Alex Vandiver 9ea86c861b puppet: Add a nagios alert configuration for smokescreen.
This verifies that the proxy is working by accessing a
highly-available website through it.  Since failure of this equates to
failures of Sentry notifications and Android mobile push
notifications, this is a paging service.
2021-03-18 10:11:15 -07:00
Anders Kaseorg 129ea6dd11 nginx: Consistently listen on IPv6 and with HTTP/2.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-03-17 17:46:32 -07:00
Alex Vandiver 06c07109e4 puppet: Add missing semicolons left off in ba3b88c81b. 2021-03-12 15:48:53 -08:00
Alex Vandiver ba3b88c81b puppet: Explicitly use the snakeoil certificates for nginx.
In production, the `wildcard-zulipchat.com.combined-chain.crt` file is
just a symlink to the snakeoil certificates; but we do not puppet that
symlink, which makes new hosts fail to start cleanly.  Instead, point
explicitly to the snakeoil certificate, and explain why.
2021-03-12 13:31:54 -08:00
Alex Vandiver 306bf930f5 puppet: Add a warning if ksplice is enabled but has no key set. 2021-03-10 17:57:20 -08:00
Alex Vandiver a215c83c2d puppet: Switch to more explicit variable rather than reuse a nagios one.
Redis is not nagios, and this only leads to confusion as to why there
is a nagios domain setting on frontend servers; it also leaves the
`redis0` part of the name buried in the template.

Switch to an explicit variable for the redis hostname.
2021-03-10 11:44:54 -08:00
Alex Vandiver a5b29398fc puppet: Only install ksplice uptrack if there is an access key. 2021-03-10 11:44:11 -08:00
Alex Vandiver d938dd9d4a puppet: Document smokescreen installation, and move to puppet/zulip/.
This is more broadly useful than for just Kandra; provide
documentation and means to install Smokescreen for stand-alone
servers, and motivate its use somewhat more.
2021-03-02 17:16:38 -08:00
Alex Vandiver 2f5eae5c68 puppet: Minor formatting. 2021-02-28 17:03:29 -08:00
Alex Vandiver a759d26a32 puppet: Make ksplice config not world-readable, use 'adm' group.
This matches the configuration that ksplice itself creates the file
and directory with.
2021-02-28 17:03:29 -08:00
Tim Abbott 957c16aa77 nagios: Tweak prod load monitoring parameters.
Ultimately this monitoring isn't that helpful, but we're mainly
interested in when it spikes to very high numbers.
2021-02-26 08:39:52 -08:00
Alex Vandiver 32149c6a1c puppet: Add ksplice uptrack for kernel hotpatches. 2021-02-25 18:05:47 -08:00
Alex Vandiver 173d2dec3d puppet: Check in defensive restart-camo cron job.
This was found on lb1; add it to the camo install on smokescreen.
2021-02-24 16:42:21 -08:00
Alex Vandiver 0b736ef4cf puppet: Remove puppet_ops configuration for separate loadbalancer host. 2021-02-22 16:05:13 -08:00
Alex Vandiver e30b524896 iptables: Limit smokescreen port 4750, add camo port.
Limit incoming connections to port 4750 to only the smokescreen host,
and also allow access to the Camo server on that host, on port 9292.
2021-02-17 13:52:38 -08:00
Alex Vandiver a88af1b5a2 camo: Install on smokescreen host. 2021-02-16 08:12:31 -08:00
Alex Vandiver 29f60bad20 smokescreen: Put the version into the supervisorctl command.
This makes it reload correctly if the version is changed.
2021-02-16 08:12:31 -08:00
Anders Kaseorg 6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Alex Vandiver 559cdf7317 puppet: Set APT::Periodic::Unattended-Upgrade in apt config.
This is required for unattended upgrades to actually run regularly.
In some distributions, it may be found in 20auto-upgrades, but placing
it here makes it more discoverable.
2021-02-12 08:59:19 -08:00
Tim Abbott fd8504e06b munin: Update to use NAGIOS_BOT_HOST.
We haven't actively used this plugin in years, and so it was never
converted from the 2014-era monitoring to detect the hostname.

This seems worth fixing since we may want to migrate this logic to a
more modern monitoring system, and it's helpful to have it correct.
2021-01-27 12:07:09 -08:00
Alex Vandiver c2526844e9 worker: Remove SignupWorker and friends.
ZULIP_FRIENDS_LIST_ID and MAILCHIMP_API_KEY are not currently used in
production.

This removes the unused 'signups' queue and worker.
2021-01-17 11:16:35 -08:00
Alex Vandiver 90ca06d873 puppet: Allow unattended upgrades of -updates in addition to -security.
This ensures that software will be fully up-to-date, not just with
security patches.
2020-11-13 16:45:05 -08:00
Tim Abbott 494a685827 puppet: Fix typo in name of missedmessage_emails consumer.
This has been present since this check was introduced in
45c9c3cc30.
2020-10-29 12:28:54 -07:00
Tim Abbott ab3cb2b3bf puppet: Fix internal redis puppet configuration.
The inherits rule is required for overriding existing configuration
files; while the `::profile` piece was missed in the recent ::profile
migration.
2020-10-29 11:53:43 -07:00
Alex Vandiver b9797770d3 provision: Rename backup directory to postgresql. 2020-10-28 11:57:03 -07:00
Alex Vandiver 1f7132f50d docs: Standardize on PostgreSQL, not Postgres. 2020-10-28 11:55:16 -07:00
Alex Vandiver eaa99359b1 puppet: Rename to check_postgresql_replication_lag. 2020-10-28 11:51:52 -07:00
Alex Vandiver 53e59a0a13 puppet: Rename check_postgres_backup to check_postgresql_backup. 2020-10-28 11:51:52 -07:00
Alex Vandiver 45f6c79c4a puppet: Rename postgres_ variables to postgresql_. 2020-10-28 11:51:52 -07:00
Alex Vandiver e124324050 puppet: Rename postgres_appdb in nagios to postgresql. 2020-10-28 11:51:52 -07:00
Alex Vandiver a155430eb5 docs: Document all zulip.conf settings.
This provides a single reference point for all zulip.conf settings;
these mostly link out to the more complete documentation about each
setting, elsewhere.

Fixes #12490.
2020-10-27 13:31:57 -07:00
Alex Vandiver d24c571bab puppet: Automatically back up the database if we have the secrets.
This avoids folks having to manually add to the puppet_classes.
2020-10-27 13:29:19 -07:00
Alex Vandiver e7798d2797 puppet: Move zulip_ops::profile::postgres_appdb to postgresql. 2020-10-27 13:29:19 -07:00
Alex Vandiver 9f25389bff puppet: Move top-level zulip_ops deployments to zulip_ops::profile. 2020-10-27 13:29:19 -07:00
Alex Vandiver 5365af544a puppet: Rename zulip::profile::rabbit to ::rabbitmq. 2020-10-27 13:29:19 -07:00
Alex Vandiver 188af57296 puppet: Rename postgres_appdb to postgresql.
There is only one PostgreSQL database; the "appdb" is irrelevant.
Also use "postgresql," as it is the name of the software, whereas
"postgres" the name of the binary and colloquial name.  This is minor
cleanup, but enabled by the other renames in the previous commit.
2020-10-27 13:29:19 -07:00
Alex Vandiver c2185a81d6 puppet: Move top-level zulip deployments into "profile" directory.
This moves the puppet configuration closer to the "roles and profiles
method"[1] which is suggested for organizing puppet classes.  Notably,
here it makes clear which classes are meant to be able to stand alone
as deployments.

Shims are left behind at the previous names, for compatibility with
existing `zulip.conf` files when upgrading.

[1] https://puppet.com/docs/pe/2019.8/the_roles_and_profiles_method
2020-10-27 13:29:19 -07:00
Alex Vandiver 27cfb14d92 puppet: Only include zulip::base for top-level deploys.
This also removes direct includes of `zulip::common`, making
`zulip::base` gatekeep the inclusion of it.  This helps enforce that
any top-level deploy only needs include a single class, and that any
configuration which is not meant to be deployed by itself will not
apply, due to lack of `zulip::common` include.

The following commit will better differentiate these top-level deploys
by moving them into a subdirectory.
2020-10-27 13:29:19 -07:00
Alex Vandiver c296b5d819 puppet: Allow unattended-upgrades for all but servers.
Restarting servers is what can cause service interruptions, and
increase risk.  Add all of the servers that we use to the list of
ignored packages, and uncomment the default allowed-origins in order
to enable unattended upgrades.
2020-10-23 16:46:06 -07:00
Anders Kaseorg 72d6ff3c3b docs: Fix more capitalization issues.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-23 11:46:55 -07:00
Alex Vandiver a7d1fd9ffb puppet: Remove non-working apt::source.
d2aa81858c replaced the `apt::source` to set up debathena with
`Exec['setup-apt-repo-debathena']`, but mistakenly left the
`apt::source` in place in `zmirror` (but not `zmirror_personals`).
The `apt::source` resource type was later removed in c9d54f7854,
making the manifest to apply on `zmirror`.

Remove the broken and unnecessary `apt::source` resource.
2020-10-23 11:31:20 -07:00
Alex Vandiver 48e06c25ba puppet: Switch nagios SSH checks to id_ed25519 key.
The ssh-rsa algorithm was deprecated[1] in OpenSSH 8.2 (2020-02-14) and
will be removed in a future release.

[1] https://www.openssh.com/txt/release-8.4
2020-10-22 16:42:30 -07:00
Alex Vandiver 0ea20bd7d8 puppet: Move postgres_version into postgres_common.
This property is not related to the base zulip install; move it to
zulip::postgres_common, which is already used as a namespace for
various postgres variables.
2020-10-22 11:32:25 -07:00
Alex Vandiver ca971ebc59 puppet: Remove empty zulip_ops class. 2020-10-22 11:30:53 -07:00
Alex Vandiver 16af05758d puppet: Move zulip_org into zulip_ops.
This class is not of general interest.
2020-10-22 11:30:53 -07:00
Alex Vandiver ad566c491d puppet: Drop now-unused zulip_ops:::git class. 2020-10-22 11:30:53 -07:00
Alex Vandiver 50e9e2ed20 puppet: Make zulip::base include zulip::apt_repository.
There was likely more dependency complexity prior to 97766102df, but
there is now no reason to require that consumers explicitly include
zulip::apt_repository.
2020-10-22 11:30:53 -07:00
Alex Vandiver 2dc6d26ec6 puppet: Fix included monitoring class name. 2020-10-19 22:30:20 -07:00
Alex Vandiver 7a1132d605 puppet: Switch golang and smokescreen to use /srv.
/srv and /opt have very similar usages; but we should be internally
consistent.

Move these two (the only usages of /opt) to match the rest in /srv.
2020-10-16 13:00:06 -07:00
Alex Vandiver 78b92a51cc puppet: Allow access to smokescreen port via iptables. 2020-10-15 15:18:35 -07:00
Alex Vandiver 0d5356969e puppet: Reformat ipv4 iptables rules comments. 2020-10-15 15:18:35 -07:00
Alex Vandiver fffea9612b puppet: Add an outgoing HTTP/HTTPS proxy server.
Use https://github.com/stripe/smokescreen to provide a server for an
outgoing proxy, run under supervisor.  This will allow centralized
blocking of internal metadata IPs, localhost, and so forth, as well as
providing default request timeouts (10s by default).
2020-10-15 15:18:35 -07:00
Anders Kaseorg dfaea9df65 shfmt: Reformat shell scripts with shfmt.
https://github.com/mvdan/sh

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-15 15:16:00 -07:00
Alex Vandiver f61ac4a28d puppet: Move frontend monitoring into its own file.
This allows it to be pulled in for deploys like czo, which don't use
the full `zulip_ops::app_frontend`, but we wish to monitor.
2020-10-13 17:37:32 -07:00
Tim Abbott 7c2c82b190 nginx: Update nginx configuration for fhir/hl7 organization.
We should eventually add templating for the set of hosts here, but
it's worth merging this change to remove the deleted hostname and
replace it with the current one.
2020-10-13 16:50:26 -07:00
Anders Kaseorg 723d285e46 nginx: Redirect {www.,}zulipchat.com, www.zulip.com to zulip.com.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-13 16:49:23 -07:00
Alex Vandiver c8df9a150e puppet: Drop all log2zulip configuration.
Disabled on webservers in 047817b6b0, it has since lingered in
configuration, as well as running (to no effect) every minute on the
loadbalancer.

Remove the vestiges of its configuration.
2020-10-13 11:00:50 -07:00
Alex Vandiver b431b1b021 puppet: Remove misleading motd.
This banner shows on lb1, advertising itself as lb0.  There is no
compelling reason for a custom motd, especially one which needs to
be reconfigured for each host.
2020-10-13 11:00:36 -07:00
Alex Vandiver 45c9c3cc30 queue: Monitor user_activity queue, now that it has a consumer.
Since this was using repead individual get() calls previously, it
could not be monitored for having a consumer.  Add it in, by marking
it of queue type "consumer" (the default), and adding Nagios lines for
it.

Also adjust missedmessage_emails to be monitored; it stopped using
LoopQueueProcessingWorker in 5cec566cb9, but was never added back
into the set of monitored consumers.
2020-10-11 14:19:42 -07:00
Alex Vandiver 4fd7df4e8c puppet: Remove absent of check-apns-tokens.
This was marked as ensure absent in d02101a401, in v1.7.0 in 2017.
2020-09-29 18:17:08 -07:00
Alex Vandiver 872a349508 puppet: Remove absent of log2zulip.
This was marked as ensure absent in 047817b6b0, in v2.0.0 in 2018.
2020-09-29 18:17:08 -07:00
Alex Vandiver 57d88eedd8 puppet: Only install rabbitmq cron jobs via zulip_ops.
The rabbitmq cron jobs exist in order to call rabbitmqctl as root and
write the output to files that nagios can consume, since nagios is not
allowed to run rabbitmqctl.

In systems which do not have nagios configured, these every-minute
cron jobs add non-insignificant load, to no effect.  Move their
installation into `zulip_ops`.  In doing so, also combine the cron.d
files into a single file; this allows us to `ensure => absent` the old
filenames, removing them from existing systems.  Leave the resulting
combined cron.d file in `zulip`, since it is still of general utility
and note.
2020-09-29 17:44:44 -07:00
Anders Kaseorg ab120a03bc python: Replace unnecessary intermediate lists with generators.
Mostly suggested by the flake8-comprehension plugin.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-02 11:15:41 -07:00
Tim Abbott 5a1243db3c puppet: Use correct scope for zulip_ops::munin_plugin. 2020-07-15 21:49:45 -07:00
Alex Vandiver 48c3c33d10 puppet: Fully-qualify the munin-plugin name 2020-07-14 17:58:51 -07:00
Alex Vandiver 31d80a77d4 puppet: Update nagios check_postgres_replication_lag to be on DB hosts
7d4a370a57 attempted to move the replication check to on the
PostgreSQL hosts.  While it updated the _check_ to assume it was
running and talking to a local PostgreSQL instance, the configuration
and installation for the check were not updated.  As such, the check
ran on the nagios host for each DB host, and produced no output.

Start distributing the check to all apopdb hosts, and configure nagios
to use the SSH tunnel to get there.
2020-07-14 16:27:18 -07:00
Alex Vandiver 6c27f07c1d puppet: Move PostgreSQL backups to their own class.
wal-g was used in `puppet/zulip` by env-wal-g, but only installed in
`puppet/zulip_ops`.

Merge all of the dependencies of doing backups using wal-g (wal-g
installation, the pg_backup_and_purge job, the nagios plugin that
verifies it happens) into a common base class in `puppet/zulip`, since
it is generally useful.
2020-07-14 00:40:25 -07:00
Anders Kaseorg 15483c09cb puppet: Add missing trailing commas.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-13 15:36:06 -07:00
Alex Vandiver 3691a94efe puppet: Configure munin and nagios under apache with puppet.
This swaps in the actually-in-use munin configuiration file;
otherwise, it is an implementation of the configuration as it exists
on the machine.
2020-07-13 13:23:11 -07:00
Alex Vandiver 4e42164b4a munin: Add plugins to prod hosts. 2020-07-13 13:23:11 -07:00
Alex Vandiver 2a14212b27 munin: Add a helper resource definition for munin plugins. 2020-07-13 12:49:28 -07:00
Alex Vandiver 7c7b5fcd6f munin: Deal with spaces in the channel names. 2020-07-13 12:49:28 -07:00
Alex Vandiver eda2c4b8e2 puppet: Split munin-node from munin-server.
No plugins are installed inside the /usr/local/munin/lib this creates
in munin-node, nor are they symlinked into /etc/munin/plugins, so
non-default plugins are added by this.
2020-07-13 12:49:28 -07:00
Alex Vandiver ddc7bb5a45 munin: Fix the path to check_send_receive_time. 2020-07-13 12:49:28 -07:00
Alex Vandiver 8be544e7eb munin: Rename monitoring plugin to use zulip name, not humbug. 2020-07-13 12:49:28 -07:00
Alex Vandiver 8cff27f67d puppet: Pull hosts from zulip.conf, not hardcoded list.
The one complexity is that hosts_fullstack are treated differently, as
they are not currently found in the manual `hosts` list, and as such
do not get munin monitoring.
2020-07-10 00:14:09 -07:00
Alex Vandiver 24383a5082 puppet: Rename hosts_domain so hosts_prefix can be grepped for. 2020-07-10 00:14:09 -07:00
Alex Vandiver a4e7c7a27e nagios: Remove check_memcached.
check_memcached does not support memcached authentication even in its
latest release (it’s in a TODO item comment, and that’s it), and was
never particularly useful.
2020-07-10 00:12:48 -07:00
Anders Kaseorg 9900298315 zthumbor: Remove Python 2 residue.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-06 18:44:58 -07:00
Alex Vandiver a21a086f5c puppet: nagios-plugins-basic is replaced by monitoring-plugins-basic.
In Bionic, nagios-plugins-basic is a transitional package which
depends on monitoring-plugins-basic.  In Focal, it is a virtual
package, which means that every time puppet runs, it tries to
re-install the nagios-plugins-basic package.

Switch all instances to referring to `$zulip::common::nagios_plugins`,
and repoint that to monitoring-plugins-basic.
2020-06-29 14:58:01 -07:00
Alex Vandiver 876ee4a8ed installer: Remove code specific to stretch or xenial.
Support for Xenial and Stretch was removed (5154ddafca, 0f4b1076ad,
8944e0ad53, 79acd5ae40, 1219a2e854), but not all codepaths were
updated to remove their conditionals on it.

Remove all code predicated on Xenial or Stretch.  debathena support
was migrated to Bionic, since that appears to be the current state of
existing debathena servers.
2020-06-24 12:57:38 -07:00
Alex Vandiver 7250d41bf7 puppet: Fix the path to install-wall-g 2020-06-17 15:23:18 -07:00
Tim Abbott 26396c5e25 puppet: Fix exceptions with multiple certbot declarations.
Since 9e8f1aacb3, zulip_ops machines
might have two Package declarations for `certbot`, which doesn't work
in puppet.

The fix is, as usual, to use our `zulip::safepackage` wrapper instead.
2020-06-15 18:21:33 -07:00
Alex Vandiver f8fc3a16eb puppet: Use "primary" / "replica" consistently in comments.
The style guide for Zulip is to always use "primary" and "replica"
when describing database replication.  Adjust a few comments under
`puppet/` that do not adhere to this.

Unfortunately, some references still remain to the insensitive and
inaccurate "master" / "slave" terminology.  However, these are only in
files which we are attempting to preserve as close to the upstream
versions they are derived from (e.g. postgresql.conf,
postfix/master.cf).
2020-06-15 16:18:07 -07:00
Alex Vandiver 5f433d6eeb puppet: Remove vestigial check_postgres.pl.
65774e1c4f switched from using the bundled check_postgres.pl to using
the version from packages; the file itself remained, however.

Remove it, and clean up references to it.

Fixes #15389.
2020-06-15 16:18:07 -07:00
Alex Vandiver 7d4a370a57 puppet: Move monitoring of pg replication to the pg hosts.
Instead of SSH'ing around to them, run directly on the database hosts.
This means that the replicas do not know how many bytes behind they
are in _receiving_ the wall logs; thus, the monitoring also extends to
the primary database, which knows that information for each replica.
This also allows for detecting when there are too few active replicas.
2020-06-15 16:18:07 -07:00
Anders Kaseorg 74c17bf94a python: Convert more percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Now including %d, %i, %u, and multi-line strings.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 23:27:22 -07:00