Commit Graph

1272 Commits

Author SHA1 Message Date
Alex Vandiver 20eab264cf puppet: Remove dependency on scripts.lib.zulip_tools.
ab130ceb35 added a dependency on scripts.lib.zulip_tools; however,
check_postgresql_replication_lag is run on hosts which do not have a
zulip tree installed.

Inline the simple functions that were imported.
2021-12-14 14:48:53 -08:00
Alex Vandiver 71b56f7c1c puppet: process_fts_updates connects as nagios (or provided username).
It should not use the configured zulip username, but should instead
pull from the login user (likely `nagios`), or an explicit alternate
provided PostgreSQL username.  Failure to do so results in Nagios
failures because the `nagios` login does not have permissions to
authenticated the `zulip` PostgreSQL user.

This requires CI changes, as the install tests install as the `zulip`
login username, which allowed Nagios tests to pass previously; with
the custom database and username, however, they must be passed to
process_fts_updates explicitly when validating the install.
2021-12-14 14:48:53 -08:00
Alex Vandiver 9d67e37166 puppet: Nagios connects as itself, in check_postgresql_replication_lag. 2021-12-14 14:48:53 -08:00
Alex Vandiver 850bc4cc81 puppet: Create directory for redis PID file.
The Redis configuration, and the systemd file for it, assumes there
will be a pid file written to `/var/run/redis/redis.pid`, but
`/var/run/redis` is not created during installation.

Create `/run/redis`; as `/var/run` is a symlink to `/run` on systemd
systems, this is equivalent to `/var/run/redis`.
2021-12-13 12:42:15 -08:00
Alex Vandiver a6c2079502 puppet: Create memcached PID file that systemd config file specifies.
The systemd config file installed by the `memcached` package assumes
there will be a PID written to `/run/memcached/memcached.pid`.  Since we
override `memcached.conf`, we have omitted the line that writes out the
PID to this file.

Systemd is smart enough to not _need_ the PID file to start up the
service correctly, but match the configuration.  We create the
directory since the package does not do so.  It is created as
`/run/memcached` and not `/var/run/memcached` because `/var/run` is a
symlink to `/run`.
2021-12-13 12:42:15 -08:00
Alex Vandiver e4b23daad7 puppet: Upgrade to Grafana 8.3.2, for CVE-2021-43813. 2021-12-10 14:00:11 -08:00
Alex Vandiver 01e8f752a8 puppet: Use certbot package timer, not our own cron job.
The certbot package installs its own systemd timer (and cron job,
which disabled itself if systemd is enabled) which updates
certificates.  This process races with the cron job which Zulip
installs -- the only difference being that Zulip respects the
`certbot.auto_renew` setting, and that it passes the deploy hook.
This means that occasionally nginx would not be reloaded, when the
systemd timer caught the expiration first.

Remove the custom cron job and `certbot-maybe-renew` script, and
reconfigure certbot to always reload nginx after deploying, using
certbot directory hooks.

Since `certbot.auto_renew` can't have an effect, remove the setting.
In turn, this removes the need for `--no-zulip-conf` to
`setup-certbot`.  `--deploy-hook` is similarly removed, as running
deploy hooks to restart nginx is now the default; pass
`--no-directory-hooks` in standalone mode to not attempt to reload
nginx.  The other property of `--deploy-hook`, of skipping symlinking
into place, is given its own flog.
2021-12-09 13:47:33 -08:00
Alex Vandiver 053682964e puppet: Only fetch from running hosts in Grafana ec2 discovery. 2021-12-09 08:12:03 -08:00
Alex Vandiver 291f688678 puppet: Use zulip::external_dep for grafana, template config.
Templating the config ensures that the service is restarted when it is
upgraded.
2021-12-08 20:58:10 -08:00
Alex Vandiver 3eae429ab4 puppet: Upgrade Grafana to 8.3.1, for CVE-2021-43798. 2021-12-08 20:58:10 -08:00
Alex Vandiver 7db146d0a9 puppet: Do not assume amd64 architecture. 2021-12-06 11:08:50 -08:00
Alex Vandiver fb2d05f9e3 puppet: Remove unused 'builder' files.
These are leftover detritus from the "builder" host, which was removed
in 4c9a283542.
2021-12-06 10:21:50 -08:00
Alex Vandiver cb2d0ff32b postgresql: Support replication on PostgreSQL >= 11, document.
PostgreSQL 11 and below used a configuration file names
`recovery.conf` to manage replicas and standbys; support for this was
removed in PostgreSQL 12[1], and the configuration parameters were
moved into the main `postgresql.conf`.

Add `zulip.conf` settings for the primary server hostname and
replication username, so that the complete `postgresql.conf`
configuration on PostgreSQL 14 can continue to be managed, even when
replication is enabled.  For consistency, also begin writing out the
`recovery.conf` for PostgreSQL 11 and below.

In PostgreSQL 12 configuration and later, the `wal_level =
hot_standby` setting is removed, as `hot_standby` is equivalent to
`replica`, which is the default value[2].  Similarly, the
`hot_standby = on` setting is also the default[3].

Documentation is added for these features, and the commentary on the
"Export and Import" page referencing files under `puppet/zulip_ops/`
is removed, as those files no longer have any replication-specific
configuration.

[1]: https://www.postgresql.org/docs/current/recovery-config.html
[2]: https://www.postgresql.org/docs/12/runtime-config-wal.html#GUC-WAL-LEVEL
[3]: https://www.postgresql.org/docs/12/runtime-config-replication.html#GUC-HOT-STANDBY
2021-12-03 16:32:41 -08:00
Alex Vandiver 7d3399a970 puppet: Drop configuration files for unsupported PostgreSQL versions.
These are both unsupported by PostgreSQL itself, as well as by Zulip;
the removal of Ubuntu Xenial and Debian Stretch support in Zulip 3.0
removed the requirement for PostgreSQL 9.6, and the previous versions
date back yet farther.
2021-12-03 16:32:41 -08:00
Alex Vandiver 6436c4087d puppet: Tidy old wal-g binaries. 2021-12-03 16:17:50 -08:00
Alex Vandiver 53cc9538f7 puppet: Factor out wal-g binary path. 2021-12-03 16:17:50 -08:00
Alex Vandiver 338483792b puppet: Upgrade wal-g release to 1.1.1. 2021-12-03 16:17:50 -08:00
Anders Kaseorg 325b4bac7e env-wal-g: Quote $s3_backups_bucket.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-03 14:33:53 -08:00
Alex Vandiver f31bf3f06c puppet: Install camo on Docker.
Now that go-camo runs within supervisor, it can be run in Docker
simply.

Fixes #20101.
Fixes zulip/docker-zulip#179.
2021-12-02 09:25:00 -08:00
Alex Vandiver 358a7fb0c6 puppet: Read camo secret at startup time, not at puppet-apply time.
Writing the secret to the supervisor configuration file makes changes
to the secret requires a zulip-puppet-apply to take hold.  The Docker
image is constructed to avoid having to run zulip-puppet-apply on
startup, and indeed cannot run zulip-puppet-apply after having
configured secrets, as it has replaced the zulip.conf file with a
symlink, for example.  This means that camo gets the static secret
that was built into the image, and not the one regenerated on first
startup.

Read the camo secret at process startup time.  Because this pattern is
likely common with "12-factor" applications which can read from
environment variables, write a generic tool to map secrets to
environment variables before exec'ing a binary, and use that for Camo.
2021-12-02 09:25:00 -08:00
Alex Vandiver 86cf3be39f puppet: Fix pgroonga init for custom database names and users. 2021-11-20 07:13:50 -08:00
Alex Vandiver c514feaa22 puppet: Default go-camo to listening on localhost for standalone deploys.
The default in the previous commit, inherited from camo, was to bind
to 0.0.0.0:9292.  In standalone deployments, camo is deployed on the
same host as the nginx reverse proxy, and as such there is no need to
open it up to other IPs.

Make `zulip::camo` take an optional parameter, which allows overriding
it in puppet, but skips a `zulip.conf` setting for it, since it is
unlikely to be adjust by most users.
2021-11-19 15:58:26 -08:00
Alex Vandiver b982222e03 camo: Replace with go-camo implementation.
The upstream of the `camo` repository[1] has been unmaintained for
several years, and is now archived by the owner.  Additionally, it has
a number of limitations:
 - It is installed as a sysinit service, which does not run under
   Docker
 - It does not prevent access to internal IPs, like 127.0.0.1
 - It does not respect standard `HTTP_proxy` environment variables,
   making it unable to use Smokescreen to prevent the prior flaw
 - It occasionally just crashes, and thus must have a cron job to
   restart it.

Swap camo out for the drop-in replacement go-camo[2], which has the
same external API, requiring not changes to Django code, but is more
maintained.  Additionally, it resolves all of the above complaints.

go-camo is not configured to use Smokescreen as a proxy, because its
own private-IP filtering prevents using a proxy which lies within that
IP space.  It is also unclear if the addition of Smokescreen would
provide any additional protection over the existing IP address
restrictions in go-camo.

go-camo has a subset of the security headers that our nginx reverse
proxy sets, and which camo set; provide the missing headers with `-H`
to ensure that go-camo, if exposed from behind some other non-nginx
load-balancer, still provides the necessary security headers.

Fixes #18351 by moving to supervisor.
Fixes zulip/docker-zulip#298 also by moving to supervisor.

[1] https://github.com/atmos/camo
[2] https://github.com/cactus/go-camo
2021-11-19 15:58:26 -08:00
Alex Vandiver c33562f0a8 puppet: Default to installing smokescreen on application frontends.
This is an additional security hardening step, to make Zulip default
to preventing SSRF attacks.  The overhead of running Smokescreen is
minimal, and there is no reason to force deployments to take
additional steps in order to secure themselves against SSRF attacks.

Deployments which already have a different external proxy configured
will not gain a local Smokescreen installation, and running without
Smokescreen is supported by explicitly unsetting the `host` or `port`
values in `/etc/zulip/zulip.conf`.
2021-11-19 15:29:28 -08:00
Alex Vandiver 44f1ea6bae puppet: Split smokescreen into a non-profile version.
In a subsequent commit, we intend to include it from
`zulip::app_frontend_base`, which is a layering violation if it only
exists in the form of a profile.
2021-11-19 15:29:28 -08:00
Alex Vandiver c2ed3c22b5 puppet: Remove unused smokescreen symlink. 2021-11-19 15:29:28 -08:00
Alex Vandiver 47e16a5d41 puppet: Tidy old smokescreen binaries. 2021-11-19 15:29:28 -08:00
Alex Vandiver 239ac8413e puppet: Embed golang version into binary path, to rebuild on new golang.
This will cause the output binary path to be sensitive to golang
version, causing it to be rebuilt on new golang, and an updated
supervisor config file written out, and thus supervisor also
restarted.
2021-11-19 15:29:28 -08:00
Alex Vandiver 216eeba2dd puppet: Factor out smokescreen binary path. 2021-11-19 15:29:28 -08:00
Alex Vandiver 3a7cef6582 puppet: Switch smokescreen to using zulip::external_dep, so it tidies. 2021-11-19 15:29:28 -08:00
Alex Vandiver ea08111d60 puppet: Move /srv/smokescreen-src to /srv/zulip-smokescreen-src.
As with the previous commit for `/srv/golang`, we have the custom of
namespacing things under `/srv` with `zulip-` to help ensure that we
play nice with anything else that happens to be on the host.
2021-11-19 15:29:28 -08:00
Anders Kaseorg c64e1adb19 puppet: Upgrade Smokescreen v0.0.2-59-gbfca45c to v0.0.2-63-gdc40301.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-11-19 15:29:28 -08:00
Alex Vandiver bb9d2df1ae puppet: Extract an external-tarball-dependency manifest. 2021-11-19 15:29:28 -08:00
Alex Vandiver 3c8d7e2598 puppet: Tidy old golang directories.
This relies on behavior which is only in Puppet 5.5.1 and above, which
means it must be skipped on Ubuntu 18.04.
2021-11-19 15:29:28 -08:00
Alex Vandiver 2fc4acdf81 puppet: Move /srv/golang to /srv/zulip-golang.
We have the custom of namespacing things under `/srv` with `zulip-`
to help ensure that we play nice with anything else that happens
to be on the host.
2021-11-19 15:29:28 -08:00
Alex Vandiver 00a4abb642 puppet: Switch dependency to the golang binary we need. 2021-11-19 15:29:28 -08:00
Alex Vandiver 2d5f813094 puppet: Stop making a /srv/golang symlink.
Nothing needs this extra directory.
2021-11-19 15:29:28 -08:00
Alex Vandiver 93af6c7f06 puppet: Factor out golang variables. 2021-11-19 15:29:28 -08:00
Alex Vandiver 21be36f15f puppet: Shorten golang version variable name. 2021-11-19 15:29:28 -08:00
Alex Vandiver 6b9e74adee puppet: Upgrade golang from 1.16.4 to 1.17.3. 2021-11-19 15:29:28 -08:00
Alex Vandiver 514801c509 puppet: Split out golang toolchain into its own manifest. 2021-11-19 15:29:28 -08:00
Alex Vandiver 610a0b2d59 nagios: `pg_is_in_recovery()` is better to know replica/primary status.
It is possible to be in recovery, and downloading WAL logs from
archives, and not yet be replicating.  If one only checks the
streaming log status, it reports as "no replicas" which is technically
accurate but not a useful summation of the state of the replica.
2021-11-17 13:38:26 -08:00
Alex Vandiver 83091cbc96 puppet: Swap the one use of the `cron` resource for an /etc/cron.d file.
The `cron` resource places its contents in the user's crontab, which
makes it unlike every other cron job that Zulip installs.

Switch to using `/etc/cron.d` files, like all other cron jobs.
2021-11-16 16:17:32 -08:00
Alex Vandiver 90e1a0400e puppet: Add a few more inter-resource dependencies.
None of these are important; they just express semantic dependencies.
2021-11-16 16:17:32 -08:00
Alex Vandiver 49ad188449 rate_limit: Add a flag to lump all TOR exit node IPs together.
TOR users are legitimate users of the system; however, that system can
also be used for abuse -- specifically, by evading IP-based
rate-limiting.

For the purposes of IP-based rate-limiting, add a
RATE_LIMIT_TOR_TOGETHER flag, defaulting to false, which lumps all
requests from TOR exit nodes into the same bucket.  This may allow a
TOR user to deny other TOR users access to the find-my-account and
new-realm endpoints, but this is a low cost for cutting off a
significant potential abuse vector.

If enabled, the list of TOR exit nodes is fetched from their public
endpoint once per hour, via a cron job, and cached on disk.  Django
processes load this data from disk, and cache it in memcached.
Requests are spared from the burden of checking disk on failure via a
circuitbreaker, which trips of there are two failures in a row, and
only begins trying again after 10 minutes.
2021-11-16 11:42:00 -08:00
Alex Vandiver 01c007ceaf puppet: Remove an out-of-date comment.
Comment was missed in 9d57fa9759.
2021-11-09 21:52:17 -08:00
Alex Vandiver 7af2fa2e92 puppet: Use sysv status command, not supervisorctl status.
Since Supervisor 4, which is installed on Ubuntu 20.04 and Debian 11,
`supervisorctl status` returns exit code 3 if any of the
supervisor-controlled processes are not running.

Using `supervisorctl status` as the Puppet `status` command for
Supervisor leads to unnecessarily trying to "start" a Supervisor
process which is already started, but happens to have one or more of
its managed processes stopped.  This is an unnecessary no-op in
production environments, but in docker-init enviroments, such as in
CI, attempting to start the process a second time is an error.

Switch to checking if supervisor is running by way of sysv init.  This
fixes the potential error in CI, as well as eliminates unnecessary
"starts" of supervisor when it was already running -- a situation
which made zulip-puppet-apply not idempotent:

```
root@alexmv-prod:~# supervisorctl status
process-fts-updates                                             STOPPED   Nov 10 12:33 AM
smokescreen                                                     RUNNING   pid 1287280, uptime 0:35:32
zulip-django                                                    STOPPED   Nov 10 12:33 AM
zulip-tornado                                                   STOPPED   Nov 10 12:33 AM
[...]

root@alexmv-prod:~# ~zulip/deployments/current/scripts/zulip-puppet-apply --force
Notice: Compiled catalog for alexmv-prod.zulipdev.org in environment production in 2.32 seconds
Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running'
Notice: Applied catalog in 0.91 seconds

root@alexmv-prod:~# ~zulip/deployments/current/scripts/zulip-puppet-apply --force
Notice: Compiled catalog for alexmv-prod.zulipdev.org in environment production in 2.35 seconds
Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running'
Notice: Applied catalog in 0.92 seconds
```
2021-11-09 21:52:17 -08:00
Alex Vandiver 8a1bb43b23 puppet: Adjust for templated paths and settings, set C.UTF-8 locale. 2021-11-08 18:21:46 -08:00
Alex Vandiver d3e9a71d42 puppet: Check in upstream PostgreSQL 14 configuration file.
Note that one `<%u%%d>` has to be escaped as `<%%u%%d>`.
2021-11-08 18:21:46 -08:00
Adam Benesh c881430f4c puppet: Add WSGIApplicationGroup config to Apache SSO example.
Zulip apparently is now affected by a bad interaction between Apache's
WSGI using Python subinterpreters and C extension modules like `re2`
that are not designed for it.

The solution is apparently to set WSGIApplicationGroup to %{GLOBAL},
which disables Apache's use of Python subinterpreters.

See https://serverfault.com/questions/514242/non-responsive-apache-mod-wsgi-after-installing-scipy/514251#514251 for background.

Fixes #19924.
2021-10-08 15:07:23 -07:00