Commit Graph

1265 Commits

Author SHA1 Message Date
Alex Vandiver 053682964e puppet: Only fetch from running hosts in Grafana ec2 discovery. 2021-12-09 08:12:03 -08:00
Alex Vandiver 291f688678 puppet: Use zulip::external_dep for grafana, template config.
Templating the config ensures that the service is restarted when it is
upgraded.
2021-12-08 20:58:10 -08:00
Alex Vandiver 3eae429ab4 puppet: Upgrade Grafana to 8.3.1, for CVE-2021-43798. 2021-12-08 20:58:10 -08:00
Alex Vandiver 7db146d0a9 puppet: Do not assume amd64 architecture. 2021-12-06 11:08:50 -08:00
Alex Vandiver fb2d05f9e3 puppet: Remove unused 'builder' files.
These are leftover detritus from the "builder" host, which was removed
in 4c9a283542.
2021-12-06 10:21:50 -08:00
Alex Vandiver cb2d0ff32b postgresql: Support replication on PostgreSQL >= 11, document.
PostgreSQL 11 and below used a configuration file names
`recovery.conf` to manage replicas and standbys; support for this was
removed in PostgreSQL 12[1], and the configuration parameters were
moved into the main `postgresql.conf`.

Add `zulip.conf` settings for the primary server hostname and
replication username, so that the complete `postgresql.conf`
configuration on PostgreSQL 14 can continue to be managed, even when
replication is enabled.  For consistency, also begin writing out the
`recovery.conf` for PostgreSQL 11 and below.

In PostgreSQL 12 configuration and later, the `wal_level =
hot_standby` setting is removed, as `hot_standby` is equivalent to
`replica`, which is the default value[2].  Similarly, the
`hot_standby = on` setting is also the default[3].

Documentation is added for these features, and the commentary on the
"Export and Import" page referencing files under `puppet/zulip_ops/`
is removed, as those files no longer have any replication-specific
configuration.

[1]: https://www.postgresql.org/docs/current/recovery-config.html
[2]: https://www.postgresql.org/docs/12/runtime-config-wal.html#GUC-WAL-LEVEL
[3]: https://www.postgresql.org/docs/12/runtime-config-replication.html#GUC-HOT-STANDBY
2021-12-03 16:32:41 -08:00
Alex Vandiver 7d3399a970 puppet: Drop configuration files for unsupported PostgreSQL versions.
These are both unsupported by PostgreSQL itself, as well as by Zulip;
the removal of Ubuntu Xenial and Debian Stretch support in Zulip 3.0
removed the requirement for PostgreSQL 9.6, and the previous versions
date back yet farther.
2021-12-03 16:32:41 -08:00
Alex Vandiver 6436c4087d puppet: Tidy old wal-g binaries. 2021-12-03 16:17:50 -08:00
Alex Vandiver 53cc9538f7 puppet: Factor out wal-g binary path. 2021-12-03 16:17:50 -08:00
Alex Vandiver 338483792b puppet: Upgrade wal-g release to 1.1.1. 2021-12-03 16:17:50 -08:00
Anders Kaseorg 325b4bac7e env-wal-g: Quote $s3_backups_bucket.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-03 14:33:53 -08:00
Alex Vandiver f31bf3f06c puppet: Install camo on Docker.
Now that go-camo runs within supervisor, it can be run in Docker
simply.

Fixes #20101.
Fixes zulip/docker-zulip#179.
2021-12-02 09:25:00 -08:00
Alex Vandiver 358a7fb0c6 puppet: Read camo secret at startup time, not at puppet-apply time.
Writing the secret to the supervisor configuration file makes changes
to the secret requires a zulip-puppet-apply to take hold.  The Docker
image is constructed to avoid having to run zulip-puppet-apply on
startup, and indeed cannot run zulip-puppet-apply after having
configured secrets, as it has replaced the zulip.conf file with a
symlink, for example.  This means that camo gets the static secret
that was built into the image, and not the one regenerated on first
startup.

Read the camo secret at process startup time.  Because this pattern is
likely common with "12-factor" applications which can read from
environment variables, write a generic tool to map secrets to
environment variables before exec'ing a binary, and use that for Camo.
2021-12-02 09:25:00 -08:00
Alex Vandiver 86cf3be39f puppet: Fix pgroonga init for custom database names and users. 2021-11-20 07:13:50 -08:00
Alex Vandiver c514feaa22 puppet: Default go-camo to listening on localhost for standalone deploys.
The default in the previous commit, inherited from camo, was to bind
to 0.0.0.0:9292.  In standalone deployments, camo is deployed on the
same host as the nginx reverse proxy, and as such there is no need to
open it up to other IPs.

Make `zulip::camo` take an optional parameter, which allows overriding
it in puppet, but skips a `zulip.conf` setting for it, since it is
unlikely to be adjust by most users.
2021-11-19 15:58:26 -08:00
Alex Vandiver b982222e03 camo: Replace with go-camo implementation.
The upstream of the `camo` repository[1] has been unmaintained for
several years, and is now archived by the owner.  Additionally, it has
a number of limitations:
 - It is installed as a sysinit service, which does not run under
   Docker
 - It does not prevent access to internal IPs, like 127.0.0.1
 - It does not respect standard `HTTP_proxy` environment variables,
   making it unable to use Smokescreen to prevent the prior flaw
 - It occasionally just crashes, and thus must have a cron job to
   restart it.

Swap camo out for the drop-in replacement go-camo[2], which has the
same external API, requiring not changes to Django code, but is more
maintained.  Additionally, it resolves all of the above complaints.

go-camo is not configured to use Smokescreen as a proxy, because its
own private-IP filtering prevents using a proxy which lies within that
IP space.  It is also unclear if the addition of Smokescreen would
provide any additional protection over the existing IP address
restrictions in go-camo.

go-camo has a subset of the security headers that our nginx reverse
proxy sets, and which camo set; provide the missing headers with `-H`
to ensure that go-camo, if exposed from behind some other non-nginx
load-balancer, still provides the necessary security headers.

Fixes #18351 by moving to supervisor.
Fixes zulip/docker-zulip#298 also by moving to supervisor.

[1] https://github.com/atmos/camo
[2] https://github.com/cactus/go-camo
2021-11-19 15:58:26 -08:00
Alex Vandiver c33562f0a8 puppet: Default to installing smokescreen on application frontends.
This is an additional security hardening step, to make Zulip default
to preventing SSRF attacks.  The overhead of running Smokescreen is
minimal, and there is no reason to force deployments to take
additional steps in order to secure themselves against SSRF attacks.

Deployments which already have a different external proxy configured
will not gain a local Smokescreen installation, and running without
Smokescreen is supported by explicitly unsetting the `host` or `port`
values in `/etc/zulip/zulip.conf`.
2021-11-19 15:29:28 -08:00
Alex Vandiver 44f1ea6bae puppet: Split smokescreen into a non-profile version.
In a subsequent commit, we intend to include it from
`zulip::app_frontend_base`, which is a layering violation if it only
exists in the form of a profile.
2021-11-19 15:29:28 -08:00
Alex Vandiver c2ed3c22b5 puppet: Remove unused smokescreen symlink. 2021-11-19 15:29:28 -08:00
Alex Vandiver 47e16a5d41 puppet: Tidy old smokescreen binaries. 2021-11-19 15:29:28 -08:00
Alex Vandiver 239ac8413e puppet: Embed golang version into binary path, to rebuild on new golang.
This will cause the output binary path to be sensitive to golang
version, causing it to be rebuilt on new golang, and an updated
supervisor config file written out, and thus supervisor also
restarted.
2021-11-19 15:29:28 -08:00
Alex Vandiver 216eeba2dd puppet: Factor out smokescreen binary path. 2021-11-19 15:29:28 -08:00
Alex Vandiver 3a7cef6582 puppet: Switch smokescreen to using zulip::external_dep, so it tidies. 2021-11-19 15:29:28 -08:00
Alex Vandiver ea08111d60 puppet: Move /srv/smokescreen-src to /srv/zulip-smokescreen-src.
As with the previous commit for `/srv/golang`, we have the custom of
namespacing things under `/srv` with `zulip-` to help ensure that we
play nice with anything else that happens to be on the host.
2021-11-19 15:29:28 -08:00
Anders Kaseorg c64e1adb19 puppet: Upgrade Smokescreen v0.0.2-59-gbfca45c to v0.0.2-63-gdc40301.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-11-19 15:29:28 -08:00
Alex Vandiver bb9d2df1ae puppet: Extract an external-tarball-dependency manifest. 2021-11-19 15:29:28 -08:00
Alex Vandiver 3c8d7e2598 puppet: Tidy old golang directories.
This relies on behavior which is only in Puppet 5.5.1 and above, which
means it must be skipped on Ubuntu 18.04.
2021-11-19 15:29:28 -08:00
Alex Vandiver 2fc4acdf81 puppet: Move /srv/golang to /srv/zulip-golang.
We have the custom of namespacing things under `/srv` with `zulip-`
to help ensure that we play nice with anything else that happens
to be on the host.
2021-11-19 15:29:28 -08:00
Alex Vandiver 00a4abb642 puppet: Switch dependency to the golang binary we need. 2021-11-19 15:29:28 -08:00
Alex Vandiver 2d5f813094 puppet: Stop making a /srv/golang symlink.
Nothing needs this extra directory.
2021-11-19 15:29:28 -08:00
Alex Vandiver 93af6c7f06 puppet: Factor out golang variables. 2021-11-19 15:29:28 -08:00
Alex Vandiver 21be36f15f puppet: Shorten golang version variable name. 2021-11-19 15:29:28 -08:00
Alex Vandiver 6b9e74adee puppet: Upgrade golang from 1.16.4 to 1.17.3. 2021-11-19 15:29:28 -08:00
Alex Vandiver 514801c509 puppet: Split out golang toolchain into its own manifest. 2021-11-19 15:29:28 -08:00
Alex Vandiver 610a0b2d59 nagios: `pg_is_in_recovery()` is better to know replica/primary status.
It is possible to be in recovery, and downloading WAL logs from
archives, and not yet be replicating.  If one only checks the
streaming log status, it reports as "no replicas" which is technically
accurate but not a useful summation of the state of the replica.
2021-11-17 13:38:26 -08:00
Alex Vandiver 83091cbc96 puppet: Swap the one use of the `cron` resource for an /etc/cron.d file.
The `cron` resource places its contents in the user's crontab, which
makes it unlike every other cron job that Zulip installs.

Switch to using `/etc/cron.d` files, like all other cron jobs.
2021-11-16 16:17:32 -08:00
Alex Vandiver 90e1a0400e puppet: Add a few more inter-resource dependencies.
None of these are important; they just express semantic dependencies.
2021-11-16 16:17:32 -08:00
Alex Vandiver 49ad188449 rate_limit: Add a flag to lump all TOR exit node IPs together.
TOR users are legitimate users of the system; however, that system can
also be used for abuse -- specifically, by evading IP-based
rate-limiting.

For the purposes of IP-based rate-limiting, add a
RATE_LIMIT_TOR_TOGETHER flag, defaulting to false, which lumps all
requests from TOR exit nodes into the same bucket.  This may allow a
TOR user to deny other TOR users access to the find-my-account and
new-realm endpoints, but this is a low cost for cutting off a
significant potential abuse vector.

If enabled, the list of TOR exit nodes is fetched from their public
endpoint once per hour, via a cron job, and cached on disk.  Django
processes load this data from disk, and cache it in memcached.
Requests are spared from the burden of checking disk on failure via a
circuitbreaker, which trips of there are two failures in a row, and
only begins trying again after 10 minutes.
2021-11-16 11:42:00 -08:00
Alex Vandiver 01c007ceaf puppet: Remove an out-of-date comment.
Comment was missed in 9d57fa9759.
2021-11-09 21:52:17 -08:00
Alex Vandiver 7af2fa2e92 puppet: Use sysv status command, not supervisorctl status.
Since Supervisor 4, which is installed on Ubuntu 20.04 and Debian 11,
`supervisorctl status` returns exit code 3 if any of the
supervisor-controlled processes are not running.

Using `supervisorctl status` as the Puppet `status` command for
Supervisor leads to unnecessarily trying to "start" a Supervisor
process which is already started, but happens to have one or more of
its managed processes stopped.  This is an unnecessary no-op in
production environments, but in docker-init enviroments, such as in
CI, attempting to start the process a second time is an error.

Switch to checking if supervisor is running by way of sysv init.  This
fixes the potential error in CI, as well as eliminates unnecessary
"starts" of supervisor when it was already running -- a situation
which made zulip-puppet-apply not idempotent:

```
root@alexmv-prod:~# supervisorctl status
process-fts-updates                                             STOPPED   Nov 10 12:33 AM
smokescreen                                                     RUNNING   pid 1287280, uptime 0:35:32
zulip-django                                                    STOPPED   Nov 10 12:33 AM
zulip-tornado                                                   STOPPED   Nov 10 12:33 AM
[...]

root@alexmv-prod:~# ~zulip/deployments/current/scripts/zulip-puppet-apply --force
Notice: Compiled catalog for alexmv-prod.zulipdev.org in environment production in 2.32 seconds
Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running'
Notice: Applied catalog in 0.91 seconds

root@alexmv-prod:~# ~zulip/deployments/current/scripts/zulip-puppet-apply --force
Notice: Compiled catalog for alexmv-prod.zulipdev.org in environment production in 2.35 seconds
Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running'
Notice: Applied catalog in 0.92 seconds
```
2021-11-09 21:52:17 -08:00
Alex Vandiver 8a1bb43b23 puppet: Adjust for templated paths and settings, set C.UTF-8 locale. 2021-11-08 18:21:46 -08:00
Alex Vandiver d3e9a71d42 puppet: Check in upstream PostgreSQL 14 configuration file.
Note that one `<%u%%d>` has to be escaped as `<%%u%%d>`.
2021-11-08 18:21:46 -08:00
Adam Benesh c881430f4c puppet: Add WSGIApplicationGroup config to Apache SSO example.
Zulip apparently is now affected by a bad interaction between Apache's
WSGI using Python subinterpreters and C extension modules like `re2`
that are not designed for it.

The solution is apparently to set WSGIApplicationGroup to %{GLOBAL},
which disables Apache's use of Python subinterpreters.

See https://serverfault.com/questions/514242/non-responsive-apache-mod-wsgi-after-installing-scipy/514251#514251 for background.

Fixes #19924.
2021-10-08 15:07:23 -07:00
Tim Abbott 33b5fa633a process_fts_updates: Fix docker-zulip support.
In the series of migrations to this tool's configuration to support
specifying an arbitrary database name
(e.g. c17f502bb0), we broke support for
running process_fts_updates on the application server, connected to a
remote database server. That workflow is used by docker-zulip and
presumably other settings like Amazon RDS.

The fix is to import the Zulip virtualenv (if available) when running
on an application server.  This is better than just supporting this
case, since both docker-zulip and an Amazon RDS database are setting
where it would be inconvenient to run process-fts-updates directly on
the database server. (In the former case, because we want to avoid
having a strong version dependency on the postgres container).

Details are available in this conversation:
https://chat.zulip.org/#narrow/stream/49-development-help/topic/Logic.20in.20process_fts_updates.20seems.20to.20be.20broken/near/1251894

Thanks to Erik Tews for reporting and help in debugging this issue.
2021-09-27 18:17:33 -05:00
Alex Vandiver 1806e0f45e puppet: Remove zulip.org configuration. 2021-08-26 17:21:31 -07:00
Alex Vandiver 27881babab puppet: Increase prometheus storage, from the default 15d. 2021-08-24 23:40:43 -07:00
Alex Vandiver faf71eea41 upgrade-postgresql: Do not remove other supervisor configs.
We previously used `zulip-puppet-apply` with a custom config file,
with an updated PostgreSQL version but more limited set of
`puppet_classes`, to pre-create the basic settings for the new cluster
before running `pg_upgradecluster`.

Unfortunately, the supervisor config uses `purge => true` to remove
all SUPERVISOR configuration files that are not included in the puppet
configuration; this leads to it removing all other supervisor
processes during the upgrade, only to add them back and start them
during the second `zulip-puppet-apply`.

It also leads to `process-fts-updates` not being started after the
upgrade completes; this is the one supervisor config file which was
not removed and re-added, and thus the one that is not re-started due
to having been re-added.  This was not detected in CI because CI added
a `start-server` command which was not in the upgrade documentation.

Set a custom facter fact that prevents the `purge` behaviour of the
supervisor configuration.  We want to preserve that behaviour in
general, and using `zulip-puppet-apply` continues to be the best way
to pre-set-up the PostgreSQL configuration -- but we wish to avoid
that behaviour when we know we are applying a subset of the puppet
classes.

Since supervisor configs are no longer removed and re-added, this
requires an explicit start-server step in the instructions after the
upgrades complete.  This brings the documentation into alignment with
what CI is testing.
2021-08-24 19:00:58 -07:00
Alex Vandiver e46e862f2b puppet: Add a bare-bones zulipbot profile.
This sets up the firewalls appropriate for zulipbot, but does not
automate any of the configuration of zulipbot itself.
2021-08-24 16:05:58 -07:00
Alex Vandiver 5857dcd9b4 puppet: Configure ip6tables in parallel to ipv4.
Previously, IPv6 firewalls were left at the default all-open.

Configure IPv6 equivalently to IPv4.
2021-08-24 16:05:46 -07:00
Alex Vandiver 845509a9ec puppet: Be explicit that existing iptables are only ipv4. 2021-08-24 16:05:46 -07:00