Commit Graph

1342 Commits

Author SHA1 Message Date
Alex Vandiver 6f5ae8d13d puppet: wal-g backups are required for replication.
Previously, it was possible to configure `wal-g` backups without
replication enabled; this resulted in only daily backups, not
streaming backups.  It was also possible to enable replication without
configuring the `wal-g` backups bucket; this simply failed to work.

Make `wal-g` backups always streaming, and warn loudly if replication
is enabled but `wal-g` is not configured.
2022-03-11 10:09:35 -08:00
Alex Vandiver 6496d43148 puppet: Only s3_backups_bucket is required for backups.
`s3_backups_key` / `s3_backups_secret_key` are optional, as the
permissions could come from the EC2 instance's role.
2022-03-11 10:09:35 -08:00
Alex Vandiver 19beed2709 puppet: Default s3_region to the current ec2 region. 2022-03-11 10:09:35 -08:00
Anders Kaseorg b3260bd610 docs: Use Debian and Ubuntu version numbers over development codenames.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-23 12:04:24 -08:00
Anders Kaseorg 1629d6bfb3 python: Reformat with Black 22 (stable).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-18 18:03:13 -08:00
Alex Vandiver c656d933fa puppet: Switch from $::memorysize_mb to non-legacy $::memory. 2022-02-15 12:04:37 -08:00
Alex Vandiver f2f4462e71 puppet: Switch from $::fqdn to non-legacy $::networking. 2022-02-15 12:04:37 -08:00
Alex Vandiver bb4c0799cc puppet: Switch to the canonical case for $::os['family'].
The == operator in Puppet is case-insensitive for ASCII
characters[1], which is potentially surprising.  Switch to the
canonical case that `$::os['family']` returns.

[1] https://puppet.com/docs/puppet/5.5/lang_expressions.html#string-encoding-and-comparisons
2022-02-15 12:04:37 -08:00
Alex Vandiver d4eefbbeea puppet: Switch from $::osfamily to non-legacy $::os. 2022-02-15 12:04:37 -08:00
Alex Vandiver a787ebe0e2 puppet: Switch from $::architecture to non-legacy $::os. 2022-02-15 12:04:37 -08:00
Alex Vandiver d7e8733705 puppet: Use goarch for wal-g.
wal-g does not currently provide pre-built binaries for
arm64/aarch64 (see #21070) but if they begin to, it will likely be
with the goarch names.
2022-02-15 12:04:37 -08:00
Alex Vandiver abdbe4ca83 puppet: Use goarch for go-camo. 2022-02-15 12:04:37 -08:00
Alex Vandiver be2f2a5bde puppet: Use goarch for golang.
Fixes: #21051.
2022-02-15 12:04:37 -08:00
Alex Vandiver 788daa953b puppet: Factor out $::architecture case statement for golang. 2022-02-15 12:04:37 -08:00
Anders Kaseorg f6a701090c setup-apt-repos: Don’t install lsb_release.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-14 16:38:53 -08:00
Anders Kaseorg 45f4db9702 puppet: Remove unused $release_name.
It would confuse a future Debian 15.10 release with Ubuntu 15.10, it
relies on the legacy fact $::operatingsystemrelease, the modern fact
$::os provides this information without extra logic, and it’s unused
as of commit 03bffd3938.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-14 16:38:53 -08:00
Alex Vandiver 291c5e87b6 puppet: Upgrade prometheus to 2.33.1. 2022-02-09 20:32:24 -08:00
Alex Vandiver 2d538c2356 puppet: Upgrade grafana to 8.3.6. 2022-02-09 20:32:24 -08:00
Alex Vandiver f2e66c0b20 puppet: Upgrade go-camo to 2.4.0. 2022-02-09 20:32:24 -08:00
Alex Vandiver 51a516384d puppet: Upgrade golang to 1.17.6. 2022-02-09 20:32:24 -08:00
Alex Vandiver 48263a01dd puppet: Upgrade puppet libraries. 2022-02-09 20:32:24 -08:00
Alex Vandiver e032b38661 puppet: Fix typo in uwsgi exporter dependency. 2022-02-08 15:17:17 -08:00
Alex Vandiver b3900bec7e puppet: Upgrade Grafana to 8.3.5.
https://grafana.com/docs/grafana/latest/release-notes/release-notes-8-3-5/
2022-02-08 11:13:40 -08:00
Alex Vandiver a46f6df91e CVE-2021-43799: Write rabbitmq configuration before starting.
Zulip writes a `rabbitmq.config` configuration file which locks down
RabbitMQ to listen only on localhost:5672, as well as the RabbitMQ
distribution port, on localhost:25672.

The "distribution port" is part of Erlang's clustering configuration;
while it is documented that the protocol is fundamentally
insecure ([1], [2]) and can result in remote arbitrary execution of
code, by default the RabbitMQ configuration on Debian and Ubuntu
leaves it publicly accessible, with weak credentials.

The configuration file that Zulip writes, while effective, is only
written _after_ the package has been installed and the service
started, which leaves the port exposed until RabbitMQ or system
restart.

Ensure that rabbitmq's `/etc/rabbitmq/rabbitmq.config` is written
before rabbitmq is installed or starts, and that changes to that file
trigger a restart of the service, such that the ports are only ever
bound to localhost.  This does not mitigate existing installs, since
it does not force a rabbitmq restart.

[1] https://www.erlang.org/doc/apps/erts/erl_dist_protocol.html
[2] https://www.erlang.org/doc/reference_manual/distributed.html#distributed-erlang-system
2022-01-25 01:48:05 +00:00
Alex Vandiver 43d63bd5a1 puppet: Always set the RabbitMQ nodename to zulip@localhost.
This is required in order to lock down the RabbitMQ port to only
listen on localhost.  If the nodename is `rabbit@hostname`, in most
circumstances the hostname will resolve to an external IP, which the
rabbitmq port will not be bound to.

Installs which used `rabbit@hostname`, due to RabbitMQ having been
installed before Zulip, would not have functioned if the host or
RabbitMQ service was restarted, as the localhost restrictions in the
RabbitMQ configuration would have made rabbitmqctl (and Zulip cron
jobs that call it) unable to find the rabbitmq server.

The previous commit ensures that configure-rabbitmq is re-run after
the nodename has changed.  However, rabbitmq needs to be stopped
before `rabbitmq-env.conf` is changed; we use an `onlyif` on an `exec`
to print the warning about the node change, and let the subsequent
config change and notify of the service and configure-rabbitmq to
complete the re-configuration.
2022-01-25 01:48:02 +00:00
Alex Vandiver 3bfcfeac24 puppet: Run configure-rabbitmq on nodename change.
`/etc/rabbitmq/rabbitmq-env.conf` sets the nodename; anytime the
nodename changes, the backing database changes, and this requires
re-creating the rabbitmq users and permissions.

Trigger this in puppet by running configure-rabbitmq after the file
changes.
2022-01-25 01:46:51 +00:00
Alex Vandiver 694c4dfe8f puppet: Admit we leave epmd port 4369 open on all interfaces.
The Erlang `epmd` daemon listens on port 4369, and provides
information (without authentication) about which Erlang processes are
listening on what ports.  This information is not itself a
vulnerability, but may provide information for remote attackers about
what local Erlang services (such as `rabbitmq-server`) are running,
and where.

`epmd` supports an `ERL_EPMD_ADDRESS` environment variable to limit
which interfaces it binds on.  While this environment variable is set
in `/etc/default/rabbitmq-server`, Zulip unfortunately attempts to
start `epmd` using an explicit `exec` block, which ignores those
settings.

Regardless, this lack of `ERL_EPMD_ADDRESS` variable only controls
`epmd`'s startup upon first installation.  Upon reboot, there are two
ways in which `epmd` might be started, neither of which respect
`ERL_EPMD_ADDRESS`:

 - On Focal, an `epmd` service exists and is activated, which uses
   systemd's configuration to choose which interfaces to bind on, and
   thus `ERL_EPMD_ADDRESS` is irrelevant.

 - On Bionic (and Focal, due to a broken dependency from
   `rabbitmq-server` to `epmd@` instead of `epmd`, which may lead to
   the explicit `epmd` service losing a race), `epmd` is started by
   `rabbitmq-server` when it does not detect a running instance.
   Unfortunately, only `/etc/init.d/rabbitmq-server` would respects
   `/etc/default/rabbitmq-server` -- and it defers the actual startup
   to using systemd, which does not pass the environment variable
   down.  Thus, `ERL_EPMD_ADDRESS` is also irrelevant here.

We unfortunately cannot limit `epmd` to only listening on localhost,
due to a number of overlapping bugs and limitations:

 - Manually starting `epmd` with `-address 127.0.0.1` silently fails
   to start on hosts with IPv6 disabled, due to an Erlang bug ([1],
   [2]).

 - The dependencies of the systemd `rabbitmq-server` service can be
   fixed to include the `epmd` service, and systemd can be made to
   bind to `127.0.0.1:4369` and pass that socket to `epmd`, bypassing
   the above bug.  However, the startup of this service is not
   guaranteed, because it races with other sources of `epmd` (see
   below).

 - Any process that runs `rabbitmqctl` results in `epmd` being started
   if one is not currently running; these instances do not respect any
   environment variables as to which addresses to bind on.  This is
   also triggered by `service rabbitmq-server status`, as well as
   various Zulip cron jobs which inspect the rabbitmq queues.  As
   such, it is difficult-to-impossible to ensure that some other
   `epmd` process will not win the race and open the port on all
   interfaces.

Since the only known exposure from leaving port 4369 open is
information that rabbitmq is running on the host, and the complexity
of adjusting this to only bind on localhost is high, we remove the
setting which does not address the problem, and document that the port
is left open, and should be protected via system-level or
network-level firewalls.

[1]: https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109
[2]: https://github.com/erlang/otp/issues/4820
2022-01-25 01:46:51 +00:00
Alex Vandiver 2713e90eaf puppet: Remove rabbitmq_mochiweb configuration.
mochiweb was renamed to web_dispatch in RabbitMQ 3.8.0, and the plugin
is not enabled.  Nor does this control the management interface, which
would listen on port 15672.
2022-01-25 01:46:51 +00:00
Alex Vandiver a3adaf4aa3 puppet: Fix standalone certbot configurations.
This addresses the problems mentioned in the previous commit, but for
existing installations which have `authenticator = standalone` in
their configurations.

This reconfigures all hostnames in certbot to use the webroot
authenticator, and attempts to force-renew their certificates.
Force-renewal is necessary because certbot contains no way to merely
update the configuration.  Let's Encrypt allows for multiple extra
renewals per week, so this is a reasonable cost.

Because the certbot configuration is `configobj`, and not
`configparser`, we have no way to easily parse to determine if webroot
is in use; additionally, `certbot certificates` does not provide this
information.  We use `grep`, on the assumption that this will catch
nearly all cases.

It is possible that this will find `authenticator = standalone`
certificates which are managed by Certbot, but not Zulip certificates.
These certificates would also fail to renew while Zulip is running, so
switching them to use the Zulip webroot would still be an improvement.

Fixes #20593.
2022-01-24 12:13:44 -08:00
Anders Kaseorg 97e4e9886c python: Replace universal_newlines with text.
This is supported in Python ≥ 3.7.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-23 22:16:01 -08:00
Anders Kaseorg a58a71ef43 Remove Ubuntu 18.04 support.
As a consequence:

• Bump minimum supported Python version to 3.7.
• Move Vagrant environment to Debian 10, which has Python 3.7.
• Move CI frontend tests to Debian 10.
• Move production build test to Debian 10.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-21 17:26:14 -08:00
Alex Vandiver 3bbe5c1110 puppet: Put comments on iptables lines.
In addition to documenting the rules.v4 and rules.v6 files slightly,
these comments show up in `iptables -L`:

```
root@hostname:~# iptables -L INPUT
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere
LOGDROP    all  --  anywhere             localhost/8
ACCEPT     all  --  anywhere             anywhere             state RELATED,ESTABLISHED
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:ssh /* ssh */
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:3000 /* grafana */
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:9100 /* node_exporter */
LOGDROP    all  --  anywhere             anywhere
```
2022-01-21 16:46:14 -08:00
Alex Vandiver 6bc5849ea8 puppet: Remove now-unused debathena apt repository. 2022-01-18 14:13:28 -08:00
Alex Vandiver b3f07cc98d puppet: Replace debathena zephyr package with equivalent puppet file. 2022-01-18 14:13:28 -08:00
Alex Vandiver a6d7539571 puppet: Replace debathena krb5 package with equivalent puppet file. 2022-01-18 14:13:28 -08:00
Alex Vandiver 75224ea5de puppet: python-dev is now purely virtual; install python2.7-dev. 2022-01-18 14:13:28 -08:00
Alex Vandiver fc1adef28a puppet: Fix server_name of internal staging server. 2022-01-18 12:36:56 -08:00
Alex Vandiver 7e630b81f8 puppet: Switch to using snakeoil certs for staging.
This parallels ba3b88c81b, but for the
staging host.
2022-01-18 12:36:56 -08:00
Alex Vandiver fb4d9764fa puppet: Bump Grafana version, for 8.3.4. security release. 2022-01-18 12:33:02 -08:00
Alex Vandiver 434bda01c7 puppet: Enable camo prometheus metrics.
Doing so requires protecting /metrics from direct access when proxied
through nginx.  If camo is placed on a separate host, the equivalent
/metrics URL may need to be protected.

See https://github.com/cactus/go-camo#metrics for details on the
statistics so reported.  Note that 5xx responses are _expected_ from
go-camo's statistics, as it returns 502 status code when the remote
server responds with 500/502/503/504, or 504 when the remote host
times out.
2022-01-13 14:19:18 -08:00
Alex Vandiver 0b8a6a51b8 puppet: Remove all parts of AWS kernels.
Otherwise, we just uninstall the meta-package, and still restart into
the installed AWS kernel.
2022-01-12 15:52:19 -08:00
Alex Vandiver 4d7e6b26df puppet: Provide more attributes to teleport on ssh nodes. 2022-01-12 14:15:45 -08:00
Alex Vandiver 339e70671c puppet: Switch Grafana to Grafana 8 Unified Alerting. 2022-01-11 14:27:11 -08:00
Alex Vandiver 6a7eecee9a puppet: Increase load paging thresholds. 2022-01-11 09:38:31 -08:00
Alex Vandiver 1e80b844f4 puppet: Disable apparmor profile for msmtp.
As the nagios user, we want to read the msmtp configuration from
~nagios, which apparmor's profile does not allow msmtp to do.
2022-01-11 09:38:31 -08:00
Alex Vandiver 3c95ad82c6 puppet: Upgrade to nagios4.
This updates the puppeted nagios configuration file for the Nagios4
defaults.
2022-01-11 09:38:31 -08:00
Alex Vandiver d328d3dd4d puppet: Allow routing camo requests through an outgoing proxy.
Because Camo includes logic to deny access to private subnets, routing
its requests through Smokescreen is generally not necessary.  However,
it may be necessary if Zulip has configured a non-Smokescreen exit
proxy.

Default Camo to using the proxy only if it is not Smokescreen, with a
new `proxy.enable_for_camo` setting to override this behaviour if need
be.  Note that that setting is in `zulip.conf` on the host with Camo
installed -- not the Zulip frontend host, if they are different.

Fixes: #20550.
2022-01-07 12:08:10 -08:00
Alex Vandiver 2c5fc1827c puppet: Standardize what values are bools, and what true is.
For `no_serve_uploads`, `http_only`, which previously specified
"non-empty" to enable, this tightens what values are true.  For
`pgroonga` and `queue_workers_multiprocess`, this broadens the
possible values from `enabled`, and `true` respectively.
2022-01-07 12:08:10 -08:00
Alex Vandiver 1e672e4d82 puppet: Remove unused $no_serve_uploads in app_frontend. 2022-01-07 12:08:10 -08:00
Alex Vandiver 6218ed91c2 puppet: Use lazy-apps and uwsgi control sockets for rolling reloads.
Restarting the uwsgi processes by way of supervisor opens a window
during which nginx 502's all responses.  uwsgi has a configuration
called "chain reloading" which allows for rolling restart of the uwsgi
processes, such that only one process at once in unavailable; see
uwsgi documentation ([1]).

The tradeoff is that this requires that the uwsgi processes load the
libraries after forking, rather than before ("lazy apps"); in theory
this can lead to larger memory footprints, since they are not shared.
In practice, as Django defers much of the loading, this is not as much
of an issue.  In a very basic test of memory consumption (measured by
total memory - free - caches - buffers; 6 uwsgi workers), both
immediately after restarting Django, and after requesting `/` 60 times
with 6 concurrent requests:

                      |  Non-lazy  |  Lazy app  | Difference
    ------------------+------------+------------+-------------
    Fresh             |  2,827,216 |  2,870,480 |   +43,264
    After 60 requests |  3,332,284 |  3,409,608 |   +77,324
    ..................|............|............|.............
    Difference        |   +505,068 |   +539,128 |   +34,060

That is, "lazy app" loading increased the footprint pre-requests by
43MB, and after 60 requests grew the memory footprint by 539MB, as
opposed to non-lazy loading, which grew it by 505MB.  Using wsgi "lazy
app" loading does increase the memory footprint, but not by a large
percentage.

The other effect is that processes may be served by either old or new
code during the restart window.  This may cause transient failures
when new frontend code talks to old backend code.

Enable chain-reloading during graceful, puppetless restarts, but only
if enabled via a zulip.conf configuration flag.

Fixes #2559.

[1]: https://uwsgi-docs.readthedocs.io/en/latest/articles/TheArtOfGracefulReloading.html#chain-reloading-lazy-apps
2022-01-05 14:48:52 -08:00