Commit Graph

1728 Commits

Author SHA1 Message Date
Anders Kaseorg f6a701090c setup-apt-repos: Don’t install lsb_release.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-14 16:38:53 -08:00
Anders Kaseorg 45f4db9702 puppet: Remove unused $release_name.
It would confuse a future Debian 15.10 release with Ubuntu 15.10, it
relies on the legacy fact $::operatingsystemrelease, the modern fact
$::os provides this information without extra logic, and it’s unused
as of commit 03bffd3938.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-14 16:38:53 -08:00
Alex Vandiver 291c5e87b6 puppet: Upgrade prometheus to 2.33.1. 2022-02-09 20:32:24 -08:00
Alex Vandiver 2d538c2356 puppet: Upgrade grafana to 8.3.6. 2022-02-09 20:32:24 -08:00
Alex Vandiver f2e66c0b20 puppet: Upgrade go-camo to 2.4.0. 2022-02-09 20:32:24 -08:00
Alex Vandiver 51a516384d puppet: Upgrade golang to 1.17.6. 2022-02-09 20:32:24 -08:00
Alex Vandiver 48263a01dd puppet: Upgrade puppet libraries. 2022-02-09 20:32:24 -08:00
Alex Vandiver e032b38661 puppet: Fix typo in uwsgi exporter dependency. 2022-02-08 15:17:17 -08:00
Alex Vandiver b3900bec7e puppet: Upgrade Grafana to 8.3.5.
https://grafana.com/docs/grafana/latest/release-notes/release-notes-8-3-5/
2022-02-08 11:13:40 -08:00
Alex Vandiver a46f6df91e CVE-2021-43799: Write rabbitmq configuration before starting.
Zulip writes a `rabbitmq.config` configuration file which locks down
RabbitMQ to listen only on localhost:5672, as well as the RabbitMQ
distribution port, on localhost:25672.

The "distribution port" is part of Erlang's clustering configuration;
while it is documented that the protocol is fundamentally
insecure ([1], [2]) and can result in remote arbitrary execution of
code, by default the RabbitMQ configuration on Debian and Ubuntu
leaves it publicly accessible, with weak credentials.

The configuration file that Zulip writes, while effective, is only
written _after_ the package has been installed and the service
started, which leaves the port exposed until RabbitMQ or system
restart.

Ensure that rabbitmq's `/etc/rabbitmq/rabbitmq.config` is written
before rabbitmq is installed or starts, and that changes to that file
trigger a restart of the service, such that the ports are only ever
bound to localhost.  This does not mitigate existing installs, since
it does not force a rabbitmq restart.

[1] https://www.erlang.org/doc/apps/erts/erl_dist_protocol.html
[2] https://www.erlang.org/doc/reference_manual/distributed.html#distributed-erlang-system
2022-01-25 01:48:05 +00:00
Alex Vandiver 43d63bd5a1 puppet: Always set the RabbitMQ nodename to zulip@localhost.
This is required in order to lock down the RabbitMQ port to only
listen on localhost.  If the nodename is `rabbit@hostname`, in most
circumstances the hostname will resolve to an external IP, which the
rabbitmq port will not be bound to.

Installs which used `rabbit@hostname`, due to RabbitMQ having been
installed before Zulip, would not have functioned if the host or
RabbitMQ service was restarted, as the localhost restrictions in the
RabbitMQ configuration would have made rabbitmqctl (and Zulip cron
jobs that call it) unable to find the rabbitmq server.

The previous commit ensures that configure-rabbitmq is re-run after
the nodename has changed.  However, rabbitmq needs to be stopped
before `rabbitmq-env.conf` is changed; we use an `onlyif` on an `exec`
to print the warning about the node change, and let the subsequent
config change and notify of the service and configure-rabbitmq to
complete the re-configuration.
2022-01-25 01:48:02 +00:00
Alex Vandiver 3bfcfeac24 puppet: Run configure-rabbitmq on nodename change.
`/etc/rabbitmq/rabbitmq-env.conf` sets the nodename; anytime the
nodename changes, the backing database changes, and this requires
re-creating the rabbitmq users and permissions.

Trigger this in puppet by running configure-rabbitmq after the file
changes.
2022-01-25 01:46:51 +00:00
Alex Vandiver 694c4dfe8f puppet: Admit we leave epmd port 4369 open on all interfaces.
The Erlang `epmd` daemon listens on port 4369, and provides
information (without authentication) about which Erlang processes are
listening on what ports.  This information is not itself a
vulnerability, but may provide information for remote attackers about
what local Erlang services (such as `rabbitmq-server`) are running,
and where.

`epmd` supports an `ERL_EPMD_ADDRESS` environment variable to limit
which interfaces it binds on.  While this environment variable is set
in `/etc/default/rabbitmq-server`, Zulip unfortunately attempts to
start `epmd` using an explicit `exec` block, which ignores those
settings.

Regardless, this lack of `ERL_EPMD_ADDRESS` variable only controls
`epmd`'s startup upon first installation.  Upon reboot, there are two
ways in which `epmd` might be started, neither of which respect
`ERL_EPMD_ADDRESS`:

 - On Focal, an `epmd` service exists and is activated, which uses
   systemd's configuration to choose which interfaces to bind on, and
   thus `ERL_EPMD_ADDRESS` is irrelevant.

 - On Bionic (and Focal, due to a broken dependency from
   `rabbitmq-server` to `epmd@` instead of `epmd`, which may lead to
   the explicit `epmd` service losing a race), `epmd` is started by
   `rabbitmq-server` when it does not detect a running instance.
   Unfortunately, only `/etc/init.d/rabbitmq-server` would respects
   `/etc/default/rabbitmq-server` -- and it defers the actual startup
   to using systemd, which does not pass the environment variable
   down.  Thus, `ERL_EPMD_ADDRESS` is also irrelevant here.

We unfortunately cannot limit `epmd` to only listening on localhost,
due to a number of overlapping bugs and limitations:

 - Manually starting `epmd` with `-address 127.0.0.1` silently fails
   to start on hosts with IPv6 disabled, due to an Erlang bug ([1],
   [2]).

 - The dependencies of the systemd `rabbitmq-server` service can be
   fixed to include the `epmd` service, and systemd can be made to
   bind to `127.0.0.1:4369` and pass that socket to `epmd`, bypassing
   the above bug.  However, the startup of this service is not
   guaranteed, because it races with other sources of `epmd` (see
   below).

 - Any process that runs `rabbitmqctl` results in `epmd` being started
   if one is not currently running; these instances do not respect any
   environment variables as to which addresses to bind on.  This is
   also triggered by `service rabbitmq-server status`, as well as
   various Zulip cron jobs which inspect the rabbitmq queues.  As
   such, it is difficult-to-impossible to ensure that some other
   `epmd` process will not win the race and open the port on all
   interfaces.

Since the only known exposure from leaving port 4369 open is
information that rabbitmq is running on the host, and the complexity
of adjusting this to only bind on localhost is high, we remove the
setting which does not address the problem, and document that the port
is left open, and should be protected via system-level or
network-level firewalls.

[1]: https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109
[2]: https://github.com/erlang/otp/issues/4820
2022-01-25 01:46:51 +00:00
Alex Vandiver 2713e90eaf puppet: Remove rabbitmq_mochiweb configuration.
mochiweb was renamed to web_dispatch in RabbitMQ 3.8.0, and the plugin
is not enabled.  Nor does this control the management interface, which
would listen on port 15672.
2022-01-25 01:46:51 +00:00
Alex Vandiver a3adaf4aa3 puppet: Fix standalone certbot configurations.
This addresses the problems mentioned in the previous commit, but for
existing installations which have `authenticator = standalone` in
their configurations.

This reconfigures all hostnames in certbot to use the webroot
authenticator, and attempts to force-renew their certificates.
Force-renewal is necessary because certbot contains no way to merely
update the configuration.  Let's Encrypt allows for multiple extra
renewals per week, so this is a reasonable cost.

Because the certbot configuration is `configobj`, and not
`configparser`, we have no way to easily parse to determine if webroot
is in use; additionally, `certbot certificates` does not provide this
information.  We use `grep`, on the assumption that this will catch
nearly all cases.

It is possible that this will find `authenticator = standalone`
certificates which are managed by Certbot, but not Zulip certificates.
These certificates would also fail to renew while Zulip is running, so
switching them to use the Zulip webroot would still be an improvement.

Fixes #20593.
2022-01-24 12:13:44 -08:00
Anders Kaseorg 97e4e9886c python: Replace universal_newlines with text.
This is supported in Python ≥ 3.7.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-23 22:16:01 -08:00
Anders Kaseorg a58a71ef43 Remove Ubuntu 18.04 support.
As a consequence:

• Bump minimum supported Python version to 3.7.
• Move Vagrant environment to Debian 10, which has Python 3.7.
• Move CI frontend tests to Debian 10.
• Move production build test to Debian 10.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-21 17:26:14 -08:00
Alex Vandiver 3bbe5c1110 puppet: Put comments on iptables lines.
In addition to documenting the rules.v4 and rules.v6 files slightly,
these comments show up in `iptables -L`:

```
root@hostname:~# iptables -L INPUT
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere
LOGDROP    all  --  anywhere             localhost/8
ACCEPT     all  --  anywhere             anywhere             state RELATED,ESTABLISHED
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:ssh /* ssh */
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:3000 /* grafana */
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:9100 /* node_exporter */
LOGDROP    all  --  anywhere             anywhere
```
2022-01-21 16:46:14 -08:00
Alex Vandiver 6bc5849ea8 puppet: Remove now-unused debathena apt repository. 2022-01-18 14:13:28 -08:00
Alex Vandiver b3f07cc98d puppet: Replace debathena zephyr package with equivalent puppet file. 2022-01-18 14:13:28 -08:00
Alex Vandiver a6d7539571 puppet: Replace debathena krb5 package with equivalent puppet file. 2022-01-18 14:13:28 -08:00
Alex Vandiver 75224ea5de puppet: python-dev is now purely virtual; install python2.7-dev. 2022-01-18 14:13:28 -08:00
Alex Vandiver fc1adef28a puppet: Fix server_name of internal staging server. 2022-01-18 12:36:56 -08:00
Alex Vandiver 7e630b81f8 puppet: Switch to using snakeoil certs for staging.
This parallels ba3b88c81b, but for the
staging host.
2022-01-18 12:36:56 -08:00
Alex Vandiver fb4d9764fa puppet: Bump Grafana version, for 8.3.4. security release. 2022-01-18 12:33:02 -08:00
Alex Vandiver 434bda01c7 puppet: Enable camo prometheus metrics.
Doing so requires protecting /metrics from direct access when proxied
through nginx.  If camo is placed on a separate host, the equivalent
/metrics URL may need to be protected.

See https://github.com/cactus/go-camo#metrics for details on the
statistics so reported.  Note that 5xx responses are _expected_ from
go-camo's statistics, as it returns 502 status code when the remote
server responds with 500/502/503/504, or 504 when the remote host
times out.
2022-01-13 14:19:18 -08:00
Alex Vandiver 0b8a6a51b8 puppet: Remove all parts of AWS kernels.
Otherwise, we just uninstall the meta-package, and still restart into
the installed AWS kernel.
2022-01-12 15:52:19 -08:00
Alex Vandiver 4d7e6b26df puppet: Provide more attributes to teleport on ssh nodes. 2022-01-12 14:15:45 -08:00
Alex Vandiver 339e70671c puppet: Switch Grafana to Grafana 8 Unified Alerting. 2022-01-11 14:27:11 -08:00
Alex Vandiver 6a7eecee9a puppet: Increase load paging thresholds. 2022-01-11 09:38:31 -08:00
Alex Vandiver 1e80b844f4 puppet: Disable apparmor profile for msmtp.
As the nagios user, we want to read the msmtp configuration from
~nagios, which apparmor's profile does not allow msmtp to do.
2022-01-11 09:38:31 -08:00
Alex Vandiver 3c95ad82c6 puppet: Upgrade to nagios4.
This updates the puppeted nagios configuration file for the Nagios4
defaults.
2022-01-11 09:38:31 -08:00
Alex Vandiver d328d3dd4d puppet: Allow routing camo requests through an outgoing proxy.
Because Camo includes logic to deny access to private subnets, routing
its requests through Smokescreen is generally not necessary.  However,
it may be necessary if Zulip has configured a non-Smokescreen exit
proxy.

Default Camo to using the proxy only if it is not Smokescreen, with a
new `proxy.enable_for_camo` setting to override this behaviour if need
be.  Note that that setting is in `zulip.conf` on the host with Camo
installed -- not the Zulip frontend host, if they are different.

Fixes: #20550.
2022-01-07 12:08:10 -08:00
Alex Vandiver 2c5fc1827c puppet: Standardize what values are bools, and what true is.
For `no_serve_uploads`, `http_only`, which previously specified
"non-empty" to enable, this tightens what values are true.  For
`pgroonga` and `queue_workers_multiprocess`, this broadens the
possible values from `enabled`, and `true` respectively.
2022-01-07 12:08:10 -08:00
Alex Vandiver 1e672e4d82 puppet: Remove unused $no_serve_uploads in app_frontend. 2022-01-07 12:08:10 -08:00
Alex Vandiver 6218ed91c2 puppet: Use lazy-apps and uwsgi control sockets for rolling reloads.
Restarting the uwsgi processes by way of supervisor opens a window
during which nginx 502's all responses.  uwsgi has a configuration
called "chain reloading" which allows for rolling restart of the uwsgi
processes, such that only one process at once in unavailable; see
uwsgi documentation ([1]).

The tradeoff is that this requires that the uwsgi processes load the
libraries after forking, rather than before ("lazy apps"); in theory
this can lead to larger memory footprints, since they are not shared.
In practice, as Django defers much of the loading, this is not as much
of an issue.  In a very basic test of memory consumption (measured by
total memory - free - caches - buffers; 6 uwsgi workers), both
immediately after restarting Django, and after requesting `/` 60 times
with 6 concurrent requests:

                      |  Non-lazy  |  Lazy app  | Difference
    ------------------+------------+------------+-------------
    Fresh             |  2,827,216 |  2,870,480 |   +43,264
    After 60 requests |  3,332,284 |  3,409,608 |   +77,324
    ..................|............|............|.............
    Difference        |   +505,068 |   +539,128 |   +34,060

That is, "lazy app" loading increased the footprint pre-requests by
43MB, and after 60 requests grew the memory footprint by 539MB, as
opposed to non-lazy loading, which grew it by 505MB.  Using wsgi "lazy
app" loading does increase the memory footprint, but not by a large
percentage.

The other effect is that processes may be served by either old or new
code during the restart window.  This may cause transient failures
when new frontend code talks to old backend code.

Enable chain-reloading during graceful, puppetless restarts, but only
if enabled via a zulip.conf configuration flag.

Fixes #2559.

[1]: https://uwsgi-docs.readthedocs.io/en/latest/articles/TheArtOfGracefulReloading.html#chain-reloading-lazy-apps
2022-01-05 14:48:52 -08:00
Alex Vandiver 4a95967a33 puppet: Gather uwsgi stats from chat.zulip.org. 2022-01-03 21:26:57 -08:00
Alex Vandiver 8a5be972d2 puppet: Add a uwsgi exporter for monitoring.
This allows investigation of how many workers are busy, and to track
"harikari" terminations.
2022-01-03 15:25:58 -08:00
Alex Vandiver d6c40d24d4 puppet: Manage current smokescreen binary so it is not tidied.
Fix another tidy error caused by 1e4e6a09af23; as also noted in
f9a39b6703, these resources are necessary such that tidy does not
cleanup of smokescreen, and then force a recompilation of it again.
2022-01-03 15:24:42 -08:00
Alex Vandiver f9a39b6703 puppet: Manage extracted resources again.
1e4e6a09af removed the resources for the unpacked directory, on the
argument that they were unnecessary.  However, the directory (or file,
see below) that is unpacked must be managed, or it will be tidied on
the next puppet apply.

Add back the resource for `$dir`, but mark it `ensure => present`, to
support tarballs which only unpack to a single file (e.g. wal-g).
2022-01-02 12:11:53 -08:00
Alex Vandiver 54b6a83412 puppet: Fix typo in cron job name. 2021-12-31 17:39:53 -08:00
Alex Vandiver 941800cf12 puppet: Upgrade external dependencies. 2021-12-31 11:14:40 -08:00
Alex Vandiver 6f693d10d9 puppet: Fix version of node_exporter.
This was a copy/paste but introduced in f166f9f7d6.
2021-12-30 23:33:34 +00:00
Anders Kaseorg 82748d45d8 install-yarn: Use test -ef in case /srv is a symlink.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-30 13:42:07 -08:00
Alex Vandiver c094867a74 puppet: Add aarch64 build hashes to external dependencies.
wal-g does not ship aarch64 binaries, currently; the compilation
process([1]) is somewhat complicated, so we defer the decision about
how to support wal-g for aarch64 until a later date.

[1]: https://github.com/wal-g/wal-g/blob/master/docs/PostgreSQL.md#installing
2021-12-29 16:35:15 -08:00
Alex Vandiver f166f9f7d6 puppet: Centralize versions and sha256 hashes of external dependencies.
This will make it easier to update versions of these dependencies.
2021-12-29 16:35:15 -08:00
Alex Vandiver 57662689a9 puppet: Provide a constant homedir for grafana user.
The homedir of a user cannot be changed if any processes are running
as them, so having it change over time as upgrades happen will break
puppet application, as the old grafana process under supervisor will
effectively lock changes to the user's homedir.

Unfortunately, that means that this change will thus fail to
puppet-apply unless `supervisorctl stop grafana` is run first, but
there's no way around that.
2021-12-29 16:35:15 -08:00
Alex Vandiver 6e55e52694 puppet: Pull out grafana $data_dir. 2021-12-29 16:35:15 -08:00
Alex Vandiver 51d3862c7e puppet: Move wal-g to external_dep, in /srv/zulip-wal-g-*. 2021-12-29 16:35:15 -08:00
Alex Vandiver 1e4e6a09af puppet: Stop making resources for external binaries and directories.
In the event that extracting doesn't produce the binary we expected it
to, all this will do is create an _empty_ file where we expect the
binary to be.  This will likely muddle debugging.

Since the only reason the resourfce was made in the first place was to
make dependencies clear, switch to depending on the External_Dep
itself, when such a dependency is needed.
2021-12-29 16:35:15 -08:00
Alex Vandiver 3c163a7d5e puppet: Move slash out of $dir by convention. 2021-12-29 16:35:15 -08:00
Alex Vandiver bb5a2c8138 puppet: Move prometheus to external_dep. 2021-12-29 16:35:15 -08:00
Alex Vandiver 2d6c096904 puppet: Move node_exporter to external_dep. 2021-12-29 16:35:15 -08:00
Alex Vandiver d2a78bac7e puppet: Adjust wal-g release version and SHA256.
wal-g apparently removed the 1.1.1 release; replace it with the
equivalent rc.
2021-12-29 16:35:15 -08:00
Alex Vandiver 7a9074ecfd puppet: Use shorter local variable for supervisor conf.d dir. 2021-12-28 09:24:01 -08:00
Alex Vandiver 670fad0cc4 puppet: Drop now-unnecessary supervisor file removals. 2021-12-28 09:24:01 -08:00
Alex Vandiver 20eab264cf puppet: Remove dependency on scripts.lib.zulip_tools.
ab130ceb35 added a dependency on scripts.lib.zulip_tools; however,
check_postgresql_replication_lag is run on hosts which do not have a
zulip tree installed.

Inline the simple functions that were imported.
2021-12-14 14:48:53 -08:00
Alex Vandiver 71b56f7c1c puppet: process_fts_updates connects as nagios (or provided username).
It should not use the configured zulip username, but should instead
pull from the login user (likely `nagios`), or an explicit alternate
provided PostgreSQL username.  Failure to do so results in Nagios
failures because the `nagios` login does not have permissions to
authenticated the `zulip` PostgreSQL user.

This requires CI changes, as the install tests install as the `zulip`
login username, which allowed Nagios tests to pass previously; with
the custom database and username, however, they must be passed to
process_fts_updates explicitly when validating the install.
2021-12-14 14:48:53 -08:00
Alex Vandiver 9d67e37166 puppet: Nagios connects as itself, in check_postgresql_replication_lag. 2021-12-14 14:48:53 -08:00
Alex Vandiver 850bc4cc81 puppet: Create directory for redis PID file.
The Redis configuration, and the systemd file for it, assumes there
will be a pid file written to `/var/run/redis/redis.pid`, but
`/var/run/redis` is not created during installation.

Create `/run/redis`; as `/var/run` is a symlink to `/run` on systemd
systems, this is equivalent to `/var/run/redis`.
2021-12-13 12:42:15 -08:00
Alex Vandiver a6c2079502 puppet: Create memcached PID file that systemd config file specifies.
The systemd config file installed by the `memcached` package assumes
there will be a PID written to `/run/memcached/memcached.pid`.  Since we
override `memcached.conf`, we have omitted the line that writes out the
PID to this file.

Systemd is smart enough to not _need_ the PID file to start up the
service correctly, but match the configuration.  We create the
directory since the package does not do so.  It is created as
`/run/memcached` and not `/var/run/memcached` because `/var/run` is a
symlink to `/run`.
2021-12-13 12:42:15 -08:00
Alex Vandiver e4b23daad7 puppet: Upgrade to Grafana 8.3.2, for CVE-2021-43813. 2021-12-10 14:00:11 -08:00
Alex Vandiver 01e8f752a8 puppet: Use certbot package timer, not our own cron job.
The certbot package installs its own systemd timer (and cron job,
which disabled itself if systemd is enabled) which updates
certificates.  This process races with the cron job which Zulip
installs -- the only difference being that Zulip respects the
`certbot.auto_renew` setting, and that it passes the deploy hook.
This means that occasionally nginx would not be reloaded, when the
systemd timer caught the expiration first.

Remove the custom cron job and `certbot-maybe-renew` script, and
reconfigure certbot to always reload nginx after deploying, using
certbot directory hooks.

Since `certbot.auto_renew` can't have an effect, remove the setting.
In turn, this removes the need for `--no-zulip-conf` to
`setup-certbot`.  `--deploy-hook` is similarly removed, as running
deploy hooks to restart nginx is now the default; pass
`--no-directory-hooks` in standalone mode to not attempt to reload
nginx.  The other property of `--deploy-hook`, of skipping symlinking
into place, is given its own flog.
2021-12-09 13:47:33 -08:00
Alex Vandiver 053682964e puppet: Only fetch from running hosts in Grafana ec2 discovery. 2021-12-09 08:12:03 -08:00
Alex Vandiver 291f688678 puppet: Use zulip::external_dep for grafana, template config.
Templating the config ensures that the service is restarted when it is
upgraded.
2021-12-08 20:58:10 -08:00
Alex Vandiver 3eae429ab4 puppet: Upgrade Grafana to 8.3.1, for CVE-2021-43798. 2021-12-08 20:58:10 -08:00
Alex Vandiver 7db146d0a9 puppet: Do not assume amd64 architecture. 2021-12-06 11:08:50 -08:00
Alex Vandiver fb2d05f9e3 puppet: Remove unused 'builder' files.
These are leftover detritus from the "builder" host, which was removed
in 4c9a283542.
2021-12-06 10:21:50 -08:00
Alex Vandiver cb2d0ff32b postgresql: Support replication on PostgreSQL >= 11, document.
PostgreSQL 11 and below used a configuration file names
`recovery.conf` to manage replicas and standbys; support for this was
removed in PostgreSQL 12[1], and the configuration parameters were
moved into the main `postgresql.conf`.

Add `zulip.conf` settings for the primary server hostname and
replication username, so that the complete `postgresql.conf`
configuration on PostgreSQL 14 can continue to be managed, even when
replication is enabled.  For consistency, also begin writing out the
`recovery.conf` for PostgreSQL 11 and below.

In PostgreSQL 12 configuration and later, the `wal_level =
hot_standby` setting is removed, as `hot_standby` is equivalent to
`replica`, which is the default value[2].  Similarly, the
`hot_standby = on` setting is also the default[3].

Documentation is added for these features, and the commentary on the
"Export and Import" page referencing files under `puppet/zulip_ops/`
is removed, as those files no longer have any replication-specific
configuration.

[1]: https://www.postgresql.org/docs/current/recovery-config.html
[2]: https://www.postgresql.org/docs/12/runtime-config-wal.html#GUC-WAL-LEVEL
[3]: https://www.postgresql.org/docs/12/runtime-config-replication.html#GUC-HOT-STANDBY
2021-12-03 16:32:41 -08:00
Alex Vandiver 7d3399a970 puppet: Drop configuration files for unsupported PostgreSQL versions.
These are both unsupported by PostgreSQL itself, as well as by Zulip;
the removal of Ubuntu Xenial and Debian Stretch support in Zulip 3.0
removed the requirement for PostgreSQL 9.6, and the previous versions
date back yet farther.
2021-12-03 16:32:41 -08:00
Alex Vandiver 6436c4087d puppet: Tidy old wal-g binaries. 2021-12-03 16:17:50 -08:00
Alex Vandiver 53cc9538f7 puppet: Factor out wal-g binary path. 2021-12-03 16:17:50 -08:00
Alex Vandiver 338483792b puppet: Upgrade wal-g release to 1.1.1. 2021-12-03 16:17:50 -08:00
Anders Kaseorg 325b4bac7e env-wal-g: Quote $s3_backups_bucket.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-03 14:33:53 -08:00
Alex Vandiver f31bf3f06c puppet: Install camo on Docker.
Now that go-camo runs within supervisor, it can be run in Docker
simply.

Fixes #20101.
Fixes zulip/docker-zulip#179.
2021-12-02 09:25:00 -08:00
Alex Vandiver 358a7fb0c6 puppet: Read camo secret at startup time, not at puppet-apply time.
Writing the secret to the supervisor configuration file makes changes
to the secret requires a zulip-puppet-apply to take hold.  The Docker
image is constructed to avoid having to run zulip-puppet-apply on
startup, and indeed cannot run zulip-puppet-apply after having
configured secrets, as it has replaced the zulip.conf file with a
symlink, for example.  This means that camo gets the static secret
that was built into the image, and not the one regenerated on first
startup.

Read the camo secret at process startup time.  Because this pattern is
likely common with "12-factor" applications which can read from
environment variables, write a generic tool to map secrets to
environment variables before exec'ing a binary, and use that for Camo.
2021-12-02 09:25:00 -08:00
Alex Vandiver 86cf3be39f puppet: Fix pgroonga init for custom database names and users. 2021-11-20 07:13:50 -08:00
Alex Vandiver c514feaa22 puppet: Default go-camo to listening on localhost for standalone deploys.
The default in the previous commit, inherited from camo, was to bind
to 0.0.0.0:9292.  In standalone deployments, camo is deployed on the
same host as the nginx reverse proxy, and as such there is no need to
open it up to other IPs.

Make `zulip::camo` take an optional parameter, which allows overriding
it in puppet, but skips a `zulip.conf` setting for it, since it is
unlikely to be adjust by most users.
2021-11-19 15:58:26 -08:00
Alex Vandiver b982222e03 camo: Replace with go-camo implementation.
The upstream of the `camo` repository[1] has been unmaintained for
several years, and is now archived by the owner.  Additionally, it has
a number of limitations:
 - It is installed as a sysinit service, which does not run under
   Docker
 - It does not prevent access to internal IPs, like 127.0.0.1
 - It does not respect standard `HTTP_proxy` environment variables,
   making it unable to use Smokescreen to prevent the prior flaw
 - It occasionally just crashes, and thus must have a cron job to
   restart it.

Swap camo out for the drop-in replacement go-camo[2], which has the
same external API, requiring not changes to Django code, but is more
maintained.  Additionally, it resolves all of the above complaints.

go-camo is not configured to use Smokescreen as a proxy, because its
own private-IP filtering prevents using a proxy which lies within that
IP space.  It is also unclear if the addition of Smokescreen would
provide any additional protection over the existing IP address
restrictions in go-camo.

go-camo has a subset of the security headers that our nginx reverse
proxy sets, and which camo set; provide the missing headers with `-H`
to ensure that go-camo, if exposed from behind some other non-nginx
load-balancer, still provides the necessary security headers.

Fixes #18351 by moving to supervisor.
Fixes zulip/docker-zulip#298 also by moving to supervisor.

[1] https://github.com/atmos/camo
[2] https://github.com/cactus/go-camo
2021-11-19 15:58:26 -08:00
Alex Vandiver c33562f0a8 puppet: Default to installing smokescreen on application frontends.
This is an additional security hardening step, to make Zulip default
to preventing SSRF attacks.  The overhead of running Smokescreen is
minimal, and there is no reason to force deployments to take
additional steps in order to secure themselves against SSRF attacks.

Deployments which already have a different external proxy configured
will not gain a local Smokescreen installation, and running without
Smokescreen is supported by explicitly unsetting the `host` or `port`
values in `/etc/zulip/zulip.conf`.
2021-11-19 15:29:28 -08:00
Alex Vandiver 44f1ea6bae puppet: Split smokescreen into a non-profile version.
In a subsequent commit, we intend to include it from
`zulip::app_frontend_base`, which is a layering violation if it only
exists in the form of a profile.
2021-11-19 15:29:28 -08:00
Alex Vandiver c2ed3c22b5 puppet: Remove unused smokescreen symlink. 2021-11-19 15:29:28 -08:00
Alex Vandiver 47e16a5d41 puppet: Tidy old smokescreen binaries. 2021-11-19 15:29:28 -08:00
Alex Vandiver 239ac8413e puppet: Embed golang version into binary path, to rebuild on new golang.
This will cause the output binary path to be sensitive to golang
version, causing it to be rebuilt on new golang, and an updated
supervisor config file written out, and thus supervisor also
restarted.
2021-11-19 15:29:28 -08:00
Alex Vandiver 216eeba2dd puppet: Factor out smokescreen binary path. 2021-11-19 15:29:28 -08:00
Alex Vandiver 3a7cef6582 puppet: Switch smokescreen to using zulip::external_dep, so it tidies. 2021-11-19 15:29:28 -08:00
Alex Vandiver ea08111d60 puppet: Move /srv/smokescreen-src to /srv/zulip-smokescreen-src.
As with the previous commit for `/srv/golang`, we have the custom of
namespacing things under `/srv` with `zulip-` to help ensure that we
play nice with anything else that happens to be on the host.
2021-11-19 15:29:28 -08:00
Anders Kaseorg c64e1adb19 puppet: Upgrade Smokescreen v0.0.2-59-gbfca45c to v0.0.2-63-gdc40301.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-11-19 15:29:28 -08:00
Alex Vandiver bb9d2df1ae puppet: Extract an external-tarball-dependency manifest. 2021-11-19 15:29:28 -08:00
Alex Vandiver 3c8d7e2598 puppet: Tidy old golang directories.
This relies on behavior which is only in Puppet 5.5.1 and above, which
means it must be skipped on Ubuntu 18.04.
2021-11-19 15:29:28 -08:00
Alex Vandiver 2fc4acdf81 puppet: Move /srv/golang to /srv/zulip-golang.
We have the custom of namespacing things under `/srv` with `zulip-`
to help ensure that we play nice with anything else that happens
to be on the host.
2021-11-19 15:29:28 -08:00
Alex Vandiver 00a4abb642 puppet: Switch dependency to the golang binary we need. 2021-11-19 15:29:28 -08:00
Alex Vandiver 2d5f813094 puppet: Stop making a /srv/golang symlink.
Nothing needs this extra directory.
2021-11-19 15:29:28 -08:00
Alex Vandiver 93af6c7f06 puppet: Factor out golang variables. 2021-11-19 15:29:28 -08:00
Alex Vandiver 21be36f15f puppet: Shorten golang version variable name. 2021-11-19 15:29:28 -08:00
Alex Vandiver 6b9e74adee puppet: Upgrade golang from 1.16.4 to 1.17.3. 2021-11-19 15:29:28 -08:00
Alex Vandiver 514801c509 puppet: Split out golang toolchain into its own manifest. 2021-11-19 15:29:28 -08:00
Alex Vandiver 610a0b2d59 nagios: `pg_is_in_recovery()` is better to know replica/primary status.
It is possible to be in recovery, and downloading WAL logs from
archives, and not yet be replicating.  If one only checks the
streaming log status, it reports as "no replicas" which is technically
accurate but not a useful summation of the state of the replica.
2021-11-17 13:38:26 -08:00
Alex Vandiver 83091cbc96 puppet: Swap the one use of the `cron` resource for an /etc/cron.d file.
The `cron` resource places its contents in the user's crontab, which
makes it unlike every other cron job that Zulip installs.

Switch to using `/etc/cron.d` files, like all other cron jobs.
2021-11-16 16:17:32 -08:00
Alex Vandiver 90e1a0400e puppet: Add a few more inter-resource dependencies.
None of these are important; they just express semantic dependencies.
2021-11-16 16:17:32 -08:00
Alex Vandiver 49ad188449 rate_limit: Add a flag to lump all TOR exit node IPs together.
TOR users are legitimate users of the system; however, that system can
also be used for abuse -- specifically, by evading IP-based
rate-limiting.

For the purposes of IP-based rate-limiting, add a
RATE_LIMIT_TOR_TOGETHER flag, defaulting to false, which lumps all
requests from TOR exit nodes into the same bucket.  This may allow a
TOR user to deny other TOR users access to the find-my-account and
new-realm endpoints, but this is a low cost for cutting off a
significant potential abuse vector.

If enabled, the list of TOR exit nodes is fetched from their public
endpoint once per hour, via a cron job, and cached on disk.  Django
processes load this data from disk, and cache it in memcached.
Requests are spared from the burden of checking disk on failure via a
circuitbreaker, which trips of there are two failures in a row, and
only begins trying again after 10 minutes.
2021-11-16 11:42:00 -08:00
Alex Vandiver 01c007ceaf puppet: Remove an out-of-date comment.
Comment was missed in 9d57fa9759.
2021-11-09 21:52:17 -08:00
Alex Vandiver 7af2fa2e92 puppet: Use sysv status command, not supervisorctl status.
Since Supervisor 4, which is installed on Ubuntu 20.04 and Debian 11,
`supervisorctl status` returns exit code 3 if any of the
supervisor-controlled processes are not running.

Using `supervisorctl status` as the Puppet `status` command for
Supervisor leads to unnecessarily trying to "start" a Supervisor
process which is already started, but happens to have one or more of
its managed processes stopped.  This is an unnecessary no-op in
production environments, but in docker-init enviroments, such as in
CI, attempting to start the process a second time is an error.

Switch to checking if supervisor is running by way of sysv init.  This
fixes the potential error in CI, as well as eliminates unnecessary
"starts" of supervisor when it was already running -- a situation
which made zulip-puppet-apply not idempotent:

```
root@alexmv-prod:~# supervisorctl status
process-fts-updates                                             STOPPED   Nov 10 12:33 AM
smokescreen                                                     RUNNING   pid 1287280, uptime 0:35:32
zulip-django                                                    STOPPED   Nov 10 12:33 AM
zulip-tornado                                                   STOPPED   Nov 10 12:33 AM
[...]

root@alexmv-prod:~# ~zulip/deployments/current/scripts/zulip-puppet-apply --force
Notice: Compiled catalog for alexmv-prod.zulipdev.org in environment production in 2.32 seconds
Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running'
Notice: Applied catalog in 0.91 seconds

root@alexmv-prod:~# ~zulip/deployments/current/scripts/zulip-puppet-apply --force
Notice: Compiled catalog for alexmv-prod.zulipdev.org in environment production in 2.35 seconds
Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running'
Notice: Applied catalog in 0.92 seconds
```
2021-11-09 21:52:17 -08:00
Alex Vandiver 8a1bb43b23 puppet: Adjust for templated paths and settings, set C.UTF-8 locale. 2021-11-08 18:21:46 -08:00
Alex Vandiver d3e9a71d42 puppet: Check in upstream PostgreSQL 14 configuration file.
Note that one `<%u%%d>` has to be escaped as `<%%u%%d>`.
2021-11-08 18:21:46 -08:00
Adam Benesh c881430f4c puppet: Add WSGIApplicationGroup config to Apache SSO example.
Zulip apparently is now affected by a bad interaction between Apache's
WSGI using Python subinterpreters and C extension modules like `re2`
that are not designed for it.

The solution is apparently to set WSGIApplicationGroup to %{GLOBAL},
which disables Apache's use of Python subinterpreters.

See https://serverfault.com/questions/514242/non-responsive-apache-mod-wsgi-after-installing-scipy/514251#514251 for background.

Fixes #19924.
2021-10-08 15:07:23 -07:00
Tim Abbott 33b5fa633a process_fts_updates: Fix docker-zulip support.
In the series of migrations to this tool's configuration to support
specifying an arbitrary database name
(e.g. c17f502bb0), we broke support for
running process_fts_updates on the application server, connected to a
remote database server. That workflow is used by docker-zulip and
presumably other settings like Amazon RDS.

The fix is to import the Zulip virtualenv (if available) when running
on an application server.  This is better than just supporting this
case, since both docker-zulip and an Amazon RDS database are setting
where it would be inconvenient to run process-fts-updates directly on
the database server. (In the former case, because we want to avoid
having a strong version dependency on the postgres container).

Details are available in this conversation:
https://chat.zulip.org/#narrow/stream/49-development-help/topic/Logic.20in.20process_fts_updates.20seems.20to.20be.20broken/near/1251894

Thanks to Erik Tews for reporting and help in debugging this issue.
2021-09-27 18:17:33 -05:00
Alex Vandiver 1806e0f45e puppet: Remove zulip.org configuration. 2021-08-26 17:21:31 -07:00
Alex Vandiver 27881babab puppet: Increase prometheus storage, from the default 15d. 2021-08-24 23:40:43 -07:00
Alex Vandiver faf71eea41 upgrade-postgresql: Do not remove other supervisor configs.
We previously used `zulip-puppet-apply` with a custom config file,
with an updated PostgreSQL version but more limited set of
`puppet_classes`, to pre-create the basic settings for the new cluster
before running `pg_upgradecluster`.

Unfortunately, the supervisor config uses `purge => true` to remove
all SUPERVISOR configuration files that are not included in the puppet
configuration; this leads to it removing all other supervisor
processes during the upgrade, only to add them back and start them
during the second `zulip-puppet-apply`.

It also leads to `process-fts-updates` not being started after the
upgrade completes; this is the one supervisor config file which was
not removed and re-added, and thus the one that is not re-started due
to having been re-added.  This was not detected in CI because CI added
a `start-server` command which was not in the upgrade documentation.

Set a custom facter fact that prevents the `purge` behaviour of the
supervisor configuration.  We want to preserve that behaviour in
general, and using `zulip-puppet-apply` continues to be the best way
to pre-set-up the PostgreSQL configuration -- but we wish to avoid
that behaviour when we know we are applying a subset of the puppet
classes.

Since supervisor configs are no longer removed and re-added, this
requires an explicit start-server step in the instructions after the
upgrades complete.  This brings the documentation into alignment with
what CI is testing.
2021-08-24 19:00:58 -07:00
Alex Vandiver e46e862f2b puppet: Add a bare-bones zulipbot profile.
This sets up the firewalls appropriate for zulipbot, but does not
automate any of the configuration of zulipbot itself.
2021-08-24 16:05:58 -07:00
Alex Vandiver 5857dcd9b4 puppet: Configure ip6tables in parallel to ipv4.
Previously, IPv6 firewalls were left at the default all-open.

Configure IPv6 equivalently to IPv4.
2021-08-24 16:05:46 -07:00
Alex Vandiver 845509a9ec puppet: Be explicit that existing iptables are only ipv4. 2021-08-24 16:05:46 -07:00
Anders Kaseorg 09564e95ac mypy: Add types-psycopg2.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-09 20:32:19 -07:00
Alex Vandiver 4dd289cb9d puppet: Enable prometheus monitoring of supervisord.
To be able to read the UNIX socket, this requires running
node_exporter as zulip, not as prometheus.
2021-08-03 21:47:02 -07:00
Alex Vandiver aa940bce72 puppet: Disable hwmon collector, which does nothing on cloud hosts. 2021-08-03 21:47:02 -07:00
Alex Vandiver 23a355df0f puppet: Move backup time earlier, from 10am to 7pm America/Los_Angeles.
This is less likely to overlap with common evening deploy times.
2021-08-03 18:32:45 -05:00
Alex Vandiver e94b6afb00 nagios: Remove broken check_email_deliverer_* checks and related code.
These checks suffer from a couple notable problems:
 - They are only enabled on staging hosts -- where they should never
   be run.  Since ef6d0ec5ca, these supervisor processes are only
   run on one host, and never on the staging host.
 - They run as the `nagios` user, which does not have appropriate
   permissions, and thus the checks always fail.  Specifically,
   `nagios` does not have permissions to run `supervisorctl`, since
   the socket is owned by the `zulip` user, and mode 0700; and the
   `nagios` user does not have permission to access Zulip secrets to
   run `./manage.py print_email_delivery_backlog`.

Rather than rewrite these checks to run on a cron as zulip, and check
those file contents as the nagios user, drop these checks -- they can
be rewritten at a later point, or replaced with Prometheus alerting,
and currently serve only to cause always-failing Nagios checks, which
normalizes alert failures.

Leave the files installed if they currently exist, rather than
cluttering puppet with `ensure => absent`; they do no harm if they are
left installed.
2021-08-03 16:07:13 -07:00
Mateusz Mandera 57f14b247e bots: Specify realm for nagios bots messages in check_send_receive_time. 2021-07-26 15:33:13 -07:00
Alex Vandiver befe204be4 puppet: Run the supervisor-restart step only after it is started.
In an initial install, the following is a potential rule ordering:
```
Notice: /Stage[main]/Zulip::Supervisor/File[/etc/supervisor/conf.d/zulip]/ensure: created
Notice: /Stage[main]/Zulip::Supervisor/File[/etc/supervisor/supervisord.conf]/content: content changed '{md5}99dc7e8a1178ede9ae9794aaecbca436' to '{md5}7ef9771d2c476c246a3ebd95fab784cb'
Notice: /Stage[main]/Zulip::Supervisor/Exec[supervisor-restart]: Triggered 'refresh' from 1 event
[...]
Notice: /Stage[main]/Zulip::App_frontend_base/File[/etc/supervisor/conf.d/zulip/zulip.conf]/ensure: defined content as '{md5}d98ac8a974d44efb1d1bb2ef8b9c3dee'
[...]
Notice: /Stage[main]/Zulip::App_frontend_once/File[/etc/supervisor/conf.d/zulip/zulip-once.conf]/ensure: defined content as '{md5}53f56ae4b95413bfd7a117e3113082dc'
[...]
Notice: /Stage[main]/Zulip::Process_fts_updates/File[/etc/supervisor/conf.d/zulip/zulip_db.conf]/ensure: defined content as '{md5}96092d7f27d76f48178a53b51f80b0f0'
Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running'
```

The last line is misleading -- supervisor was already started by the
`supervisor-restart` process on the third line.  As can be shown with
`zulip-puppet-apply --debug`, the last line just installs supervisor
to run on startup, using `systemctl`:
```
Debug: Executing: 'supervisorctl status'
Debug: Executing: '/usr/bin/systemctl unmask supervisor'
Debug: Executing: '/usr/bin/systemctl start supervisor'
```

This means the list of processes started by supervisor depends
entirely on which configuration files were successfully written out by
puppet before the initial `supervisor-restart` ran.  Since
`zulip_db.conf` is written later than the rest, the initial install
often fails to start the `process-fts-updates` process.  In this
state, an explicit `supervisorctl restart` or `supervisorctl reread &&
supervisorctl update` is required for the service to be found and
started.

Reorder the `supervisor-restart` exec to only run after the service is
started.  Because all supervisor configuration files have a `notify`
of the service, this forces the ordering of:

```
(package) -> (config files) -> (service) -> (optional restart)
```

On first startup, this will start and them immediately restart
supervisor, which is unfortunate but unavoidable -- and not terribly
relevant, since the database will not have been created yet, and thus
most processes will be in a restart loop for failing to connect to it.
2021-07-22 14:09:01 -07:00
Alex Vandiver ee7c849f8a puppet: Work around sysvinit supervisor init bug.
The sysvinit script for supervisor has a long-standing bug where
`/etc/init.d/supervisor restart` stops but does not then start the
supervisor process.

Work around this by making restart then try to start, and return if it
is currently running.
2021-07-22 14:09:01 -07:00
Alex Vandiver 7e65421b1f puppet: Ensure psycopg2 is installed before running process_fts_updates.
Not having the package installed will cause startup failures in
`process_fts_updates`; ensure that we've installed the package before
we potentially start the service.
2021-07-14 17:24:52 -07:00
Alex Vandiver 528e5adaab smokescreen: Default to only listening on 127.0.0.1.
This prevents Smokescreen from acting as an open proxy.

Fixes #19214.
2021-07-14 15:40:26 -07:00
Alex Vandiver e6bae4f1dd puppet: Remove zulip::nagios class.
93f62b999e removed the last file in
puppet/zulip/files/nagios_plugins/zulip_nagios_server, which means the
singular rule in zulip::nagios no longer applies cleanly.

Remove the `zulip::nagios` class, as it is no longer needed.
2021-07-09 17:29:41 -07:00
Anders Kaseorg 93f62b999e nagios: Replace check_website_response with standard check_http plugin.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-07-09 16:47:03 -07:00
Vishnu KS e0f5fadb79 billing: Downgrade small realms that are behind on payments.
An organization with at most 5 users that is behind on payments isn't
worth spending time on investigating the situation.

For larger organizations, we likely want somewhat different logic that
at least does not void invoices.
2021-07-02 13:19:12 -07:00
Anders Kaseorg 91bfebca7d install: Replace wget with curl.
curl uses Happy Eyeballs to avoid long timeouts on systems with broken
IPv6.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-06-25 09:05:07 -07:00
Anders Kaseorg 3b60b25446 ci: Remove bullseye hack.
base-files 11.1 marked bullseye as Debian 11 in /etc/os-release.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-06-24 14:35:51 -07:00
Alex Vandiver d51272cc3d puppet: Remove zulip_deliver_scheduled_* from zulip-workers:*.
Staging and other hosts that are `zulip::app_frontend_base` but not
`zulip::app_frontend_once` do not have a
/etc/supervisor/conf.d/zulip/zulip-once.conf and as such do not have
`zulip_deliver_scheduled_emails` or `zulip_deliver_scheduled_messages`
and thus supervisor will fail to reload.

Making the contents of `zulip-workers` contingent on if the server is
_also_ a `-once` server is complicated, and would involve using Concat
fragments, which severely limit readability.

Instead, expel those two from `zulip-workers`; this is somewhat
reasonable, since they are use an entirely different codepath from
zulip_events_*, using the database rather than RabbitMQ for their
queuing.
2021-06-14 17:12:59 -07:00
Alex Vandiver 6c72698df2 puppet: Move zulip_ops supervisor config into /etc/supervisor/conf.d/zulip/.
This is similar cleanup to 3ab9b31d2f, but only affects zulip_ops
services; it serves to ensure that any of these services which are no
longer enabled are automatically removed from supervisor.

Note that this will cause a supervisor restart on all affected hosts,
which will restart all supervisor services.
2021-06-14 17:12:59 -07:00
Alex Vandiver df09607202 puppet: Switch to $zulip::common::supervisor_conf_dir variable. 2021-06-14 17:12:59 -07:00
Alex Vandiver 391f78a9c1 puppet: Move supervisor-not-in-/etc/supervisor/conf.d/ to common place. 2021-06-14 17:12:59 -07:00
Alex Vandiver dd90083ed7 puppet: Provide FQDN of self as URI, so the certificate validates.
Failure to do this results in:
```
psql: error: failed to connect to `host=localhost user=zulip database=zulip`: failed to write startup message (x509: certificate is valid for [redacted], not localhost)
```
2021-06-14 00:14:48 -07:00
Alex Vandiver c90ff80084 puppet: Bump grafana version to 8.0.1.
Most notably, this fixes an annoying bug with CloudWatch metrics being
repeated in graphs.
2021-06-10 15:49:08 -07:00
Alex Vandiver d905eb6131 puppet: Add a database teleport server.
Host-based md5 auth for 127.0.0.1 must be removed from `pg_hba.conf`,
otherwise password authentication is preferred over certificate-based
authentication for localhost.
2021-06-08 22:21:21 -07:00
Alex Vandiver 100a899d5d puppet: Add grafana server. 2021-06-08 22:21:00 -07:00
Alex Vandiver 459f37f041 puppet: Add prometheus server. 2021-06-08 22:21:00 -07:00
Alex Vandiver 19fb58e845 puppet: Add prometheus node exporter. 2021-06-08 22:21:00 -07:00
Alex Vandiver a2b1009ed5 puppet: Turn on "authentication" which defaults to user with all rights.
Nagios refuses to allow any modifications with use_authentication off;
re-enabled "authentication" but set a default user, which (by way of
the `*` permissions in 359f37389a) is allowed to take all actions.
2021-06-08 15:19:28 -07:00
Alex Vandiver 61b6fc865c puppet: Add a label to teleport applications, to allow RBAC.
Roles can only grant or deny access based on labels; set one based on
the application name.
2021-06-08 15:19:04 -07:00
Alex Vandiver 4aff5b1d22 puppet: Allow access to `/` in nagios.
This was a regression in 51b985b40d.
2021-06-07 22:40:58 -07:00
Alex Vandiver 54768c2210 puppet: Remove now-unused basic auth support files.
51b985b40d made these unnecessary.
2021-06-07 16:17:45 -07:00
Alex Vandiver 359f37389a puppet: Remove in-nagios auth restrictions.
51b985b40d made nagios only accessible from localhost, or as proxied
via teleport.  Remove the HTTP-level auth requirements.
2021-06-07 16:17:45 -07:00
Alex Vandiver 2352fac6b5 puppet: Fix indentation. 2021-06-02 18:38:38 -07:00
Alex Vandiver 51b985b40d puppet: Move nagios to behind teleport.
This makes the server only accessible via localhost, by way of the
Teleport application service.
2021-06-02 18:38:38 -07:00
Alex Vandiver 4f51d32676 puppet: Add a teleport application server.
This requires switching to a reverse tunnel for the auth connection,
with the side effect that the `zulip_ops::teleport::node` manifest can
be applied on servers anywhere in the Internet; they do not need to
have any publicly-available open ports.
2021-06-02 18:38:38 -07:00
Alex Vandiver c59421682f puppet: Add a teleport node on every host.
Teleport nodes[1] are the equivalent to SSH servers.  In addition to
this config, joining the teleport cluster will require presenting a
one-time "join token" from the proxy server[2], which may either be
short-lived or static.

[1] https://goteleport.com/docs/architecture/nodes/
[2] https://goteleport.com/docs/admin-guide/#adding-nodes-to-the-cluster
2021-06-02 18:38:38 -07:00
Alex Vandiver 1cdf14d195 puppet: Add a teleport server.
See https://goteleport.com/docs/architecture/overview/ for the general
architecture of a Teleport cluster.  This commit adds a Teleport auth[1]
and proxy[2] server.  The auth server serves as a CA for granting
time-bounded access to users and authenticating nodes on the cluster;
the proxy provides access and a management UI.

[1] https://goteleport.com/docs/architecture/authentication/
[2] https://goteleport.com/docs/architecture/proxy/
2021-06-02 18:38:38 -07:00
Alex Vandiver 3ebd627c50 puppet: Fix "import" -> "include" in chat_zulip_org. 2021-06-02 11:02:34 -07:00
Alex Vandiver 2130fc0645 puppet: Add an explicit class for czo. 2021-06-01 22:18:50 -07:00
Alex Vandiver c9141785fd puppet: Use concat fragments to place port allows next to services.
This means that services will only open their ports if they are
actually run, without having to clutter rules.v4 with a log of `if`
statements.

This does not go as far as using `puppetlabs/firewall`[1] because that
would represent an additional DSL to learn; raw IPtables sections can
easily be inserted into the generated iptables file via
`concat::fragment` (either inline, or as a separate file), but config
can be centralized next to the appropriate service.

[1] https://forge.puppet.com/modules/puppetlabs/firewall
2021-05-27 21:14:48 -07:00
Alex Vandiver 4f79b53825 puppet: Factor out firewall config. 2021-05-27 21:14:48 -07:00
Alex Vandiver 87a109e3e0 puppet: Pull in pinned puppet modules.
Using puppet modules from the puppet forge judiciously will allow us
to simplify the configuration somewhat; this specifically pulls in the
stdlib module, which we were already using parts of.
2021-05-27 21:14:48 -07:00
Alex Vandiver f3eea72c2a setup: Merge multiple setup-apt-repo scripts into one.
This moves the `.asc` files into subdirectories, and writes out the
according `.list` files into them.  It moves from templates to
written-out `.list` files for clarity and ease of
implementation (Debian and Ubuntu need different templates for
`zulip`), and as a way of making explicit which releases are supported
for each list.  For the special-case of the PGroonga signing key, we
source an additional file within the directory.

This simplifies the process for adding another class of `.list` file.
2021-05-26 14:42:29 -07:00
Alex Vandiver 4f017614c5 nagios: Replace check_fts_update_log with a process_fts_updates flag.
This avoids having to duplicate the connection logic from
process_fts_updates.

Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>
2021-05-25 13:56:05 -07:00
Alex Vandiver ab130ceb35 nagios: Support arbitrary database user and dbname in replication check.
Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>
2021-05-25 13:56:05 -07:00
Alex Vandiver c17f502bb0 process_fts_updates: Support arbitrary database user and dbname.
Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>
2021-05-25 13:56:05 -07:00
Alex Vandiver 02fc0d3e1d db: Drop None and empty-string checking in arguments.
psycopg2 treats None and "" the same as not-provided:

```
assert connect(user="zulip", dbname="zulip")
assert connect(user="zulip", dbname="zulip", host="")
assert connect(user="zulip", dbname="zulip", host=None)
with Raises("no password supplied"):
    connect(user="zulip", dbname="zulip", host="localhost")

assert connect(user="zulip", dbname="zulip", port="")
assert connect(user="zulip", dbname="zulip", port=None)
assert connect(user="zulip", dbname="zulip", port=5432)
with Raises("could not connect to server"):
    connect(user="zulip", dbname="zulip", port=5000)

assert connect(dbname="zulip", host="localhost", password="right-password")
with Raises("no password supplied"):
    connect(dbname="zulip", host="localhost", password="")
with Raises("no password supplied"):
    connect(dbname="zulip", host="localhost", password=None)
with Raises("password authentication failed"):
    connect(dbname="zulip", host="localhost", password="wrong")
```

Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>
2021-05-25 13:46:58 -07:00
Alex Vandiver 9c652eb16b db: Use the pre-computed values from settings.
Rather than duplicate logic from `computed_settings`, use the values
that were computed therein.

Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>
2021-05-25 13:46:58 -07:00
Alex Vandiver 94d7c29d92 db: Use the same codepath for cases (2) and (3).
Using the second branch _only_ for case (3), of a PostgreSQL server on
a different host, leaves it untested in CI.  It also brings in an
unnecessary Django dependency.

Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>
2021-05-25 13:46:58 -07:00
Alex Vandiver add6971ad9 db: Make USING_PGROONGA logic clearer.
We only need to read the `zulip.conf` file to determine if we're using
PGROONGA if we are on the PostgreSQL machine, with no access to
Django.

Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>
2021-05-25 13:46:58 -07:00
Alex Vandiver 75bf19c9d9 db: Combine the `if "host" in pg_args` stanza with earlier clause.
The only way in which "host" could be set is in cases (1) or (2), when
it was potentially read from Django's settings.  In case (3), we
already know we are on the same host as the PostgreSQL server.

This unifies the two separated checks, which are actually the same
check.

Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>
2021-05-25 13:46:58 -07:00
Alex Vandiver 67fc8e84ea db: Clarify the 3 different cases that process_fts_updates must support.
Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>
2021-05-25 13:46:58 -07:00
Alex Vandiver 116e41f1da puppet: Move files out and back when mounting /srv.
Specifically, this affects /srv/zulip-aws-tools.
2021-05-23 13:29:23 -07:00
Alex Vandiver ea98549e88 puppet: Always install linux-image-virtual, for ksplice support. 2021-05-23 13:29:23 -07:00
Alex Vandiver 0b1dd27841 puppet: AWS mounts its extra disks with inconsistent names.
It is now /dev/nvme1n1, not /dev/nvme0n1; but it always has a
consistent major/minor node.  Source the file that defines these.
2021-05-23 13:29:23 -07:00
Alex Vandiver 82797dd53c settings: Standardize the name of the deliver_scheduled_messages logs.
This makes it match its command name, and other logfile name.
2021-05-18 12:39:28 -07:00
Alex Vandiver 343a1396af puppet: Rename logfile for deliver_scheduled_messages to be consistent. 2021-05-18 12:39:28 -07:00
Alex Vandiver ef6d0ec5ca puppet: Only run deliver_scheduled_messages and _emails on one server.
`deliver_scheduled_emails` and `deliver_scheduled_messages` use the
`ScheduledEmail` and `ScheduledMessage` tables as a queue,
effectively, pulling values off of them.  As noted in their comments,
this is not safe to run on multiple hosts at once.  As such, split out
the supervisor files for them.
2021-05-18 12:39:28 -07:00
Alex Vandiver 033a96aa5d puppet: Fix check_ssl_certificate check to check named host, not self. 2021-05-17 18:38:30 -07:00
Alex Vandiver a2b7a5ef4b puppet: Clarify 20m keepalive time from the LB is a max; it can be less. 2021-05-17 14:56:51 -07:00
Alex Vandiver 66a232e303 smokescreen: Bump version of Go and Smokescreen.
Move version pins to the latest versions of Go and Smokescreen.
2021-05-12 10:08:42 -10:00
Alex Vandiver feb7870db7 puppet: Adjust thresholds on autovac_freeze.
These thresholds are in relationship to the
`autovacuum_freeze_max_age`, *not* the XID wraparound, which happens
at 2^31-1.  As such, it is *perfectly normal* that they hit 100%, and
then autovacuum kicks in and brings it back down.  The unusual
condition is that PostgreSQL pushes past the point where an autovacuum
would be triggered -- therein lies the XID wraparound danger.

With the `autovacuum_freeze_max_age` set to 2000000000 in
`postgresql.conf`, XID wraparound happens at 107.3%.  Set the warning
and error thresholds to below this, but above 100% so this does not
trigger constantly.
2021-05-11 17:11:47 -07:00
Alex Vandiver 0f1611286d management: Rename the deliver_email command to deliver_scheduled_email.
This makes it parallel with deliver_scheduled_messages, and clarifies
that it is not used for simply sending outgoing emails (e.g. the
`email_senders` queue).

This also renames the supervisor job to match.
2021-05-11 13:07:29 -07:00
Anders Kaseorg 544bbd5398 docs: Fix capitalization mistakes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-10 09:57:26 -07:00
Tim Abbott ad0be6cea1 puppet: Remove thumbor.conf nginx configuration.
This was missing in 405bc8dabf.
2021-05-07 16:57:29 -07:00
Anders Kaseorg 9d57fa9759 puppet: Use pgrep -x to avoid accidental matches.
Matching the full process name (-x without -f) or full command
line (-xf) is less prone to mistakes like matching a random substring
of some other command line or pgrep matching itself.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-07 08:54:41 -07:00
Anders Kaseorg 405bc8dabf requirements: Remove Thumbor.
Thumbor and tc-aws have been dragging their feet on Python 3 support
for years, and even the alphas and unofficial forks we’ve been running
don’t seem to be maintained anymore.  Depending on these projects is
no longer viable for us.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-06 20:07:32 -07:00
Alex Vandiver eda9ce2364 locale: Use `C.UTF-8` rather than `en_US.UTF-8`.
The `en_US.UTF-8` locale may not be configured or generated on all
installs; it also requires that the `locales` package be installed.
If users generate the `en_US.UTF-8` locale without adding it to the
permanent set of system locales, the generated `en_US.UTF-8` stops
working when the `locales` package is updated.

Switch to using `C.UTF-8` in all cases, which is guaranteed to be
installed.

Fixes #15819.
2021-05-04 08:51:46 -07:00
Alex Vandiver ddb9d16132 puppet: Install procps, for pgrep.
In puppet, we use pgrep in the collection stage, to see if rabbitmq is
running.  Sufficiently bare-bones systems will not have
`procps` (which provides `pgrep`) installed yet, which makes the
install abort when running `puppet` for the first time.

Just installing the `procps` package in Puppet is insufficient,
because the check in the `unless` block runs when Puppet is
determining which resources it needs to instantiate, and in what
order; any package installation has yet to happen.  As
`erlang-base` (which provides `epmd`) happens to have a dependency of
`procps`, any system without `pgrep` will also not have `epmd`
installed or running.  Regardless, it is safe to run `epmd -daemon`
even if one is already running, as the comment above notes.
2021-05-03 14:48:52 -07:00
Alex Vandiver 3577c6dbd4 puppet: `pgrep -f something` can match itself.
Using `pgrep -f epmd` to determine if `empd` is running is a race
condition with itself, since the pgrep is attempting to match the
"full process name" and its own full process name contains "epmd".
This leads to epmd not being started when it should be, which in turn
leads to rabbitmq-server failing to start.

Use the standard trick for this, namely a one-character character
class, to prevent self-matching.
2021-05-03 14:48:52 -07:00
Jennifer Hwang c9f5946239 puppet: Add override for queue_workers_multiprocess.
With tweaks to the documentation by tabbott.

This uses the following configuration option:

[application_server]
queue_workers_multiprocess = false
2021-04-20 14:37:15 -07:00
Tim Abbott bb676f1143 smokescreen: Move supervisor configuration to managed directory.
We've established the conf.d/zulip directory as the recommended path
for Zulip-managed configuration files, so this belongs there.
2021-04-16 14:05:42 -07:00
Gaurav Pandey 303e7b9701 ci: Add Debian bullseye to production test suite. 2021-04-15 21:38:31 -07:00
Gaurav Pandey feb720b463 install: Add beta support for debian bullseye for production.
This won't work on a real bullseye system until Bullseye actually
officially releases.

Fixes part of #17863.
2021-04-15 21:38:31 -07:00
Alex Vandiver 9de35d98d3 puppet: Ensure a snakeoil certificate, for Postfix and PostgreSQL.
We use the snakeoil TLS certificate for PostgreSQL and Postfix; some
VMs install the `ssl-cert` package but (reasonably) don't build the
snakeoil certs into the image.

Build them as needed.

Fixes #14955.
2021-04-15 21:37:55 -07:00
Anders Kaseorg b01d43f339 mypy: Fix strict_equality violations.
puppet/zulip/files/nagios_plugins/zulip_postgresql/check_postgresql_replication_lag:98: error: Non-overlapping equality check (left operand type: "List[List[str]]", right operand type: "Literal[0]")  [comparison-overlap]
zerver/tests/test_realm.py:650: error: Non-overlapping container check (element type: "Dict[str, Any]", container item type: "str")  [comparison-overlap]

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-04-13 09:18:18 -07:00
Alex Vandiver 93f3b41811 puppet: Also move avatars to the same nginx include file. 2021-04-09 08:28:42 -07:00
Alex Vandiver aae8f454ce puppet: Simplify uploads handling.
`uploads-route.noserve` and `uploads-route.internal` contained
identical location blocks for `/upload`, since differentiation was
necessary for Trusty until 33c941407b72; move the now-common sections
into `app`.

This the only differences between internal and S3 serving as a single
block which should be included or not based on config; move it to a
file which may or may not be placed in `app.d/`.
2021-04-09 08:28:42 -07:00
Alex Vandiver fb26c6b7ca puppet: Move uwsgi_pass setting into uwsgi_params.
We only ever call `uwsgi_pass django` in association with `include
uwsgi_params`; refactor it in.
2021-04-09 08:28:42 -07:00
Alex Vandiver 9cf9d5f2cf puppet: Move HTTP_X_REAL_IP setting into uwsgi_params.
This effectively also adds it to serving `/user_uploads`, where its
lack would cause failures to list the actual IP address.
2021-04-09 08:28:42 -07:00
Alex Vandiver 795517bd52 puppet: Only set X-Real-IP once.
07779ea879 added an additional `proxy_set_header` of `X-Real-IP` to
`puppet/zulip/files/nginx/zulip-include-common/proxy`; as noted in
that commit, Tornado longpoll proxies already included such a line.

Unfortunately, this equates to setting that header _twice_ for Tornado
ports, like so:

```
X-Real-Ip: 198.199.116.58
X-Real-Ip: 198.199.116.58
```

...which is represented, once parsed by Django, as an IP of
`198.199.116.58, 198.199.116.58`.  For IPv4, this odd "IP address" has
no problems, and appears in the access logs accordingly; for IPv6
addresses, however, its length is such that it overflows a call to
`getaddrinfo` when attempting to determine the validity of the IP.

Remove the now-duplicated inclusion of the header.
2021-04-09 08:28:42 -07:00
Alex Vandiver 07779ea879 middleware: Do not trust X-Forwarded-For; use X-Real-Ip, set from nginx.
The `X-Forwarded-For` header is a list of proxies' IP addresses; each
proxy appends the remote address of the host it received its request
from to the list, as it passes the request down.  A naïve parsing, as
SetRemoteAddrFromForwardedFor did, would thus interpret the first
address in the list as the client's IP.

However, clients can pass in arbitrary `X-Forwarded-For` headers,
which would allow them to spoof their IP address.  `nginx`'s behavior
is to treat the addresses as untrusted unless they match an allowlist
of known proxies.  By setting `real_ip_recursive on`, it also allows
this behavior to be applied repeatedly, moving from right to left down
the `X-Forwarded-For` list, stopping at the right-most that is
untrusted.

Rather than re-implement this logic in Django, pass the first
untrusted value that `nginx` computer down into Django via `X-Real-Ip`
header.  This allows consistent IP addresses in logs between `nginx`
and Django.

Proxied calls into Tornado (which don't use UWSGI) already passed this
header, as Tornado logging respects it.
2021-03-31 14:19:38 -07:00
Anders Kaseorg 29e4c71ec4 puppet: Reformat custom Ruby modules with Rufo.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-03-24 12:12:04 -07:00
Alex Vandiver 6ee74b3433 puppet: Check health of APT repository. 2021-03-23 19:27:42 -07:00
Alex Vandiver c01345d20c puppet: Add nagios check for long-lived certs that do not auto-renew. 2021-03-23 19:27:27 -07:00
Alex Vandiver 9ea86c861b puppet: Add a nagios alert configuration for smokescreen.
This verifies that the proxy is working by accessing a
highly-available website through it.  Since failure of this equates to
failures of Sentry notifications and Android mobile push
notifications, this is a paging service.
2021-03-18 10:11:15 -07:00
Anders Kaseorg 129ea6dd11 nginx: Consistently listen on IPv6 and with HTTP/2.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-03-17 17:46:32 -07:00
Alex Vandiver 15c58cce5a puppet: Create new nginx logfiles as the zulip user, not as www-data.
All of `/var/log/nginx/` is chown'd to `zulip` and the nginx processes
themselves run as `nginx`, and would thus (on their own) create new
logfiles as `zulip`.  Having `logrotate` create them as the package
default of `www-data` means that they are momentarily unreadable by
the `zulip` user just after rotation, which can cause problems with
logtail scripts.

Commit the standard `nginx` logrotate configuration, but with the
`zulip` user instead of the `www-data` user.
2021-03-16 14:45:13 -07:00
Alex Vandiver 3314fefaec puppet: Do not require a venv for zulip-puppet-apply.
0663b23d54 changed zulip-puppet-apply to
use the venv, because it began using `yaml` to parse the output of
puppet to determine if changes would happen.

However, not every install ends with a venv; notably, non-frontend
servers do not have one.  Attempting to run zulip-puppet-apply on them
hence now fails.

Remove this dependency on the venv, by installing a system
python3-yaml package -- though in reality, this package is already an
indirect dependency of the system.  Especially since pyyaml is quite
stable, we're not using it in any interesting way, and it does not
actually add to the dependencies, it is preferable to parsing the YAML
by hand in this instance.
2021-03-14 17:50:57 -07:00
Alex Vandiver 52f155873f puppet: Ensure that all `scripts/lib/install` packages are installed.
These have all been required packages for some time, but this helps
keep the install-time list more clearly a subset of the upgrade-time
list.
2021-03-14 17:50:57 -07:00
Alex Vandiver 06c07109e4 puppet: Add missing semicolons left off in ba3b88c81b. 2021-03-12 15:48:53 -08:00
Alex Vandiver 024282b51e Revert "puppet: Use rabbitmq as the user for its config files."
This reverts commit 211232978f.  The
`rabbitmq` user does not exist yet on first install, and the goal is
to create the `rabbitmq-env.conf` file before the package is
installed.
2021-03-12 15:37:19 -08:00
Alex Vandiver ba3b88c81b puppet: Explicitly use the snakeoil certificates for nginx.
In production, the `wildcard-zulipchat.com.combined-chain.crt` file is
just a symlink to the snakeoil certificates; but we do not puppet that
symlink, which makes new hosts fail to start cleanly.  Instead, point
explicitly to the snakeoil certificate, and explain why.
2021-03-12 13:31:54 -08:00
Alex Vandiver 211232978f puppet: Use rabbitmq as the user for its config files.
This matches the initial ownership by the `rabbitmq-server` package.
2021-03-12 13:31:03 -08:00
Alex Vandiver ef188af82d puppet: Use two location blocks, instead of nesting them.
Directives in `location` blocks may or may not inherit from
surrounding `location` blocks; specifically, `add_header` directives
do not[1]:

> There could be several add_header directives. These directives are
> inherited from the previous configuration level if and only if there
> are no add_header directives defined on the current level.

In order to maintain the same headers (including, critically,
`Access-Control-Allow-Origin`) as the surrounding block, all
`add_header` directives must thus be repeated (which includes the
`include`).

For clarity, un-nest and repeat the entire `location` block as was
used for `/static/`, but with the additional `add_header`.  This is
preferred to the of an `if $request_uri` statement to add the header,
as those can have unexpected or undefined results[2].

[1] http://nginx.org/en/docs/http/ngx_http_headers_module.html#add_header
[2] https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/
2021-03-11 21:09:15 -08:00
Alex Vandiver 306bf930f5 puppet: Add a warning if ksplice is enabled but has no key set. 2021-03-10 17:57:20 -08:00
Alex Vandiver a215c83c2d puppet: Switch to more explicit variable rather than reuse a nagios one.
Redis is not nagios, and this only leads to confusion as to why there
is a nagios domain setting on frontend servers; it also leaves the
`redis0` part of the name buried in the template.

Switch to an explicit variable for the redis hostname.
2021-03-10 11:44:54 -08:00
Alex Vandiver a5b29398fc puppet: Only install ksplice uptrack if there is an access key. 2021-03-10 11:44:11 -08:00
Alex Vandiver 189e86e18e puppet: Set aggressive caching headers on immutable webpack files.
A partial fix for #3470.
2021-03-07 22:00:32 -08:00
Alex Vandiver e63f170027 puppet: Add access time and host to nginx access logs.
2e20ab1658 attempted to add this; but
there are multiple locations that access logs are set, and the most
specific wins.
2021-03-04 18:06:47 -08:00
Alex Vandiver 8961885b0f puppet: Add smokescreen to logrotate. 2021-03-02 17:16:38 -08:00
Alex Vandiver d938dd9d4a puppet: Document smokescreen installation, and move to puppet/zulip/.
This is more broadly useful than for just Kandra; provide
documentation and means to install Smokescreen for stand-alone
servers, and motivate its use somewhat more.
2021-03-02 17:16:38 -08:00
Alex Vandiver 2f5eae5c68 puppet: Minor formatting. 2021-02-28 17:03:29 -08:00
Alex Vandiver a759d26a32 puppet: Make ksplice config not world-readable, use 'adm' group.
This matches the configuration that ksplice itself creates the file
and directory with.
2021-02-28 17:03:29 -08:00
Tim Abbott 957c16aa77 nagios: Tweak prod load monitoring parameters.
Ultimately this monitoring isn't that helpful, but we're mainly
interested in when it spikes to very high numbers.
2021-02-26 08:39:52 -08:00
Alex Vandiver 32149c6a1c puppet: Add ksplice uptrack for kernel hotpatches. 2021-02-25 18:05:47 -08:00
Alex Vandiver 173d2dec3d puppet: Check in defensive restart-camo cron job.
This was found on lb1; add it to the camo install on smokescreen.
2021-02-24 16:42:21 -08:00
Alex Vandiver d15e6990e5 puppet: Only execute setup-apt-repo if necessary.
This means that in steady-state, `zulip-puppet-apply` is expected to
produce no changes or commands to execute.  The verification step of
`setup-apt-repo` is quite fast, so this cleans up the output for very
little cost.
2021-02-23 18:16:02 -08:00
Alex Vandiver 0b736ef4cf puppet: Remove puppet_ops configuration for separate loadbalancer host. 2021-02-22 16:05:13 -08:00
Alex Vandiver e30b524896 iptables: Limit smokescreen port 4750, add camo port.
Limit incoming connections to port 4750 to only the smokescreen host,
and also allow access to the Camo server on that host, on port 9292.
2021-02-17 13:52:38 -08:00
Alex Vandiver 1caff01463 puppet: Configure nginx for long keep-alives when behind a loadbalancer.
These optimizations only makes sense when all connections at a TCP
level are coming from the same host or set of hosts; as such, they
are only enabled if `loadbalancer.ips` is set in the `zulip.conf`.
2021-02-17 10:25:33 -08:00
Alex Vandiver a88af1b5a2 camo: Install on smokescreen host. 2021-02-16 08:12:31 -08:00
Alex Vandiver 29f60bad20 smokescreen: Put the version into the supervisorctl command.
This makes it reload correctly if the version is changed.
2021-02-16 08:12:31 -08:00
Anders Kaseorg 6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 5028c081cb python: Merge concatenated string literals that Black would uglify.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Alex Vandiver 559cdf7317 puppet: Set APT::Periodic::Unattended-Upgrade in apt config.
This is required for unattended upgrades to actually run regularly.
In some distributions, it may be found in 20auto-upgrades, but placing
it here makes it more discoverable.
2021-02-12 08:59:19 -08:00
Ganesh Pawar 65e23dd713 puppet: Add Zulip specific postgresql configuration for 13.
Based on the work done in a03e4784c7.
2021-02-05 09:30:34 -08:00
Ganesh Pawar 90a3dc8a91 puppet: Add upstream version of postgresql 13 config.
This is a prep commit to add provision support for Ubuntu 20.10 Groovy.
2021-02-05 09:30:34 -08:00
Tim Abbott fd8504e06b munin: Update to use NAGIOS_BOT_HOST.
We haven't actively used this plugin in years, and so it was never
converted from the 2014-era monitoring to detect the hostname.

This seems worth fixing since we may want to migrate this logic to a
more modern monitoring system, and it's helpful to have it correct.
2021-01-27 12:07:09 -08:00
Alex Vandiver ab035f76de puppet: Be more restrictive about mm addresses.
These will always have only 32 characters after the `mm`.
2021-01-26 10:13:58 -08:00
Alex Vandiver a53092687e puppet: Only match incoming gateway address on our mail domain.
79931051bd allows outgoing emails from
localhost, but outgoing recipients are still subjected to virtualmaps.
This caused all outgoing email from Zulip with destination addresses
containing `.`, `+`, or starting with `mm`, to be redirected back
through the email gateway.

Bracket the virualmap addresses used for local delivery to the mail
gateway with a restriction on the domain matching the
`postfix.mailname` configuration, regex-escaped, so those only apply
to email destined for that domain.

The hostname is _not_ moved from `mydestination` to
`virtual_alias_domains`, as that would preclude delivery to
actually-local addresses, like `postmaster@`.
2021-01-26 10:13:58 -08:00
Alex Vandiver c2526844e9 worker: Remove SignupWorker and friends.
ZULIP_FRIENDS_LIST_ID and MAILCHIMP_API_KEY are not currently used in
production.

This removes the unused 'signups' queue and worker.
2021-01-17 11:16:35 -08:00
Tim Abbott 4ee58f408b process_fts_updates: Make normal development startup silent.
We run this tool at DEBUG log level in production, so we will still
see the notice on startup there; this avoids a spammy line in the
development environment output..
2020-12-20 12:19:49 -08:00
Sutou Kouhei 0d3f9fc855 install: Use PGroonga packages built for PostgreSQL packages by PGDG
Because we always use PostgreSQL packages by PGDG since Zulip 3.0.

Fixes #16058.
2020-12-18 15:38:21 -08:00
Alex Vandiver 4868a4fe48 puppet: Set a long timeout on wal-g wal-push, to prevent stalls.
`wal-g wal-push` has a known bug with occasionally hanging after file
upload to S3[1]; set a rather long timeout on the upload process, so
that we don't simply stall forever when archiving WAL segments.

[1] https://github.com/wal-g/wal-g/issues/656
2020-11-20 11:32:36 -08:00
Sourabh Rana 419f163906 nginx: Increase file upload size from 25mb to 80mb. 2020-11-19 00:49:49 -08:00
Alex Vandiver 90ca06d873 puppet: Allow unattended upgrades of -updates in addition to -security.
This ensures that software will be fully up-to-date, not just with
security patches.
2020-11-13 16:45:05 -08:00
Alex Vandiver 2e20ab1658 puppet: Log the "Host" header and total response time.
Logging `Host` is useful for determining access patterns to realms,
especially if ROOT_DOMAIN_LANDING_PAGE is set.  Total response time is
useful in debugging access and performance patterns.
2020-11-13 16:42:32 -08:00
Tim Abbott 494a685827 puppet: Fix typo in name of missedmessage_emails consumer.
This has been present since this check was introduced in
45c9c3cc30.
2020-10-29 12:28:54 -07:00
Tim Abbott ab3cb2b3bf puppet: Fix internal redis puppet configuration.
The inherits rule is required for overriding existing configuration
files; while the `::profile` piece was missed in the recent ::profile
migration.
2020-10-29 11:53:43 -07:00
Alex Vandiver 6b9d7000b5 puppet: Set proxy environment variables.
These are respected by `urllib`, and thus also `requests`.  We set
`HTTP_proxy`, not `HTTP_PROXY`, because the latter is ignored in
situations which might be running under CGI -- in such cases it may be
coming from the `Proxy:` header in the request.
2020-10-28 12:17:35 -07:00
Alex Vandiver 8b0f32ee07 puppet: Move environment-setting into configuration, not command. 2020-10-28 12:13:04 -07:00
Alex Vandiver b9797770d3 provision: Rename backup directory to postgresql. 2020-10-28 11:57:03 -07:00
Alex Vandiver 1f7132f50d docs: Standardize on PostgreSQL, not Postgres. 2020-10-28 11:55:16 -07:00
Alex Vandiver eaa99359b1 puppet: Rename to check_postgresql_replication_lag. 2020-10-28 11:51:52 -07:00
Alex Vandiver 53e59a0a13 puppet: Rename check_postgres_backup to check_postgresql_backup. 2020-10-28 11:51:52 -07:00
Alex Vandiver 45f6c79c4a puppet: Rename postgres_ variables to postgresql_. 2020-10-28 11:51:52 -07:00
Alex Vandiver e124324050 puppet: Rename postgres_appdb in nagios to postgresql. 2020-10-28 11:51:52 -07:00
Alex Vandiver a155430eb5 docs: Document all zulip.conf settings.
This provides a single reference point for all zulip.conf settings;
these mostly link out to the more complete documentation about each
setting, elsewhere.

Fixes #12490.
2020-10-27 13:31:57 -07:00
Alex Vandiver e81bc19e45 puppet: Remove shims for old classes, except dockervoyager.
The upgrade mechanism in the previous commit negates the need for
them -- with the exception of dockervoyager.
2020-10-27 13:29:19 -07:00
Alex Vandiver d24c571bab puppet: Automatically back up the database if we have the secrets.
This avoids folks having to manually add to the puppet_classes.
2020-10-27 13:29:19 -07:00
Alex Vandiver e7798d2797 puppet: Move zulip_ops::profile::postgres_appdb to postgresql. 2020-10-27 13:29:19 -07:00
Alex Vandiver 9f25389bff puppet: Move top-level zulip_ops deployments to zulip_ops::profile. 2020-10-27 13:29:19 -07:00
Alex Vandiver 5365af544a puppet: Rename zulip::profile::rabbit to ::rabbitmq. 2020-10-27 13:29:19 -07:00
Alex Vandiver 188af57296 puppet: Rename postgres_appdb to postgresql.
There is only one PostgreSQL database; the "appdb" is irrelevant.
Also use "postgresql," as it is the name of the software, whereas
"postgres" the name of the binary and colloquial name.  This is minor
cleanup, but enabled by the other renames in the previous commit.
2020-10-27 13:29:19 -07:00
Alex Vandiver 91cb0988e1 puppet: Generalize docker detection.
This also has the benefit of detecting zulip::dockervoyager as well as
zulip::profile::docker.
2020-10-27 13:29:19 -07:00
Alex Vandiver 0f25acc7b3 puppet: Rename "voyager"/"dockervoyager" to "standalone"/"docker".
The "voyager" name is non-intuitive and not significant.
`zulip::voyager` and `zulip::dockervoyager` stubs are kept for
back-compatibility with existing `zulip.conf` files.
2020-10-27 13:29:19 -07:00
Alex Vandiver c2185a81d6 puppet: Move top-level zulip deployments into "profile" directory.
This moves the puppet configuration closer to the "roles and profiles
method"[1] which is suggested for organizing puppet classes.  Notably,
here it makes clear which classes are meant to be able to stand alone
as deployments.

Shims are left behind at the previous names, for compatibility with
existing `zulip.conf` files when upgrading.

[1] https://puppet.com/docs/pe/2019.8/the_roles_and_profiles_method
2020-10-27 13:29:19 -07:00
Alex Vandiver 27cfb14d92 puppet: Only include zulip::base for top-level deploys.
This also removes direct includes of `zulip::common`, making
`zulip::base` gatekeep the inclusion of it.  This helps enforce that
any top-level deploy only needs include a single class, and that any
configuration which is not meant to be deployed by itself will not
apply, due to lack of `zulip::common` include.

The following commit will better differentiate these top-level deploys
by moving them into a subdirectory.
2020-10-27 13:29:19 -07:00
Alex Vandiver 34e8c2c61e puppet: Move total_memory_mb from zulip::base into zulip::common.
This makes `zulip::common` used only for variable-setting, and
`zulip::base` used only for resource creation.
2020-10-27 13:29:19 -07:00
Alex Vandiver 7bb888c2ec puppet: Template supervisor.conf for redhat paths. 2020-10-27 13:29:19 -07:00
Alex Vandiver 3ab9b31d2f puppet: Purge all un-managed supervisor configuration files.
Relying on `defined(Class['...'])` makes the class sensitive to
resource evaluation ordering, and thus brittle.  It is also only
functional for a single service (thumbor).

Generalize by using `purge => true` for the directory to automatically
remove all un-managed files.  This is more general than the previous
form, and may result in additional not-managed services being removed.
2020-10-27 13:29:19 -07:00
Alex Vandiver 1d54630b4e log: Rename email-deliverer.log to match other files. 2020-10-25 14:56:37 -07:00
Alex Vandiver 93d661d119 puppet: Configure logrotate for all logger files.
This adds log rotation to all /var/log/zulip files.
2020-10-25 14:56:37 -07:00
Alex Vandiver c296b5d819 puppet: Allow unattended-upgrades for all but servers.
Restarting servers is what can cause service interruptions, and
increase risk.  Add all of the servers that we use to the list of
ignored packages, and uncomment the default allowed-origins in order
to enable unattended upgrades.
2020-10-23 16:46:06 -07:00
Anders Kaseorg 72d6ff3c3b docs: Fix more capitalization issues.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-23 11:46:55 -07:00
Alex Vandiver a7d1fd9ffb puppet: Remove non-working apt::source.
d2aa81858c replaced the `apt::source` to set up debathena with
`Exec['setup-apt-repo-debathena']`, but mistakenly left the
`apt::source` in place in `zmirror` (but not `zmirror_personals`).
The `apt::source` resource type was later removed in c9d54f7854,
making the manifest to apply on `zmirror`.

Remove the broken and unnecessary `apt::source` resource.
2020-10-23 11:31:20 -07:00
Alex Vandiver 48e06c25ba puppet: Switch nagios SSH checks to id_ed25519 key.
The ssh-rsa algorithm was deprecated[1] in OpenSSH 8.2 (2020-02-14) and
will be removed in a future release.

[1] https://www.openssh.com/txt/release-8.4
2020-10-22 16:42:30 -07:00
Alex Vandiver 0ea20bd7d8 puppet: Move postgres_version into postgres_common.
This property is not related to the base zulip install; move it to
zulip::postgres_common, which is already used as a namespace for
various postgres variables.
2020-10-22 11:32:25 -07:00
Alex Vandiver 25e995b677 puppet: Move normal_queues to the one place that uses it. 2020-10-22 11:32:25 -07:00
Alex Vandiver 423b5c2be2 puppet: Move queue error and stats directories to just the app host. 2020-10-22 11:31:05 -07:00
Alex Vandiver 4d4c21499a puppet: Move supervisor dependency into process_fts_updates.
PostgreSQL itself has no dependency on supervisor; rather, the FTS
updates do.
2020-10-22 11:30:53 -07:00
Alex Vandiver ca971ebc59 puppet: Remove empty zulip_ops class. 2020-10-22 11:30:53 -07:00
Alex Vandiver 16af05758d puppet: Move zulip_org into zulip_ops.
This class is not of general interest.
2020-10-22 11:30:53 -07:00
Alex Vandiver ad566c491d puppet: Drop now-unused zulip_ops:::git class. 2020-10-22 11:30:53 -07:00
Alex Vandiver 50e9e2ed20 puppet: Make zulip::base include zulip::apt_repository.
There was likely more dependency complexity prior to 97766102df, but
there is now no reason to require that consumers explicitly include
zulip::apt_repository.
2020-10-22 11:30:53 -07:00
Alex Vandiver 2dc6d26ec6 puppet: Fix included monitoring class name. 2020-10-19 22:30:20 -07:00
Alex Vandiver 7a1132d605 puppet: Switch golang and smokescreen to use /srv.
/srv and /opt have very similar usages; but we should be internally
consistent.

Move these two (the only usages of /opt) to match the rest in /srv.
2020-10-16 13:00:06 -07:00
Alex Vandiver 78b92a51cc puppet: Allow access to smokescreen port via iptables. 2020-10-15 15:18:35 -07:00
Alex Vandiver 0d5356969e puppet: Reformat ipv4 iptables rules comments. 2020-10-15 15:18:35 -07:00
Alex Vandiver fffea9612b puppet: Add an outgoing HTTP/HTTPS proxy server.
Use https://github.com/stripe/smokescreen to provide a server for an
outgoing proxy, run under supervisor.  This will allow centralized
blocking of internal metadata IPs, localhost, and so forth, as well as
providing default request timeouts (10s by default).
2020-10-15 15:18:35 -07:00
Anders Kaseorg dfaea9df65 shfmt: Reformat shell scripts with shfmt.
https://github.com/mvdan/sh

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-15 15:16:00 -07:00
Alex Vandiver f61ac4a28d puppet: Move frontend monitoring into its own file.
This allows it to be pulled in for deploys like czo, which don't use
the full `zulip_ops::app_frontend`, but we wish to monitor.
2020-10-13 17:37:32 -07:00
Tim Abbott 7c2c82b190 nginx: Update nginx configuration for fhir/hl7 organization.
We should eventually add templating for the set of hosts here, but
it's worth merging this change to remove the deleted hostname and
replace it with the current one.
2020-10-13 16:50:26 -07:00
Anders Kaseorg 723d285e46 nginx: Redirect {www.,}zulipchat.com, www.zulip.com to zulip.com.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-13 16:49:23 -07:00
Alex Vandiver c8df9a150e puppet: Drop all log2zulip configuration.
Disabled on webservers in 047817b6b0, it has since lingered in
configuration, as well as running (to no effect) every minute on the
loadbalancer.

Remove the vestiges of its configuration.
2020-10-13 11:00:50 -07:00
Alex Vandiver b431b1b021 puppet: Remove misleading motd.
This banner shows on lb1, advertising itself as lb0.  There is no
compelling reason for a custom motd, especially one which needs to
be reconfigured for each host.
2020-10-13 11:00:36 -07:00
Alex Vandiver 45c9c3cc30 queue: Monitor user_activity queue, now that it has a consumer.
Since this was using repead individual get() calls previously, it
could not be monitored for having a consumer.  Add it in, by marking
it of queue type "consumer" (the default), and adding Nagios lines for
it.

Also adjust missedmessage_emails to be monitored; it stopped using
LoopQueueProcessingWorker in 5cec566cb9, but was never added back
into the set of monitored consumers.
2020-10-11 14:19:42 -07:00
Alex Vandiver 4fd7df4e8c puppet: Remove absent of check-apns-tokens.
This was marked as ensure absent in d02101a401, in v1.7.0 in 2017.
2020-09-29 18:17:08 -07:00
Alex Vandiver 872a349508 puppet: Remove absent of log2zulip.
This was marked as ensure absent in 047817b6b0, in v2.0.0 in 2018.
2020-09-29 18:17:08 -07:00
Alex Vandiver 0137772fdb puppet: Remove absent of calculate-first-visible-message-id.
This was marked as ensure absent in dc7d44a245, in v1.9.0 in 2018.
2020-09-29 18:17:08 -07:00
Alex Vandiver 966c8dc23d puppet: Remove absent of email-mirror cron job.
This was marked as ensure absent in 24f8492236, in v1.3.0 in 2014.
2020-09-29 18:17:08 -07:00
Alex Vandiver 430d3b8554 puppet: Remove absent of libapache2-mod-wsgi.
This was marked as ensure absent in 89b97e7480, in v1.7.0 in 2017,
though it did not take effect until 6e55aa2ce6, in v1.9.0 in 2018.
2020-09-29 18:17:08 -07:00
Alex Vandiver 12085552d5 puppet: Tidy indentation. 2020-09-29 17:44:44 -07:00
Alex Vandiver 57d88eedd8 puppet: Only install rabbitmq cron jobs via zulip_ops.
The rabbitmq cron jobs exist in order to call rabbitmqctl as root and
write the output to files that nagios can consume, since nagios is not
allowed to run rabbitmqctl.

In systems which do not have nagios configured, these every-minute
cron jobs add non-insignificant load, to no effect.  Move their
installation into `zulip_ops`.  In doing so, also combine the cron.d
files into a single file; this allows us to `ensure => absent` the old
filenames, removing them from existing systems.  Leave the resulting
combined cron.d file in `zulip`, since it is still of general utility
and note.
2020-09-29 17:44:44 -07:00
Alex Vandiver 79931051bd puppet: Permit outgoing mail from postfix.
The configuration change made in 1c17583ad5 only allowed delivery to
those specific Zulip addresses.  However, they also prevent the
mailserver from being used as an outgoing email relay from Zulip,
since all mail that passed through the mailserver (from any
originator) was required to have a `RCPT TO` that matched those
regexes.

Allow mail originating from `mynetworks` to have an arbitrary
addresses in `RCPT TO`.
2020-09-25 15:09:27 -07:00
Alex Vandiver 36ea307fbf puppet: Depend other changes on sharding.py validation.
Use the validation of the tornado sharding config that
`stage_updated_sharding` does, by depending on it.  This ensures that
we don't write out a supervisor or nginx config based on a
bad (e.g. non-sequential) list of tornado ports.
2020-09-25 10:52:40 -07:00
Alex Vandiver c0e240277b tornado: Remove fingerprinting, write out .tmp files always.
Fingerprinting the config is somewhat brittle -- it requires either
custom bootstrapping for old (fingerprint-less) configs, and may have
false-positives.

Since generating the config is lightweight, do so into the .tmp files,
and compare the output to the originals to determine if there are
changes to apply.

In order to both surface errors, as well as notify the user in case a
restart is necessary, we must run it twice.  The `onlyif`
functionality cannot show configuration errors to the user, only
determine if the command runs or not.  We thus run the command once,
judging errors as "interesting" enough to run the actual command,
whose failure will be verbose in Puppet and halt any steps that depend
on it.

Removing the `onlyif` would result in `stage_updated_sharding` showing
up in the output of every Puppet run, which obscures the important
messages it displays when an update to sharding is necessary.
Removing the `command` (e.g. making it an `echo`) would result in
removing the ability to report configuration errors.  We thus have no
choice but to run it twice; this is thankfully low-overhead.
2020-09-25 10:52:40 -07:00
Alex Vandiver 2a12fedcf1 tornado: Remove explicit tornado_processes setting; compute it.
We can compute the intended number of processes from the sharding
configuration.  In doing so, also validate that all of the ports are
contiguous.

This removes a discrepancy between `scripts/lib/sharding.py` and other
parts of the codebase about if merely having a `[tornado_sharding]`
section is sufficient to enable sharding.  Having behaviour which
changes merely based on if an empty section exists is surprising.

This does require that a (presumably empty) `9800` configuration line
exist, but making that default explicit is useful.

After this commit, configuring sharding can be done by adding to
`zulip.conf`:

```
[tornado_sharding]
9800 =              # default
9801 = other_realm
```

Followed by running `./scripts/refresh-sharding-and-restart`.
2020-09-18 15:13:40 -07:00
Alex Vandiver f638518722 tornado: Move default production port to 9800.
In development and test, we keep the Tornado port at 9993 and 9983,
respectively; this allows tests to run while a dev instance is
running.

In production, moving to port 9800 consistently removes an odd edge
case, when just one worker is on an entirely different port than if
two workers are used.
2020-09-18 15:13:40 -07:00
Alex Vandiver ff94254598 tornado: Log to files by port number.
Without an explicit port number, the `stdout_logfile` values for each
port are identical.  Supervisor apparently decides that it will
de-conflict this by appending an arbitrary number to the end:

```
/var/log/zulip/tornado.log
/var/log/zulip/tornado.log.1
/var/log/zulip/tornado.log.10
/var/log/zulip/tornado.log.2
/var/log/zulip/tornado.log.3
/var/log/zulip/tornado.log.7
/var/log/zulip/tornado.log.8
/var/log/zulip/tornado.log.9
```

This is quite confusing, since most other files in `/var/log/zulip/`
use `.1` to mean logrotate was used.  Also note that these are not all
sequential -- 4, 5, and 6 are mysteriously missing, though they were
used in previous restarts.  This can make it extremely hard to debug
logs from a particular Tornado shard.

Give the logfiles a consistent name, and set them up to logrotate.
2020-09-14 22:17:51 -07:00
Alex Vandiver efdaa58c24 supervisor: Use more specific process_name than "port-9800".
Making this include "zulip-tornado" makes it clearer in supervisor
logs.  Without this, one only sees:
```
2020-09-14 03:43:13,788 INFO waiting for port-9807 to stop
2020-09-14 03:43:14,466 INFO stopped: port-9807 (exit status 1)
2020-09-14 03:43:14,469 INFO spawned: 'port-9807' with pid 24289
2020-09-14 03:43:15,470 INFO success: port-9807 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
```
2020-09-14 22:17:51 -07:00
Alex Vandiver e9d0bdea65 puppet: Coerce uwsgi_listen_backlog_limit into an int before doing math. 2020-09-14 21:22:13 -07:00
Alex Vandiver 8adf530400 puppet: Generate sharding in puppet, then refresh-sharding-and-restart.
This supports running puppet to pick up new sharding changes, which
will warn of the need to finalize them via
`refresh-sharding-and-restart`, or simply running that directly.
2020-09-14 16:27:15 -07:00
Alex Vandiver 0de356c2df puppet: Move generation of tornado nginx upstreams into tornado_sharding.
This puts the creation of the upstreams referenced by
`nginx_sharding.conf` adjacent to their use.
2020-09-14 16:27:15 -07:00
Alex Vandiver bf029d99f1 sharding: Also mark sharding.json 644 for consistency.
There is no reason to limit this to 640; mark it 644 for consistency
with the other file.
2020-09-14 16:27:15 -07:00
Alex Vandiver 1c17583ad5 puppet: Restrict postfix incoming addresses to postmaster and zulip.
This removes the possibility of local user enumeration via RCPT TO.
2020-09-11 18:49:22 -07:00
Alex Vandiver 482c964dd3 puppet: Logrotate for webhook exceptions. 2020-09-10 17:47:21 -07:00
Alex Vandiver e38051736d puppet: Wrap and sort logrotate config. 2020-09-10 17:47:21 -07:00
Anders Kaseorg 75c59a820d python: Convert subprocess.Popen.communicate to run or check_output.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 17:42:35 -07:00
Anders Kaseorg fbfd4b399d python: Elide action="store" for argparse arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 16:17:14 -07:00
Anders Kaseorg 1f2ac1962f python: Elide default=None for argparse arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 16:17:14 -07:00
Anders Kaseorg d751e0cece puppet: Don’t install netcat.
It’s been unused since commit 0af22dad18
(#13239).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 10:33:47 -07:00
Anders Kaseorg ab120a03bc python: Replace unnecessary intermediate lists with generators.
Mostly suggested by the flake8-comprehension plugin.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-02 11:15:41 -07:00
Anders Kaseorg a5dbab8fb0 python: Remove redundant dest for argparse arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-02 11:04:10 -07:00
Anders Kaseorg dbdf67301b memcached: Switch from pylibmc to python-binary-memcached.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-08-06 12:51:14 -07:00
Casper Kvan Clausen ed7a6d5e4d puppet: Support nginx_listen_port with http_only 2020-08-03 12:58:12 -07:00
Alex Vandiver cd530d627b uwsgi: Stop generating IOError and SIGPIPE on client close.
Clients that close their socket to nginx suddenly also cause nginx to close
its connection to uwsgi.  When uwsgi finishes computing the response,
it thus tries to write to a closed socket, and generates either
IOError or SIGPIPE failures.

Since these are caused by the _client_ closing the connection
suddenly, they are not actionable by the server.  At particularly high
volumes, this could represent some sort of server-side failure;
however, this is better detected by examining status codes at the
loadbalancer.  nginx uses the error code 499 for this occurrence:
https://httpstatuses.com/499

Stop uwsgi from generating this family of exception entirely, using
configuration for uwsgi[1]; it documents these errors as "(annoying),"
hinting at their general utility."

[1] https://uwsgi-docs.readthedocs.io/en/latest/Options.html#ignore-sigpipe
2020-07-31 10:40:09 -07:00
Alex Vandiver ceb909dbc5 puppet: Increase backlogged socket count based on uwsgi backlog.
Increasing the uwsgi listen backlog is intended to allow it to handle
higher connection rates during server restart, when many clients may
be trying to connect.  The kernel, in turn, needs to have a
proportionally increased somaxconn soas to not refuse the connection.

Set somaxconn to 2x the uwsgi backlog, but no lower than the
default (128).
2020-07-28 21:16:26 -07:00
Alex Vandiver 38d01cd4db puppet: Generalize install-wal-g to be arbitrary tarballs. 2020-07-24 17:24:57 -07:00
Tim Abbott 5a1243db3c puppet: Use correct scope for zulip_ops::munin_plugin. 2020-07-15 21:49:45 -07:00
Alex Vandiver 48c3c33d10 puppet: Fully-qualify the munin-plugin name 2020-07-14 17:58:51 -07:00
Alex Vandiver c68333040b
puppet: Revert PostgreSQL setting of recovery_target_timeline.
Prior to PostgreSQL 12, the `recovery_target_timeline` setting is only
valid in a `recovery.conf` file, as that file has its own
configuration parser.  As such, including it in `postgresql.conf`
results in an error, and PostgreSQL will fail to start.

Remove the setting, reverting bff3b540b1.  This fixes PostgreSQL 9.5,
9.6, 10, and 11; while the setting is not an error in a PostgreSQL 12
configuration file, it is unnecessary since `latest` is the default.
2020-07-14 16:28:20 -07:00
Alex Vandiver 31d80a77d4 puppet: Update nagios check_postgres_replication_lag to be on DB hosts
7d4a370a57 attempted to move the replication check to on the
PostgreSQL hosts.  While it updated the _check_ to assume it was
running and talking to a local PostgreSQL instance, the configuration
and installation for the check were not updated.  As such, the check
ran on the nagios host for each DB host, and produced no output.

Start distributing the check to all apopdb hosts, and configure nagios
to use the SSH tunnel to get there.
2020-07-14 16:27:18 -07:00
Alex Vandiver 2174db27db puppet: Put the dependencies on pg_backup_and_purge itself, and ensure them. 2020-07-14 00:40:25 -07:00
Alex Vandiver 6c27f07c1d puppet: Move PostgreSQL backups to their own class.
wal-g was used in `puppet/zulip` by env-wal-g, but only installed in
`puppet/zulip_ops`.

Merge all of the dependencies of doing backups using wal-g (wal-g
installation, the pg_backup_and_purge job, the nagios plugin that
verifies it happens) into a common base class in `puppet/zulip`, since
it is generally useful.
2020-07-14 00:40:25 -07:00
Anders Kaseorg 15483c09cb puppet: Add missing trailing commas.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-13 15:36:06 -07:00
Alex Vandiver 3691a94efe puppet: Configure munin and nagios under apache with puppet.
This swaps in the actually-in-use munin configuiration file;
otherwise, it is an implementation of the configuration as it exists
on the machine.
2020-07-13 13:23:11 -07:00
Alex Vandiver 4e42164b4a munin: Add plugins to prod hosts. 2020-07-13 13:23:11 -07:00
Alex Vandiver 2a14212b27 munin: Add a helper resource definition for munin plugins. 2020-07-13 12:49:28 -07:00
Alex Vandiver 7c7b5fcd6f munin: Deal with spaces in the channel names. 2020-07-13 12:49:28 -07:00
Alex Vandiver eda2c4b8e2 puppet: Split munin-node from munin-server.
No plugins are installed inside the /usr/local/munin/lib this creates
in munin-node, nor are they symlinked into /etc/munin/plugins, so
non-default plugins are added by this.
2020-07-13 12:49:28 -07:00
Alex Vandiver ddc7bb5a45 munin: Fix the path to check_send_receive_time. 2020-07-13 12:49:28 -07:00
Alex Vandiver 8be544e7eb munin: Rename monitoring plugin to use zulip name, not humbug. 2020-07-13 12:49:28 -07:00
Alex Vandiver 1b3560af94 nagios: Stop assuming /api is where zulip client is.
The api/ directory was removed in f9ba3cb60c; as that commit notes, we
use the python-zulip-api module for that, added in 938597c5da.
2020-07-13 12:49:28 -07:00
Mateusz Mandera 57d3ef42b8 puppet: Don't run thumbor services in production.
Fixes #15649.
Currently, no production services use thumbor; so, it makes sense
to not run them in production systems.
2020-07-10 14:22:17 -07:00
Alex Vandiver f0f29584aa puppet: Add an arity count ("at least two") to zulipconf function. 2020-07-10 00:14:09 -07:00
Alex Vandiver 8cff27f67d puppet: Pull hosts from zulip.conf, not hardcoded list.
The one complexity is that hosts_fullstack are treated differently, as
they are not currently found in the manual `hosts` list, and as such
do not get munin monitoring.
2020-07-10 00:14:09 -07:00
Alex Vandiver 24383a5082 puppet: Rename hosts_domain so hosts_prefix can be grepped for. 2020-07-10 00:14:09 -07:00
Alex Vandiver a4e7c7a27e nagios: Remove check_memcached.
check_memcached does not support memcached authentication even in its
latest release (it’s in a TODO item comment, and that’s it), and was
never particularly useful.
2020-07-10 00:12:48 -07:00
Anders Kaseorg ebf7f4d0f6 zthumbor: Rename thumbor.conf to thumbor_settings.py.
So we can apply all our lint checks to it.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-06 18:44:58 -07:00
Anders Kaseorg 9900298315 zthumbor: Remove Python 2 residue.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-06 18:44:58 -07:00
Alex Vandiver 17002f2a0e puppet: Allow passing an alternate config path to zulip-puppet-apply.
When temporary configuration changes are desired, this lets one set up
an alternate `zulip.conf` to apply while leaving the true one in place.
2020-07-06 18:30:16 -07:00
Alex Vandiver 64b44a12f5 puppet: Add an exec rule to reload the whole supervisor config.
When supervisor is first installed, it is started automatically, and
creates the socket, owned by root.  Subsequent reconfiguration in
puppet only calls `reread + update`, which is insufficient to apply
the `chown = zulip:zulip` line in `supervisord.conf`, leaving the
socket owned by `root` and the last part of the installation unable to
restart `supervisor` services as the `zulip` user.  The `chown` line
in `scripts/lib/install` exists to paper over this.

Add a separate exec target for changes to `supervisord.conf` itself,
which restarts the full service.  This leaves the default `restart`
action on the service for the lightweight `reread + update` action,
which is more common.

We use `systemctl` only on redhat-esque builds, because CI runs
Ubuntu, but init is not systemd in that context.  `systemctl reload`
is sufficient to re-apply the socket ownership, but a full `restart`
and not `reload` is necessary under `/etc/init.d/supervisor`.
2020-07-01 10:40:54 -07:00
Alex Vandiver dd91f8edba puppet: Move supervisor start command into zulip::common.
Move this command alongside the rest of the distro-dependent
supervisor paths.
2020-07-01 10:40:53 -07:00
Alex Vandiver a5d63cfedf wal-g: Update pg_backup_and_purge for wal-g format.
wal-g has a slihghtly different format than wal-e in its `backup-list`
output; it only contains three columns:
 - `name`
 - `last_modified`,
 - `wal_segment_backup_start`

..rather than wal-e's plethora, most of which were blank:
 - `name`
 - `last_modified`
 - `expanded_size_bytes`
 - `wal_segment_backup_start`
 - `wal_segment_offset_backup_start`
 - `wal_segment_backup_stop`
 - `wal_segment_offset_backup_stop`

Remove one argument from the split.
2020-06-29 17:17:26 -07:00
Alex Vandiver a21a086f5c puppet: nagios-plugins-basic is replaced by monitoring-plugins-basic.
In Bionic, nagios-plugins-basic is a transitional package which
depends on monitoring-plugins-basic.  In Focal, it is a virtual
package, which means that every time puppet runs, it tries to
re-install the nagios-plugins-basic package.

Switch all instances to referring to `$zulip::common::nagios_plugins`,
and repoint that to monitoring-plugins-basic.
2020-06-29 14:58:01 -07:00
Alex Vandiver 6fdcb4aa17 puppet: Move supervisor conf file path into zulip::common.
Move this config file alongside the rest of the distro-dependent
paths.
2020-06-29 13:41:05 -07:00
Alex Vandiver 93401448b9 puppet: Explain value of reload && update trick for supervisor.
While the stock reload works just fine, it causes too much disruption.
2020-06-29 13:39:09 -07:00
Alex Vandiver d2de5aced8 puppet: Remove unnecessary supervisor service name variable. 2020-06-29 13:39:09 -07:00
Alex Vandiver 73805f8279 puppet: Stop removing file that contains only comments.
In modern PostgreSQL, this file, provided by `postgresql-common`, has
no non-comment, non-blank lines.  There's hence no reason to remove
it.
2020-06-29 13:37:42 -07:00
Alex Vandiver 6e3a424921 puppet: Install the latest postgresql-client on frontend hosts.
Frontend hosts in multiple-host configurations (including docker
hosts) need a `psql` binary installed.  ca9d27175b switched to not
setting `postgresql.version` in `zulip.conf`, which in turn means that
`$zulip::base::postgres_version` is unset.  This, in turn, led to the
frontend hosts installing `postgresql-client-`, whose trailing dash
causes apt to _uninstall_ that package.

Unconditionally install `postgresql-client` with no explicit version
attached.  This is a metapackage which depends on the latest client
package, which currently means it will install `postgresql-client-12`.

On single-host installs which have configured `postgresql.version` in
`zulip.conf` to be a lower version, this will result in
`postgresql-client-12` existing alongside another
version (e.g. `postgresql-client-10`); `psql` will give the most
recent.  This is acceptable because the semantic meaning of the
postgresql version in `zulip.conf` is about the database engine
itself, not the command-line client.
2020-06-29 13:37:16 -07:00
Alex Vandiver 2c36bb19b2 puppet: Pull out `unzip` package which is identical in both cases. 2020-06-29 13:37:16 -07:00
Alex Vandiver 876ee4a8ed installer: Remove code specific to stretch or xenial.
Support for Xenial and Stretch was removed (5154ddafca, 0f4b1076ad,
8944e0ad53, 79acd5ae40, 1219a2e854), but not all codepaths were
updated to remove their conditionals on it.

Remove all code predicated on Xenial or Stretch.  debathena support
was migrated to Bionic, since that appears to be the current state of
existing debathena servers.
2020-06-24 12:57:38 -07:00
Anders Kaseorg a9e59b6bd3 memcached: Change the default MEMCACHED_USERNAME to zulip@localhost.
This prevents memcached from automatically appending the hostname to
the username, which was a source of problems on servers where the
hostname was changed.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-19 21:22:30 -07:00
Alex Vandiver 7250d41bf7 puppet: Fix the path to install-wall-g 2020-06-17 15:23:18 -07:00
Alex Vandiver 03bffd3938 upgrade-zulip: Pin the postgres version to the OS default.
We would prefer to use the postgres packages from Postgres themselves,
if available.  However, this requires ensures that, for existing
installs, we preserve the same version of postgres as their base
distribution installed.

Move the version-determination logic from being computed at puppet
interpolation time, to being computed at install time and pinned into
zulip.conf.
2020-06-16 17:05:46 -07:00
Tim Abbott 26396c5e25 puppet: Fix exceptions with multiple certbot declarations.
Since 9e8f1aacb3, zulip_ops machines
might have two Package declarations for `certbot`, which doesn't work
in puppet.

The fix is, as usual, to use our `zulip::safepackage` wrapper instead.
2020-06-15 18:21:33 -07:00
Alex Vandiver bff3b540b1 puppet: Postgres replication should always switch to latest timeline.
Omission of this setting makes resuming after a primary switchover
difficult-to-impossible.  It is the default in PostgreSQL 12.
2020-06-15 16:18:07 -07:00
Alex Vandiver f8fc3a16eb puppet: Use "primary" / "replica" consistently in comments.
The style guide for Zulip is to always use "primary" and "replica"
when describing database replication.  Adjust a few comments under
`puppet/` that do not adhere to this.

Unfortunately, some references still remain to the insensitive and
inaccurate "master" / "slave" terminology.  However, these are only in
files which we are attempting to preserve as close to the upstream
versions they are derived from (e.g. postgresql.conf,
postfix/master.cf).
2020-06-15 16:18:07 -07:00
Alex Vandiver 5f433d6eeb puppet: Remove vestigial check_postgres.pl.
65774e1c4f switched from using the bundled check_postgres.pl to using
the version from packages; the file itself remained, however.

Remove it, and clean up references to it.

Fixes #15389.
2020-06-15 16:18:07 -07:00
Alex Vandiver 7d4a370a57 puppet: Move monitoring of pg replication to the pg hosts.
Instead of SSH'ing around to them, run directly on the database hosts.
This means that the replicas do not know how many bytes behind they
are in _receiving_ the wall logs; thus, the monitoring also extends to
the primary database, which knows that information for each replica.
This also allows for detecting when there are too few active replicas.
2020-06-15 16:18:07 -07:00
Anders Kaseorg 5dc9b55c43 python: Manually convert more percent-formatting to f-strings.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 23:27:22 -07:00
Anders Kaseorg 74c17bf94a python: Convert more percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Now including %d, %i, %u, and multi-line strings.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 23:27:22 -07:00
Anders Kaseorg 1ed2d9b4a0 logging: Use logging.exception and exc_info for unexpected exceptions.
logging.exception() and logging.debug(exc_info=True),
etc. automatically include a traceback.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 23:27:22 -07:00
Tim Abbott 80589099d8 puppet: Fix typo in logic for whether to install certbot.
Fixes #15372.
2020-06-14 16:04:39 -07:00
rht 89af2f381d puppet: Link postgres dict symlinks to hunspell files on CentOS.
This is a temporary measure until we can find the directory of postgresql dicts
on CentOS.
2020-06-13 17:53:38 -07:00
rht 36a5ca5015 puppet: Add cyrus-sasl to memcached_packages on RedHat.
This is to mirror the sasl2-bin package on Debian.
2020-06-13 17:49:51 -07:00
rht e776d2d159 puppet: Abstract out owner:group of memcached-sasldb2. 2020-06-13 17:49:51 -07:00
Anders Kaseorg 91a86c24f5 python: Replace None defaults with empty collections where appropriate.
Use read-only types (List ↦ Sequence, Dict ↦ Mapping, Set ↦
AbstractSet) to guard against accidental mutation of the default
value.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-13 15:31:27 -07:00
Alex Vandiver 97b9308781 puppet: Merge multiple postgres roles in `zulip_ops`.
All differences between the primary and replica roles having been
merged, fold the `postgres_common`, `postgres_master`, and
`postgres_slave` roles into just `postgres_appdb`.
2020-06-12 14:57:46 -07:00
Alex Vandiver 55bd31721d puppet: Remove custom `vm.dirty_ratio` and `vm.dirty_background_ratio`.
These values differed between the primary and secondary database
hosts, for unclear reasons.  The differences date back to their
introduction in 387f63deaa.  As the comment in the replica
confguration notes, settings of `vm.dirty_ratio = 10` and
`vm.dirty_background_ratio = 5` matched the kernel defaults for
"newer" kernels; however, kernel 2.6.30 bumped those to 20 and 10,
respectively[1], as a fix for underlying logic now being more correct.

Remove these overrides; they should at very least be consistent across
roles, and the previous values look to be an attempt to tune for a
very much older version of the Linux kernel, which was using an
different, buggier, algorithm under the hood.

[1] 1b5e62b42b
2020-06-12 14:57:46 -07:00
Alex Vandiver f39816e768 puppet: Stop distributing recovery.conf file.
This file controls streaming replication, and recovery using wal-g on
the secondary.  The `primary_conninfo` data needs to change on short
notice when database failover happens, in a way that is not suitable
for being controlled by puppet.

PostgreSQL 12, in fact, removes the use of the `recovery.conf` file[1];
the `primary_conninfo` and `restore_command` information goes into the
main `postgresql.conf` file, and the standby status is controlled by
the presence of absence of an empty `standby.signal` file.

Remove the puppet control of the `recovery.conf` file.

[1] https://pgstef.github.io/2018/11/26/postgresql12_preview_recovery_conf_disappears.html
2020-06-12 14:57:46 -07:00
Alex Vandiver 316498a169 puppet: Remove unnecessary nagios authentication setup.
Since the nagios authentication is stored _in the database_, it is
unnecessary to run if the database is simply a replica of the
production database.  The only case in which this statement would have
an effect is if the postgres node contains a _different_ (or empty)
database, which `setup_disks` now effectively prevents.

Remove the unnecessary step.
2020-06-11 21:01:49 -07:00
Alex Vandiver 0774f54c1b puppet: Move to `setup_disks` to postgres_common.
The tooling should now be run no matter if the node is a primary or
replica.
2020-06-11 21:01:49 -07:00
Alex Vandiver 6f6a0e890a puppet: Run setup_disks based on symlink; remove mdadm dependency.
481613a344 updated the `setup_disks` script to no longer reference
`mdadm`, since we no longer set up RAID on servers.

Update the puppet that would call it to remove the `mdadm` dependency,
and run only if the state is not what it produces -- namely, a symlink
for `/var/lib/postgresql`, which must point to an existent
`/srv/postgresql` directory.
2020-06-11 21:01:49 -07:00
Alex Vandiver 1dc2de5026 puppet: Update setup-disks to be idempotent.
The end state it produces is _either_:

 - `/srv/postgresql` already existed, which was symlinked into
   `/var/lib/postgresql`; postgres is left untouched.  This is the
   situation if `setup_disks` is run on the database primary, or a
   replica which was correctly configured.

 - An empty `/srv/postgresql` now exists, symlinked into
   `/var/lib/postgresql`, and postgres is stopped.  This is the
   situation if `puppet` was just run on a new host, or a
   previously-configured host was rebooted (clearing the temporary
   disk in `/dev/nvme0`)

In the latter case, where `/srv/postgresql` is now empty, any previous
contents of `/var/lib/postgresql` are placed under `/root`,
timestamped for uniqueness.

In either case, the tool should now be idempotent.
2020-06-11 21:01:49 -07:00
Alex Vandiver 8373f5f4b9 puppet: Make parent directories of postgresql.conf
This fixes errors when provisioning a new system (or version of
postgres) when the configuration file cannot be written because its
parent directories do not exist.

Files inherently depend on their containing directories, so no
explicit dependencies are necessary.
2020-06-11 20:56:55 -07:00
Alex Vandiver 9fd7a026ad puppet: Pull postgres data directory into postgres_appdb_base.
The `pg_datadir` variable was only used, and accurate, for CentOS.
Pull it out into `postgres_app_base`, broaden it to being accurate on
Debian-based systems as well, and use it consistently in the
templates.
2020-06-11 20:56:55 -07:00
Alex Vandiver 16c4cea951 puppet: Pull postgres config directory into postgres_appdb_base.
As the previous commit, this is currently only used in tuning, but is
a property of the whole postgres configuration; move it there, as just
the directory, not the file.

Use this directory consistently in the erb templates.  Since we
produce a `pg_hba.conf`, it makes sense that we point to the path that we
know that we explicitly wrote to, for instance.
2020-06-11 20:56:55 -07:00
Alex Vandiver 2a7373b602 puppet: Pull postgres restart config into postgres_appdb_base.
While it is only currently used in the tuning configuration, it is a
property of the base configuration, and fits more clearly into the
case block there.
2020-06-11 20:56:55 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Alex Vandiver b114eb2f10 puppet: Rename env-wal-e to env-wal-g.
It runs wal-g now, not wal-e; make its name respect that.
2020-06-11 15:52:43 -07:00
Alex Vandiver 4fe0444108 puppet: Install wal-g, not wal-e. 2020-06-11 15:52:43 -07:00
Alex Vandiver 39d6185ce7 puppet: Remove python-dateutil requirement from pg_backup_and_purge.
1f565a9f41 removed the `package` lines which install
`python-dateutil`, but not the line in `puppet_ops` that reference it;
as such, Puppet manifests in puppet_ops fail to compile.

Remove the stale reference to `python-dateutil`, which is unnecessary
since the code is python3, not python2.
2020-06-11 14:28:55 -07:00
Anders Kaseorg ca4357fd64 python: Use standard NoReturn (Python ≥ 3.6).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 12:56:52 -07:00
Mateusz Mandera fbc96d56d5 sharding: Fix permissions on the nginx_sharding.conf file.
The zulip user needs to be able to read the file, when running the
backup tool.
We put root:root as owner on other nginx config files, so it's probably
correct to keep the ownership as it is, and set the mode to 0644.
2020-06-11 12:56:06 -07:00
Anders Kaseorg 67e7a3631d python: Convert percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-10 15:02:09 -07:00
arpit551 03d563ce0f postgres: Changed max_connections in postgres 12 config template.
Value of max_connections is now 1000 like in other postgres versions
template.
2020-06-08 21:59:57 -07:00
arpit551 9e8f1aacb3 certbot: Switch to use certbot from apt.
certbot-auto doesn’t work on Ubuntu 20.04, and won’t be updated; we
migrate to instead using the certbot package shipped with the OS
instead. Also made sure that sure certbot gets installed when running
zulip-puppet-apply, to handle existing systems.
2020-06-08 21:59:29 -07:00
arpit551 7e75a7e336 postgres: Fix syntax error in postgres 12 config.
<% used as example in postgres 12 config is being confused with erb syntax
so added extra % as <%% means literal <%.
2020-06-08 21:57:54 -07:00
arpit551 7d11be5ca5 puppet: Add Zulip specific postgres configuration for 12.
Based on the work done in a03e478.
2020-06-08 21:57:54 -07:00
arpit551 4e52f1bc53 puppet: Commit an upstream version of postgres 12 config.
In preparation for adding production support for Ubuntu Focal.
2020-06-08 21:57:54 -07:00
Tim Abbott 71078adc50 docs: Update URLs to use https://zulip.com.
We're migrating to using the cleaner zulip.com domain, which involves
changing all of our links from ReadTheDocs and other places to point
to the cleaner URL.
2020-06-08 18:10:45 -07:00
Anders Kaseorg 1f565a9f41 timezone: Use standard library datetime.timezone.utc consistently.
datetime.timezone is available in Python ≥ 3.2.  This also lets us
remove a pytz dependency from the PostgreSQL scripts.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-05 09:34:17 -07:00
Alex Vandiver 8b1d49dbc7 puppet: Rename "wiki" realm to "monitoring".
This is vestigial.

It requires manually altering the `htdigest` file (not stored in this
repo) to change the digest realm from `wiki` to `monitoring`, and will
re-prompt users for their passwords if the browsers currently store
them.
2020-05-30 12:26:21 -07:00
Alex Vandiver b33aa8da7f postgresql: Update setup-disks to use `service postgresql`.
Using `service postgresql` makes it no longer linked to the specific
version/cluster that is on the host.
2020-05-30 12:14:24 -07:00
Alex Vandiver 4e370cda75 postgresql: Update setup-disks to drop /mnt disabling.
Hosts do not start out with a `/mnt`; there is no need to disable it.
2020-05-30 12:14:24 -07:00
Alex Vandiver a7d85b7e69 postgresql: Update setup-disks to not move /tmp.
Drop the change to move `/tmp` onto the local disk.  Doing this move
confuses `resolved` until there is a restart, and has no clear
benefits.  The change came in during bf82fadc95, but does not describe
the reasoning; it is particularly puzzling, since postgresql stores
its temporary files under `$PGDATA/base/pgsql_tmp`.
2020-05-30 12:14:24 -07:00
Alex Vandiver 481613a344 postgresql: Update setup-disks to not use RAID.
Do not RAID the disks together.  This was previously done when they
were spinning media, for reliability; running them on an SSD obviates
this sufficiently.  This means that updating the initramfs is also not
necessary.
2020-05-30 12:14:24 -07:00
Alex Vandiver b537563bc1 postgresql: Set the current primary host. 2020-05-30 12:14:24 -07:00
Alex Vandiver ad2918ea51 puppet: Remove `postgres_other` nagios hostgroup.
This no longer has any rules specific to it.  We leave the `postgres`
munin group (which now only contains `postgres_appdb`) as
future-proofing, and so that `postgres_appdb` matches to the puppet
manifest of the same name.
2020-05-28 17:24:35 -07:00
Alex Vandiver 2c73fbdcb6 puppet: Remove munin monitoring for no-longer-used "postgres_other".
The `wiki` and `trac` products are no longer used.
2020-05-28 17:24:35 -07:00
Tim Abbott 0b93e09e72 puppet: Add nginx configuration for blog.zulip.org move. 2020-05-26 14:47:05 -07:00
Anders Kaseorg f5b33f9398 python: Further pyupgrade changes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-26 11:43:40 -07:00
Anders Kaseorg 333f7d16c9 logging: Pass more format arguments to logging.
Commit bdc365d0fe (#14852) missed this
because of https://github.com/returntocorp/semgrep/issues/831.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-26 11:42:23 -07:00
Anders Kaseorg 824d97987b process_fts_updates: Use cursor.execute correctly.
Commit b501d04f6a (#14841) missed this
because of https://github.com/returntocorp/semgrep/issues/831.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-26 11:42:23 -07:00
arpit551 439f0d3004 install: Ad production support for Zulip on Ubuntu Focal.
Install script now runs on Focal.  Python 2 is now installed via the
`python2` package in Focal.
2020-05-25 16:58:42 -07:00
Tim Abbott 220620e7cf sharding: Add basic sharding configuration for Tornado.
This allows straight-forward configuration of realm-based Tornado
sharding through simply editing /etc/zulip/zulip.conf to configure
shards and running scripts/refresh-sharding-and-restart.

Co-Author-By: Mateusz Mandera <mateusz.mandera@zulip.com>
2020-05-20 13:47:20 -07:00
Tim Abbott cdd3b7efbc tornado: Configure upstreams for TORNADO_PROCESSES. 2020-05-20 13:43:48 -07:00
Tim Abbott c3d3324295 puppet: Add link to the sources for Zephyr patches. 2020-05-19 20:54:11 -07:00
Tim Abbott a35e71ebbc puppet: Update package name for boto-on-python3.
The python3-boto3 package is the maintained fork that supports Python
3; it was renamed in Ubuntu Bionic from the original Ubuntu Xenial name.
2020-05-19 20:25:11 -07:00
Tim Abbott 1c28770810 puppet: Fix apt_repo_debathena setup_file path.
There was a typo introduced here when scripts_path was added.
2020-05-19 20:21:30 -07:00
Tim Abbott c43b3d95e2 puppet: Switch env-wal-e to use wal-g rather than wal-e.
wal-g is the modern reimplementation of wal-e that supports current
postgres.  It requires a bit of extra configuration to specify the AWS
region.
2020-05-15 16:45:36 -07:00
Anders Kaseorg fcca4a38b6 puppet: Work around memcached SASL configuration path bug.
memcached 1.5.22 in Ubuntu 20.04 has a bug where it looks for its SASL
configuration at /etc/sasl2/memcached.conf/memcached.conf instead of
/etc/sasl2/memcached.conf.

https://bugs.launchpad.net/ubuntu/+source/memcached/+bug/1878721

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-14 23:25:24 -07:00
Tim Abbott b3c5f2c13e puppet: Remove check_postgres_replication_lag hostname hardcoding.
Since this runs on the Nagios server, which already has the relevant
hostnames defined in zulip.conf, we can just read it from there.
2020-05-11 23:42:36 -07:00
Tim Abbott 225bbf3633 puppet: Update check_postgres_replication_lag for postgres 10.
These functions were renamed in postgres 10.
2020-05-11 15:59:23 -07:00
Tim Abbott d8ea649869 puppet: Cast tornado_processes to Integer.
This is the latest mechanism in puppet for turning a string into an
integer.

We update an adjacent comment while we're at it.
2020-05-11 00:54:48 -07:00
Tim Abbott 6319c181eb puppet: Use actual name for the bind9-host package.
Using the `host` virtual package confused Puppet into reporting it was
doing work every time one did a puppet run, resulting in unnecessarily
spammy output.
2020-05-11 00:51:53 -07:00
Mateusz Mandera dd40649e04 queue_processors: Remove the slow_queries queue.
While this functionality to post slow queries to a Zulip stream was
very useful in the early days of Zulip, when there were only a few
hundred accounts, it's long since been useless since (1) the total
request volume on larger Zulip servers run by Zulip developers, and
(2) other server operators don't want real-time notifications of slow
backend queries.  The right structure for this is just a log file.

We get rid of the queue and replace it with a "zulip.slow_queries"
logger, which will still log to /var/log/zulip/slow_queries.log for
ease of access to this information and propagate to the other logging
handlers.  Reducing the amount of queues is good for lowering zulip's
memory footprint and restart performance, since we run at least one
dedicated queue worker process for each one in most configurations.
2020-05-11 00:45:13 -07:00
Tim Abbott 21a04e2dbc puppet: Use nice to deprioritize various processes.
Our priority hierarchy is:
(1) Tornado and base services like memcached, redis, etc.
(2) Django and message sender queue workers.
(3) Everything else.

Ideally, we'd have something a bit more fine-grained (e.g. some queue
workers are potentially in the sending path, while others aren't), but
this should have a big impact on ensuring Tornado gets the resources
it needs during load spikes.

I think this has a good chance of causing some load spikes that would
previously have resulted in a user-facing delivery delays no longer
having any significant user-facing impact.
2020-05-10 23:28:25 -07:00
shubhamgupta2956 9cd8644c7c uploads: Add support for ".jpe" file extension.
Currently when the user uploads files with ".jpe" file extension, the
markdown is converted to link but the image is not embedded.

This commit adds the support for ".jpe" file extension.

Fixes #14863
2020-05-10 22:55:52 -07:00
Anders Kaseorg 8cdf2801f7 python: Convert more variable type annotations to Python 3.6 style.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-08 16:42:43 -07:00
Anders Kaseorg 708c6f4f11 puppet: Finally vanquish the cursed integer conversion conditional.
We no longer support Puppet 3.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-08 16:42:43 -07:00
Tim Abbott 50d8d61d3c puppet: Remove unnecssary/broken ;.
This breaks the Xenial build, which we're removing soon, but it's
unnecessary in any case.
2020-05-07 16:23:37 -07:00
Tim Abbott 03991d098a puppet: Add optional postgres version override.
This makes it convenient to run an alternative postgres version.
2020-05-07 09:33:24 -07:00
Mateusz Mandera 4643e48f60 retention: Add a daily cron job.
This will run archive_messages management command at 6am every day, 1
hour after soft_deactivate_users (which runs at 5am).
2020-05-05 10:11:38 -07:00
Tim Abbott 4034f6f99e nagios: Fix check_postgres_replication_lag.
This expects to be run outside a virtualenv and thus without
typing_extensions available.
2020-05-03 00:14:54 -07:00
Tim Abbott 4f3976b917 process_fts_updates: Clean up logging output.
This saves a couple lines of spammy output in the run-dev.py startup
experience, and will be better output in production as well.
2020-05-01 11:51:20 -07:00
Anders Kaseorg c0ffa71fa9 nginx: Replace unanchored regexes in location directives.
We could anchor the regexes, but there’s no need for the power (and
responsibility) of regexes here.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-24 16:58:19 -07:00
Anders Kaseorg 5e01a0ae8b zulip-ec2-configure-interfaces: Convert function type annotations.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-24 13:06:54 -07:00
Anders Kaseorg f8339f019d python: Convert assignment type annotations to Python 3.6 style.
Commit split by tabbott; this has changes to scripts/, tools/, and
puppet/.

scripts/lib/hash_reqs.py, scripts/lib/setup_venv.py,
scripts/lib/zulip_tools.py, and tools/lib/provision.py are excluded so
tools/provision still gives the right error message on Ubuntu 16.04
with Python 3.5.

Generated by com2ann, with whitespace fixes and various manual fixes
for runtime issues:

-shebang_rules: List[Rule] = [
+shebang_rules: List["Rule"] = [

-trailing_whitespace_rule: Rule = {
+trailing_whitespace_rule: "Rule" = {

-whitespace_rules: List[Rule] = [
+whitespace_rules: List["Rule"] = [

-comma_whitespace_rule: List[Rule] = [
+comma_whitespace_rule: List["Rule"] = [

-prose_style_rules: List[Rule] = [
+prose_style_rules: List["Rule"] = [

-html_rules: List[Rule] = whitespace_rules + prose_style_rules + [
+html_rules: List["Rule"] = whitespace_rules + prose_style_rules + [

-    target_port: int = None
+    target_port: int

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-24 13:06:54 -07:00
Anders Kaseorg 09ea778db1 nginx: Listen for ACME challenges on port 80 too.
This should make Certbot renewals more reliable.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-23 16:22:04 -07:00
Aman Agrawal 2dc6d09c2a python3-upgrade: Move python2 scripts to run on python3. 2020-04-22 16:13:15 -07:00
Anders Kaseorg 5901e7ba7e python: Convert function type annotations to Python 3 style.
Generated by com2ann (slightly patched to avoid also converting
assignment type annotations, which require Python 3.6), followed by
some manual whitespace adjustment, and six fixes for runtime issues:

-    def __init__(self, token: Token, parent: Optional[Node]) -> None:
+    def __init__(self, token: Token, parent: "Optional[Node]") -> None:

-def main(options: argparse.Namespace) -> NoReturn:
+def main(options: argparse.Namespace) -> "NoReturn":

-def fetch_request(url: str, callback: Any, **kwargs: Any) -> Generator[Callable[..., Any], Any, None]:
+def fetch_request(url: str, callback: Any, **kwargs: Any) -> "Generator[Callable[..., Any], Any, None]":

-def assert_server_running(server: subprocess.Popen[bytes], log_file: Optional[str]) -> None:
+def assert_server_running(server: "subprocess.Popen[bytes]", log_file: Optional[str]) -> None:

-def server_is_up(server: subprocess.Popen[bytes], log_file: Optional[str]) -> bool:
+def server_is_up(server: "subprocess.Popen[bytes]", log_file: Optional[str]) -> bool:

-    method_kwarg_pairs: List[FuncKwargPair],
+    method_kwarg_pairs: "List[FuncKwargPair]",

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-18 20:42:48 -07:00
pemontto fd34bc5161
puppet: Allow /etc/zulip to be a symlink.
This PR updates the puppet manifest to allow /etc/zulip to be a symlink. The current behaviour overwrites /etc/zulip if it is link to another directory, which is problematic with docker-zulip and 
in particular the `LINK_SETTINGS_TO_DATA` setting.
2020-04-17 12:45:05 -07:00
Tim Abbott 777a3b6c18 puppet: Fix nagios check to not require typing_extensions. 2020-04-16 17:56:05 -07:00
Tim Abbott e1ce53ac46 puppet: Update nagios checks for disk to exclude kernel filesystems.
The fact that we have to explicitly list these is almost certainly a
bug in check_disk, but at least this works.
2020-04-16 17:49:29 -07:00
Tim Abbott cfbb617f5c puppet: Update nagios configuration for checking local disk. 2020-04-16 17:48:36 -07:00
Tim Abbott 9821dfa9fc puppet: The letsencrypt package is debian is now certbot.
It was an alias starting with Ubuntu Xenial, and will eventually be
removed.
2020-04-16 17:30:01 -07:00
Tim Abbott 8e5a866122 puppet: Update tuning for load average monitoring. 2020-04-16 16:47:05 -07:00
Tim Abbott b1ff823798 puppet: Remove old zulipbot configuration.
We haven't used zulipbot hosted here for years.
2020-04-16 16:18:48 -07:00
Anders Kaseorg 99242138a7 static: Serve webpack bundles from the root domain.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-10 00:48:02 -07:00
Anders Kaseorg c734bbd95d python: Modernize legacy Python 2 syntax with pyupgrade.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-09 16:43:22 -07:00
Vishnu KS 449f7e2d4b team: Generate team page data using cron job.
This eliminates the contributors data as a possible source of
flakiness when installing Zulip from Git.

Fixes #14351.
2020-04-08 12:52:31 -07:00
Anders Kaseorg 15d68c40dd nginx: Set X-XSS-Protection: 1; mode=block.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-05 16:13:53 -07:00
Anders Kaseorg 79c215626e nginx: Set X-Content-Type-Options: nosniff globally.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-05 16:13:53 -07:00