Commit Graph

125 Commits

Author SHA1 Message Date
Alex Vandiver 7a1132d605 puppet: Switch golang and smokescreen to use /srv.
/srv and /opt have very similar usages; but we should be internally
consistent.

Move these two (the only usages of /opt) to match the rest in /srv.
2020-10-16 13:00:06 -07:00
Alex Vandiver fffea9612b puppet: Add an outgoing HTTP/HTTPS proxy server.
Use https://github.com/stripe/smokescreen to provide a server for an
outgoing proxy, run under supervisor.  This will allow centralized
blocking of internal metadata IPs, localhost, and so forth, as well as
providing default request timeouts (10s by default).
2020-10-15 15:18:35 -07:00
Alex Vandiver f61ac4a28d puppet: Move frontend monitoring into its own file.
This allows it to be pulled in for deploys like czo, which don't use
the full `zulip_ops::app_frontend`, but we wish to monitor.
2020-10-13 17:37:32 -07:00
Alex Vandiver c8df9a150e puppet: Drop all log2zulip configuration.
Disabled on webservers in 047817b6b0, it has since lingered in
configuration, as well as running (to no effect) every minute on the
loadbalancer.

Remove the vestiges of its configuration.
2020-10-13 11:00:50 -07:00
Alex Vandiver b431b1b021 puppet: Remove misleading motd.
This banner shows on lb1, advertising itself as lb0.  There is no
compelling reason for a custom motd, especially one which needs to
be reconfigured for each host.
2020-10-13 11:00:36 -07:00
Alex Vandiver 4fd7df4e8c puppet: Remove absent of check-apns-tokens.
This was marked as ensure absent in d02101a401, in v1.7.0 in 2017.
2020-09-29 18:17:08 -07:00
Alex Vandiver 872a349508 puppet: Remove absent of log2zulip.
This was marked as ensure absent in 047817b6b0, in v2.0.0 in 2018.
2020-09-29 18:17:08 -07:00
Alex Vandiver 57d88eedd8 puppet: Only install rabbitmq cron jobs via zulip_ops.
The rabbitmq cron jobs exist in order to call rabbitmqctl as root and
write the output to files that nagios can consume, since nagios is not
allowed to run rabbitmqctl.

In systems which do not have nagios configured, these every-minute
cron jobs add non-insignificant load, to no effect.  Move their
installation into `zulip_ops`.  In doing so, also combine the cron.d
files into a single file; this allows us to `ensure => absent` the old
filenames, removing them from existing systems.  Leave the resulting
combined cron.d file in `zulip`, since it is still of general utility
and note.
2020-09-29 17:44:44 -07:00
Tim Abbott 5a1243db3c puppet: Use correct scope for zulip_ops::munin_plugin. 2020-07-15 21:49:45 -07:00
Alex Vandiver 48c3c33d10 puppet: Fully-qualify the munin-plugin name 2020-07-14 17:58:51 -07:00
Alex Vandiver 6c27f07c1d puppet: Move PostgreSQL backups to their own class.
wal-g was used in `puppet/zulip` by env-wal-g, but only installed in
`puppet/zulip_ops`.

Merge all of the dependencies of doing backups using wal-g (wal-g
installation, the pg_backup_and_purge job, the nagios plugin that
verifies it happens) into a common base class in `puppet/zulip`, since
it is generally useful.
2020-07-14 00:40:25 -07:00
Anders Kaseorg 15483c09cb puppet: Add missing trailing commas.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-13 15:36:06 -07:00
Alex Vandiver 3691a94efe puppet: Configure munin and nagios under apache with puppet.
This swaps in the actually-in-use munin configuiration file;
otherwise, it is an implementation of the configuration as it exists
on the machine.
2020-07-13 13:23:11 -07:00
Alex Vandiver 4e42164b4a munin: Add plugins to prod hosts. 2020-07-13 13:23:11 -07:00
Alex Vandiver 2a14212b27 munin: Add a helper resource definition for munin plugins. 2020-07-13 12:49:28 -07:00
Alex Vandiver eda2c4b8e2 puppet: Split munin-node from munin-server.
No plugins are installed inside the /usr/local/munin/lib this creates
in munin-node, nor are they symlinked into /etc/munin/plugins, so
non-default plugins are added by this.
2020-07-13 12:49:28 -07:00
Alex Vandiver 8cff27f67d puppet: Pull hosts from zulip.conf, not hardcoded list.
The one complexity is that hosts_fullstack are treated differently, as
they are not currently found in the manual `hosts` list, and as such
do not get munin monitoring.
2020-07-10 00:14:09 -07:00
Alex Vandiver 24383a5082 puppet: Rename hosts_domain so hosts_prefix can be grepped for. 2020-07-10 00:14:09 -07:00
Anders Kaseorg 9900298315 zthumbor: Remove Python 2 residue.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-06 18:44:58 -07:00
Alex Vandiver a21a086f5c puppet: nagios-plugins-basic is replaced by monitoring-plugins-basic.
In Bionic, nagios-plugins-basic is a transitional package which
depends on monitoring-plugins-basic.  In Focal, it is a virtual
package, which means that every time puppet runs, it tries to
re-install the nagios-plugins-basic package.

Switch all instances to referring to `$zulip::common::nagios_plugins`,
and repoint that to monitoring-plugins-basic.
2020-06-29 14:58:01 -07:00
Alex Vandiver 876ee4a8ed installer: Remove code specific to stretch or xenial.
Support for Xenial and Stretch was removed (5154ddafca, 0f4b1076ad,
8944e0ad53, 79acd5ae40, 1219a2e854), but not all codepaths were
updated to remove their conditionals on it.

Remove all code predicated on Xenial or Stretch.  debathena support
was migrated to Bionic, since that appears to be the current state of
existing debathena servers.
2020-06-24 12:57:38 -07:00
Alex Vandiver 7250d41bf7 puppet: Fix the path to install-wall-g 2020-06-17 15:23:18 -07:00
Tim Abbott 26396c5e25 puppet: Fix exceptions with multiple certbot declarations.
Since 9e8f1aacb3, zulip_ops machines
might have two Package declarations for `certbot`, which doesn't work
in puppet.

The fix is, as usual, to use our `zulip::safepackage` wrapper instead.
2020-06-15 18:21:33 -07:00
Alex Vandiver f8fc3a16eb puppet: Use "primary" / "replica" consistently in comments.
The style guide for Zulip is to always use "primary" and "replica"
when describing database replication.  Adjust a few comments under
`puppet/` that do not adhere to this.

Unfortunately, some references still remain to the insensitive and
inaccurate "master" / "slave" terminology.  However, these are only in
files which we are attempting to preserve as close to the upstream
versions they are derived from (e.g. postgresql.conf,
postfix/master.cf).
2020-06-15 16:18:07 -07:00
Alex Vandiver 97b9308781 puppet: Merge multiple postgres roles in `zulip_ops`.
All differences between the primary and replica roles having been
merged, fold the `postgres_common`, `postgres_master`, and
`postgres_slave` roles into just `postgres_appdb`.
2020-06-12 14:57:46 -07:00
Alex Vandiver 55bd31721d puppet: Remove custom `vm.dirty_ratio` and `vm.dirty_background_ratio`.
These values differed between the primary and secondary database
hosts, for unclear reasons.  The differences date back to their
introduction in 387f63deaa.  As the comment in the replica
confguration notes, settings of `vm.dirty_ratio = 10` and
`vm.dirty_background_ratio = 5` matched the kernel defaults for
"newer" kernels; however, kernel 2.6.30 bumped those to 20 and 10,
respectively[1], as a fix for underlying logic now being more correct.

Remove these overrides; they should at very least be consistent across
roles, and the previous values look to be an attempt to tune for a
very much older version of the Linux kernel, which was using an
different, buggier, algorithm under the hood.

[1] 1b5e62b42b
2020-06-12 14:57:46 -07:00
Alex Vandiver f39816e768 puppet: Stop distributing recovery.conf file.
This file controls streaming replication, and recovery using wal-g on
the secondary.  The `primary_conninfo` data needs to change on short
notice when database failover happens, in a way that is not suitable
for being controlled by puppet.

PostgreSQL 12, in fact, removes the use of the `recovery.conf` file[1];
the `primary_conninfo` and `restore_command` information goes into the
main `postgresql.conf` file, and the standby status is controlled by
the presence of absence of an empty `standby.signal` file.

Remove the puppet control of the `recovery.conf` file.

[1] https://pgstef.github.io/2018/11/26/postgresql12_preview_recovery_conf_disappears.html
2020-06-12 14:57:46 -07:00
Alex Vandiver 316498a169 puppet: Remove unnecessary nagios authentication setup.
Since the nagios authentication is stored _in the database_, it is
unnecessary to run if the database is simply a replica of the
production database.  The only case in which this statement would have
an effect is if the postgres node contains a _different_ (or empty)
database, which `setup_disks` now effectively prevents.

Remove the unnecessary step.
2020-06-11 21:01:49 -07:00
Alex Vandiver 0774f54c1b puppet: Move to `setup_disks` to postgres_common.
The tooling should now be run no matter if the node is a primary or
replica.
2020-06-11 21:01:49 -07:00
Alex Vandiver 6f6a0e890a puppet: Run setup_disks based on symlink; remove mdadm dependency.
481613a344 updated the `setup_disks` script to no longer reference
`mdadm`, since we no longer set up RAID on servers.

Update the puppet that would call it to remove the `mdadm` dependency,
and run only if the state is not what it produces -- namely, a symlink
for `/var/lib/postgresql`, which must point to an existent
`/srv/postgresql` directory.
2020-06-11 21:01:49 -07:00
Alex Vandiver 16c4cea951 puppet: Pull postgres config directory into postgres_appdb_base.
As the previous commit, this is currently only used in tuning, but is
a property of the whole postgres configuration; move it there, as just
the directory, not the file.

Use this directory consistently in the erb templates.  Since we
produce a `pg_hba.conf`, it makes sense that we point to the path that we
know that we explicitly wrote to, for instance.
2020-06-11 20:56:55 -07:00
Alex Vandiver 4fe0444108 puppet: Install wal-g, not wal-e. 2020-06-11 15:52:43 -07:00
Alex Vandiver 39d6185ce7 puppet: Remove python-dateutil requirement from pg_backup_and_purge.
1f565a9f41 removed the `package` lines which install
`python-dateutil`, but not the line in `puppet_ops` that reference it;
as such, Puppet manifests in puppet_ops fail to compile.

Remove the stale reference to `python-dateutil`, which is unnecessary
since the code is python3, not python2.
2020-06-11 14:28:55 -07:00
Alex Vandiver 8b1d49dbc7 puppet: Rename "wiki" realm to "monitoring".
This is vestigial.

It requires manually altering the `htdigest` file (not stored in this
repo) to change the digest realm from `wiki` to `monitoring`, and will
re-prompt users for their passwords if the browsers currently store
them.
2020-05-30 12:26:21 -07:00
Tim Abbott c3d3324295 puppet: Add link to the sources for Zephyr patches. 2020-05-19 20:54:11 -07:00
Tim Abbott a35e71ebbc puppet: Update package name for boto-on-python3.
The python3-boto3 package is the maintained fork that supports Python
3; it was renamed in Ubuntu Bionic from the original Ubuntu Xenial name.
2020-05-19 20:25:11 -07:00
Tim Abbott 1c28770810 puppet: Fix apt_repo_debathena setup_file path.
There was a typo introduced here when scripts_path was added.
2020-05-19 20:21:30 -07:00
Tim Abbott 6319c181eb puppet: Use actual name for the bind9-host package.
Using the `host` virtual package confused Puppet into reporting it was
doing work every time one did a puppet run, resulting in unnecessarily
spammy output.
2020-05-11 00:51:53 -07:00
Tim Abbott 9821dfa9fc puppet: The letsencrypt package is debian is now certbot.
It was an alias starting with Ubuntu Xenial, and will eventually be
removed.
2020-04-16 17:30:01 -07:00
Vishnu KS 449f7e2d4b team: Generate team page data using cron job.
This eliminates the contributors data as a possible source of
flakiness when installing Zulip from Git.

Fixes #14351.
2020-04-08 12:52:31 -07:00
Tim Abbott 271319fb13 puppet: Fix hacky release test for whether we're in EC2.
The result is still a bit hacky, but guaranteed to be correct if we
adjust the OS version of our systems, which we of course will do over
time.
2019-06-25 22:19:04 -07:00
Tim Abbott 8d8cfb314b puppet: Remove zulip_ops configuration for trusty.
There are no longer any zulip_ops systems using trusty.
2019-06-25 22:09:06 -07:00
Tim Abbott 0ec1b4e82c puppet: Move check_send_receive_time to the _once ruleset.
We don't actually want to run this bundle of message-sending Nagios
checks to run on every single server.
2019-06-16 15:48:35 -07:00
Tim Abbott df83979c76 zulip_ops: Extract a prod_app_frontend_once ruleset. 2019-06-16 15:48:35 -07:00
Tim Abbott 738cfe54c3 puppet: Move app_frontend_once out of prod configuration.
That logic made it inconvenient to run multiple prod servers with the
same top-level puppet configuration.
2019-06-16 15:24:20 -07:00
Tim Abbott e85250941d puppet: Fix quoting of commented-out python3-boto.
This will avoid a linter error if/when we uncomment it.
2019-06-13 14:39:24 -07:00
Tim Abbott 337efe0fb7 puppet: Remove puppet-el, which no longer exists.
This package was only every available on Ubuntu Xenial.
2019-06-13 14:39:24 -07:00
Vishnu Ks ecdd3bea43 billing: Add cron job to run invoice_plans once a day.
Fixes #11960
2019-04-29 11:23:17 -07:00
Anders Kaseorg 9f7c0b7e65 postgres_master.pp: Fix wacky su command line.
The construction `su postgres -c -- bash -c 'psql …'` didn’t behave the
way it reads, and only worked by accident:

1. `-c --` sets the command to `--`.
2. `bash` sets the first argument to `bash`.
3. `-c 'psql …'` replaces the command with `psql …`.

Thus, `su` ended up executing `<shell> -c 'psql …' bash`, where
`<shell>` is the `postgres` user’s login shell, usually also `bash`,
which then executed 'psql …' and ignored the extra `bash`.

Unconfuse this construction.

Note from tabbott: The old code didn't even work by accident, it was
just broken.  The right fix is to move the quoting around properly.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-04-12 17:27:23 -07:00
Tim Abbott 047817b6b0 puppet: Disable log2zulip cron job.
It hasn't been working for years, but more importantly, it spams up
root's mail queue so that one can't find important things in there
(e.g. the fact that the long-term-idle cron job was failing).
2019-01-05 10:56:44 -08:00