zulip

Commit Graph

Author	SHA1	Message	Date
Alex Vandiver	47e16a5d41	puppet: Tidy old smokescreen binaries.	2021-11-19 15:29:28 -08:00
Alex Vandiver	239ac8413e	puppet: Embed golang version into binary path, to rebuild on new golang. This will cause the output binary path to be sensitive to golang version, causing it to be rebuilt on new golang, and an updated supervisor config file written out, and thus supervisor also restarted.	2021-11-19 15:29:28 -08:00
Alex Vandiver	216eeba2dd	puppet: Factor out smokescreen binary path.	2021-11-19 15:29:28 -08:00
Alex Vandiver	3a7cef6582	puppet: Switch smokescreen to using zulip::external_dep, so it tidies.	2021-11-19 15:29:28 -08:00
Alex Vandiver	ea08111d60	puppet: Move /srv/smokescreen-src to /srv/zulip-smokescreen-src. As with the previous commit for `/srv/golang`, we have the custom of namespacing things under `/srv` with `zulip-` to help ensure that we play nice with anything else that happens to be on the host.	2021-11-19 15:29:28 -08:00
Anders Kaseorg	c64e1adb19	puppet: Upgrade Smokescreen v0.0.2-59-gbfca45c to v0.0.2-63-gdc40301. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-11-19 15:29:28 -08:00
Alex Vandiver	bb9d2df1ae	puppet: Extract an external-tarball-dependency manifest.	2021-11-19 15:29:28 -08:00
Alex Vandiver	3c8d7e2598	puppet: Tidy old golang directories. This relies on behavior which is only in Puppet 5.5.1 and above, which means it must be skipped on Ubuntu 18.04.	2021-11-19 15:29:28 -08:00
Alex Vandiver	2fc4acdf81	puppet: Move /srv/golang to /srv/zulip-golang. We have the custom of namespacing things under `/srv` with `zulip-` to help ensure that we play nice with anything else that happens to be on the host.	2021-11-19 15:29:28 -08:00
Alex Vandiver	00a4abb642	puppet: Switch dependency to the golang binary we need.	2021-11-19 15:29:28 -08:00
Alex Vandiver	2d5f813094	puppet: Stop making a /srv/golang symlink. Nothing needs this extra directory.	2021-11-19 15:29:28 -08:00
Alex Vandiver	93af6c7f06	puppet: Factor out golang variables.	2021-11-19 15:29:28 -08:00
Alex Vandiver	21be36f15f	puppet: Shorten golang version variable name.	2021-11-19 15:29:28 -08:00
Alex Vandiver	6b9e74adee	puppet: Upgrade golang from 1.16.4 to 1.17.3.	2021-11-19 15:29:28 -08:00
Alex Vandiver	514801c509	puppet: Split out golang toolchain into its own manifest.	2021-11-19 15:29:28 -08:00
Alex Vandiver	610a0b2d59	nagios: `pg_is_in_recovery()` is better to know replica/primary status. It is possible to be in recovery, and downloading WAL logs from archives, and not yet be replicating. If one only checks the streaming log status, it reports as "no replicas" which is technically accurate but not a useful summation of the state of the replica.	2021-11-17 13:38:26 -08:00
Alex Vandiver	83091cbc96	puppet: Swap the one use of the `cron` resource for an /etc/cron.d file. The `cron` resource places its contents in the user's crontab, which makes it unlike every other cron job that Zulip installs. Switch to using `/etc/cron.d` files, like all other cron jobs.	2021-11-16 16:17:32 -08:00
Alex Vandiver	90e1a0400e	puppet: Add a few more inter-resource dependencies. None of these are important; they just express semantic dependencies.	2021-11-16 16:17:32 -08:00
Alex Vandiver	49ad188449	rate_limit: Add a flag to lump all TOR exit node IPs together. TOR users are legitimate users of the system; however, that system can also be used for abuse -- specifically, by evading IP-based rate-limiting. For the purposes of IP-based rate-limiting, add a RATE_LIMIT_TOR_TOGETHER flag, defaulting to false, which lumps all requests from TOR exit nodes into the same bucket. This may allow a TOR user to deny other TOR users access to the find-my-account and new-realm endpoints, but this is a low cost for cutting off a significant potential abuse vector. If enabled, the list of TOR exit nodes is fetched from their public endpoint once per hour, via a cron job, and cached on disk. Django processes load this data from disk, and cache it in memcached. Requests are spared from the burden of checking disk on failure via a circuitbreaker, which trips of there are two failures in a row, and only begins trying again after 10 minutes.	2021-11-16 11:42:00 -08:00
Alex Vandiver	01c007ceaf	puppet: Remove an out-of-date comment. Comment was missed in `9d57fa9759`.	2021-11-09 21:52:17 -08:00
Alex Vandiver	7af2fa2e92	puppet: Use sysv status command, not supervisorctl status. Since Supervisor 4, which is installed on Ubuntu 20.04 and Debian 11, `supervisorctl status` returns exit code 3 if any of the supervisor-controlled processes are not running. Using `supervisorctl status` as the Puppet `status` command for Supervisor leads to unnecessarily trying to "start" a Supervisor process which is already started, but happens to have one or more of its managed processes stopped. This is an unnecessary no-op in production environments, but in docker-init enviroments, such as in CI, attempting to start the process a second time is an error. Switch to checking if supervisor is running by way of sysv init. This fixes the potential error in CI, as well as eliminates unnecessary "starts" of supervisor when it was already running -- a situation which made zulip-puppet-apply not idempotent: ``` root@alexmv-prod:~# supervisorctl status process-fts-updates STOPPED Nov 10 12:33 AM smokescreen RUNNING pid 1287280, uptime 0:35:32 zulip-django STOPPED Nov 10 12:33 AM zulip-tornado STOPPED Nov 10 12:33 AM [...] root@alexmv-prod:~# ~zulip/deployments/current/scripts/zulip-puppet-apply --force Notice: Compiled catalog for alexmv-prod.zulipdev.org in environment production in 2.32 seconds Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running' Notice: Applied catalog in 0.91 seconds root@alexmv-prod:~# ~zulip/deployments/current/scripts/zulip-puppet-apply --force Notice: Compiled catalog for alexmv-prod.zulipdev.org in environment production in 2.35 seconds Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running' Notice: Applied catalog in 0.92 seconds ```	2021-11-09 21:52:17 -08:00
Alex Vandiver	8a1bb43b23	puppet: Adjust for templated paths and settings, set C.UTF-8 locale.	2021-11-08 18:21:46 -08:00
Alex Vandiver	d3e9a71d42	puppet: Check in upstream PostgreSQL 14 configuration file. Note that one `<%u%%d>` has to be escaped as `<%%u%%d>`.	2021-11-08 18:21:46 -08:00
Adam Benesh	c881430f4c	puppet: Add WSGIApplicationGroup config to Apache SSO example. Zulip apparently is now affected by a bad interaction between Apache's WSGI using Python subinterpreters and C extension modules like `re2` that are not designed for it. The solution is apparently to set WSGIApplicationGroup to %{GLOBAL}, which disables Apache's use of Python subinterpreters. See https://serverfault.com/questions/514242/non-responsive-apache-mod-wsgi-after-installing-scipy/514251#514251 for background. Fixes #19924.	2021-10-08 15:07:23 -07:00
Tim Abbott	33b5fa633a	process_fts_updates: Fix docker-zulip support. In the series of migrations to this tool's configuration to support specifying an arbitrary database name (e.g. `c17f502bb0`), we broke support for running process_fts_updates on the application server, connected to a remote database server. That workflow is used by docker-zulip and presumably other settings like Amazon RDS. The fix is to import the Zulip virtualenv (if available) when running on an application server. This is better than just supporting this case, since both docker-zulip and an Amazon RDS database are setting where it would be inconvenient to run process-fts-updates directly on the database server. (In the former case, because we want to avoid having a strong version dependency on the postgres container). Details are available in this conversation: https://chat.zulip.org/#narrow/stream/49-development-help/topic/Logic.20in.20process_fts_updates.20seems.20to.20be.20broken/near/1251894 Thanks to Erik Tews for reporting and help in debugging this issue.	2021-09-27 18:17:33 -05:00
Alex Vandiver	1806e0f45e	puppet: Remove zulip.org configuration.	2021-08-26 17:21:31 -07:00
Alex Vandiver	27881babab	puppet: Increase prometheus storage, from the default 15d.	2021-08-24 23:40:43 -07:00
Alex Vandiver	faf71eea41	upgrade-postgresql: Do not remove other supervisor configs. We previously used `zulip-puppet-apply` with a custom config file, with an updated PostgreSQL version but more limited set of `puppet_classes`, to pre-create the basic settings for the new cluster before running `pg_upgradecluster`. Unfortunately, the supervisor config uses `purge => true` to remove all SUPERVISOR configuration files that are not included in the puppet configuration; this leads to it removing all other supervisor processes during the upgrade, only to add them back and start them during the second `zulip-puppet-apply`. It also leads to `process-fts-updates` not being started after the upgrade completes; this is the one supervisor config file which was not removed and re-added, and thus the one that is not re-started due to having been re-added. This was not detected in CI because CI added a `start-server` command which was not in the upgrade documentation. Set a custom facter fact that prevents the `purge` behaviour of the supervisor configuration. We want to preserve that behaviour in general, and using `zulip-puppet-apply` continues to be the best way to pre-set-up the PostgreSQL configuration -- but we wish to avoid that behaviour when we know we are applying a subset of the puppet classes. Since supervisor configs are no longer removed and re-added, this requires an explicit start-server step in the instructions after the upgrades complete. This brings the documentation into alignment with what CI is testing.	2021-08-24 19:00:58 -07:00
Alex Vandiver	e46e862f2b	puppet: Add a bare-bones zulipbot profile. This sets up the firewalls appropriate for zulipbot, but does not automate any of the configuration of zulipbot itself.	2021-08-24 16:05:58 -07:00
Alex Vandiver	5857dcd9b4	puppet: Configure ip6tables in parallel to ipv4. Previously, IPv6 firewalls were left at the default all-open. Configure IPv6 equivalently to IPv4.	2021-08-24 16:05:46 -07:00
Alex Vandiver	845509a9ec	puppet: Be explicit that existing iptables are only ipv4.	2021-08-24 16:05:46 -07:00
Anders Kaseorg	09564e95ac	mypy: Add types-psycopg2. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-09 20:32:19 -07:00
Alex Vandiver	4dd289cb9d	puppet: Enable prometheus monitoring of supervisord. To be able to read the UNIX socket, this requires running node_exporter as zulip, not as prometheus.	2021-08-03 21:47:02 -07:00
Alex Vandiver	aa940bce72	puppet: Disable hwmon collector, which does nothing on cloud hosts.	2021-08-03 21:47:02 -07:00
Alex Vandiver	23a355df0f	puppet: Move backup time earlier, from 10am to 7pm America/Los_Angeles. This is less likely to overlap with common evening deploy times.	2021-08-03 18:32:45 -05:00
Alex Vandiver	e94b6afb00	nagios: Remove broken check_email_deliverer_* checks and related code. These checks suffer from a couple notable problems: - They are only enabled on staging hosts -- where they should never be run. Since `ef6d0ec5ca`, these supervisor processes are only run on one host, and never on the staging host. - They run as the `nagios` user, which does not have appropriate permissions, and thus the checks always fail. Specifically, `nagios` does not have permissions to run `supervisorctl`, since the socket is owned by the `zulip` user, and mode 0700; and the `nagios` user does not have permission to access Zulip secrets to run `./manage.py print_email_delivery_backlog`. Rather than rewrite these checks to run on a cron as zulip, and check those file contents as the nagios user, drop these checks -- they can be rewritten at a later point, or replaced with Prometheus alerting, and currently serve only to cause always-failing Nagios checks, which normalizes alert failures. Leave the files installed if they currently exist, rather than cluttering puppet with `ensure => absent`; they do no harm if they are left installed.	2021-08-03 16:07:13 -07:00
Mateusz Mandera	57f14b247e	bots: Specify realm for nagios bots messages in check_send_receive_time.	2021-07-26 15:33:13 -07:00
Alex Vandiver	befe204be4	puppet: Run the supervisor-restart step only after it is started. In an initial install, the following is a potential rule ordering: ``` Notice: /Stage[main]/Zulip::Supervisor/File[/etc/supervisor/conf.d/zulip]/ensure: created Notice: /Stage[main]/Zulip::Supervisor/File[/etc/supervisor/supervisord.conf]/content: content changed '{md5}99dc7e8a1178ede9ae9794aaecbca436' to '{md5}7ef9771d2c476c246a3ebd95fab784cb' Notice: /Stage[main]/Zulip::Supervisor/Exec[supervisor-restart]: Triggered 'refresh' from 1 event [...] Notice: /Stage[main]/Zulip::App_frontend_base/File[/etc/supervisor/conf.d/zulip/zulip.conf]/ensure: defined content as '{md5}d98ac8a974d44efb1d1bb2ef8b9c3dee' [...] Notice: /Stage[main]/Zulip::App_frontend_once/File[/etc/supervisor/conf.d/zulip/zulip-once.conf]/ensure: defined content as '{md5}53f56ae4b95413bfd7a117e3113082dc' [...] Notice: /Stage[main]/Zulip::Process_fts_updates/File[/etc/supervisor/conf.d/zulip/zulip_db.conf]/ensure: defined content as '{md5}96092d7f27d76f48178a53b51f80b0f0' Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running' ``` The last line is misleading -- supervisor was already started by the `supervisor-restart` process on the third line. As can be shown with `zulip-puppet-apply --debug`, the last line just installs supervisor to run on startup, using `systemctl`: ``` Debug: Executing: 'supervisorctl status' Debug: Executing: '/usr/bin/systemctl unmask supervisor' Debug: Executing: '/usr/bin/systemctl start supervisor' ``` This means the list of processes started by supervisor depends entirely on which configuration files were successfully written out by puppet before the initial `supervisor-restart` ran. Since `zulip_db.conf` is written later than the rest, the initial install often fails to start the `process-fts-updates` process. In this state, an explicit `supervisorctl restart` or `supervisorctl reread && supervisorctl update` is required for the service to be found and started. Reorder the `supervisor-restart` exec to only run after the service is started. Because all supervisor configuration files have a `notify` of the service, this forces the ordering of: ``` (package) -> (config files) -> (service) -> (optional restart) ``` On first startup, this will start and them immediately restart supervisor, which is unfortunate but unavoidable -- and not terribly relevant, since the database will not have been created yet, and thus most processes will be in a restart loop for failing to connect to it.	2021-07-22 14:09:01 -07:00
Alex Vandiver	ee7c849f8a	puppet: Work around sysvinit supervisor init bug. The sysvinit script for supervisor has a long-standing bug where `/etc/init.d/supervisor restart` stops but does not then start the supervisor process. Work around this by making restart then try to start, and return if it is currently running.	2021-07-22 14:09:01 -07:00
Alex Vandiver	7e65421b1f	puppet: Ensure psycopg2 is installed before running process_fts_updates. Not having the package installed will cause startup failures in `process_fts_updates`; ensure that we've installed the package before we potentially start the service.	2021-07-14 17:24:52 -07:00
Alex Vandiver	528e5adaab	smokescreen: Default to only listening on 127.0.0.1. This prevents Smokescreen from acting as an open proxy. Fixes #19214.	2021-07-14 15:40:26 -07:00
Alex Vandiver	e6bae4f1dd	puppet: Remove zulip::nagios class. `93f62b999e` removed the last file in puppet/zulip/files/nagios_plugins/zulip_nagios_server, which means the singular rule in zulip::nagios no longer applies cleanly. Remove the `zulip::nagios` class, as it is no longer needed.	2021-07-09 17:29:41 -07:00
Anders Kaseorg	93f62b999e	nagios: Replace check_website_response with standard check_http plugin. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-07-09 16:47:03 -07:00
Vishnu KS	e0f5fadb79	billing: Downgrade small realms that are behind on payments. An organization with at most 5 users that is behind on payments isn't worth spending time on investigating the situation. For larger organizations, we likely want somewhat different logic that at least does not void invoices.	2021-07-02 13:19:12 -07:00
Anders Kaseorg	91bfebca7d	install: Replace wget with curl. curl uses Happy Eyeballs to avoid long timeouts on systems with broken IPv6. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-06-25 09:05:07 -07:00
Anders Kaseorg	3b60b25446	ci: Remove bullseye hack. base-files 11.1 marked bullseye as Debian 11 in /etc/os-release. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-06-24 14:35:51 -07:00
Alex Vandiver	d51272cc3d	puppet: Remove zulip_deliver_scheduled_* from zulip-workers:. Staging and other hosts that are `zulip::app_frontend_base` but not `zulip::app_frontend_once` do not have a /etc/supervisor/conf.d/zulip/zulip-once.conf and as such do not have `zulip_deliver_scheduled_emails` or `zulip_deliver_scheduled_messages` and thus supervisor will fail to reload. Making the contents of `zulip-workers` contingent on if the server is _also_ a `-once` server is complicated, and would involve using Concat fragments, which severely limit readability. Instead, expel those two from `zulip-workers`; this is somewhat reasonable, since they are use an entirely different codepath from zulip_events_, using the database rather than RabbitMQ for their queuing.	2021-06-14 17:12:59 -07:00
Alex Vandiver	6c72698df2	puppet: Move zulip_ops supervisor config into /etc/supervisor/conf.d/zulip/. This is similar cleanup to `3ab9b31d2f`, but only affects zulip_ops services; it serves to ensure that any of these services which are no longer enabled are automatically removed from supervisor. Note that this will cause a supervisor restart on all affected hosts, which will restart all supervisor services.	2021-06-14 17:12:59 -07:00
Alex Vandiver	df09607202	puppet: Switch to $zulip::common::supervisor_conf_dir variable.	2021-06-14 17:12:59 -07:00
Alex Vandiver	391f78a9c1	puppet: Move supervisor-not-in-/etc/supervisor/conf.d/ to common place.	2021-06-14 17:12:59 -07:00
Alex Vandiver	dd90083ed7	puppet: Provide FQDN of self as URI, so the certificate validates. Failure to do this results in: ``` psql: error: failed to connect to `host=localhost user=zulip database=zulip`: failed to write startup message (x509: certificate is valid for [redacted], not localhost) ```	2021-06-14 00:14:48 -07:00
Alex Vandiver	c90ff80084	puppet: Bump grafana version to 8.0.1. Most notably, this fixes an annoying bug with CloudWatch metrics being repeated in graphs.	2021-06-10 15:49:08 -07:00
Alex Vandiver	d905eb6131	puppet: Add a database teleport server. Host-based md5 auth for 127.0.0.1 must be removed from `pg_hba.conf`, otherwise password authentication is preferred over certificate-based authentication for localhost.	2021-06-08 22:21:21 -07:00
Alex Vandiver	100a899d5d	puppet: Add grafana server.	2021-06-08 22:21:00 -07:00
Alex Vandiver	459f37f041	puppet: Add prometheus server.	2021-06-08 22:21:00 -07:00
Alex Vandiver	19fb58e845	puppet: Add prometheus node exporter.	2021-06-08 22:21:00 -07:00
Alex Vandiver	a2b1009ed5	puppet: Turn on "authentication" which defaults to user with all rights. Nagios refuses to allow any modifications with use_authentication off; re-enabled "authentication" but set a default user, which (by way of the `*` permissions in `359f37389a`) is allowed to take all actions.	2021-06-08 15:19:28 -07:00
Alex Vandiver	61b6fc865c	puppet: Add a label to teleport applications, to allow RBAC. Roles can only grant or deny access based on labels; set one based on the application name.	2021-06-08 15:19:04 -07:00
Alex Vandiver	4aff5b1d22	puppet: Allow access to `/` in nagios. This was a regression in `51b985b40d`.	2021-06-07 22:40:58 -07:00
Alex Vandiver	54768c2210	puppet: Remove now-unused basic auth support files. `51b985b40d` made these unnecessary.	2021-06-07 16:17:45 -07:00
Alex Vandiver	359f37389a	puppet: Remove in-nagios auth restrictions. `51b985b40d` made nagios only accessible from localhost, or as proxied via teleport. Remove the HTTP-level auth requirements.	2021-06-07 16:17:45 -07:00
Alex Vandiver	2352fac6b5	puppet: Fix indentation.	2021-06-02 18:38:38 -07:00
Alex Vandiver	51b985b40d	puppet: Move nagios to behind teleport. This makes the server only accessible via localhost, by way of the Teleport application service.	2021-06-02 18:38:38 -07:00
Alex Vandiver	4f51d32676	puppet: Add a teleport application server. This requires switching to a reverse tunnel for the auth connection, with the side effect that the `zulip_ops::teleport::node` manifest can be applied on servers anywhere in the Internet; they do not need to have any publicly-available open ports.	2021-06-02 18:38:38 -07:00
Alex Vandiver	c59421682f	puppet: Add a teleport node on every host. Teleport nodes[1] are the equivalent to SSH servers. In addition to this config, joining the teleport cluster will require presenting a one-time "join token" from the proxy server[2], which may either be short-lived or static. [1] https://goteleport.com/docs/architecture/nodes/ [2] https://goteleport.com/docs/admin-guide/#adding-nodes-to-the-cluster	2021-06-02 18:38:38 -07:00
Alex Vandiver	1cdf14d195	puppet: Add a teleport server. See https://goteleport.com/docs/architecture/overview/ for the general architecture of a Teleport cluster. This commit adds a Teleport auth[1] and proxy[2] server. The auth server serves as a CA for granting time-bounded access to users and authenticating nodes on the cluster; the proxy provides access and a management UI. [1] https://goteleport.com/docs/architecture/authentication/ [2] https://goteleport.com/docs/architecture/proxy/	2021-06-02 18:38:38 -07:00
Alex Vandiver	3ebd627c50	puppet: Fix "import" -> "include" in chat_zulip_org.	2021-06-02 11:02:34 -07:00
Alex Vandiver	2130fc0645	puppet: Add an explicit class for czo.	2021-06-01 22:18:50 -07:00
Alex Vandiver	c9141785fd	puppet: Use concat fragments to place port allows next to services. This means that services will only open their ports if they are actually run, without having to clutter rules.v4 with a log of `if` statements. This does not go as far as using `puppetlabs/firewall`[1] because that would represent an additional DSL to learn; raw IPtables sections can easily be inserted into the generated iptables file via `concat::fragment` (either inline, or as a separate file), but config can be centralized next to the appropriate service. [1] https://forge.puppet.com/modules/puppetlabs/firewall	2021-05-27 21:14:48 -07:00
Alex Vandiver	4f79b53825	puppet: Factor out firewall config.	2021-05-27 21:14:48 -07:00
Alex Vandiver	87a109e3e0	puppet: Pull in pinned puppet modules. Using puppet modules from the puppet forge judiciously will allow us to simplify the configuration somewhat; this specifically pulls in the stdlib module, which we were already using parts of.	2021-05-27 21:14:48 -07:00
Alex Vandiver	f3eea72c2a	setup: Merge multiple setup-apt-repo scripts into one. This moves the `.asc` files into subdirectories, and writes out the according `.list` files into them. It moves from templates to written-out `.list` files for clarity and ease of implementation (Debian and Ubuntu need different templates for `zulip`), and as a way of making explicit which releases are supported for each list. For the special-case of the PGroonga signing key, we source an additional file within the directory. This simplifies the process for adding another class of `.list` file.	2021-05-26 14:42:29 -07:00
Alex Vandiver	4f017614c5	nagios: Replace check_fts_update_log with a process_fts_updates flag. This avoids having to duplicate the connection logic from process_fts_updates. Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>	2021-05-25 13:56:05 -07:00
Alex Vandiver	ab130ceb35	nagios: Support arbitrary database user and dbname in replication check. Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>	2021-05-25 13:56:05 -07:00
Alex Vandiver	c17f502bb0	process_fts_updates: Support arbitrary database user and dbname. Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>	2021-05-25 13:56:05 -07:00
Alex Vandiver	02fc0d3e1d	db: Drop None and empty-string checking in arguments. psycopg2 treats None and "" the same as not-provided: ``` assert connect(user="zulip", dbname="zulip") assert connect(user="zulip", dbname="zulip", host="") assert connect(user="zulip", dbname="zulip", host=None) with Raises("no password supplied"): connect(user="zulip", dbname="zulip", host="localhost") assert connect(user="zulip", dbname="zulip", port="") assert connect(user="zulip", dbname="zulip", port=None) assert connect(user="zulip", dbname="zulip", port=5432) with Raises("could not connect to server"): connect(user="zulip", dbname="zulip", port=5000) assert connect(dbname="zulip", host="localhost", password="right-password") with Raises("no password supplied"): connect(dbname="zulip", host="localhost", password="") with Raises("no password supplied"): connect(dbname="zulip", host="localhost", password=None) with Raises("password authentication failed"): connect(dbname="zulip", host="localhost", password="wrong") ``` Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>	2021-05-25 13:46:58 -07:00
Alex Vandiver	9c652eb16b	db: Use the pre-computed values from settings. Rather than duplicate logic from `computed_settings`, use the values that were computed therein. Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>	2021-05-25 13:46:58 -07:00
Alex Vandiver	94d7c29d92	db: Use the same codepath for cases (2) and (3). Using the second branch _only_ for case (3), of a PostgreSQL server on a different host, leaves it untested in CI. It also brings in an unnecessary Django dependency. Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>	2021-05-25 13:46:58 -07:00
Alex Vandiver	add6971ad9	db: Make USING_PGROONGA logic clearer. We only need to read the `zulip.conf` file to determine if we're using PGROONGA if we are on the PostgreSQL machine, with no access to Django. Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>	2021-05-25 13:46:58 -07:00
Alex Vandiver	75bf19c9d9	db: Combine the `if "host" in pg_args` stanza with earlier clause. The only way in which "host" could be set is in cases (1) or (2), when it was potentially read from Django's settings. In case (3), we already know we are on the same host as the PostgreSQL server. This unifies the two separated checks, which are actually the same check. Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>	2021-05-25 13:46:58 -07:00
Alex Vandiver	67fc8e84ea	db: Clarify the 3 different cases that process_fts_updates must support. Co-authored-by: Adam Birds <adam.birds@adbwebdesigns.co.uk>	2021-05-25 13:46:58 -07:00
Alex Vandiver	116e41f1da	puppet: Move files out and back when mounting /srv. Specifically, this affects /srv/zulip-aws-tools.	2021-05-23 13:29:23 -07:00
Alex Vandiver	ea98549e88	puppet: Always install linux-image-virtual, for ksplice support.	2021-05-23 13:29:23 -07:00
Alex Vandiver	0b1dd27841	puppet: AWS mounts its extra disks with inconsistent names. It is now /dev/nvme1n1, not /dev/nvme0n1; but it always has a consistent major/minor node. Source the file that defines these.	2021-05-23 13:29:23 -07:00
Alex Vandiver	82797dd53c	settings: Standardize the name of the deliver_scheduled_messages logs. This makes it match its command name, and other logfile name.	2021-05-18 12:39:28 -07:00
Alex Vandiver	343a1396af	puppet: Rename logfile for deliver_scheduled_messages to be consistent.	2021-05-18 12:39:28 -07:00
Alex Vandiver	ef6d0ec5ca	puppet: Only run deliver_scheduled_messages and _emails on one server. `deliver_scheduled_emails` and `deliver_scheduled_messages` use the `ScheduledEmail` and `ScheduledMessage` tables as a queue, effectively, pulling values off of them. As noted in their comments, this is not safe to run on multiple hosts at once. As such, split out the supervisor files for them.	2021-05-18 12:39:28 -07:00
Alex Vandiver	033a96aa5d	puppet: Fix check_ssl_certificate check to check named host, not self.	2021-05-17 18:38:30 -07:00
Alex Vandiver	a2b7a5ef4b	puppet: Clarify 20m keepalive time from the LB is a max; it can be less.	2021-05-17 14:56:51 -07:00
Alex Vandiver	66a232e303	smokescreen: Bump version of Go and Smokescreen. Move version pins to the latest versions of Go and Smokescreen.	2021-05-12 10:08:42 -10:00
Alex Vandiver	feb7870db7	puppet: Adjust thresholds on autovac_freeze. These thresholds are in relationship to the `autovacuum_freeze_max_age`, not the XID wraparound, which happens at 2^31-1. As such, it is perfectly normal that they hit 100%, and then autovacuum kicks in and brings it back down. The unusual condition is that PostgreSQL pushes past the point where an autovacuum would be triggered -- therein lies the XID wraparound danger. With the `autovacuum_freeze_max_age` set to 2000000000 in `postgresql.conf`, XID wraparound happens at 107.3%. Set the warning and error thresholds to below this, but above 100% so this does not trigger constantly.	2021-05-11 17:11:47 -07:00
Alex Vandiver	0f1611286d	management: Rename the deliver_email command to deliver_scheduled_email. This makes it parallel with deliver_scheduled_messages, and clarifies that it is not used for simply sending outgoing emails (e.g. the `email_senders` queue). This also renames the supervisor job to match.	2021-05-11 13:07:29 -07:00
Anders Kaseorg	544bbd5398	docs: Fix capitalization mistakes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-05-10 09:57:26 -07:00
Tim Abbott	ad0be6cea1	puppet: Remove thumbor.conf nginx configuration. This was missing in `405bc8dabf`.	2021-05-07 16:57:29 -07:00
Anders Kaseorg	9d57fa9759	puppet: Use pgrep -x to avoid accidental matches. Matching the full process name (-x without -f) or full command line (-xf) is less prone to mistakes like matching a random substring of some other command line or pgrep matching itself. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-05-07 08:54:41 -07:00
Anders Kaseorg	405bc8dabf	requirements: Remove Thumbor. Thumbor and tc-aws have been dragging their feet on Python 3 support for years, and even the alphas and unofficial forks we’ve been running don’t seem to be maintained anymore. Depending on these projects is no longer viable for us. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-05-06 20:07:32 -07:00
Alex Vandiver	eda9ce2364	locale: Use `C.UTF-8` rather than `en_US.UTF-8`. The `en_US.UTF-8` locale may not be configured or generated on all installs; it also requires that the `locales` package be installed. If users generate the `en_US.UTF-8` locale without adding it to the permanent set of system locales, the generated `en_US.UTF-8` stops working when the `locales` package is updated. Switch to using `C.UTF-8` in all cases, which is guaranteed to be installed. Fixes #15819.	2021-05-04 08:51:46 -07:00
Alex Vandiver	ddb9d16132	puppet: Install procps, for pgrep. In puppet, we use pgrep in the collection stage, to see if rabbitmq is running. Sufficiently bare-bones systems will not have `procps` (which provides `pgrep`) installed yet, which makes the install abort when running `puppet` for the first time. Just installing the `procps` package in Puppet is insufficient, because the check in the `unless` block runs when Puppet is determining which resources it needs to instantiate, and in what order; any package installation has yet to happen. As `erlang-base` (which provides `epmd`) happens to have a dependency of `procps`, any system without `pgrep` will also not have `epmd` installed or running. Regardless, it is safe to run `epmd -daemon` even if one is already running, as the comment above notes.	2021-05-03 14:48:52 -07:00
Alex Vandiver	3577c6dbd4	puppet: `pgrep -f something` can match itself. Using `pgrep -f epmd` to determine if `empd` is running is a race condition with itself, since the pgrep is attempting to match the "full process name" and its own full process name contains "epmd". This leads to epmd not being started when it should be, which in turn leads to rabbitmq-server failing to start. Use the standard trick for this, namely a one-character character class, to prevent self-matching.	2021-05-03 14:48:52 -07:00
Jennifer Hwang	c9f5946239	puppet: Add override for queue_workers_multiprocess. With tweaks to the documentation by tabbott. This uses the following configuration option: [application_server] queue_workers_multiprocess = false	2021-04-20 14:37:15 -07:00
Tim Abbott	bb676f1143	smokescreen: Move supervisor configuration to managed directory. We've established the conf.d/zulip directory as the recommended path for Zulip-managed configuration files, so this belongs there.	2021-04-16 14:05:42 -07:00
Gaurav Pandey	303e7b9701	ci: Add Debian bullseye to production test suite.	2021-04-15 21:38:31 -07:00
Gaurav Pandey	feb720b463	install: Add beta support for debian bullseye for production. This won't work on a real bullseye system until Bullseye actually officially releases. Fixes part of #17863.	2021-04-15 21:38:31 -07:00
Alex Vandiver	9de35d98d3	puppet: Ensure a snakeoil certificate, for Postfix and PostgreSQL. We use the snakeoil TLS certificate for PostgreSQL and Postfix; some VMs install the `ssl-cert` package but (reasonably) don't build the snakeoil certs into the image. Build them as needed. Fixes #14955.	2021-04-15 21:37:55 -07:00
Anders Kaseorg	b01d43f339	mypy: Fix strict_equality violations. puppet/zulip/files/nagios_plugins/zulip_postgresql/check_postgresql_replication_lag:98: error: Non-overlapping equality check (left operand type: "List[List[str]]", right operand type: "Literal[0]") [comparison-overlap] zerver/tests/test_realm.py:650: error: Non-overlapping container check (element type: "Dict[str, Any]", container item type: "str") [comparison-overlap] Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-04-13 09:18:18 -07:00
Alex Vandiver	93f3b41811	puppet: Also move avatars to the same nginx include file.	2021-04-09 08:28:42 -07:00
Alex Vandiver	aae8f454ce	puppet: Simplify uploads handling. `uploads-route.noserve` and `uploads-route.internal` contained identical location blocks for `/upload`, since differentiation was necessary for Trusty until 33c941407b72; move the now-common sections into `app`. This the only differences between internal and S3 serving as a single block which should be included or not based on config; move it to a file which may or may not be placed in `app.d/`.	2021-04-09 08:28:42 -07:00
Alex Vandiver	fb26c6b7ca	puppet: Move uwsgi_pass setting into uwsgi_params. We only ever call `uwsgi_pass django` in association with `include uwsgi_params`; refactor it in.	2021-04-09 08:28:42 -07:00
Alex Vandiver	9cf9d5f2cf	puppet: Move HTTP_X_REAL_IP setting into uwsgi_params. This effectively also adds it to serving `/user_uploads`, where its lack would cause failures to list the actual IP address.	2021-04-09 08:28:42 -07:00
Alex Vandiver	795517bd52	puppet: Only set X-Real-IP once. `07779ea879` added an additional `proxy_set_header` of `X-Real-IP` to `puppet/zulip/files/nginx/zulip-include-common/proxy`; as noted in that commit, Tornado longpoll proxies already included such a line. Unfortunately, this equates to setting that header _twice_ for Tornado ports, like so: ``` X-Real-Ip: 198.199.116.58 X-Real-Ip: 198.199.116.58 ``` ...which is represented, once parsed by Django, as an IP of `198.199.116.58, 198.199.116.58`. For IPv4, this odd "IP address" has no problems, and appears in the access logs accordingly; for IPv6 addresses, however, its length is such that it overflows a call to `getaddrinfo` when attempting to determine the validity of the IP. Remove the now-duplicated inclusion of the header.	2021-04-09 08:28:42 -07:00
Alex Vandiver	07779ea879	middleware: Do not trust X-Forwarded-For; use X-Real-Ip, set from nginx. The `X-Forwarded-For` header is a list of proxies' IP addresses; each proxy appends the remote address of the host it received its request from to the list, as it passes the request down. A naïve parsing, as SetRemoteAddrFromForwardedFor did, would thus interpret the first address in the list as the client's IP. However, clients can pass in arbitrary `X-Forwarded-For` headers, which would allow them to spoof their IP address. `nginx`'s behavior is to treat the addresses as untrusted unless they match an allowlist of known proxies. By setting `real_ip_recursive on`, it also allows this behavior to be applied repeatedly, moving from right to left down the `X-Forwarded-For` list, stopping at the right-most that is untrusted. Rather than re-implement this logic in Django, pass the first untrusted value that `nginx` computer down into Django via `X-Real-Ip` header. This allows consistent IP addresses in logs between `nginx` and Django. Proxied calls into Tornado (which don't use UWSGI) already passed this header, as Tornado logging respects it.	2021-03-31 14:19:38 -07:00
Anders Kaseorg	29e4c71ec4	puppet: Reformat custom Ruby modules with Rufo. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-03-24 12:12:04 -07:00
Alex Vandiver	6ee74b3433	puppet: Check health of APT repository.	2021-03-23 19:27:42 -07:00
Alex Vandiver	c01345d20c	puppet: Add nagios check for long-lived certs that do not auto-renew.	2021-03-23 19:27:27 -07:00
Alex Vandiver	9ea86c861b	puppet: Add a nagios alert configuration for smokescreen. This verifies that the proxy is working by accessing a highly-available website through it. Since failure of this equates to failures of Sentry notifications and Android mobile push notifications, this is a paging service.	2021-03-18 10:11:15 -07:00
Anders Kaseorg	129ea6dd11	nginx: Consistently listen on IPv6 and with HTTP/2. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-03-17 17:46:32 -07:00
Alex Vandiver	15c58cce5a	puppet: Create new nginx logfiles as the zulip user, not as www-data. All of `/var/log/nginx/` is chown'd to `zulip` and the nginx processes themselves run as `nginx`, and would thus (on their own) create new logfiles as `zulip`. Having `logrotate` create them as the package default of `www-data` means that they are momentarily unreadable by the `zulip` user just after rotation, which can cause problems with logtail scripts. Commit the standard `nginx` logrotate configuration, but with the `zulip` user instead of the `www-data` user.	2021-03-16 14:45:13 -07:00
Alex Vandiver	3314fefaec	puppet: Do not require a venv for zulip-puppet-apply. `0663b23d54` changed zulip-puppet-apply to use the venv, because it began using `yaml` to parse the output of puppet to determine if changes would happen. However, not every install ends with a venv; notably, non-frontend servers do not have one. Attempting to run zulip-puppet-apply on them hence now fails. Remove this dependency on the venv, by installing a system python3-yaml package -- though in reality, this package is already an indirect dependency of the system. Especially since pyyaml is quite stable, we're not using it in any interesting way, and it does not actually add to the dependencies, it is preferable to parsing the YAML by hand in this instance.	2021-03-14 17:50:57 -07:00
Alex Vandiver	52f155873f	puppet: Ensure that all `scripts/lib/install` packages are installed. These have all been required packages for some time, but this helps keep the install-time list more clearly a subset of the upgrade-time list.	2021-03-14 17:50:57 -07:00
Alex Vandiver	06c07109e4	puppet: Add missing semicolons left off in `ba3b88c81b`.	2021-03-12 15:48:53 -08:00
Alex Vandiver	024282b51e	Revert "puppet: Use rabbitmq as the user for its config files." This reverts commit `211232978f`. The `rabbitmq` user does not exist yet on first install, and the goal is to create the `rabbitmq-env.conf` file before the package is installed.	2021-03-12 15:37:19 -08:00
Alex Vandiver	ba3b88c81b	puppet: Explicitly use the snakeoil certificates for nginx. In production, the `wildcard-zulipchat.com.combined-chain.crt` file is just a symlink to the snakeoil certificates; but we do not puppet that symlink, which makes new hosts fail to start cleanly. Instead, point explicitly to the snakeoil certificate, and explain why.	2021-03-12 13:31:54 -08:00
Alex Vandiver	211232978f	puppet: Use rabbitmq as the user for its config files. This matches the initial ownership by the `rabbitmq-server` package.	2021-03-12 13:31:03 -08:00
Alex Vandiver	ef188af82d	puppet: Use two location blocks, instead of nesting them. Directives in `location` blocks may or may not inherit from surrounding `location` blocks; specifically, `add_header` directives do not[1]: > There could be several add_header directives. These directives are > inherited from the previous configuration level if and only if there > are no add_header directives defined on the current level. In order to maintain the same headers (including, critically, `Access-Control-Allow-Origin`) as the surrounding block, all `add_header` directives must thus be repeated (which includes the `include`). For clarity, un-nest and repeat the entire `location` block as was used for `/static/`, but with the additional `add_header`. This is preferred to the of an `if $request_uri` statement to add the header, as those can have unexpected or undefined results[2]. [1] http://nginx.org/en/docs/http/ngx_http_headers_module.html#add_header [2] https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/	2021-03-11 21:09:15 -08:00
Alex Vandiver	306bf930f5	puppet: Add a warning if ksplice is enabled but has no key set.	2021-03-10 17:57:20 -08:00
Alex Vandiver	a215c83c2d	puppet: Switch to more explicit variable rather than reuse a nagios one. Redis is not nagios, and this only leads to confusion as to why there is a nagios domain setting on frontend servers; it also leaves the `redis0` part of the name buried in the template. Switch to an explicit variable for the redis hostname.	2021-03-10 11:44:54 -08:00
Alex Vandiver	a5b29398fc	puppet: Only install ksplice uptrack if there is an access key.	2021-03-10 11:44:11 -08:00
Alex Vandiver	189e86e18e	puppet: Set aggressive caching headers on immutable webpack files. A partial fix for #3470.	2021-03-07 22:00:32 -08:00
Alex Vandiver	e63f170027	puppet: Add access time and host to nginx access logs. `2e20ab1658` attempted to add this; but there are multiple locations that access logs are set, and the most specific wins.	2021-03-04 18:06:47 -08:00
Alex Vandiver	8961885b0f	puppet: Add smokescreen to logrotate.	2021-03-02 17:16:38 -08:00
Alex Vandiver	d938dd9d4a	puppet: Document smokescreen installation, and move to puppet/zulip/. This is more broadly useful than for just Kandra; provide documentation and means to install Smokescreen for stand-alone servers, and motivate its use somewhat more.	2021-03-02 17:16:38 -08:00
Alex Vandiver	2f5eae5c68	puppet: Minor formatting.	2021-02-28 17:03:29 -08:00
Alex Vandiver	a759d26a32	puppet: Make ksplice config not world-readable, use 'adm' group. This matches the configuration that ksplice itself creates the file and directory with.	2021-02-28 17:03:29 -08:00
Tim Abbott	957c16aa77	nagios: Tweak prod load monitoring parameters. Ultimately this monitoring isn't that helpful, but we're mainly interested in when it spikes to very high numbers.	2021-02-26 08:39:52 -08:00
Alex Vandiver	32149c6a1c	puppet: Add ksplice uptrack for kernel hotpatches.	2021-02-25 18:05:47 -08:00
Alex Vandiver	173d2dec3d	puppet: Check in defensive restart-camo cron job. This was found on lb1; add it to the camo install on smokescreen.	2021-02-24 16:42:21 -08:00
Alex Vandiver	d15e6990e5	puppet: Only execute setup-apt-repo if necessary. This means that in steady-state, `zulip-puppet-apply` is expected to produce no changes or commands to execute. The verification step of `setup-apt-repo` is quite fast, so this cleans up the output for very little cost.	2021-02-23 18:16:02 -08:00
Alex Vandiver	0b736ef4cf	puppet: Remove puppet_ops configuration for separate loadbalancer host.	2021-02-22 16:05:13 -08:00
Alex Vandiver	e30b524896	iptables: Limit smokescreen port 4750, add camo port. Limit incoming connections to port 4750 to only the smokescreen host, and also allow access to the Camo server on that host, on port 9292.	2021-02-17 13:52:38 -08:00
Alex Vandiver	1caff01463	puppet: Configure nginx for long keep-alives when behind a loadbalancer. These optimizations only makes sense when all connections at a TCP level are coming from the same host or set of hosts; as such, they are only enabled if `loadbalancer.ips` is set in the `zulip.conf`.	2021-02-17 10:25:33 -08:00
Alex Vandiver	a88af1b5a2	camo: Install on smokescreen host.	2021-02-16 08:12:31 -08:00
Alex Vandiver	29f60bad20	smokescreen: Put the version into the supervisorctl command. This makes it reload correctly if the version is changed.	2021-02-16 08:12:31 -08:00
Anders Kaseorg	6e4c3e41dc	python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	11741543da	python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	5028c081cb	python: Merge concatenated string literals that Black would uglify. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Alex Vandiver	559cdf7317	puppet: Set APT::Periodic::Unattended-Upgrade in apt config. This is required for unattended upgrades to actually run regularly. In some distributions, it may be found in 20auto-upgrades, but placing it here makes it more discoverable.	2021-02-12 08:59:19 -08:00
Ganesh Pawar	65e23dd713	puppet: Add Zulip specific postgresql configuration for 13. Based on the work done in `a03e4784c7`.	2021-02-05 09:30:34 -08:00
Ganesh Pawar	90a3dc8a91	puppet: Add upstream version of postgresql 13 config. This is a prep commit to add provision support for Ubuntu 20.10 Groovy.	2021-02-05 09:30:34 -08:00
Tim Abbott	fd8504e06b	munin: Update to use NAGIOS_BOT_HOST. We haven't actively used this plugin in years, and so it was never converted from the 2014-era monitoring to detect the hostname. This seems worth fixing since we may want to migrate this logic to a more modern monitoring system, and it's helpful to have it correct.	2021-01-27 12:07:09 -08:00
Alex Vandiver	ab035f76de	puppet: Be more restrictive about mm addresses. These will always have only 32 characters after the `mm`.	2021-01-26 10:13:58 -08:00
Alex Vandiver	a53092687e	puppet: Only match incoming gateway address on our mail domain. `79931051bd` allows outgoing emails from localhost, but outgoing recipients are still subjected to virtualmaps. This caused all outgoing email from Zulip with destination addresses containing `.`, `+`, or starting with `mm`, to be redirected back through the email gateway. Bracket the virualmap addresses used for local delivery to the mail gateway with a restriction on the domain matching the `postfix.mailname` configuration, regex-escaped, so those only apply to email destined for that domain. The hostname is _not_ moved from `mydestination` to `virtual_alias_domains`, as that would preclude delivery to actually-local addresses, like `postmaster@`.	2021-01-26 10:13:58 -08:00
Alex Vandiver	c2526844e9	worker: Remove SignupWorker and friends. ZULIP_FRIENDS_LIST_ID and MAILCHIMP_API_KEY are not currently used in production. This removes the unused 'signups' queue and worker.	2021-01-17 11:16:35 -08:00
Tim Abbott	4ee58f408b	process_fts_updates: Make normal development startup silent. We run this tool at DEBUG log level in production, so we will still see the notice on startup there; this avoids a spammy line in the development environment output..	2020-12-20 12:19:49 -08:00
Sutou Kouhei	0d3f9fc855	install: Use PGroonga packages built for PostgreSQL packages by PGDG Because we always use PostgreSQL packages by PGDG since Zulip 3.0. Fixes #16058.	2020-12-18 15:38:21 -08:00
Alex Vandiver	4868a4fe48	puppet: Set a long timeout on wal-g wal-push, to prevent stalls. `wal-g wal-push` has a known bug with occasionally hanging after file upload to S3[1]; set a rather long timeout on the upload process, so that we don't simply stall forever when archiving WAL segments. [1] https://github.com/wal-g/wal-g/issues/656	2020-11-20 11:32:36 -08:00
Sourabh Rana	419f163906	nginx: Increase file upload size from 25mb to 80mb.	2020-11-19 00:49:49 -08:00
Alex Vandiver	90ca06d873	puppet: Allow unattended upgrades of -updates in addition to -security. This ensures that software will be fully up-to-date, not just with security patches.	2020-11-13 16:45:05 -08:00
Alex Vandiver	2e20ab1658	puppet: Log the "Host" header and total response time. Logging `Host` is useful for determining access patterns to realms, especially if ROOT_DOMAIN_LANDING_PAGE is set. Total response time is useful in debugging access and performance patterns.	2020-11-13 16:42:32 -08:00
Tim Abbott	494a685827	puppet: Fix typo in name of missedmessage_emails consumer. This has been present since this check was introduced in `45c9c3cc30`.	2020-10-29 12:28:54 -07:00
Tim Abbott	ab3cb2b3bf	puppet: Fix internal redis puppet configuration. The inherits rule is required for overriding existing configuration files; while the `::profile` piece was missed in the recent ::profile migration.	2020-10-29 11:53:43 -07:00
Alex Vandiver	6b9d7000b5	puppet: Set proxy environment variables. These are respected by `urllib`, and thus also `requests`. We set `HTTP_proxy`, not `HTTP_PROXY`, because the latter is ignored in situations which might be running under CGI -- in such cases it may be coming from the `Proxy:` header in the request.	2020-10-28 12:17:35 -07:00
Alex Vandiver	8b0f32ee07	puppet: Move environment-setting into configuration, not command.	2020-10-28 12:13:04 -07:00
Alex Vandiver	b9797770d3	provision: Rename backup directory to postgresql.	2020-10-28 11:57:03 -07:00
Alex Vandiver	1f7132f50d	docs: Standardize on PostgreSQL, not Postgres.	2020-10-28 11:55:16 -07:00
Alex Vandiver	eaa99359b1	puppet: Rename to check_postgresql_replication_lag.	2020-10-28 11:51:52 -07:00
Alex Vandiver	53e59a0a13	puppet: Rename check_postgres_backup to check_postgresql_backup.	2020-10-28 11:51:52 -07:00
Alex Vandiver	45f6c79c4a	puppet: Rename postgres_ variables to postgresql_.	2020-10-28 11:51:52 -07:00
Alex Vandiver	e124324050	puppet: Rename postgres_appdb in nagios to postgresql.	2020-10-28 11:51:52 -07:00
Alex Vandiver	a155430eb5	docs: Document all zulip.conf settings. This provides a single reference point for all zulip.conf settings; these mostly link out to the more complete documentation about each setting, elsewhere. Fixes #12490.	2020-10-27 13:31:57 -07:00
Alex Vandiver	e81bc19e45	puppet: Remove shims for old classes, except dockervoyager. The upgrade mechanism in the previous commit negates the need for them -- with the exception of dockervoyager.	2020-10-27 13:29:19 -07:00
Alex Vandiver	d24c571bab	puppet: Automatically back up the database if we have the secrets. This avoids folks having to manually add to the puppet_classes.	2020-10-27 13:29:19 -07:00
Alex Vandiver	e7798d2797	puppet: Move zulip_ops::profile::postgres_appdb to postgresql.	2020-10-27 13:29:19 -07:00
Alex Vandiver	9f25389bff	puppet: Move top-level zulip_ops deployments to zulip_ops::profile.	2020-10-27 13:29:19 -07:00
Alex Vandiver	5365af544a	puppet: Rename zulip::profile::rabbit to ::rabbitmq.	2020-10-27 13:29:19 -07:00
Alex Vandiver	188af57296	puppet: Rename postgres_appdb to postgresql. There is only one PostgreSQL database; the "appdb" is irrelevant. Also use "postgresql," as it is the name of the software, whereas "postgres" the name of the binary and colloquial name. This is minor cleanup, but enabled by the other renames in the previous commit.	2020-10-27 13:29:19 -07:00
Alex Vandiver	91cb0988e1	puppet: Generalize docker detection. This also has the benefit of detecting zulip::dockervoyager as well as zulip::profile::docker.	2020-10-27 13:29:19 -07:00
Alex Vandiver	0f25acc7b3	puppet: Rename "voyager"/"dockervoyager" to "standalone"/"docker". The "voyager" name is non-intuitive and not significant. `zulip::voyager` and `zulip::dockervoyager` stubs are kept for back-compatibility with existing `zulip.conf` files.	2020-10-27 13:29:19 -07:00
Alex Vandiver	c2185a81d6	puppet: Move top-level zulip deployments into "profile" directory. This moves the puppet configuration closer to the "roles and profiles method"[1] which is suggested for organizing puppet classes. Notably, here it makes clear which classes are meant to be able to stand alone as deployments. Shims are left behind at the previous names, for compatibility with existing `zulip.conf` files when upgrading. [1] https://puppet.com/docs/pe/2019.8/the_roles_and_profiles_method	2020-10-27 13:29:19 -07:00
Alex Vandiver	27cfb14d92	puppet: Only include zulip::base for top-level deploys. This also removes direct includes of `zulip::common`, making `zulip::base` gatekeep the inclusion of it. This helps enforce that any top-level deploy only needs include a single class, and that any configuration which is not meant to be deployed by itself will not apply, due to lack of `zulip::common` include. The following commit will better differentiate these top-level deploys by moving them into a subdirectory.	2020-10-27 13:29:19 -07:00
Alex Vandiver	34e8c2c61e	puppet: Move total_memory_mb from zulip::base into zulip::common. This makes `zulip::common` used only for variable-setting, and `zulip::base` used only for resource creation.	2020-10-27 13:29:19 -07:00
Alex Vandiver	7bb888c2ec	puppet: Template supervisor.conf for redhat paths.	2020-10-27 13:29:19 -07:00
Alex Vandiver	3ab9b31d2f	puppet: Purge all un-managed supervisor configuration files. Relying on `defined(Class['...'])` makes the class sensitive to resource evaluation ordering, and thus brittle. It is also only functional for a single service (thumbor). Generalize by using `purge => true` for the directory to automatically remove all un-managed files. This is more general than the previous form, and may result in additional not-managed services being removed.	2020-10-27 13:29:19 -07:00
Alex Vandiver	1d54630b4e	log: Rename email-deliverer.log to match other files.	2020-10-25 14:56:37 -07:00
Alex Vandiver	93d661d119	puppet: Configure logrotate for all logger files. This adds log rotation to all /var/log/zulip files.	2020-10-25 14:56:37 -07:00
Alex Vandiver	c296b5d819	puppet: Allow unattended-upgrades for all but servers. Restarting servers is what can cause service interruptions, and increase risk. Add all of the servers that we use to the list of ignored packages, and uncomment the default allowed-origins in order to enable unattended upgrades.	2020-10-23 16:46:06 -07:00
Anders Kaseorg	72d6ff3c3b	docs: Fix more capitalization issues. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-10-23 11:46:55 -07:00
Alex Vandiver	a7d1fd9ffb	puppet: Remove non-working apt::source. `d2aa81858c` replaced the `apt::source` to set up debathena with `Exec['setup-apt-repo-debathena']`, but mistakenly left the `apt::source` in place in `zmirror` (but not `zmirror_personals`). The `apt::source` resource type was later removed in `c9d54f7854`, making the manifest to apply on `zmirror`. Remove the broken and unnecessary `apt::source` resource.	2020-10-23 11:31:20 -07:00
Alex Vandiver	48e06c25ba	puppet: Switch nagios SSH checks to id_ed25519 key. The ssh-rsa algorithm was deprecated[1] in OpenSSH 8.2 (2020-02-14) and will be removed in a future release. [1] https://www.openssh.com/txt/release-8.4	2020-10-22 16:42:30 -07:00
Alex Vandiver	0ea20bd7d8	puppet: Move postgres_version into postgres_common. This property is not related to the base zulip install; move it to zulip::postgres_common, which is already used as a namespace for various postgres variables.	2020-10-22 11:32:25 -07:00
Alex Vandiver	25e995b677	puppet: Move normal_queues to the one place that uses it.	2020-10-22 11:32:25 -07:00
Alex Vandiver	423b5c2be2	puppet: Move queue error and stats directories to just the app host.	2020-10-22 11:31:05 -07:00
Alex Vandiver	4d4c21499a	puppet: Move supervisor dependency into process_fts_updates. PostgreSQL itself has no dependency on supervisor; rather, the FTS updates do.	2020-10-22 11:30:53 -07:00
Alex Vandiver	ca971ebc59	puppet: Remove empty zulip_ops class.	2020-10-22 11:30:53 -07:00
Alex Vandiver	16af05758d	puppet: Move zulip_org into zulip_ops. This class is not of general interest.	2020-10-22 11:30:53 -07:00
Alex Vandiver	ad566c491d	puppet: Drop now-unused zulip_ops:::git class.	2020-10-22 11:30:53 -07:00
Alex Vandiver	50e9e2ed20	puppet: Make zulip::base include zulip::apt_repository. There was likely more dependency complexity prior to `97766102df`, but there is now no reason to require that consumers explicitly include zulip::apt_repository.	2020-10-22 11:30:53 -07:00
Alex Vandiver	2dc6d26ec6	puppet: Fix included monitoring class name.	2020-10-19 22:30:20 -07:00
Alex Vandiver	7a1132d605	puppet: Switch golang and smokescreen to use /srv. /srv and /opt have very similar usages; but we should be internally consistent. Move these two (the only usages of /opt) to match the rest in /srv.	2020-10-16 13:00:06 -07:00
Alex Vandiver	78b92a51cc	puppet: Allow access to smokescreen port via iptables.	2020-10-15 15:18:35 -07:00
Alex Vandiver	0d5356969e	puppet: Reformat ipv4 iptables rules comments.	2020-10-15 15:18:35 -07:00
Alex Vandiver	fffea9612b	puppet: Add an outgoing HTTP/HTTPS proxy server. Use https://github.com/stripe/smokescreen to provide a server for an outgoing proxy, run under supervisor. This will allow centralized blocking of internal metadata IPs, localhost, and so forth, as well as providing default request timeouts (10s by default).	2020-10-15 15:18:35 -07:00
Anders Kaseorg	dfaea9df65	shfmt: Reformat shell scripts with shfmt. https://github.com/mvdan/sh Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-10-15 15:16:00 -07:00
Alex Vandiver	f61ac4a28d	puppet: Move frontend monitoring into its own file. This allows it to be pulled in for deploys like czo, which don't use the full `zulip_ops::app_frontend`, but we wish to monitor.	2020-10-13 17:37:32 -07:00
Tim Abbott	7c2c82b190	nginx: Update nginx configuration for fhir/hl7 organization. We should eventually add templating for the set of hosts here, but it's worth merging this change to remove the deleted hostname and replace it with the current one.	2020-10-13 16:50:26 -07:00
Anders Kaseorg	723d285e46	nginx: Redirect {www.,}zulipchat.com, www.zulip.com to zulip.com. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-10-13 16:49:23 -07:00
Alex Vandiver	c8df9a150e	puppet: Drop all log2zulip configuration. Disabled on webservers in `047817b6b0`, it has since lingered in configuration, as well as running (to no effect) every minute on the loadbalancer. Remove the vestiges of its configuration.	2020-10-13 11:00:50 -07:00
Alex Vandiver	b431b1b021	puppet: Remove misleading motd. This banner shows on lb1, advertising itself as lb0. There is no compelling reason for a custom motd, especially one which needs to be reconfigured for each host.	2020-10-13 11:00:36 -07:00
Alex Vandiver	45c9c3cc30	queue: Monitor user_activity queue, now that it has a consumer. Since this was using repead individual get() calls previously, it could not be monitored for having a consumer. Add it in, by marking it of queue type "consumer" (the default), and adding Nagios lines for it. Also adjust missedmessage_emails to be monitored; it stopped using LoopQueueProcessingWorker in `5cec566cb9`, but was never added back into the set of monitored consumers.	2020-10-11 14:19:42 -07:00
Alex Vandiver	4fd7df4e8c	puppet: Remove absent of check-apns-tokens. This was marked as ensure absent in `d02101a401`, in v1.7.0 in 2017.	2020-09-29 18:17:08 -07:00
Alex Vandiver	872a349508	puppet: Remove absent of log2zulip. This was marked as ensure absent in `047817b6b0`, in v2.0.0 in 2018.	2020-09-29 18:17:08 -07:00
Alex Vandiver	0137772fdb	puppet: Remove absent of calculate-first-visible-message-id. This was marked as ensure absent in `dc7d44a245`, in v1.9.0 in 2018.	2020-09-29 18:17:08 -07:00
Alex Vandiver	966c8dc23d	puppet: Remove absent of email-mirror cron job. This was marked as ensure absent in `24f8492236`, in v1.3.0 in 2014.	2020-09-29 18:17:08 -07:00
Alex Vandiver	430d3b8554	puppet: Remove absent of libapache2-mod-wsgi. This was marked as ensure absent in `89b97e7480`, in v1.7.0 in 2017, though it did not take effect until `6e55aa2ce6`, in v1.9.0 in 2018.	2020-09-29 18:17:08 -07:00
Alex Vandiver	12085552d5	puppet: Tidy indentation.	2020-09-29 17:44:44 -07:00
Alex Vandiver	57d88eedd8	puppet: Only install rabbitmq cron jobs via zulip_ops. The rabbitmq cron jobs exist in order to call rabbitmqctl as root and write the output to files that nagios can consume, since nagios is not allowed to run rabbitmqctl. In systems which do not have nagios configured, these every-minute cron jobs add non-insignificant load, to no effect. Move their installation into `zulip_ops`. In doing so, also combine the cron.d files into a single file; this allows us to `ensure => absent` the old filenames, removing them from existing systems. Leave the resulting combined cron.d file in `zulip`, since it is still of general utility and note.	2020-09-29 17:44:44 -07:00
Alex Vandiver	79931051bd	puppet: Permit outgoing mail from postfix. The configuration change made in `1c17583ad5` only allowed delivery to those specific Zulip addresses. However, they also prevent the mailserver from being used as an outgoing email relay from Zulip, since all mail that passed through the mailserver (from any originator) was required to have a `RCPT TO` that matched those regexes. Allow mail originating from `mynetworks` to have an arbitrary addresses in `RCPT TO`.	2020-09-25 15:09:27 -07:00
Alex Vandiver	36ea307fbf	puppet: Depend other changes on sharding.py validation. Use the validation of the tornado sharding config that `stage_updated_sharding` does, by depending on it. This ensures that we don't write out a supervisor or nginx config based on a bad (e.g. non-sequential) list of tornado ports.	2020-09-25 10:52:40 -07:00
Alex Vandiver	c0e240277b	tornado: Remove fingerprinting, write out .tmp files always. Fingerprinting the config is somewhat brittle -- it requires either custom bootstrapping for old (fingerprint-less) configs, and may have false-positives. Since generating the config is lightweight, do so into the .tmp files, and compare the output to the originals to determine if there are changes to apply. In order to both surface errors, as well as notify the user in case a restart is necessary, we must run it twice. The `onlyif` functionality cannot show configuration errors to the user, only determine if the command runs or not. We thus run the command once, judging errors as "interesting" enough to run the actual command, whose failure will be verbose in Puppet and halt any steps that depend on it. Removing the `onlyif` would result in `stage_updated_sharding` showing up in the output of every Puppet run, which obscures the important messages it displays when an update to sharding is necessary. Removing the `command` (e.g. making it an `echo`) would result in removing the ability to report configuration errors. We thus have no choice but to run it twice; this is thankfully low-overhead.	2020-09-25 10:52:40 -07:00
Alex Vandiver	2a12fedcf1	tornado: Remove explicit tornado_processes setting; compute it. We can compute the intended number of processes from the sharding configuration. In doing so, also validate that all of the ports are contiguous. This removes a discrepancy between `scripts/lib/sharding.py` and other parts of the codebase about if merely having a `[tornado_sharding]` section is sufficient to enable sharding. Having behaviour which changes merely based on if an empty section exists is surprising. This does require that a (presumably empty) `9800` configuration line exist, but making that default explicit is useful. After this commit, configuring sharding can be done by adding to `zulip.conf`: ``` [tornado_sharding] 9800 = # default 9801 = other_realm ``` Followed by running `./scripts/refresh-sharding-and-restart`.	2020-09-18 15:13:40 -07:00
Alex Vandiver	f638518722	tornado: Move default production port to 9800. In development and test, we keep the Tornado port at 9993 and 9983, respectively; this allows tests to run while a dev instance is running. In production, moving to port 9800 consistently removes an odd edge case, when just one worker is on an entirely different port than if two workers are used.	2020-09-18 15:13:40 -07:00
Alex Vandiver	ff94254598	tornado: Log to files by port number. Without an explicit port number, the `stdout_logfile` values for each port are identical. Supervisor apparently decides that it will de-conflict this by appending an arbitrary number to the end: ``` /var/log/zulip/tornado.log /var/log/zulip/tornado.log.1 /var/log/zulip/tornado.log.10 /var/log/zulip/tornado.log.2 /var/log/zulip/tornado.log.3 /var/log/zulip/tornado.log.7 /var/log/zulip/tornado.log.8 /var/log/zulip/tornado.log.9 ``` This is quite confusing, since most other files in `/var/log/zulip/` use `.1` to mean logrotate was used. Also note that these are not all sequential -- 4, 5, and 6 are mysteriously missing, though they were used in previous restarts. This can make it extremely hard to debug logs from a particular Tornado shard. Give the logfiles a consistent name, and set them up to logrotate.	2020-09-14 22:17:51 -07:00
Alex Vandiver	efdaa58c24	supervisor: Use more specific process_name than "port-9800". Making this include "zulip-tornado" makes it clearer in supervisor logs. Without this, one only sees: ``` 2020-09-14 03:43:13,788 INFO waiting for port-9807 to stop 2020-09-14 03:43:14,466 INFO stopped: port-9807 (exit status 1) 2020-09-14 03:43:14,469 INFO spawned: 'port-9807' with pid 24289 2020-09-14 03:43:15,470 INFO success: port-9807 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) ```	2020-09-14 22:17:51 -07:00
Alex Vandiver	e9d0bdea65	puppet: Coerce uwsgi_listen_backlog_limit into an int before doing math.	2020-09-14 21:22:13 -07:00
Alex Vandiver	8adf530400	puppet: Generate sharding in puppet, then refresh-sharding-and-restart. This supports running puppet to pick up new sharding changes, which will warn of the need to finalize them via `refresh-sharding-and-restart`, or simply running that directly.	2020-09-14 16:27:15 -07:00
Alex Vandiver	0de356c2df	puppet: Move generation of tornado nginx upstreams into tornado_sharding. This puts the creation of the upstreams referenced by `nginx_sharding.conf` adjacent to their use.	2020-09-14 16:27:15 -07:00
Alex Vandiver	bf029d99f1	sharding: Also mark sharding.json 644 for consistency. There is no reason to limit this to 640; mark it 644 for consistency with the other file.	2020-09-14 16:27:15 -07:00
Alex Vandiver	1c17583ad5	puppet: Restrict postfix incoming addresses to postmaster and zulip. This removes the possibility of local user enumeration via RCPT TO.	2020-09-11 18:49:22 -07:00
Alex Vandiver	482c964dd3	puppet: Logrotate for webhook exceptions.	2020-09-10 17:47:21 -07:00
Alex Vandiver	e38051736d	puppet: Wrap and sort logrotate config.	2020-09-10 17:47:21 -07:00
Anders Kaseorg	75c59a820d	python: Convert subprocess.Popen.communicate to run or check_output. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-03 17:42:35 -07:00
Anders Kaseorg	fbfd4b399d	python: Elide action="store" for argparse arguments. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-03 16:17:14 -07:00
Anders Kaseorg	1f2ac1962f	python: Elide default=None for argparse arguments. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-03 16:17:14 -07:00
Anders Kaseorg	d751e0cece	puppet: Don’t install netcat. It’s been unused since commit `0af22dad18` (#13239). Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-03 10:33:47 -07:00
Anders Kaseorg	ab120a03bc	python: Replace unnecessary intermediate lists with generators. Mostly suggested by the flake8-comprehension plugin. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-02 11:15:41 -07:00
Anders Kaseorg	a5dbab8fb0	python: Remove redundant dest for argparse arguments. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-02 11:04:10 -07:00
Anders Kaseorg	dbdf67301b	memcached: Switch from pylibmc to python-binary-memcached. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-08-06 12:51:14 -07:00
Casper Kvan Clausen	ed7a6d5e4d	puppet: Support nginx_listen_port with http_only	2020-08-03 12:58:12 -07:00
Alex Vandiver	cd530d627b	uwsgi: Stop generating IOError and SIGPIPE on client close. Clients that close their socket to nginx suddenly also cause nginx to close its connection to uwsgi. When uwsgi finishes computing the response, it thus tries to write to a closed socket, and generates either IOError or SIGPIPE failures. Since these are caused by the _client_ closing the connection suddenly, they are not actionable by the server. At particularly high volumes, this could represent some sort of server-side failure; however, this is better detected by examining status codes at the loadbalancer. nginx uses the error code 499 for this occurrence: https://httpstatuses.com/499 Stop uwsgi from generating this family of exception entirely, using configuration for uwsgi[1]; it documents these errors as "(annoying)," hinting at their general utility." [1] https://uwsgi-docs.readthedocs.io/en/latest/Options.html#ignore-sigpipe	2020-07-31 10:40:09 -07:00
Alex Vandiver	ceb909dbc5	puppet: Increase backlogged socket count based on uwsgi backlog. Increasing the uwsgi listen backlog is intended to allow it to handle higher connection rates during server restart, when many clients may be trying to connect. The kernel, in turn, needs to have a proportionally increased somaxconn soas to not refuse the connection. Set somaxconn to 2x the uwsgi backlog, but no lower than the default (128).	2020-07-28 21:16:26 -07:00
Alex Vandiver	38d01cd4db	puppet: Generalize install-wal-g to be arbitrary tarballs.	2020-07-24 17:24:57 -07:00
Tim Abbott	5a1243db3c	puppet: Use correct scope for zulip_ops::munin_plugin.	2020-07-15 21:49:45 -07:00
Alex Vandiver	48c3c33d10	puppet: Fully-qualify the munin-plugin name	2020-07-14 17:58:51 -07:00
Alex Vandiver	c68333040b	puppet: Revert PostgreSQL setting of recovery_target_timeline. Prior to PostgreSQL 12, the `recovery_target_timeline` setting is only valid in a `recovery.conf` file, as that file has its own configuration parser. As such, including it in `postgresql.conf` results in an error, and PostgreSQL will fail to start. Remove the setting, reverting `bff3b540b1`. This fixes PostgreSQL 9.5, 9.6, 10, and 11; while the setting is not an error in a PostgreSQL 12 configuration file, it is unnecessary since `latest` is the default.	2020-07-14 16:28:20 -07:00
Alex Vandiver	31d80a77d4	puppet: Update nagios check_postgres_replication_lag to be on DB hosts `7d4a370a57` attempted to move the replication check to on the PostgreSQL hosts. While it updated the _check_ to assume it was running and talking to a local PostgreSQL instance, the configuration and installation for the check were not updated. As such, the check ran on the nagios host for each DB host, and produced no output. Start distributing the check to all apopdb hosts, and configure nagios to use the SSH tunnel to get there.	2020-07-14 16:27:18 -07:00
Alex Vandiver	2174db27db	puppet: Put the dependencies on pg_backup_and_purge itself, and ensure them.	2020-07-14 00:40:25 -07:00
Alex Vandiver	6c27f07c1d	puppet: Move PostgreSQL backups to their own class. wal-g was used in `puppet/zulip` by env-wal-g, but only installed in `puppet/zulip_ops`. Merge all of the dependencies of doing backups using wal-g (wal-g installation, the pg_backup_and_purge job, the nagios plugin that verifies it happens) into a common base class in `puppet/zulip`, since it is generally useful.	2020-07-14 00:40:25 -07:00
Anders Kaseorg	15483c09cb	puppet: Add missing trailing commas. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-07-13 15:36:06 -07:00
Alex Vandiver	3691a94efe	puppet: Configure munin and nagios under apache with puppet. This swaps in the actually-in-use munin configuiration file; otherwise, it is an implementation of the configuration as it exists on the machine.	2020-07-13 13:23:11 -07:00
Alex Vandiver	4e42164b4a	munin: Add plugins to prod hosts.	2020-07-13 13:23:11 -07:00
Alex Vandiver	2a14212b27	munin: Add a helper resource definition for munin plugins.	2020-07-13 12:49:28 -07:00
Alex Vandiver	7c7b5fcd6f	munin: Deal with spaces in the channel names.	2020-07-13 12:49:28 -07:00
Alex Vandiver	eda2c4b8e2	puppet: Split munin-node from munin-server. No plugins are installed inside the /usr/local/munin/lib this creates in munin-node, nor are they symlinked into /etc/munin/plugins, so non-default plugins are added by this.	2020-07-13 12:49:28 -07:00
Alex Vandiver	ddc7bb5a45	munin: Fix the path to check_send_receive_time.	2020-07-13 12:49:28 -07:00
Alex Vandiver	8be544e7eb	munin: Rename monitoring plugin to use zulip name, not humbug.	2020-07-13 12:49:28 -07:00
Alex Vandiver	1b3560af94	nagios: Stop assuming /api is where zulip client is. The api/ directory was removed in f9ba3cb60c; as that commit notes, we use the python-zulip-api module for that, added in `938597c5da`.	2020-07-13 12:49:28 -07:00
Mateusz Mandera	57d3ef42b8	puppet: Don't run thumbor services in production. Fixes #15649. Currently, no production services use thumbor; so, it makes sense to not run them in production systems.	2020-07-10 14:22:17 -07:00
Alex Vandiver	f0f29584aa	puppet: Add an arity count ("at least two") to zulipconf function.	2020-07-10 00:14:09 -07:00
Alex Vandiver	8cff27f67d	puppet: Pull hosts from zulip.conf, not hardcoded list. The one complexity is that hosts_fullstack are treated differently, as they are not currently found in the manual `hosts` list, and as such do not get munin monitoring.	2020-07-10 00:14:09 -07:00
Alex Vandiver	24383a5082	puppet: Rename hosts_domain so hosts_prefix can be grepped for.	2020-07-10 00:14:09 -07:00
Alex Vandiver	a4e7c7a27e	nagios: Remove check_memcached. check_memcached does not support memcached authentication even in its latest release (it’s in a TODO item comment, and that’s it), and was never particularly useful.	2020-07-10 00:12:48 -07:00
Anders Kaseorg	ebf7f4d0f6	zthumbor: Rename thumbor.conf to thumbor_settings.py. So we can apply all our lint checks to it. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-07-06 18:44:58 -07:00
Anders Kaseorg	9900298315	zthumbor: Remove Python 2 residue. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-07-06 18:44:58 -07:00
Alex Vandiver	17002f2a0e	puppet: Allow passing an alternate config path to zulip-puppet-apply. When temporary configuration changes are desired, this lets one set up an alternate `zulip.conf` to apply while leaving the true one in place.	2020-07-06 18:30:16 -07:00
Alex Vandiver	64b44a12f5	puppet: Add an exec rule to reload the whole supervisor config. When supervisor is first installed, it is started automatically, and creates the socket, owned by root. Subsequent reconfiguration in puppet only calls `reread + update`, which is insufficient to apply the `chown = zulip:zulip` line in `supervisord.conf`, leaving the socket owned by `root` and the last part of the installation unable to restart `supervisor` services as the `zulip` user. The `chown` line in `scripts/lib/install` exists to paper over this. Add a separate exec target for changes to `supervisord.conf` itself, which restarts the full service. This leaves the default `restart` action on the service for the lightweight `reread + update` action, which is more common. We use `systemctl` only on redhat-esque builds, because CI runs Ubuntu, but init is not systemd in that context. `systemctl reload` is sufficient to re-apply the socket ownership, but a full `restart` and not `reload` is necessary under `/etc/init.d/supervisor`.	2020-07-01 10:40:54 -07:00
Alex Vandiver	dd91f8edba	puppet: Move supervisor start command into zulip::common. Move this command alongside the rest of the distro-dependent supervisor paths.	2020-07-01 10:40:53 -07:00
Alex Vandiver	a5d63cfedf	wal-g: Update pg_backup_and_purge for wal-g format. wal-g has a slihghtly different format than wal-e in its `backup-list` output; it only contains three columns: - `name` - `last_modified`, - `wal_segment_backup_start` ..rather than wal-e's plethora, most of which were blank: - `name` - `last_modified` - `expanded_size_bytes` - `wal_segment_backup_start` - `wal_segment_offset_backup_start` - `wal_segment_backup_stop` - `wal_segment_offset_backup_stop` Remove one argument from the split.	2020-06-29 17:17:26 -07:00
Alex Vandiver	a21a086f5c	puppet: nagios-plugins-basic is replaced by monitoring-plugins-basic. In Bionic, nagios-plugins-basic is a transitional package which depends on monitoring-plugins-basic. In Focal, it is a virtual package, which means that every time puppet runs, it tries to re-install the nagios-plugins-basic package. Switch all instances to referring to `$zulip::common::nagios_plugins`, and repoint that to monitoring-plugins-basic.	2020-06-29 14:58:01 -07:00
Alex Vandiver	6fdcb4aa17	puppet: Move supervisor conf file path into zulip::common. Move this config file alongside the rest of the distro-dependent paths.	2020-06-29 13:41:05 -07:00
Alex Vandiver	93401448b9	puppet: Explain value of reload && update trick for supervisor. While the stock reload works just fine, it causes too much disruption.	2020-06-29 13:39:09 -07:00
Alex Vandiver	d2de5aced8	puppet: Remove unnecessary supervisor service name variable.	2020-06-29 13:39:09 -07:00
Alex Vandiver	73805f8279	puppet: Stop removing file that contains only comments. In modern PostgreSQL, this file, provided by `postgresql-common`, has no non-comment, non-blank lines. There's hence no reason to remove it.	2020-06-29 13:37:42 -07:00
Alex Vandiver	6e3a424921	puppet: Install the latest postgresql-client on frontend hosts. Frontend hosts in multiple-host configurations (including docker hosts) need a `psql` binary installed. `ca9d27175b` switched to not setting `postgresql.version` in `zulip.conf`, which in turn means that `$zulip::base::postgres_version` is unset. This, in turn, led to the frontend hosts installing `postgresql-client-`, whose trailing dash causes apt to _uninstall_ that package. Unconditionally install `postgresql-client` with no explicit version attached. This is a metapackage which depends on the latest client package, which currently means it will install `postgresql-client-12`. On single-host installs which have configured `postgresql.version` in `zulip.conf` to be a lower version, this will result in `postgresql-client-12` existing alongside another version (e.g. `postgresql-client-10`); `psql` will give the most recent. This is acceptable because the semantic meaning of the postgresql version in `zulip.conf` is about the database engine itself, not the command-line client.	2020-06-29 13:37:16 -07:00
Alex Vandiver	2c36bb19b2	puppet: Pull out `unzip` package which is identical in both cases.	2020-06-29 13:37:16 -07:00
Alex Vandiver	876ee4a8ed	installer: Remove code specific to stretch or xenial. Support for Xenial and Stretch was removed (`5154ddafca`, `0f4b1076ad`, `8944e0ad53`, `79acd5ae40`, `1219a2e854`), but not all codepaths were updated to remove their conditionals on it. Remove all code predicated on Xenial or Stretch. debathena support was migrated to Bionic, since that appears to be the current state of existing debathena servers.	2020-06-24 12:57:38 -07:00
Anders Kaseorg	a9e59b6bd3	memcached: Change the default MEMCACHED_USERNAME to zulip@localhost. This prevents memcached from automatically appending the hostname to the username, which was a source of problems on servers where the hostname was changed. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-19 21:22:30 -07:00
Alex Vandiver	7250d41bf7	puppet: Fix the path to install-wall-g	2020-06-17 15:23:18 -07:00
Alex Vandiver	03bffd3938	upgrade-zulip: Pin the postgres version to the OS default. We would prefer to use the postgres packages from Postgres themselves, if available. However, this requires ensures that, for existing installs, we preserve the same version of postgres as their base distribution installed. Move the version-determination logic from being computed at puppet interpolation time, to being computed at install time and pinned into zulip.conf.	2020-06-16 17:05:46 -07:00
Tim Abbott	26396c5e25	puppet: Fix exceptions with multiple certbot declarations. Since `9e8f1aacb3`, zulip_ops machines might have two Package declarations for `certbot`, which doesn't work in puppet. The fix is, as usual, to use our `zulip::safepackage` wrapper instead.	2020-06-15 18:21:33 -07:00
Alex Vandiver	bff3b540b1	puppet: Postgres replication should always switch to latest timeline. Omission of this setting makes resuming after a primary switchover difficult-to-impossible. It is the default in PostgreSQL 12.	2020-06-15 16:18:07 -07:00
Alex Vandiver	f8fc3a16eb	puppet: Use "primary" / "replica" consistently in comments. The style guide for Zulip is to always use "primary" and "replica" when describing database replication. Adjust a few comments under `puppet/` that do not adhere to this. Unfortunately, some references still remain to the insensitive and inaccurate "master" / "slave" terminology. However, these are only in files which we are attempting to preserve as close to the upstream versions they are derived from (e.g. postgresql.conf, postfix/master.cf).	2020-06-15 16:18:07 -07:00
Alex Vandiver	5f433d6eeb	puppet: Remove vestigial check_postgres.pl. `65774e1c4f` switched from using the bundled check_postgres.pl to using the version from packages; the file itself remained, however. Remove it, and clean up references to it. Fixes #15389.	2020-06-15 16:18:07 -07:00
Alex Vandiver	7d4a370a57	puppet: Move monitoring of pg replication to the pg hosts. Instead of SSH'ing around to them, run directly on the database hosts. This means that the replicas do not know how many bytes behind they are in _receiving_ the wall logs; thus, the monitoring also extends to the primary database, which knows that information for each replica. This also allows for detecting when there are too few active replicas.	2020-06-15 16:18:07 -07:00
Anders Kaseorg	5dc9b55c43	python: Manually convert more percent-formatting to f-strings. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	74c17bf94a	python: Convert more percent formatting to Python 3.6 f-strings. Generated by pyupgrade --py36-plus. Now including %d, %i, %u, and multi-line strings. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	1ed2d9b4a0	logging: Use logging.exception and exc_info for unexpected exceptions. logging.exception() and logging.debug(exc_info=True), etc. automatically include a traceback. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Tim Abbott	80589099d8	puppet: Fix typo in logic for whether to install certbot. Fixes #15372.	2020-06-14 16:04:39 -07:00
rht	89af2f381d	puppet: Link postgres dict symlinks to hunspell files on CentOS. This is a temporary measure until we can find the directory of postgresql dicts on CentOS.	2020-06-13 17:53:38 -07:00
rht	36a5ca5015	puppet: Add cyrus-sasl to memcached_packages on RedHat. This is to mirror the sasl2-bin package on Debian.	2020-06-13 17:49:51 -07:00
rht	e776d2d159	puppet: Abstract out owner:group of memcached-sasldb2.	2020-06-13 17:49:51 -07:00
Anders Kaseorg	91a86c24f5	python: Replace None defaults with empty collections where appropriate. Use read-only types (List ↦ Sequence, Dict ↦ Mapping, Set ↦ AbstractSet) to guard against accidental mutation of the default value. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-13 15:31:27 -07:00
Alex Vandiver	97b9308781	puppet: Merge multiple postgres roles in `zulip_ops`. All differences between the primary and replica roles having been merged, fold the `postgres_common`, `postgres_master`, and `postgres_slave` roles into just `postgres_appdb`.	2020-06-12 14:57:46 -07:00
Alex Vandiver	55bd31721d	puppet: Remove custom `vm.dirty_ratio` and `vm.dirty_background_ratio`. These values differed between the primary and secondary database hosts, for unclear reasons. The differences date back to their introduction in `387f63deaa`. As the comment in the replica confguration notes, settings of `vm.dirty_ratio = 10` and `vm.dirty_background_ratio = 5` matched the kernel defaults for "newer" kernels; however, kernel 2.6.30 bumped those to 20 and 10, respectively[1], as a fix for underlying logic now being more correct. Remove these overrides; they should at very least be consistent across roles, and the previous values look to be an attempt to tune for a very much older version of the Linux kernel, which was using an different, buggier, algorithm under the hood. [1] `1b5e62b42b`	2020-06-12 14:57:46 -07:00
Alex Vandiver	f39816e768	puppet: Stop distributing recovery.conf file. This file controls streaming replication, and recovery using wal-g on the secondary. The `primary_conninfo` data needs to change on short notice when database failover happens, in a way that is not suitable for being controlled by puppet. PostgreSQL 12, in fact, removes the use of the `recovery.conf` file[1]; the `primary_conninfo` and `restore_command` information goes into the main `postgresql.conf` file, and the standby status is controlled by the presence of absence of an empty `standby.signal` file. Remove the puppet control of the `recovery.conf` file. [1] https://pgstef.github.io/2018/11/26/postgresql12_preview_recovery_conf_disappears.html	2020-06-12 14:57:46 -07:00
Alex Vandiver	316498a169	puppet: Remove unnecessary nagios authentication setup. Since the nagios authentication is stored _in the database_, it is unnecessary to run if the database is simply a replica of the production database. The only case in which this statement would have an effect is if the postgres node contains a _different_ (or empty) database, which `setup_disks` now effectively prevents. Remove the unnecessary step.	2020-06-11 21:01:49 -07:00
Alex Vandiver	0774f54c1b	puppet: Move to `setup_disks` to postgres_common. The tooling should now be run no matter if the node is a primary or replica.	2020-06-11 21:01:49 -07:00
Alex Vandiver	6f6a0e890a	puppet: Run setup_disks based on symlink; remove mdadm dependency. `481613a344` updated the `setup_disks` script to no longer reference `mdadm`, since we no longer set up RAID on servers. Update the puppet that would call it to remove the `mdadm` dependency, and run only if the state is not what it produces -- namely, a symlink for `/var/lib/postgresql`, which must point to an existent `/srv/postgresql` directory.	2020-06-11 21:01:49 -07:00
Alex Vandiver	1dc2de5026	puppet: Update setup-disks to be idempotent. The end state it produces is _either_: - `/srv/postgresql` already existed, which was symlinked into `/var/lib/postgresql`; postgres is left untouched. This is the situation if `setup_disks` is run on the database primary, or a replica which was correctly configured. - An empty `/srv/postgresql` now exists, symlinked into `/var/lib/postgresql`, and postgres is stopped. This is the situation if `puppet` was just run on a new host, or a previously-configured host was rebooted (clearing the temporary disk in `/dev/nvme0`) In the latter case, where `/srv/postgresql` is now empty, any previous contents of `/var/lib/postgresql` are placed under `/root`, timestamped for uniqueness. In either case, the tool should now be idempotent.	2020-06-11 21:01:49 -07:00
Alex Vandiver	8373f5f4b9	puppet: Make parent directories of postgresql.conf This fixes errors when provisioning a new system (or version of postgres) when the configuration file cannot be written because its parent directories do not exist. Files inherently depend on their containing directories, so no explicit dependencies are necessary.	2020-06-11 20:56:55 -07:00
Alex Vandiver	9fd7a026ad	puppet: Pull postgres data directory into postgres_appdb_base. The `pg_datadir` variable was only used, and accurate, for CentOS. Pull it out into `postgres_app_base`, broaden it to being accurate on Debian-based systems as well, and use it consistently in the templates.	2020-06-11 20:56:55 -07:00
Alex Vandiver	16c4cea951	puppet: Pull postgres config directory into postgres_appdb_base. As the previous commit, this is currently only used in tuning, but is a property of the whole postgres configuration; move it there, as just the directory, not the file. Use this directory consistently in the erb templates. Since we produce a `pg_hba.conf`, it makes sense that we point to the path that we know that we explicitly wrote to, for instance.	2020-06-11 20:56:55 -07:00

... 4 5 6 7 8 ...

1496 Commits