zulip

Commit Graph

Author	SHA1	Message	Date
Alex Vandiver	eaaa2fbff8	nagios: Use canonical "hostgroup_name" consistently.	2022-06-22 12:07:38 -07:00
Alex Vandiver	e8996b53a5	nagios: Remove unused has_swap hostgroup.	2022-06-22 12:07:38 -07:00
Alex Vandiver	33472ee9ff	nagios: Remove unused stats host set.	2022-06-22 12:07:38 -07:00
Alex Vandiver	bc4f4b4862	nagios: Make the pageable/not/flaky tri-state clearer.	2022-06-22 12:07:38 -07:00
Alex Vandiver	c74f195fba	nagios: Split AWS and non-AWS hosts, for ntp checks. The non-AWS hosts cannot use the AWS ntp server for their check.	2022-06-22 12:07:38 -07:00
Alex Vandiver	872efdee58	nagios: Fold single- and multitornado_frontends back into frontends. `5abf4dee92` made this distinction, then multitornado_frontends was never used; the singletornado_frontends alerting worked even for the multiple-Tornado instances. Remove the useless and misleading distinction.	2022-06-22 12:07:38 -07:00
Anders Kaseorg	dc6af98e52	nginx: Add Cache-Control headers for Django-hashed static files. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-06-21 17:26:23 -07:00
Alex Vandiver	0645656fd8	process_fts_updates: Nagios may lack permissions to load Django config. Even if Django and PostgreSQL are on the same host, the `nagios` user may lack permissions to read accessory configuration files needed to load the Django configuration (e.g. authentication keys). Catch those failures, and switch to loading the required settings from `/etc/zulip/zulip.conf`.	2022-06-21 12:50:13 -07:00
Anders Kaseorg	a7f9c4f958	logging: Pass more format arguments to logging. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-06-03 12:27:23 -07:00
Alex Vandiver	aa46d8d2a8	puppet: Enable strict typo checking in uwsgi.	2022-06-02 13:20:48 -07:00
Alex Vandiver	18ec3b6215	puppet: Enable background worker threads in uwsgi. Without this, uwsgi does not release the GIL before going back into `epoll_wait` to wait for the next request. This results in any background threads languishing, unserviced.[1] Practically, this results in Sentry background reporter threads timing out when attempting to post results -- but only in situations with low traffic, as in those significant time is spent in `epoll_wait`. This is seen in logs as: WARN [urllib3.connectionpool] Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))': /api/123456789/envelope/ Or: WARN [urllib3.connectionpool] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProxyError('Cannot connect to proxy.', RemoteDisconnected('Remote end closed connection without response'))': /api/123456789/envelope/ Sentry attempts to detect this and warn, but due to startup ordering, the warning is not printed without lazy-loading. Enable threads, at a miniscule performance cost, in order to support background workers like Sentry[2]. [1] https://github.com/unbit/uwsgi/issues/1141#issuecomment-169042767 [2] https://docs.sentry.io/clients/python/advanced/#a-note-on-uwsgi	2022-06-02 13:20:48 -07:00
Alex Vandiver	919c904091	puppet: Give the uwsgi processes a shorter process name. Previously, the complete command line, which is quite long, is shown: 3963143 ? SN 0:00 /home/zulip/deployments/current/zulip-current-venv/bin/uwsgi --ini /etc/zulip/uwsgi.ini 3963144 ? SN 0:03 \_ /home/zulip/deployments/current/zulip-current-venv/bin/uwsgi --ini /etc/zulip/uwsgi.ini 3963145 ? SN 0:03 \_ /home/zulip/deployments/current/zulip-current-venv/bin/uwsgi --ini /etc/zulip/uwsgi.ini 3963146 ? SN 0:03 \_ /home/zulip/deployments/current/zulip-current-venv/bin/uwsgi --ini /etc/zulip/uwsgi.ini 3963147 ? SN 0:03 \_ /home/zulip/deployments/current/zulip-current-venv/bin/uwsgi --ini /etc/zulip/uwsgi.ini 3963148 ? SN 0:03 \_ /home/zulip/deployments/current/zulip-current-venv/bin/uwsgi --ini /etc/zulip/uwsgi.ini 3963149 ? SN 0:03 \_ /home/zulip/deployments/current/zulip-current-venv/bin/uwsgi --ini /etc/zulip/uwsgi.ini Configure uwsgi to rename and number the processes. This results in: 3907613 ? SN 0:00 zulip-django uWSGI master 3907614 ? SN 0:05 \_ zulip-django uWSGI worker 1 3907615 ? SN 0:03 \_ zulip-django uWSGI worker 2 3907616 ? SN 0:05 \_ zulip-django uWSGI worker 3 3907617 ? SN 0:05 \_ zulip-django uWSGI worker 4 3907618 ? SN 0:05 \_ zulip-django uWSGI worker 5 3907619 ? SN 0:05 \_ zulip-django uWSGI worker 6	2022-06-02 13:20:48 -07:00
Alex Vandiver	a522ad1d9a	puppet: Always create a uwsgi master control socket. This is potentially useful even with rolling restarts disabled.	2022-06-02 13:20:48 -07:00
Alex Vandiver	721a101f12	puppet: Reorganize and comment uwsgi.ini file. As the uwsgi documentation is somewhat obtuse, more comments are added here than might usually be.	2022-06-02 13:20:48 -07:00
Alex Vandiver	3741c1c034	puppet: Switch to checking time against the AWS timeserver. Since this is what chrony is sync'ing to, it lessens the chance of spurious firings of this alert. See https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync-service/	2022-05-31 22:57:32 -07:00
Alex Vandiver	a201e3b25b	puppet: Upgrade wal-g to 2.0.0.	2022-05-22 14:51:18 -07:00
Alex Vandiver	c8ee53619d	puppet: Upgrade go and smokescreen.	2022-05-22 14:51:18 -07:00
Alex Vandiver	4a5e530743	puppet: Upgrade Grafana to 8.5.3, for CVE-2022-29170.	2022-05-22 14:51:18 -07:00
Alex Vandiver	baed1214f2	puppet: Only fix certbot certificates if https is enabled. This is a reprise of `c97162e485`, but for the case where certbot certs are no longer in use by way of enabling `http_only` and letting another server handle TLS termination. Fixes: #22034.	2022-05-17 15:03:44 -07:00
Alex Vandiver	62f234328d	puppet: Include the OS-enabled nginx module configurations. This allows system-level configuration to be done by `apt-get install` of nginx modules, which place their load statements in this directory. The initial import in `ed0cb0a5f8` of the stock nginx config omitted this include -- one potential explanation was in an effort to reduce the memory footprint of the server. The default nginx install enables: 50-mod-http-auth-pam.conf 50-mod-http-dav-ext.conf 50-mod-http-echo.conf 50-mod-http-geoip2.conf 50-mod-http-geoip.conf 50-mod-http-image-filter.conf 50-mod-http-subs-filter.conf 50-mod-http-upstream-fair.conf 50-mod-http-xslt-filter.conf 50-mod-mail.conf 50-mod-stream.conf While Zulip doesn't actively use any of these, they likely don't do any harm to simply be loaded -- they are loaded into every nginx by default. Having the `modules-enabled` include allows easier extension of the server, as neither of the existing wildcard includes (`/etc/nginx/conf.d/.conf` and `/etc/nginx/zulip-include/app.d/.conf`) are in the top context, and thus able to load modules.	2022-05-17 15:03:07 -07:00
Alex Vandiver	814841c9ec	puppet: Remove typo'd cron job. `54b6a83412` fixed the typo introduced in `49ad188449`, but that does not clean up existing installs which had the file with the wrong name already. Remove the file with the typo'd name, so two jobs do not race, and fix the typo in the comment.	2022-05-16 14:57:21 -07:00
Alex Vandiver	20b7a2d450	puppet: Each worker should chdir after forking. The top-level `chdir` setting only does the chdir once, at initial `uwsgi` startup time. Rolling restarts, however, however, require that `uwsgi` pick up the _new_ value of the `current` directory, and start new workers in that directory -- as currently implemented, rolling restarts cannot restart into newer versions of the code, only the same one in which they were started. Use [configurable hooks][1] to execute the `chdir` after every fork. This causes the following behaviour: ``` Thu May 12 18:56:55 2022 - chain reload starting... Thu May 12 18:56:55 2022 - chain next victim is worker 1 Gracefully killing worker 1 (pid: 1757689)... worker 1 killed successfully (pid: 1757689) Respawned uWSGI worker 1 (new pid: 1757969) Thu May 12 18:56:56 2022 - chain is still waiting for worker 1... running "chdir:/home/zulip/deployments/current" (post-fork)... Thu May 12 18:56:57 2022 - chain is still waiting for worker 1... Thu May 12 18:56:58 2022 - chain is still waiting for worker 1... Thu May 12 18:56:59 2022 - chain is still waiting for worker 1... WSGI app 0 (mountpoint='') ready in 3 seconds on interpreter 0x55dfca409170 pid: 1757969 (default app) Thu May 12 18:57:00 2022 - chain next victim is worker 2 [...] ``` ..and so forth down the line of processes. Each process is correctly started in the _current_ value of `current`, and thus picks up the correct code. [1]: https://uwsgi-docs.readthedocs.io/en/latest/Hooks.html	2022-05-12 21:54:02 -07:00
Alex Vandiver	7f6a77da31	puppet: Add a redis exporter.	2022-05-03 17:13:44 -07:00
Anders Kaseorg	e9ba9b0e0d	zulip-ec2-configure-interfaces: Remove. Our current EC2 systems don’t have an interface named ‘eth0’, and if they did, this script would do nothing but crash with ImportError because we have never installed boto.utils for Python 3. (The message of commit `2a4d851a7c` made an effort to document for future researchers why this script should not have been blindly converted to Python 3. However, commit `2dc6d09c2a` (#14278) was evidently unresearched and untested.) Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-05-03 02:25:59 -07:00
Alex Vandiver	d891b9590a	puppet: Fix non-replicated PostgreSQL 10 and 11 configuration. `6f5ae8d13d` removed the `$replication` variable from the configurations of PostgreSQL 12 and higher, but left it in the templates for PostgreSQL 10 and 11. Because `undef != ''`, deployments on PostgreSQL 10 and 11 started trying to push to S3 backups, regardless of if they were configured, leaving frequent log messages like: ``` 2022-04-30 12:45:47.805 UTC [626d24ec.1f8db0]: [107-1] LOG: archiver process (PID 2086106) exited with exit code 1 2022-04-30 12:45:49.680 UTC [626d24ee.1f8dc3]: [18-1] LOG: checkpoint complete: wrote 19 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=1.910 s, sync=0.022 s, total=1.950 s; sync files=16, longest=0.018 s, average=0.002 s; distance=49 kB, estimate=373 kB /usr/bin/timeout: failed to run command "/usr/local/bin/env-wal-g": No such file or directory 2022-04-30 12:46:17.852 UTC [626d2f99.1fd4e9]: [1-1] FATAL: archive command failed with exit code 127 2022-04-30 12:46:17.852 UTC [626d2f99.1fd4e9]: [2-1] DETAIL: The failed archive command was: /usr/bin/timeout 10m /usr/local/bin/env-wal-g wal-push pg_wal/000000010000000300000080 ``` Switch the PostgreSQL 10 and 11 configuration to check `s3_backups_bucket`, like the other versions.	2022-05-02 16:46:10 -07:00
Anders Kaseorg	646a4d19a3	puppet: Remove quotes for enumerable values. https://puppet.com/docs/puppet/7/style_guide.html#style_guide_module_design-quoting “If a string is a value from an enumerable set of options, such as present and absent, it SHOULD NOT be enclosed in quotes at all.” Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-04-29 22:06:46 -07:00
Alex Vandiver	c97162e485	puppet: Check that certbot certs are in use before fixing them. It is possible to have previously installed certbot, but switched back to using self-signed certificates -- in which case renewing them using certbot may fail. Verify that the certificate is a symlink into certbot's output directory before running `fix-standalone-certbot`.	2022-04-27 16:01:15 -07:00
Anders Kaseorg	098a514599	python: Use Python 3.8 shlex.join function. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-04-27 12:57:49 -07:00
Alex Vandiver	35db1ee435	puppet: Only include "app_service" section if there are apps. This works around gravitational/teleport#12256, but also produces config files that are slightly cleaner.	2022-04-26 16:36:13 -07:00
Anders Kaseorg	a7e6cb7705	puppet: ‘supervisorctl stop all’ before restarting Supervisor. This fixes a failure of the 3.4 upgrade test running on Ubuntu 20.04 with Supervisor 4. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-04-26 16:32:02 -07:00
Alex Vandiver	e5548ecba0	puppet: Upgrade external dependencies.	2022-04-21 13:54:14 -07:00
Alex Vandiver	1151118cc8	puppet: Upgrade Grafana to 8.4.6.	2022-04-12 16:41:45 -07:00
Alex Vandiver	572443edc6	puppet: Remove memcached SASL workaround. https://bugs.launchpad.net/ubuntu/+source/memcached/+bug/1878721 was fixed and released in Focal in 2020-06-24. We don't bother with an `ensure => absent` because leaving this in-place for existing installs does no harm.	2022-04-08 14:59:45 -07:00
Anders Kaseorg	935cb605a5	puppet: Do not ensure Chrony is running. Commit `f6d27562fa` (#21564) tried to ensure Chrony is running, which fails in containers where Chrony doesn’t have permission to update the host clock. The Debian package should still attempt to start it, and Puppet should still restart it when chrony.conf is modified. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-03-30 11:37:54 -07:00
Alex Vandiver	f6d27562fa	puppet: Configure chrony to use AWS-local NTP sources. This prevents hosts from spewing traffic to random hosts across the Internet.	2022-03-25 17:07:53 -07:00
Alex Vandiver	5e128e7cad	puppet: Extract the wal-g configuration from the backups. This will allow it to be used for monitoring, to check the state in S3 rather than just trusting the backups when they said they ran.	2022-03-25 17:05:30 -07:00
Alex Vandiver	d7b59c86ce	puppet: Build wal-g from source for aarch64. Since wal-g does not provide binaries for aarch64, build them from source. While building them from source for arm64 would better ensure that build process is tested, the build process takes 7min and 700M of temp files, which is an unacceptable cost; we thus only build on aarch64. Since the wal-g build process uses submodules, which are not in the Github export, we clone the full wal-g repository. Because the repository is relatively small, we clone it anew on each new version, rather than attempt to manage the remotes. Fixes #21070.	2022-03-22 15:02:35 -07:00
Alex Vandiver	4d4c320a07	puppet: Switch from ntp to chrony. Chrony is the recommended time server for Ubuntu since 18.04[1], and is the default on Redhat; it is more accurate, and has lower-memory usage, than ntp, which is only getting best-effort security maintenance. See: - https://wiki.ubuntu.com/BionicBeaver/ReleaseNotes#Chrony - https://chrony.tuxfamily.org/comparison.html - https://engineering.fb.com/2020/03/18/production-engineering/ntp-service/	2022-03-22 13:07:27 -07:00
Alex Vandiver	a2c8be9cd5	puppet: Increase download timeout from 5m to 10m. The default timeout for `exec` commands in Puppet is 5 minutes[1]. On slow connections, this may not be sufficient to download larger downloads, such as the ~135MB golang tarball. Increase the timeout to 10 minutes; this is a minimum download speed of is ~225kB/s. Fixes #21449. [1]: https://puppet.com/docs/puppet/5.5/types/exec.html#exec-attribute-timeout	2022-03-21 15:47:04 -07:00
Alex Vandiver	9e850b08f3	puppet: Fix the PostgreSQL paths to recovery.conf / standby.conf.	2022-03-20 16:16:04 -07:00
Alex Vandiver	1bd5723cd2	puppet: Add a prometheus monitor for tornado processes.	2022-03-20 16:12:11 -07:00
Alex Vandiver	6b91652d9a	puppet: Open the grok_exporter port. The complete grok_exporter configuration is not ready to be committed, but this at least prepares the way for it.	2022-03-20 16:12:11 -07:00
Alex Vandiver	6558655fc6	puppet: Add rabbitmq prometheus plugin, and open the firewall.	2022-03-20 16:12:11 -07:00
Alex Vandiver	bdd2f35d05	puppet: Switch czo to using zulip_ops::app_frontend_monitoring. This was clearly intended in `f61ac4a28d` but never executed.	2022-03-20 16:12:11 -07:00
Alex Vandiver	17699bea44	puppet: postgresql_backups is auto-included if s3_backups_bucket is set. Since `6496d43148`.	2022-03-20 16:12:11 -07:00
Alex Vandiver	bedc7c2986	puppet: Smokescreen is now auto-included in standalone. Since `c33562f0a8`.	2022-03-20 16:12:11 -07:00
Alex Vandiver	6489c832a3	puppet: Upgrade third-party package versions.	2022-03-17 11:44:05 -07:00
Alex Vandiver	d17006da55	puppet: Support setting an `ssl_mode` verification level.	2022-03-15 12:43:50 -07:00
Alex Vandiver	253bef27f5	puppet: Support password-based PostgreSQL replication.	2022-03-15 12:43:50 -07:00
Sahil Batra	f0606b34ad	user_groups: Add cron job for adding users to full members system group. This commit adds a cron job which runs every hour to add the users to full members system group if user is promoted to a full member. This should ensure that full member status is available no more than an hour after configuration suggests it should be.	2022-03-14 18:53:47 -07:00
Alex Vandiver	6f5ae8d13d	puppet: wal-g backups are required for replication. Previously, it was possible to configure `wal-g` backups without replication enabled; this resulted in only daily backups, not streaming backups. It was also possible to enable replication without configuring the `wal-g` backups bucket; this simply failed to work. Make `wal-g` backups always streaming, and warn loudly if replication is enabled but `wal-g` is not configured.	2022-03-11 10:09:35 -08:00
Alex Vandiver	6496d43148	puppet: Only s3_backups_bucket is required for backups. `s3_backups_key` / `s3_backups_secret_key` are optional, as the permissions could come from the EC2 instance's role.	2022-03-11 10:09:35 -08:00
Alex Vandiver	19beed2709	puppet: Default s3_region to the current ec2 region.	2022-03-11 10:09:35 -08:00
Anders Kaseorg	b3260bd610	docs: Use Debian and Ubuntu version numbers over development codenames. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-23 12:04:24 -08:00
Anders Kaseorg	1629d6bfb3	python: Reformat with Black 22 (stable). Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-18 18:03:13 -08:00
Alex Vandiver	c656d933fa	puppet: Switch from $::memorysize_mb to non-legacy $::memory.	2022-02-15 12:04:37 -08:00
Alex Vandiver	f2f4462e71	puppet: Switch from $::fqdn to non-legacy $::networking.	2022-02-15 12:04:37 -08:00
Alex Vandiver	bb4c0799cc	puppet: Switch to the canonical case for $::os['family']. The == operator in Puppet is case-insensitive for ASCII characters[1], which is potentially surprising. Switch to the canonical case that `$::os['family']` returns. [1] https://puppet.com/docs/puppet/5.5/lang_expressions.html#string-encoding-and-comparisons	2022-02-15 12:04:37 -08:00
Alex Vandiver	d4eefbbeea	puppet: Switch from $::osfamily to non-legacy $::os.	2022-02-15 12:04:37 -08:00
Alex Vandiver	a787ebe0e2	puppet: Switch from $::architecture to non-legacy $::os.	2022-02-15 12:04:37 -08:00
Alex Vandiver	d7e8733705	puppet: Use goarch for wal-g. wal-g does not currently provide pre-built binaries for arm64/aarch64 (see #21070) but if they begin to, it will likely be with the goarch names.	2022-02-15 12:04:37 -08:00
Alex Vandiver	abdbe4ca83	puppet: Use goarch for go-camo.	2022-02-15 12:04:37 -08:00
Alex Vandiver	be2f2a5bde	puppet: Use goarch for golang. Fixes: #21051.	2022-02-15 12:04:37 -08:00
Alex Vandiver	788daa953b	puppet: Factor out $::architecture case statement for golang.	2022-02-15 12:04:37 -08:00
Anders Kaseorg	f6a701090c	setup-apt-repos: Don’t install lsb_release. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-14 16:38:53 -08:00
Anders Kaseorg	45f4db9702	puppet: Remove unused $release_name. It would confuse a future Debian 15.10 release with Ubuntu 15.10, it relies on the legacy fact $::operatingsystemrelease, the modern fact $::os provides this information without extra logic, and it’s unused as of commit `03bffd3938`. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-14 16:38:53 -08:00
Alex Vandiver	291c5e87b6	puppet: Upgrade prometheus to 2.33.1.	2022-02-09 20:32:24 -08:00
Alex Vandiver	2d538c2356	puppet: Upgrade grafana to 8.3.6.	2022-02-09 20:32:24 -08:00
Alex Vandiver	f2e66c0b20	puppet: Upgrade go-camo to 2.4.0.	2022-02-09 20:32:24 -08:00
Alex Vandiver	51a516384d	puppet: Upgrade golang to 1.17.6.	2022-02-09 20:32:24 -08:00
Alex Vandiver	48263a01dd	puppet: Upgrade puppet libraries.	2022-02-09 20:32:24 -08:00
Alex Vandiver	e032b38661	puppet: Fix typo in uwsgi exporter dependency.	2022-02-08 15:17:17 -08:00
Alex Vandiver	b3900bec7e	puppet: Upgrade Grafana to 8.3.5. https://grafana.com/docs/grafana/latest/release-notes/release-notes-8-3-5/	2022-02-08 11:13:40 -08:00
Alex Vandiver	a46f6df91e	CVE-2021-43799: Write rabbitmq configuration before starting. Zulip writes a `rabbitmq.config` configuration file which locks down RabbitMQ to listen only on localhost:5672, as well as the RabbitMQ distribution port, on localhost:25672. The "distribution port" is part of Erlang's clustering configuration; while it is documented that the protocol is fundamentally insecure ([1], [2]) and can result in remote arbitrary execution of code, by default the RabbitMQ configuration on Debian and Ubuntu leaves it publicly accessible, with weak credentials. The configuration file that Zulip writes, while effective, is only written _after_ the package has been installed and the service started, which leaves the port exposed until RabbitMQ or system restart. Ensure that rabbitmq's `/etc/rabbitmq/rabbitmq.config` is written before rabbitmq is installed or starts, and that changes to that file trigger a restart of the service, such that the ports are only ever bound to localhost. This does not mitigate existing installs, since it does not force a rabbitmq restart. [1] https://www.erlang.org/doc/apps/erts/erl_dist_protocol.html [2] https://www.erlang.org/doc/reference_manual/distributed.html#distributed-erlang-system	2022-01-25 01:48:05 +00:00
Alex Vandiver	43d63bd5a1	puppet: Always set the RabbitMQ nodename to zulip@localhost. This is required in order to lock down the RabbitMQ port to only listen on localhost. If the nodename is `rabbit@hostname`, in most circumstances the hostname will resolve to an external IP, which the rabbitmq port will not be bound to. Installs which used `rabbit@hostname`, due to RabbitMQ having been installed before Zulip, would not have functioned if the host or RabbitMQ service was restarted, as the localhost restrictions in the RabbitMQ configuration would have made rabbitmqctl (and Zulip cron jobs that call it) unable to find the rabbitmq server. The previous commit ensures that configure-rabbitmq is re-run after the nodename has changed. However, rabbitmq needs to be stopped before `rabbitmq-env.conf` is changed; we use an `onlyif` on an `exec` to print the warning about the node change, and let the subsequent config change and notify of the service and configure-rabbitmq to complete the re-configuration.	2022-01-25 01:48:02 +00:00
Alex Vandiver	3bfcfeac24	puppet: Run configure-rabbitmq on nodename change. `/etc/rabbitmq/rabbitmq-env.conf` sets the nodename; anytime the nodename changes, the backing database changes, and this requires re-creating the rabbitmq users and permissions. Trigger this in puppet by running configure-rabbitmq after the file changes.	2022-01-25 01:46:51 +00:00
Alex Vandiver	694c4dfe8f	puppet: Admit we leave epmd port 4369 open on all interfaces. The Erlang `epmd` daemon listens on port 4369, and provides information (without authentication) about which Erlang processes are listening on what ports. This information is not itself a vulnerability, but may provide information for remote attackers about what local Erlang services (such as `rabbitmq-server`) are running, and where. `epmd` supports an `ERL_EPMD_ADDRESS` environment variable to limit which interfaces it binds on. While this environment variable is set in `/etc/default/rabbitmq-server`, Zulip unfortunately attempts to start `epmd` using an explicit `exec` block, which ignores those settings. Regardless, this lack of `ERL_EPMD_ADDRESS` variable only controls `epmd`'s startup upon first installation. Upon reboot, there are two ways in which `epmd` might be started, neither of which respect `ERL_EPMD_ADDRESS`: - On Focal, an `epmd` service exists and is activated, which uses systemd's configuration to choose which interfaces to bind on, and thus `ERL_EPMD_ADDRESS` is irrelevant. - On Bionic (and Focal, due to a broken dependency from `rabbitmq-server` to `epmd@` instead of `epmd`, which may lead to the explicit `epmd` service losing a race), `epmd` is started by `rabbitmq-server` when it does not detect a running instance. Unfortunately, only `/etc/init.d/rabbitmq-server` would respects `/etc/default/rabbitmq-server` -- and it defers the actual startup to using systemd, which does not pass the environment variable down. Thus, `ERL_EPMD_ADDRESS` is also irrelevant here. We unfortunately cannot limit `epmd` to only listening on localhost, due to a number of overlapping bugs and limitations: - Manually starting `epmd` with `-address 127.0.0.1` silently fails to start on hosts with IPv6 disabled, due to an Erlang bug ([1], [2]). - The dependencies of the systemd `rabbitmq-server` service can be fixed to include the `epmd` service, and systemd can be made to bind to `127.0.0.1:4369` and pass that socket to `epmd`, bypassing the above bug. However, the startup of this service is not guaranteed, because it races with other sources of `epmd` (see below). - Any process that runs `rabbitmqctl` results in `epmd` being started if one is not currently running; these instances do not respect any environment variables as to which addresses to bind on. This is also triggered by `service rabbitmq-server status`, as well as various Zulip cron jobs which inspect the rabbitmq queues. As such, it is difficult-to-impossible to ensure that some other `epmd` process will not win the race and open the port on all interfaces. Since the only known exposure from leaving port 4369 open is information that rabbitmq is running on the host, and the complexity of adjusting this to only bind on localhost is high, we remove the setting which does not address the problem, and document that the port is left open, and should be protected via system-level or network-level firewalls. [1]: https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109 [2]: https://github.com/erlang/otp/issues/4820	2022-01-25 01:46:51 +00:00
Alex Vandiver	2713e90eaf	puppet: Remove rabbitmq_mochiweb configuration. mochiweb was renamed to web_dispatch in RabbitMQ 3.8.0, and the plugin is not enabled. Nor does this control the management interface, which would listen on port 15672.	2022-01-25 01:46:51 +00:00
Alex Vandiver	a3adaf4aa3	puppet: Fix standalone certbot configurations. This addresses the problems mentioned in the previous commit, but for existing installations which have `authenticator = standalone` in their configurations. This reconfigures all hostnames in certbot to use the webroot authenticator, and attempts to force-renew their certificates. Force-renewal is necessary because certbot contains no way to merely update the configuration. Let's Encrypt allows for multiple extra renewals per week, so this is a reasonable cost. Because the certbot configuration is `configobj`, and not `configparser`, we have no way to easily parse to determine if webroot is in use; additionally, `certbot certificates` does not provide this information. We use `grep`, on the assumption that this will catch nearly all cases. It is possible that this will find `authenticator = standalone` certificates which are managed by Certbot, but not Zulip certificates. These certificates would also fail to renew while Zulip is running, so switching them to use the Zulip webroot would still be an improvement. Fixes #20593.	2022-01-24 12:13:44 -08:00
Anders Kaseorg	97e4e9886c	python: Replace universal_newlines with text. This is supported in Python ≥ 3.7. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-01-23 22:16:01 -08:00
Anders Kaseorg	a58a71ef43	Remove Ubuntu 18.04 support. As a consequence: • Bump minimum supported Python version to 3.7. • Move Vagrant environment to Debian 10, which has Python 3.7. • Move CI frontend tests to Debian 10. • Move production build test to Debian 10. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-01-21 17:26:14 -08:00
Alex Vandiver	3bbe5c1110	puppet: Put comments on iptables lines. In addition to documenting the rules.v4 and rules.v6 files slightly, these comments show up in `iptables -L`: ``` root@hostname:~# iptables -L INPUT Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere LOGDROP all -- anywhere localhost/8 ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED ACCEPT tcp -- anywhere anywhere tcp dpt:ssh /* ssh / ACCEPT tcp -- anywhere anywhere tcp dpt:3000 / grafana / ACCEPT tcp -- anywhere anywhere tcp dpt:9100 / node_exporter */ LOGDROP all -- anywhere anywhere ```	2022-01-21 16:46:14 -08:00
Alex Vandiver	6bc5849ea8	puppet: Remove now-unused debathena apt repository.	2022-01-18 14:13:28 -08:00
Alex Vandiver	b3f07cc98d	puppet: Replace debathena zephyr package with equivalent puppet file.	2022-01-18 14:13:28 -08:00
Alex Vandiver	a6d7539571	puppet: Replace debathena krb5 package with equivalent puppet file.	2022-01-18 14:13:28 -08:00
Alex Vandiver	75224ea5de	puppet: python-dev is now purely virtual; install python2.7-dev.	2022-01-18 14:13:28 -08:00
Alex Vandiver	fc1adef28a	puppet: Fix server_name of internal staging server.	2022-01-18 12:36:56 -08:00
Alex Vandiver	7e630b81f8	puppet: Switch to using snakeoil certs for staging. This parallels `ba3b88c81b`, but for the staging host.	2022-01-18 12:36:56 -08:00
Alex Vandiver	fb4d9764fa	puppet: Bump Grafana version, for 8.3.4. security release.	2022-01-18 12:33:02 -08:00
Alex Vandiver	434bda01c7	puppet: Enable camo prometheus metrics. Doing so requires protecting /metrics from direct access when proxied through nginx. If camo is placed on a separate host, the equivalent /metrics URL may need to be protected. See https://github.com/cactus/go-camo#metrics for details on the statistics so reported. Note that 5xx responses are _expected_ from go-camo's statistics, as it returns 502 status code when the remote server responds with 500/502/503/504, or 504 when the remote host times out.	2022-01-13 14:19:18 -08:00
Alex Vandiver	0b8a6a51b8	puppet: Remove all parts of AWS kernels. Otherwise, we just uninstall the meta-package, and still restart into the installed AWS kernel.	2022-01-12 15:52:19 -08:00
Alex Vandiver	4d7e6b26df	puppet: Provide more attributes to teleport on ssh nodes.	2022-01-12 14:15:45 -08:00
Alex Vandiver	339e70671c	puppet: Switch Grafana to Grafana 8 Unified Alerting.	2022-01-11 14:27:11 -08:00
Alex Vandiver	6a7eecee9a	puppet: Increase load paging thresholds.	2022-01-11 09:38:31 -08:00
Alex Vandiver	1e80b844f4	puppet: Disable apparmor profile for msmtp. As the nagios user, we want to read the msmtp configuration from ~nagios, which apparmor's profile does not allow msmtp to do.	2022-01-11 09:38:31 -08:00
Alex Vandiver	3c95ad82c6	puppet: Upgrade to nagios4. This updates the puppeted nagios configuration file for the Nagios4 defaults.	2022-01-11 09:38:31 -08:00
Alex Vandiver	d328d3dd4d	puppet: Allow routing camo requests through an outgoing proxy. Because Camo includes logic to deny access to private subnets, routing its requests through Smokescreen is generally not necessary. However, it may be necessary if Zulip has configured a non-Smokescreen exit proxy. Default Camo to using the proxy only if it is not Smokescreen, with a new `proxy.enable_for_camo` setting to override this behaviour if need be. Note that that setting is in `zulip.conf` on the host with Camo installed -- not the Zulip frontend host, if they are different. Fixes: #20550.	2022-01-07 12:08:10 -08:00
Alex Vandiver	2c5fc1827c	puppet: Standardize what values are bools, and what true is. For `no_serve_uploads`, `http_only`, which previously specified "non-empty" to enable, this tightens what values are true. For `pgroonga` and `queue_workers_multiprocess`, this broadens the possible values from `enabled`, and `true` respectively.	2022-01-07 12:08:10 -08:00
Alex Vandiver	1e672e4d82	puppet: Remove unused $no_serve_uploads in app_frontend.	2022-01-07 12:08:10 -08:00
Alex Vandiver	6218ed91c2	puppet: Use lazy-apps and uwsgi control sockets for rolling reloads. Restarting the uwsgi processes by way of supervisor opens a window during which nginx 502's all responses. uwsgi has a configuration called "chain reloading" which allows for rolling restart of the uwsgi processes, such that only one process at once in unavailable; see uwsgi documentation ([1]). The tradeoff is that this requires that the uwsgi processes load the libraries after forking, rather than before ("lazy apps"); in theory this can lead to larger memory footprints, since they are not shared. In practice, as Django defers much of the loading, this is not as much of an issue. In a very basic test of memory consumption (measured by total memory - free - caches - buffers; 6 uwsgi workers), both immediately after restarting Django, and after requesting `/` 60 times with 6 concurrent requests: \| Non-lazy \| Lazy app \| Difference ------------------+------------+------------+------------- Fresh \| 2,827,216 \| 2,870,480 \| +43,264 After 60 requests \| 3,332,284 \| 3,409,608 \| +77,324 ..................\|............\|............\|............. Difference \| +505,068 \| +539,128 \| +34,060 That is, "lazy app" loading increased the footprint pre-requests by 43MB, and after 60 requests grew the memory footprint by 539MB, as opposed to non-lazy loading, which grew it by 505MB. Using wsgi "lazy app" loading does increase the memory footprint, but not by a large percentage. The other effect is that processes may be served by either old or new code during the restart window. This may cause transient failures when new frontend code talks to old backend code. Enable chain-reloading during graceful, puppetless restarts, but only if enabled via a zulip.conf configuration flag. Fixes #2559. [1]: https://uwsgi-docs.readthedocs.io/en/latest/articles/TheArtOfGracefulReloading.html#chain-reloading-lazy-apps	2022-01-05 14:48:52 -08:00

1 2 3 4 5 ...

1442 Commits