zulip

Commit Graph

Author	SHA1	Message	Date
Alex Vandiver	0935d388f0	nginx: Set X-Forwarded-Proto based on trust from requesting source. Django has a `SECURE_PROXY_SSL_HEADER` setting[^1] which controls if it examines a header, usually provided by upstream proxies, to allow it to treat requests as "secure" even if the proximal HTTP connection was not encrypted. This header is usually the `X-Forwarded-Proto` header, and the Django configuration has large warnings about ensuring that this setting is not enabled unless `X-Forwarded-Proto` is explicitly controlled by the proxy, and cannot be supplied by the end-user. In the absence of this setting, Django checks the `wsgi.url_scheme` property of the WSGI environment[^2]. Zulip did not control the value of the `X-Forwarded-Proto` header, because it did not set the `SECURE_PROXY_SSL_HEADER` setting (though see below). However, uwsgi has undocumented code which silently overrides the `wsgi.url_scheme` property based on the `HTTP_X_FORWARDED_PROTO` property[^3] (and hence the `X-Forwarded-Proto` header), thus doing the same as enabling the Django `SECURE_PROXY_SSL_HEADER` setting, but in a way that cannot be disabled. It also sets `wsgi.url_scheme` to `https` if the `X-Forwarded-SSL` header is set to `on` or `1`[^4], providing an alternate route to deceive to Django. These combine to make Zulip always trust `X-Forwarded-Proto` or ``X-Forwarded-SSL` headers from external sources, and thus able to trick Django into thinking a request is "secure" when it is not. However, Zulip is not accessible via unencrypted channels, since it redirects all `http` requests to `https` at the nginx level; this mitigates the vulnerability. Regardless, we harden Zulip against this vulnerability provided by the undocumented uwsgi feature, by stripping off `X-Forwarded-SSL` headers before they reach uwsgi, and setting `X-Forwarded-Proto` only if the request was received directly from a trusted proxy. Tornado, because it does not use uwsgi, is an entirely separate codepath. It uses the `proxy_set_header` values from `puppet/zulip/files/nginx/zulip-include-common/proxy`, which set `X-Forwarded-Proto` to the scheme that nginx received the request over. As such, `SECURE_PROXY_SSL_HEADER` was set in Tornado, and only Tornado; since the header was always set in nginx, this was safe. However, it was also _incorrect_ in cases where nginx did not do SSL termination, but an upstream proxy did -- it would mark those requests as insecure when they were actually secure. We adjust the `proxy_set_header X-Forwarded-Proto` used to talk to Tornado to respect the proxy if it is trusted, or the local scheme if not. [^1]: https://docs.djangoproject.com/en/4.2/ref/settings/#secure-proxy-ssl-header [^2]: https://wsgi.readthedocs.io/en/latest/definitions.html#envvar-wsgi.url_scheme [^3]: `73efb013e9/core/protocol.c (L558-L561)` [^4]: `73efb013e9/core/protocol.c (L531-L534)`	2023-05-22 16:50:29 -07:00
Alex Vandiver	a95b796a91	supervisor: Drop minfds back down from 1000000 to 40000. `1c76036c61` raised the number of `minfds` in Supervisor from 40k to 1M. If Supervisor cannot guarantee that number of available file descriptors, it will fail to start; `/etc/security/limits.conf` was hence adjusted upwards as well. However, on some virtualized environments, including Proxmox LXC, setting `/etc/security/limits.conf` may not be enough to raise the system-level limits. This causes `supervisord` with the larger `minfds` to fail to start. The limit of 1000000 was chosen to be arbitrarily high, assuming it came without cost; it is not expected to ever be reached on any deployment. `262b19346e` already lowered one aspect of that changeset, upon determining it did come with a cost. Potentially breaking virtualized deployments during upgrade is another cost of that change. Lower the `minfds` it back down to 40k, partially reverting `1c76036c61`, but allow adjusting it upwards for extremely large deployments. We do not expect any except the largest deployments to ever hit the 40k limit, and a frictionless deployment for the vanishingly small number of huge deployments is not worth the potential upgrade hiccups for the much more frequent smaller deployments.	2023-05-18 13:04:33 -07:00
Alex Vandiver	8d8b5935ac	puppet: Prevent unattended upgrades of erlang-base. When upgraded, the `erlang-base` package automatically stops all services which depend on the Erlang runtime; for Zulip, this is the `rabbitmq-server` service. This results in an unexpected outage of Zulip. Block unattended upgrades of the `erlang-base` package.	2023-05-16 14:02:06 -07:00
Anders Kaseorg	16aa7c0923	puppet: Migrate Ruby functions from legacy Puppet 3.x API. https://www.puppet.com/docs/puppet/7/functions_refactor_legacy.html This removes a bug in the 3.x API that was converting nil to the empty string, so some templates need to be adjusted. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-05-12 18:17:53 -07:00
Anders Kaseorg	cf8ae46291	puppet: Fix shell escaping in Ruby functions. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-05-12 18:17:53 -07:00
Anders Kaseorg	614ab533dc	puppet: Reformat Ruby functions with rufo. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-05-12 18:17:53 -07:00
Alex Vandiver	f4683de742	puppet: Switch the `rolling_restart` setting to use the bool values. `2c5fc1827c` standardized which values are "true"; use them.	2023-05-11 15:54:15 -07:00
Alex Vandiver	530980cf31	zulip_tools: Add a get_config_bool to match Puppet logic. Unfortunately, the existing use of this logic in `process_fts_updates` cannot switch to using this code, as that code cannot import zulip_tools.	2023-05-11 15:54:15 -07:00
Alex Vandiver	4d02ac6fb1	puppet: Bring back uwsgi_rolling_restart config. `a522ad1d9a` mistakenly deleted this variable assignment, which made the `zulip.conf` configuration setting not work -- uwsgi's `lazy_apps` were not enabled, which are required for rolling restart.	2023-05-11 15:54:15 -07:00
Alex Vandiver	da2c1ad839	puppet: Fix checksum of sentry-cli binary.	2023-05-11 13:39:54 -07:00
Alex Vandiver	1019a74c6f	puppet: Update dependencies.	2023-05-11 10:51:37 -07:00
Alex Vandiver	f11350f789	puppet: Add PostgreSQL 15 support. Instead of copying over a mostly-unchanged `postgresql.conf`, we transition to deploying a `conf.d/zulip.conf` which contains the only material changes we made to the file, which were previously appended to the end. While shipping separate while `postgresql.conf` files for each supported version is useful if there is large variety in supported options between versions, there is not no such variation at current, and the burden of overriding the entire default configuration is that it must be keep up to date wit the package's version.	2023-05-10 14:06:02 -07:00
Alex Vandiver	a9f51a0c02	static: Add Timing-Allow-Origin: * to allow sentry data timing. This is required for the browser to provide detailed timing information about resource fetches from other domains[^1]. [^1]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Timing-Allow-Origin	2023-05-09 13:16:28 -07:00
Alex Vandiver	e5ae55637e	install: Remove PostgreSQL 11 support. Django 4.2 removes this support, so Zulip has not installed with PostgreSQL 11 since `2c20028aa4`.	2023-05-05 13:35:32 -07:00
Alex Vandiver	2f4775ba68	wal-g: Write out a logfile. Otherwise, this output goes into `/var/spool/mail/postgres`, which is not terribly helpful. We do not write to `/var/log/zulip` because the backup runs as the `postgres` user, and `/var/log/zulip` is owned by zulip and chmod 750.	2023-04-27 12:19:43 -07:00
Alex Vandiver	3aba2789d3	prometheus: Add an exporter for wal-g backup properties. Since backups may now taken on arbitrary hosts, we need a blackbox monitor that _some_ backup was produced. Add a Prometheus exporter which calls `wal-g backup-list` and reports statistics about the backups. This could be extended to include `wal-g wal-verify`, but that requires a connection to the PostgreSQL server.	2023-04-26 15:41:39 -07:00
Alex Vandiver	b8a6de95d2	pg_backup_and_purge: Allow adjusting the backup concurrency. SSDs are good at parallel random reads.	2023-04-26 10:54:51 -07:00
Alex Vandiver	19a11c9556	pg_backup_and_purge: Take backups on replicas, if present. Taking backups on the database primary adds additional disk load, which can impact the performance of the application. Switch to taking backups on replicas, if they exist. Some deployments may have multiple replicas, and taking backups on all of them is wasteful and potentially confusing; add a flag to inhibit taking nightly snapshots on the host. If the deployment is a single instance of PostgreSQL, with no replicas, it takes backups as before, modulo the extra flag to allow skipping taking them.	2023-04-26 10:54:51 -07:00
Alex Vandiver	4b35211ca1	pg_backup_and_purge: Remove unnecessary explicit types.	2023-04-26 10:54:51 -07:00
Alex Vandiver	e72e83793d	pg_backup_and_purge: Just use subprocess directly. dry_run was never passed into run(); switch to using subprocess directly.	2023-04-26 10:54:51 -07:00
Alex Vandiver	cace8858f9	puppet: Move logrotate config into app_frontend_base. `7c023042cf` moved the logrotate configuration to being a templated file, from a static file, but missed that the static file was still referenced from `zulip_ops::app_frontend`; it only updated `zulip::profile::app_frontend`. This caused errors in applying puppet on any `zulip_ops::app_frontend` host. Prior to `7c023042cf`, the Puppet role was identical between those two classes; deduplicate the rule by moving the updated template definition into `zulip::app_frontend_base` which is common to those two classes and not used in any other classes.	2023-04-19 09:34:37 -07:00
Alex Vandiver	775c7ca4ea	hooks: Give a bit better Zulip deploy message.	2023-04-19 09:32:39 -07:00
Alex Vandiver	d0fc3f1c2e	puppet: Add prod hooks to push zulip-cloud-current and notify CZO.	2023-04-12 11:36:33 -07:00
Alex Vandiver	7c023042cf	puppet: Rotate access log files every day, not at 500M. Since logrotate runs in a daily cron, this practically means "daily, but only if it's larger than 500M." For large installs with large traffic, this is effectively daily for 10 days; for small installs, it is an unknown amount of time. Switch to daily logfiles, defaulting to 14 days to match nginx; this can be overridden using a zulip.conf setting. This makes it easier to ensure that access logs are only kept for a bounded period of time.	2023-04-06 14:31:16 -04:00
Tim Abbott	561daee2a1	puppet: Update declared zmirror dependencies. Following zulip/python-zulip-api/pull/758/, we're no longer using python-zephyr, and don't need to build it from source. Additionally, we no longer need to build a forked Zephyr package, since ZLoadSession and ZDumpSession were merged in `e6a545e759`.	2023-04-06 09:45:06 -07:00
Alex Vandiver	6975417acf	puppet: Create zmirror supervisor subdirectory. To not change the `supervisor.conf` file, which requires a restart of supervisor (and thus all services running under it, which is extremely disruptive) we carefully leave the contents unchanged for most installs, and append a new piece to the file, only for the zmirror configuration, using `concat`.	2023-04-06 09:45:06 -07:00
Alex Vandiver	c519ba40fd	hooks: Add a push_git_ref post-deploy hook.	2023-04-05 18:51:55 -04:00
Alex Vandiver	8a771c7ac0	hooks: Add a hook to send a Zulip before/after the deploy.	2023-04-05 18:51:55 -04:00
Alex Vandiver	377f2d6d03	hooks: Add a common/ directory and factor out common Sentry code.	2023-04-05 18:51:55 -04:00
Alex Vandiver	f4d70a2e37	hooks: Resolve version strings to commit SHAs, and pass in via the env.	2023-04-05 18:51:55 -04:00
Alex Vandiver	ecfb12404a	hooks: Switch to passing values through the environment.	2023-04-05 18:51:55 -04:00
Alex Vandiver	160a917ad3	hooks: Add a helper to install a single static file.	2023-04-05 18:51:55 -04:00
Alex Vandiver	0c13bacb89	sentry: Switch shell variables to lower-case.	2023-04-05 18:51:55 -04:00
Alex Vandiver	7202a98438	cron: Move fetch-tor-exit-nodes to not on the hour. We see connection timeouts and other access issues when run exactly on the hour, either due to load on their servers from similar cron jobs, or from operational processes of theirs. Move to on the :17s to avoid these access issues.	2023-04-05 12:20:30 -07:00
Alex Vandiver	db0ae85d97	sentry: Remove an unnecessary sudo. `790e4854dd` made the hooks run as the `zulip` user, making this sudo unnecessary.	2023-04-03 15:04:56 -07:00
Alex Vandiver	89e366771a	prometheus: Add a postgres exporter.	2023-03-30 16:16:18 -07:00
Alex Vandiver	c2beb64a79	prometheus: Consistently import the base class and supervisor, if needed.	2023-03-30 16:16:18 -07:00
Alex Vandiver	3feb536df3	nagios: Remove swap check. Swap usage is not a high signal thing to alert on, and is likely to flap.	2023-03-27 15:10:50 -07:00
Alex Vandiver	262b19346e	puppet: Decrease default nginx worker_connections. Increasing worker_connections has a memory cost, unlike the rest of the changes in 1c76036c61d8; setting it to 1 million caused nginx to consume several GB of memory. Reduce the default down to 10k, and allow deploys to configure it up if necessary. `worker_rlimit_nofile` is left at 1M, since it has no impact on memory consumption.	2023-03-23 15:59:23 -07:00
Alex Vandiver	0c46bbdf9f	puppet: Update dependencies.	2023-03-23 09:50:30 -07:00
Anders Kaseorg	3a27b12a7d	dependencies: Switch to pnpm. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-03-20 15:48:29 -07:00
Alex Vandiver	f2a20b56bc	puppet: Enable sentry hooks for production and staging.	2023-03-17 08:10:31 -07:00
Alex Vandiver	1a65315566	puppet: Switch teleport to running under systemd, not supervisord. There is no reason that the base node access method should be run under supervisor, which exists primarily to give access to the `zulip` user to restart its managed services. This access is unnecessary for Teleport, and also causes unwanted restarts of Teleport services when the `supervisor` base configuration changes. Additionally, supervisor does not support the in-place upgrade process that Teleport uses, as it replaces its core process with a new one. Switch to installing a systemd configuration file (as generated by `teleport install systemd`) for each part of Teleport, customized to pass a `--config` path. As such, we explicitly disable the `teleport` service provided by the package. The supervisor process is shut down by dint of no longer installing the file, which purges it from the managed directory, and reloads Supervisor to pick up the removed service.	2023-03-15 17:23:42 -04:00
Alex Vandiver	8f8a9f6f04	sentry: Add frontend event monitoring. Zulip already has integrations for server-side Sentry integration; however, it has historically used the Zulip-specific `blueslip` library for monitoring browser-side errors. However, the latter sends errors to email, as well optionally to an internal `#errors` stream. While this is sufficient for low volumes of users, and useful in that it does not rely on outside services, at higher volumes it is very difficult to do any analysis or filtering of the errors. Client-side errors are exceptionally noisy, with many false positives due to browser extensions or similar, so determining real real errors from a stream of un-grouped emails or messages in a stream is quite difficult. Add a client-side Javascript sentry integration. To provide useful backtraces, this requires extending the pre-deploy hooks to upload the source-maps to Sentry. Additional keys are added to the non-public API of `page_params` to control the DSN, realm identifier, and sample rates.	2023-03-07 10:51:45 -08:00
Alex Vandiver	fc40d74cda	hooks: Remove --project from sentry when not necessary.	2023-03-07 10:51:45 -08:00
Alex Vandiver	08251ac53b	hooks: Fix typo in sentry error message.	2023-03-07 10:51:45 -08:00
Alex Vandiver	26eb1d7371	puppet: Also set systemd limits.	2023-03-03 16:39:47 -08:00
Alex Vandiver	1c76036c61	puppet: Increase maximum file descriptors. The current threshold of 40k descriptors was set in 2016, chosen to be "at least 40x our current scale." At present, that only provides a 50% safety margin. Increase to 1 million to provide the same 40x buffer as previously. The highest value currently allowed by the kernels in production (linux 5.3.0) is 1048576. This is set as the hard limit. The 1 million limit is likely far above what the system can handle for other reasons (memory, cpu, etc). While this removes a potential safeguard on overload due to too many connections, due to the longpoll architecture we would generally prefer to service more connections at lower quality (due to CPU limitations) rather than randomly reject additional connections. Relevant prior commits: - `836f313e69` - `f2f97dd335` - `ec23996538` - `8806ec698a` - `e4fce10f46`	2023-03-03 16:39:47 -08:00
Alex Vandiver	a20bb54cbb	puppet: Move limits.conf to maintain more of the installation structure.	2023-03-03 16:39:47 -08:00
Tim Abbott	6b37f9a290	puppet: Run delete-old-unclaimed-attachments in archive cron file. After reflecting a bit on the last commit, I think it's substantially easier to understand what's happening for these two tasks to be defined in the same file, because we want the timing to be different to avoid potential races.	2023-03-01 11:21:42 -08:00

1 2 3 4 5 ...

1530 Commits