zulip

Commit Graph

Author	SHA1	Message	Date
Alex Vandiver	a46f6df91e	CVE-2021-43799: Write rabbitmq configuration before starting. Zulip writes a `rabbitmq.config` configuration file which locks down RabbitMQ to listen only on localhost:5672, as well as the RabbitMQ distribution port, on localhost:25672. The "distribution port" is part of Erlang's clustering configuration; while it is documented that the protocol is fundamentally insecure ([1], [2]) and can result in remote arbitrary execution of code, by default the RabbitMQ configuration on Debian and Ubuntu leaves it publicly accessible, with weak credentials. The configuration file that Zulip writes, while effective, is only written _after_ the package has been installed and the service started, which leaves the port exposed until RabbitMQ or system restart. Ensure that rabbitmq's `/etc/rabbitmq/rabbitmq.config` is written before rabbitmq is installed or starts, and that changes to that file trigger a restart of the service, such that the ports are only ever bound to localhost. This does not mitigate existing installs, since it does not force a rabbitmq restart. [1] https://www.erlang.org/doc/apps/erts/erl_dist_protocol.html [2] https://www.erlang.org/doc/reference_manual/distributed.html#distributed-erlang-system	2022-01-25 01:48:05 +00:00
Alex Vandiver	43d63bd5a1	puppet: Always set the RabbitMQ nodename to zulip@localhost. This is required in order to lock down the RabbitMQ port to only listen on localhost. If the nodename is `rabbit@hostname`, in most circumstances the hostname will resolve to an external IP, which the rabbitmq port will not be bound to. Installs which used `rabbit@hostname`, due to RabbitMQ having been installed before Zulip, would not have functioned if the host or RabbitMQ service was restarted, as the localhost restrictions in the RabbitMQ configuration would have made rabbitmqctl (and Zulip cron jobs that call it) unable to find the rabbitmq server. The previous commit ensures that configure-rabbitmq is re-run after the nodename has changed. However, rabbitmq needs to be stopped before `rabbitmq-env.conf` is changed; we use an `onlyif` on an `exec` to print the warning about the node change, and let the subsequent config change and notify of the service and configure-rabbitmq to complete the re-configuration.	2022-01-25 01:48:02 +00:00
Alex Vandiver	3bfcfeac24	puppet: Run configure-rabbitmq on nodename change. `/etc/rabbitmq/rabbitmq-env.conf` sets the nodename; anytime the nodename changes, the backing database changes, and this requires re-creating the rabbitmq users and permissions. Trigger this in puppet by running configure-rabbitmq after the file changes.	2022-01-25 01:46:51 +00:00
Alex Vandiver	694c4dfe8f	puppet: Admit we leave epmd port 4369 open on all interfaces. The Erlang `epmd` daemon listens on port 4369, and provides information (without authentication) about which Erlang processes are listening on what ports. This information is not itself a vulnerability, but may provide information for remote attackers about what local Erlang services (such as `rabbitmq-server`) are running, and where. `epmd` supports an `ERL_EPMD_ADDRESS` environment variable to limit which interfaces it binds on. While this environment variable is set in `/etc/default/rabbitmq-server`, Zulip unfortunately attempts to start `epmd` using an explicit `exec` block, which ignores those settings. Regardless, this lack of `ERL_EPMD_ADDRESS` variable only controls `epmd`'s startup upon first installation. Upon reboot, there are two ways in which `epmd` might be started, neither of which respect `ERL_EPMD_ADDRESS`: - On Focal, an `epmd` service exists and is activated, which uses systemd's configuration to choose which interfaces to bind on, and thus `ERL_EPMD_ADDRESS` is irrelevant. - On Bionic (and Focal, due to a broken dependency from `rabbitmq-server` to `epmd@` instead of `epmd`, which may lead to the explicit `epmd` service losing a race), `epmd` is started by `rabbitmq-server` when it does not detect a running instance. Unfortunately, only `/etc/init.d/rabbitmq-server` would respects `/etc/default/rabbitmq-server` -- and it defers the actual startup to using systemd, which does not pass the environment variable down. Thus, `ERL_EPMD_ADDRESS` is also irrelevant here. We unfortunately cannot limit `epmd` to only listening on localhost, due to a number of overlapping bugs and limitations: - Manually starting `epmd` with `-address 127.0.0.1` silently fails to start on hosts with IPv6 disabled, due to an Erlang bug ([1], [2]). - The dependencies of the systemd `rabbitmq-server` service can be fixed to include the `epmd` service, and systemd can be made to bind to `127.0.0.1:4369` and pass that socket to `epmd`, bypassing the above bug. However, the startup of this service is not guaranteed, because it races with other sources of `epmd` (see below). - Any process that runs `rabbitmqctl` results in `epmd` being started if one is not currently running; these instances do not respect any environment variables as to which addresses to bind on. This is also triggered by `service rabbitmq-server status`, as well as various Zulip cron jobs which inspect the rabbitmq queues. As such, it is difficult-to-impossible to ensure that some other `epmd` process will not win the race and open the port on all interfaces. Since the only known exposure from leaving port 4369 open is information that rabbitmq is running on the host, and the complexity of adjusting this to only bind on localhost is high, we remove the setting which does not address the problem, and document that the port is left open, and should be protected via system-level or network-level firewalls. [1]: https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109 [2]: https://github.com/erlang/otp/issues/4820	2022-01-25 01:46:51 +00:00
Alex Vandiver	2713e90eaf	puppet: Remove rabbitmq_mochiweb configuration. mochiweb was renamed to web_dispatch in RabbitMQ 3.8.0, and the plugin is not enabled. Nor does this control the management interface, which would listen on port 15672.	2022-01-25 01:46:51 +00:00
Alex Vandiver	a3adaf4aa3	puppet: Fix standalone certbot configurations. This addresses the problems mentioned in the previous commit, but for existing installations which have `authenticator = standalone` in their configurations. This reconfigures all hostnames in certbot to use the webroot authenticator, and attempts to force-renew their certificates. Force-renewal is necessary because certbot contains no way to merely update the configuration. Let's Encrypt allows for multiple extra renewals per week, so this is a reasonable cost. Because the certbot configuration is `configobj`, and not `configparser`, we have no way to easily parse to determine if webroot is in use; additionally, `certbot certificates` does not provide this information. We use `grep`, on the assumption that this will catch nearly all cases. It is possible that this will find `authenticator = standalone` certificates which are managed by Certbot, but not Zulip certificates. These certificates would also fail to renew while Zulip is running, so switching them to use the Zulip webroot would still be an improvement. Fixes #20593.	2022-01-24 12:13:44 -08:00
Anders Kaseorg	97e4e9886c	python: Replace universal_newlines with text. This is supported in Python ≥ 3.7. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-01-23 22:16:01 -08:00
Anders Kaseorg	a58a71ef43	Remove Ubuntu 18.04 support. As a consequence: • Bump minimum supported Python version to 3.7. • Move Vagrant environment to Debian 10, which has Python 3.7. • Move CI frontend tests to Debian 10. • Move production build test to Debian 10. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-01-21 17:26:14 -08:00
Alex Vandiver	3bbe5c1110	puppet: Put comments on iptables lines. In addition to documenting the rules.v4 and rules.v6 files slightly, these comments show up in `iptables -L`: ``` root@hostname:~# iptables -L INPUT Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere LOGDROP all -- anywhere localhost/8 ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED ACCEPT tcp -- anywhere anywhere tcp dpt:ssh /* ssh / ACCEPT tcp -- anywhere anywhere tcp dpt:3000 / grafana / ACCEPT tcp -- anywhere anywhere tcp dpt:9100 / node_exporter */ LOGDROP all -- anywhere anywhere ```	2022-01-21 16:46:14 -08:00
Alex Vandiver	6bc5849ea8	puppet: Remove now-unused debathena apt repository.	2022-01-18 14:13:28 -08:00
Alex Vandiver	b3f07cc98d	puppet: Replace debathena zephyr package with equivalent puppet file.	2022-01-18 14:13:28 -08:00
Alex Vandiver	a6d7539571	puppet: Replace debathena krb5 package with equivalent puppet file.	2022-01-18 14:13:28 -08:00
Alex Vandiver	75224ea5de	puppet: python-dev is now purely virtual; install python2.7-dev.	2022-01-18 14:13:28 -08:00
Alex Vandiver	fc1adef28a	puppet: Fix server_name of internal staging server.	2022-01-18 12:36:56 -08:00
Alex Vandiver	7e630b81f8	puppet: Switch to using snakeoil certs for staging. This parallels `ba3b88c81b`, but for the staging host.	2022-01-18 12:36:56 -08:00
Alex Vandiver	fb4d9764fa	puppet: Bump Grafana version, for 8.3.4. security release.	2022-01-18 12:33:02 -08:00
Alex Vandiver	434bda01c7	puppet: Enable camo prometheus metrics. Doing so requires protecting /metrics from direct access when proxied through nginx. If camo is placed on a separate host, the equivalent /metrics URL may need to be protected. See https://github.com/cactus/go-camo#metrics for details on the statistics so reported. Note that 5xx responses are _expected_ from go-camo's statistics, as it returns 502 status code when the remote server responds with 500/502/503/504, or 504 when the remote host times out.	2022-01-13 14:19:18 -08:00
Alex Vandiver	0b8a6a51b8	puppet: Remove all parts of AWS kernels. Otherwise, we just uninstall the meta-package, and still restart into the installed AWS kernel.	2022-01-12 15:52:19 -08:00
Alex Vandiver	4d7e6b26df	puppet: Provide more attributes to teleport on ssh nodes.	2022-01-12 14:15:45 -08:00
Alex Vandiver	339e70671c	puppet: Switch Grafana to Grafana 8 Unified Alerting.	2022-01-11 14:27:11 -08:00
Alex Vandiver	6a7eecee9a	puppet: Increase load paging thresholds.	2022-01-11 09:38:31 -08:00
Alex Vandiver	1e80b844f4	puppet: Disable apparmor profile for msmtp. As the nagios user, we want to read the msmtp configuration from ~nagios, which apparmor's profile does not allow msmtp to do.	2022-01-11 09:38:31 -08:00
Alex Vandiver	3c95ad82c6	puppet: Upgrade to nagios4. This updates the puppeted nagios configuration file for the Nagios4 defaults.	2022-01-11 09:38:31 -08:00
Alex Vandiver	d328d3dd4d	puppet: Allow routing camo requests through an outgoing proxy. Because Camo includes logic to deny access to private subnets, routing its requests through Smokescreen is generally not necessary. However, it may be necessary if Zulip has configured a non-Smokescreen exit proxy. Default Camo to using the proxy only if it is not Smokescreen, with a new `proxy.enable_for_camo` setting to override this behaviour if need be. Note that that setting is in `zulip.conf` on the host with Camo installed -- not the Zulip frontend host, if they are different. Fixes: #20550.	2022-01-07 12:08:10 -08:00
Alex Vandiver	2c5fc1827c	puppet: Standardize what values are bools, and what true is. For `no_serve_uploads`, `http_only`, which previously specified "non-empty" to enable, this tightens what values are true. For `pgroonga` and `queue_workers_multiprocess`, this broadens the possible values from `enabled`, and `true` respectively.	2022-01-07 12:08:10 -08:00
Alex Vandiver	1e672e4d82	puppet: Remove unused $no_serve_uploads in app_frontend.	2022-01-07 12:08:10 -08:00
Alex Vandiver	6218ed91c2	puppet: Use lazy-apps and uwsgi control sockets for rolling reloads. Restarting the uwsgi processes by way of supervisor opens a window during which nginx 502's all responses. uwsgi has a configuration called "chain reloading" which allows for rolling restart of the uwsgi processes, such that only one process at once in unavailable; see uwsgi documentation ([1]). The tradeoff is that this requires that the uwsgi processes load the libraries after forking, rather than before ("lazy apps"); in theory this can lead to larger memory footprints, since they are not shared. In practice, as Django defers much of the loading, this is not as much of an issue. In a very basic test of memory consumption (measured by total memory - free - caches - buffers; 6 uwsgi workers), both immediately after restarting Django, and after requesting `/` 60 times with 6 concurrent requests: \| Non-lazy \| Lazy app \| Difference ------------------+------------+------------+------------- Fresh \| 2,827,216 \| 2,870,480 \| +43,264 After 60 requests \| 3,332,284 \| 3,409,608 \| +77,324 ..................\|............\|............\|............. Difference \| +505,068 \| +539,128 \| +34,060 That is, "lazy app" loading increased the footprint pre-requests by 43MB, and after 60 requests grew the memory footprint by 539MB, as opposed to non-lazy loading, which grew it by 505MB. Using wsgi "lazy app" loading does increase the memory footprint, but not by a large percentage. The other effect is that processes may be served by either old or new code during the restart window. This may cause transient failures when new frontend code talks to old backend code. Enable chain-reloading during graceful, puppetless restarts, but only if enabled via a zulip.conf configuration flag. Fixes #2559. [1]: https://uwsgi-docs.readthedocs.io/en/latest/articles/TheArtOfGracefulReloading.html#chain-reloading-lazy-apps	2022-01-05 14:48:52 -08:00
Alex Vandiver	4a95967a33	puppet: Gather uwsgi stats from chat.zulip.org.	2022-01-03 21:26:57 -08:00
Alex Vandiver	8a5be972d2	puppet: Add a uwsgi exporter for monitoring. This allows investigation of how many workers are busy, and to track "harikari" terminations.	2022-01-03 15:25:58 -08:00
Alex Vandiver	d6c40d24d4	puppet: Manage current smokescreen binary so it is not tidied. Fix another tidy error caused by 1e4e6a09af23; as also noted in `f9a39b6703`, these resources are necessary such that tidy does not cleanup of smokescreen, and then force a recompilation of it again.	2022-01-03 15:24:42 -08:00
Alex Vandiver	f9a39b6703	puppet: Manage extracted resources again. `1e4e6a09af` removed the resources for the unpacked directory, on the argument that they were unnecessary. However, the directory (or file, see below) that is unpacked must be managed, or it will be tidied on the next puppet apply. Add back the resource for `$dir`, but mark it `ensure => present`, to support tarballs which only unpack to a single file (e.g. wal-g).	2022-01-02 12:11:53 -08:00
Alex Vandiver	54b6a83412	puppet: Fix typo in cron job name.	2021-12-31 17:39:53 -08:00
Alex Vandiver	941800cf12	puppet: Upgrade external dependencies.	2021-12-31 11:14:40 -08:00
Alex Vandiver	6f693d10d9	puppet: Fix version of node_exporter. This was a copy/paste but introduced in `f166f9f7d6`.	2021-12-30 23:33:34 +00:00
Anders Kaseorg	82748d45d8	install-yarn: Use test -ef in case /srv is a symlink. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-12-30 13:42:07 -08:00
Alex Vandiver	c094867a74	puppet: Add aarch64 build hashes to external dependencies. wal-g does not ship aarch64 binaries, currently; the compilation process([1]) is somewhat complicated, so we defer the decision about how to support wal-g for aarch64 until a later date. [1]: https://github.com/wal-g/wal-g/blob/master/docs/PostgreSQL.md#installing	2021-12-29 16:35:15 -08:00
Alex Vandiver	f166f9f7d6	puppet: Centralize versions and sha256 hashes of external dependencies. This will make it easier to update versions of these dependencies.	2021-12-29 16:35:15 -08:00
Alex Vandiver	57662689a9	puppet: Provide a constant homedir for grafana user. The homedir of a user cannot be changed if any processes are running as them, so having it change over time as upgrades happen will break puppet application, as the old grafana process under supervisor will effectively lock changes to the user's homedir. Unfortunately, that means that this change will thus fail to puppet-apply unless `supervisorctl stop grafana` is run first, but there's no way around that.	2021-12-29 16:35:15 -08:00
Alex Vandiver	6e55e52694	puppet: Pull out grafana $data_dir.	2021-12-29 16:35:15 -08:00
Alex Vandiver	51d3862c7e	puppet: Move wal-g to external_dep, in /srv/zulip-wal-g-*.	2021-12-29 16:35:15 -08:00
Alex Vandiver	1e4e6a09af	puppet: Stop making resources for external binaries and directories. In the event that extracting doesn't produce the binary we expected it to, all this will do is create an _empty_ file where we expect the binary to be. This will likely muddle debugging. Since the only reason the resourfce was made in the first place was to make dependencies clear, switch to depending on the External_Dep itself, when such a dependency is needed.	2021-12-29 16:35:15 -08:00
Alex Vandiver	3c163a7d5e	puppet: Move slash out of $dir by convention.	2021-12-29 16:35:15 -08:00
Alex Vandiver	bb5a2c8138	puppet: Move prometheus to external_dep.	2021-12-29 16:35:15 -08:00
Alex Vandiver	2d6c096904	puppet: Move node_exporter to external_dep.	2021-12-29 16:35:15 -08:00
Alex Vandiver	d2a78bac7e	puppet: Adjust wal-g release version and SHA256. wal-g apparently removed the 1.1.1 release; replace it with the equivalent rc.	2021-12-29 16:35:15 -08:00
Alex Vandiver	7a9074ecfd	puppet: Use shorter local variable for supervisor conf.d dir.	2021-12-28 09:24:01 -08:00
Alex Vandiver	670fad0cc4	puppet: Drop now-unnecessary supervisor file removals.	2021-12-28 09:24:01 -08:00
Alex Vandiver	20eab264cf	puppet: Remove dependency on scripts.lib.zulip_tools. `ab130ceb35` added a dependency on scripts.lib.zulip_tools; however, check_postgresql_replication_lag is run on hosts which do not have a zulip tree installed. Inline the simple functions that were imported.	2021-12-14 14:48:53 -08:00
Alex Vandiver	71b56f7c1c	puppet: process_fts_updates connects as nagios (or provided username). It should not use the configured zulip username, but should instead pull from the login user (likely `nagios`), or an explicit alternate provided PostgreSQL username. Failure to do so results in Nagios failures because the `nagios` login does not have permissions to authenticated the `zulip` PostgreSQL user. This requires CI changes, as the install tests install as the `zulip` login username, which allowed Nagios tests to pass previously; with the custom database and username, however, they must be passed to process_fts_updates explicitly when validating the install.	2021-12-14 14:48:53 -08:00
Alex Vandiver	9d67e37166	puppet: Nagios connects as itself, in check_postgresql_replication_lag.	2021-12-14 14:48:53 -08:00

1 2 3 4 5 ...

1319 Commits