The RabbitMQ docs state ([1]):
RabbitMQ nodes and CLI tools (e.g. rabbitmqctl) use a cookie to
determine whether they are allowed to communicate with each
other. [...] The cookie is just a string of alphanumeric
characters up to 255 characters in size. It is usually stored in a
local file.
...and goes on to state (emphasis ours):
If the file does not exist, Erlang VM will try to create one with
a randomly generated value when the RabbitMQ server starts
up. Using such generated cookie files are **appropriate in
development environments only.**
The auto-generated cookie does not use cryptographic sources of
randomness, and generates 20 characters of `[A-Z]`. Because of a
semi-predictable seed, the entropy of this password is thus less than
the idealized 26^20 = 94 bits of entropy; in actuality, it is 36 bits
of entropy, or potentially as low as 20 if the performance of the
server is known.
These sizes are well within the scope of remote brute-force attacks.
On provision, install, and upgrade, replace the default insecure
20-character Erlang cookie with a cryptographically secure
255-character string (the max length allowed).
[1] https://www.rabbitmq.com/clustering.html#erlang-cookie
Zulip writes a `rabbitmq.config` configuration file which locks down
RabbitMQ to listen only on localhost:5672, as well as the RabbitMQ
distribution port, on localhost:25672.
The "distribution port" is part of Erlang's clustering configuration;
while it is documented that the protocol is fundamentally
insecure ([1], [2]) and can result in remote arbitrary execution of
code, by default the RabbitMQ configuration on Debian and Ubuntu
leaves it publicly accessible, with weak credentials.
The configuration file that Zulip writes, while effective, is only
written _after_ the package has been installed and the service
started, which leaves the port exposed until RabbitMQ or system
restart.
Ensure that rabbitmq's `/etc/rabbitmq/rabbitmq.config` is written
before rabbitmq is installed or starts, and that changes to that file
trigger a restart of the service, such that the ports are only ever
bound to localhost. This does not mitigate existing installs, since
it does not force a rabbitmq restart.
[1] https://www.erlang.org/doc/apps/erts/erl_dist_protocol.html
[2] https://www.erlang.org/doc/reference_manual/distributed.html#distributed-erlang-system
This is required in order to lock down the RabbitMQ port to only
listen on localhost. If the nodename is `rabbit@hostname`, in most
circumstances the hostname will resolve to an external IP, which the
rabbitmq port will not be bound to.
Installs which used `rabbit@hostname`, due to RabbitMQ having been
installed before Zulip, would not have functioned if the host or
RabbitMQ service was restarted, as the localhost restrictions in the
RabbitMQ configuration would have made rabbitmqctl (and Zulip cron
jobs that call it) unable to find the rabbitmq server.
The previous commit ensures that configure-rabbitmq is re-run after
the nodename has changed. However, rabbitmq needs to be stopped
before `rabbitmq-env.conf` is changed; we use an `onlyif` on an `exec`
to print the warning about the node change, and let the subsequent
config change and notify of the service and configure-rabbitmq to
complete the re-configuration.
The Erlang `epmd` daemon listens on port 4369, and provides
information (without authentication) about which Erlang processes are
listening on what ports. This information is not itself a
vulnerability, but may provide information for remote attackers about
what local Erlang services (such as `rabbitmq-server`) are running,
and where.
`epmd` supports an `ERL_EPMD_ADDRESS` environment variable to limit
which interfaces it binds on. While this environment variable is set
in `/etc/default/rabbitmq-server`, Zulip unfortunately attempts to
start `epmd` using an explicit `exec` block, which ignores those
settings.
Regardless, this lack of `ERL_EPMD_ADDRESS` variable only controls
`epmd`'s startup upon first installation. Upon reboot, there are two
ways in which `epmd` might be started, neither of which respect
`ERL_EPMD_ADDRESS`:
- On Focal, an `epmd` service exists and is activated, which uses
systemd's configuration to choose which interfaces to bind on, and
thus `ERL_EPMD_ADDRESS` is irrelevant.
- On Bionic (and Focal, due to a broken dependency from
`rabbitmq-server` to `epmd@` instead of `epmd`, which may lead to
the explicit `epmd` service losing a race), `epmd` is started by
`rabbitmq-server` when it does not detect a running instance.
Unfortunately, only `/etc/init.d/rabbitmq-server` would respects
`/etc/default/rabbitmq-server` -- and it defers the actual startup
to using systemd, which does not pass the environment variable
down. Thus, `ERL_EPMD_ADDRESS` is also irrelevant here.
We unfortunately cannot limit `epmd` to only listening on localhost,
due to a number of overlapping bugs and limitations:
- Manually starting `epmd` with `-address 127.0.0.1` silently fails
to start on hosts with IPv6 disabled, due to an Erlang bug ([1],
[2]).
- The dependencies of the systemd `rabbitmq-server` service can be
fixed to include the `epmd` service, and systemd can be made to
bind to `127.0.0.1:4369` and pass that socket to `epmd`, bypassing
the above bug. However, the startup of this service is not
guaranteed, because it races with other sources of `epmd` (see
below).
- Any process that runs `rabbitmqctl` results in `epmd` being started
if one is not currently running; these instances do not respect any
environment variables as to which addresses to bind on. This is
also triggered by `service rabbitmq-server status`, as well as
various Zulip cron jobs which inspect the rabbitmq queues. As
such, it is difficult-to-impossible to ensure that some other
`epmd` process will not win the race and open the port on all
interfaces.
Since the only known exposure from leaving port 4369 open is
information that rabbitmq is running on the host, and the complexity
of adjusting this to only bind on localhost is high, we remove the
setting which does not address the problem, and document that the port
is left open, and should be protected via system-level or
network-level firewalls.
[1]: https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109
[2]: https://github.com/erlang/otp/issues/4820
As a consequence:
• Bump minimum supported Python version to 3.7.
• Move Vagrant environment to Debian 10, which has Python 3.7.
• Move CI frontend tests to Debian 10.
• Move production build test to Debian 10.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
With tweaks to security-model.md by tabbott to expand the SSO acronym.
Ignored, but still needs discussion on whether we should exclude this
rule:
```
The word ‘install’ is not a noun.
✗ ...ble to connect to the client during the install process: So you'll need to shut down a...
^^^^^^^
✓ ...ble to connect to the client during the installation process: So you'll need to shut down a...
A_INSTALL: a/the + install
The word ‘install’ is not a noun.
✗ ...detected at install time will cause the install to abort. If you already have PostgreSQ...
^^^^^^^
✓ ...detected at install time will cause the installation to abort. If you already have PostgreSQ...
A_INSTALL: a/the + install
```
This page contains a lot of other material related to GSoC than
just project ideas.
We would also want to add a redirect from the old URL to the new
one from the RTD admin page.
Because Camo includes logic to deny access to private subnets, routing
its requests through Smokescreen is generally not necessary. However,
it may be necessary if Zulip has configured a non-Smokescreen exit
proxy.
Default Camo to using the proxy only if it is not Smokescreen, with a
new `proxy.enable_for_camo` setting to override this behaviour if need
be. Note that that setting is in `zulip.conf` on the host with Camo
installed -- not the Zulip frontend host, if they are different.
Fixes: #20550.
For `no_serve_uploads`, `http_only`, which previously specified
"non-empty" to enable, this tightens what values are true. For
`pgroonga` and `queue_workers_multiprocess`, this broadens the
possible values from `enabled`, and `true` respectively.
Restarting the uwsgi processes by way of supervisor opens a window
during which nginx 502's all responses. uwsgi has a configuration
called "chain reloading" which allows for rolling restart of the uwsgi
processes, such that only one process at once in unavailable; see
uwsgi documentation ([1]).
The tradeoff is that this requires that the uwsgi processes load the
libraries after forking, rather than before ("lazy apps"); in theory
this can lead to larger memory footprints, since they are not shared.
In practice, as Django defers much of the loading, this is not as much
of an issue. In a very basic test of memory consumption (measured by
total memory - free - caches - buffers; 6 uwsgi workers), both
immediately after restarting Django, and after requesting `/` 60 times
with 6 concurrent requests:
| Non-lazy | Lazy app | Difference
------------------+------------+------------+-------------
Fresh | 2,827,216 | 2,870,480 | +43,264
After 60 requests | 3,332,284 | 3,409,608 | +77,324
..................|............|............|.............
Difference | +505,068 | +539,128 | +34,060
That is, "lazy app" loading increased the footprint pre-requests by
43MB, and after 60 requests grew the memory footprint by 539MB, as
opposed to non-lazy loading, which grew it by 505MB. Using wsgi "lazy
app" loading does increase the memory footprint, but not by a large
percentage.
The other effect is that processes may be served by either old or new
code during the restart window. This may cause transient failures
when new frontend code talks to old backend code.
Enable chain-reloading during graceful, puppetless restarts, but only
if enabled via a zulip.conf configuration flag.
Fixes#2559.
[1]: https://uwsgi-docs.readthedocs.io/en/latest/articles/TheArtOfGracefulReloading.html#chain-reloading-lazy-apps
One should never have to manually symlink things in /usr/bin,
especially with -f. That should be managed by the system package
manager. Indeed, on CentOS 7 and 8, one can simply install the
python3 package and get a working /usr/bin/python3.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This replaces the TERMS_OF_SERVICE and PRIVACY_POLICY settings with
just a POLICIES_DIRECTORY setting, in order to support settings (like
Zulip Cloud) where there's more policies than just those two.
With minor changes by Eeshan Garg.
The certbot package installs its own systemd timer (and cron job,
which disabled itself if systemd is enabled) which updates
certificates. This process races with the cron job which Zulip
installs -- the only difference being that Zulip respects the
`certbot.auto_renew` setting, and that it passes the deploy hook.
This means that occasionally nginx would not be reloaded, when the
systemd timer caught the expiration first.
Remove the custom cron job and `certbot-maybe-renew` script, and
reconfigure certbot to always reload nginx after deploying, using
certbot directory hooks.
Since `certbot.auto_renew` can't have an effect, remove the setting.
In turn, this removes the need for `--no-zulip-conf` to
`setup-certbot`. `--deploy-hook` is similarly removed, as running
deploy hooks to restart nginx is now the default; pass
`--no-directory-hooks` in standalone mode to not attempt to reload
nginx. The other property of `--deploy-hook`, of skipping symlinking
into place, is given its own flog.
We recently changed /developer-community to /development-community.
Now that this change is in production, we can also migrate the
external links in our ReadTheDocs documentation.
PostgreSQL 11 and below used a configuration file names
`recovery.conf` to manage replicas and standbys; support for this was
removed in PostgreSQL 12[1], and the configuration parameters were
moved into the main `postgresql.conf`.
Add `zulip.conf` settings for the primary server hostname and
replication username, so that the complete `postgresql.conf`
configuration on PostgreSQL 14 can continue to be managed, even when
replication is enabled. For consistency, also begin writing out the
`recovery.conf` for PostgreSQL 11 and below.
In PostgreSQL 12 configuration and later, the `wal_level =
hot_standby` setting is removed, as `hot_standby` is equivalent to
`replica`, which is the default value[2]. Similarly, the
`hot_standby = on` setting is also the default[3].
Documentation is added for these features, and the commentary on the
"Export and Import" page referencing files under `puppet/zulip_ops/`
is removed, as those files no longer have any replication-specific
configuration.
[1]: https://www.postgresql.org/docs/current/recovery-config.html
[2]: https://www.postgresql.org/docs/12/runtime-config-wal.html#GUC-WAL-LEVEL
[3]: https://www.postgresql.org/docs/12/runtime-config-replication.html#GUC-HOT-STANDBY
When Zulip is run behind one or more reverse proxies, you must
configure `loadbalancer.ips` so that Zulip respects the client IP
addresses found in the `X-Forwarded-For` header. This is not
immediately clear from the documentation, so this commit makes it more
clear and augments the existing examples to showcase this need.
Fixes: #19073
Fixes the link to the Neil Green presentation on TypeScript
vs Coffee Script vs ES6.
This is a change from slides to a video becasue the slides are
no longer available.
OIDC config features a get_secret call (so it requires adding an import)
as well as having a bunch of its instructions in the form of comments on
the various keys of the config dict - thus users should really update
settings.py to fetch all of that.
- Add missing link for GitHub.
- Fix broken links to Matt Ringel's blog post.
- Add link to Julia Evans blog post.
- Add section heading for "Questions Are Important."
- Rearrange some content to fit with new section heading.
With additional tweaks from tabbott:
* Avoid linking to chat.zulip.org not via our documentation.
* Avoid the CZO abbreviation.
The upstream of the `camo` repository[1] has been unmaintained for
several years, and is now archived by the owner. Additionally, it has
a number of limitations:
- It is installed as a sysinit service, which does not run under
Docker
- It does not prevent access to internal IPs, like 127.0.0.1
- It does not respect standard `HTTP_proxy` environment variables,
making it unable to use Smokescreen to prevent the prior flaw
- It occasionally just crashes, and thus must have a cron job to
restart it.
Swap camo out for the drop-in replacement go-camo[2], which has the
same external API, requiring not changes to Django code, but is more
maintained. Additionally, it resolves all of the above complaints.
go-camo is not configured to use Smokescreen as a proxy, because its
own private-IP filtering prevents using a proxy which lies within that
IP space. It is also unclear if the addition of Smokescreen would
provide any additional protection over the existing IP address
restrictions in go-camo.
go-camo has a subset of the security headers that our nginx reverse
proxy sets, and which camo set; provide the missing headers with `-H`
to ensure that go-camo, if exposed from behind some other non-nginx
load-balancer, still provides the necessary security headers.
Fixes#18351 by moving to supervisor.
Fixeszulip/docker-zulip#298 also by moving to supervisor.
[1] https://github.com/atmos/camo
[2] https://github.com/cactus/go-camo