Updating the pgroonga package is not sufficient to upgrade the
extension in PostgreSQL -- an `ALTER EXTENSION pgroonga UPDATE` must
explicitly be run[^1]. Failure to do so can lead to unexpected behavior,
including crashes of PostgreSQL.
Expand on the existing `pgroonga_setup.sql.applied` file, to track
which version of the PostgreSQL extension has been configured. If the
file exists but is empty, we run `ALTER EXTENSION pgroonga UPDATE`
regardless -- if it is a no-op, it still succeeds with a `NOTICE`:
```
zulip=# ALTER EXTENSION pgroonga UPDATE;
NOTICE: version "3.0.8" of extension "pgroonga" is already installed
ALTER EXTENSION
```
The simple `ALTER EXTENSION` is sufficient for the
backwards-compatible case[^1] -- which, for our usage, is every
upgrade since 0.9 -> 1.0. Since version 1.0 was released in 2015,
before pgroonga support was added to Zulip in 2016, we can assume for
the moment that all pgroonga upgrades are backwards-compatible, and
not bother regenerating indexes.
Fixes: #25989.
[^1]: https://pgroonga.github.io/upgrade/
This was only necessary for PGroonga 1.x, and the `pgroonga` schema
will most likely be removed at some point inthe future, which will
make this statement error out.
Drop the unnecessary statement.
The syntax in `/etc/resolv.conf` does not include any brackets:
```
nameserver 2001:db8::a3
```
However, the format of the nginx `resolver` directive[^1] requires that
IPv6 addresses be enclosed in brackets.
Adjust the `resolver_ip` puppet function to surround any IPv6
addresses extracted from `/etc/resolv.conf` with square brackets, and
any addresses from `application_server.resolver` to gain brackets if
necessary.
Fixes: #26013.
[^1]: http://nginx.org/en/docs/http/ngx_http_core_module.html#resolver
04cf68b45e make nginx responsible for downloading (and caching)
files from S3. As noted in that commit, nginx implements its own
non-blocking DNS resolver, since the base syscall is blocking, so
requires an explicit nameserver configuration. That commit used
127.0.0.53, which is provided by systemd-resolved, as the resolver.
However, that service may not always be enabled and running, and may
in fact not even be installed (e.g. on Docker). Switch to parsing
`/etc/resolv.conf` and using the first-provided nameserver. In many
deployments, this will still be `127.0.0.53`, but for others it will
provide a working DNS server which is external to the host.
In the event that a server is misconfigured and has no resolvers in
`/etc/resolv.conf`, it will error out:
```console
Error: Evaluation Error: Error while evaluating a Function Call, No nameservers found in /etc/resolv.conf! Configure one by setting application_server.nameserver in /etc/zulip/zulip.conf (file: /home/zulip/deployments/current/puppet/zulip/manifests/app_frontend_base.pp, line: 76, column: 70) on node example.zulipdev.org
```
Django has a `SECURE_PROXY_SSL_HEADER` setting[^1] which controls if
it examines a header, usually provided by upstream proxies, to allow
it to treat requests as "secure" even if the proximal HTTP connection
was not encrypted. This header is usually the `X-Forwarded-Proto`
header, and the Django configuration has large warnings about ensuring
that this setting is not enabled unless `X-Forwarded-Proto` is
explicitly controlled by the proxy, and cannot be supplied by the
end-user.
In the absence of this setting, Django checks the `wsgi.url_scheme`
property of the WSGI environment[^2].
Zulip did not control the value of the `X-Forwarded-Proto` header,
because it did not set the `SECURE_PROXY_SSL_HEADER` setting (though
see below). However, uwsgi has undocumented code which silently
overrides the `wsgi.url_scheme` property based on the
`HTTP_X_FORWARDED_PROTO` property[^3] (and hence the
`X-Forwarded-Proto` header), thus doing the same as enabling the
Django `SECURE_PROXY_SSL_HEADER` setting, but in a way that cannot be
disabled. It also sets `wsgi.url_scheme` to `https` if the
`X-Forwarded-SSL` header is set to `on` or `1`[^4], providing an
alternate route to deceive to Django.
These combine to make Zulip always trust `X-Forwarded-Proto` or
``X-Forwarded-SSL` headers from external sources, and thus able to
trick Django into thinking a request is "secure" when it is not.
However, Zulip is not accessible via unencrypted channels, since it
redirects all `http` requests to `https` at the nginx level; this
mitigates the vulnerability.
Regardless, we harden Zulip against this vulnerability provided by the
undocumented uwsgi feature, by stripping off `X-Forwarded-SSL` headers
before they reach uwsgi, and setting `X-Forwarded-Proto` only if the
request was received directly from a trusted proxy.
Tornado, because it does not use uwsgi, is an entirely separate
codepath. It uses the `proxy_set_header` values from
`puppet/zulip/files/nginx/zulip-include-common/proxy`, which set
`X-Forwarded-Proto` to the scheme that nginx received the request
over. As such, `SECURE_PROXY_SSL_HEADER` was set in Tornado, and only
Tornado; since the header was always set in nginx, this was safe.
However, it was also _incorrect_ in cases where nginx did not do SSL
termination, but an upstream proxy did -- it would mark those requests
as insecure when they were actually secure. We adjust the
`proxy_set_header X-Forwarded-Proto` used to talk to Tornado to
respect the proxy if it is trusted, or the local scheme if not.
[^1]: https://docs.djangoproject.com/en/4.2/ref/settings/#secure-proxy-ssl-header
[^2]: https://wsgi.readthedocs.io/en/latest/definitions.html#envvar-wsgi.url_scheme
[^3]: 73efb013e9/core/protocol.c (L558-L561)
[^4]: 73efb013e9/core/protocol.c (L531-L534)
1c76036c61 raised the number of `minfds` in Supervisor from 40k to
1M. If Supervisor cannot guarantee that number of available file
descriptors, it will fail to start; `/etc/security/limits.conf` was
hence adjusted upwards as well. However, on some virtualized
environments, including Proxmox LXC, setting
`/etc/security/limits.conf` may not be enough to raise the
system-level limits. This causes `supervisord` with the larger
`minfds` to fail to start.
The limit of 1000000 was chosen to be arbitrarily high, assuming it
came without cost; it is not expected to ever be reached on any
deployment. 262b19346e already lowered one aspect of that
changeset, upon determining it did come with a cost. Potentially
breaking virtualized deployments during upgrade is another cost of
that change.
Lower the `minfds` it back down to 40k, partially reverting
1c76036c61, but allow adjusting it upwards for extremely large
deployments. We do not expect any except the largest deployments to
ever hit the 40k limit, and a frictionless deployment for the
vanishingly small number of huge deployments is not worth the
potential upgrade hiccups for the much more frequent smaller
deployments.
When upgraded, the `erlang-base` package automatically stops all
services which depend on the Erlang runtime; for Zulip, this is the
`rabbitmq-server` service. This results in an unexpected outage of
Zulip.
Block unattended upgrades of the `erlang-base` package.
a522ad1d9a mistakenly deleted this variable assignment, which made
the `zulip.conf` configuration setting not work -- uwsgi's `lazy_apps`
were not enabled, which are required for rolling restart.
Instead of copying over a mostly-unchanged `postgresql.conf`, we
transition to deploying a `conf.d/zulip.conf` which contains the
only material changes we made to the file, which were previously
appended to the end.
While shipping separate while `postgresql.conf` files for each
supported version is useful if there is large variety in supported
options between versions, there is not no such variation at current,
and the burden of overriding the entire default configuration is that
it must be keep up to date wit the package's version.
Otherwise, this output goes into `/var/spool/mail/postgres`, which is
not terribly helpful. We do not write to `/var/log/zulip` because the
backup runs as the `postgres` user, and `/var/log/zulip` is owned by
zulip and chmod 750.
Since backups may now taken on arbitrary hosts, we need a blackbox
monitor that _some_ backup was produced.
Add a Prometheus exporter which calls `wal-g backup-list` and reports
statistics about the backups.
This could be extended to include `wal-g wal-verify`, but that
requires a connection to the PostgreSQL server.
Taking backups on the database primary adds additional disk load,
which can impact the performance of the application.
Switch to taking backups on replicas, if they exist. Some deployments
may have multiple replicas, and taking backups on all of them is
wasteful and potentially confusing; add a flag to inhibit taking
nightly snapshots on the host.
If the deployment is a single instance of PostgreSQL, with no
replicas, it takes backups as before, modulo the extra flag to allow
skipping taking them.
7c023042cf moved the logrotate configuration to being a templated
file, from a static file, but missed that the static file was still
referenced from `zulip_ops::app_frontend`; it only updated
`zulip::profile::app_frontend`. This caused errors in applying puppet
on any `zulip_ops::app_frontend` host.
Prior to 7c023042cf, the Puppet role was identical between those two
classes; deduplicate the rule by moving the updated template
definition into `zulip::app_frontend_base` which is common to those
two classes and not used in any other classes.
Since logrotate runs in a daily cron, this practically means "daily,
but only if it's larger than 500M." For large installs with large
traffic, this is effectively daily for 10 days; for small installs, it
is an unknown amount of time.
Switch to daily logfiles, defaulting to 14 days to match nginx; this
can be overridden using a zulip.conf setting. This makes it easier to
ensure that access logs are only kept for a bounded period of time.
Following zulip/python-zulip-api/pull/758/, we're no longer using
python-zephyr, and don't need to build it from source. Additionally,
we no longer need to build a forked Zephyr package, since ZLoadSession
and ZDumpSession were merged in
e6a545e759.
To not change the `supervisor.conf` file, which requires a restart of
supervisor (and thus all services running under it, which is extremely
disruptive) we carefully leave the contents unchanged for most
installs, and append a new piece to the file, only for the zmirror
configuration, using `concat`.
We see connection timeouts and other access issues when run exactly on
the hour, either due to load on their servers from similar cron jobs,
or from operational processes of theirs.
Move to on the :17s to avoid these access issues.
Increasing worker_connections has a memory cost, unlike the rest of
the changes in 1c76036c61d8; setting it to 1 million caused nginx to
consume several GB of memory.
Reduce the default down to 10k, and allow deploys to configure it up
if necessary. `worker_rlimit_nofile` is left at 1M, since it has no
impact on memory consumption.
There is no reason that the base node access method should be run
under supervisor, which exists primarily to give access to the `zulip`
user to restart its managed services. This access is unnecessary for
Teleport, and also causes unwanted restarts of Teleport services when
the `supervisor` base configuration changes. Additionally,
supervisor does not support the in-place upgrade process that Teleport
uses, as it replaces its core process with a new one.
Switch to installing a systemd configuration file (as generated by
`teleport install systemd`) for each part of Teleport, customized to
pass a `--config` path. As such, we explicitly disable the `teleport`
service provided by the package.
The supervisor process is shut down by dint of no longer installing
the file, which purges it from the managed directory, and reloads
Supervisor to pick up the removed service.
Zulip already has integrations for server-side Sentry integration;
however, it has historically used the Zulip-specific `blueslip`
library for monitoring browser-side errors. However, the latter sends
errors to email, as well optionally to an internal `#errors` stream.
While this is sufficient for low volumes of users, and useful in that
it does not rely on outside services, at higher volumes it is very
difficult to do any analysis or filtering of the errors. Client-side
errors are exceptionally noisy, with many false positives due to
browser extensions or similar, so determining real real errors from a
stream of un-grouped emails or messages in a stream is quite
difficult.
Add a client-side Javascript sentry integration. To provide useful
backtraces, this requires extending the pre-deploy hooks to upload the
source-maps to Sentry. Additional keys are added to the non-public
API of `page_params` to control the DSN, realm identifier, and sample
rates.
The current threshold of 40k descriptors was set in 2016, chosen to be
"at least 40x our current scale." At present, that only provides a
50% safety margin. Increase to 1 million to provide the same 40x
buffer as previously.
The highest value currently allowed by the kernels in
production (linux 5.3.0) is 1048576. This is set as the hard limit.
The 1 million limit is likely far above what the system can handle for
other reasons (memory, cpu, etc). While this removes a potential
safeguard on overload due to too many connections, due to the longpoll
architecture we would generally prefer to service more connections at
lower quality (due to CPU limitations) rather than randomly reject
additional connections.
Relevant prior commits:
- 836f313e69
- f2f97dd335
- ec23996538
- 8806ec698a
- e4fce10f46
After reflecting a bit on the last commit, I think it's substantially
easier to understand what's happening for these two tasks to be
defined in the same file, because we want the timing to be different
to avoid potential races.
5db55c38dc switched from `ensure => present` to the more specific
`ensure => directory` on the premise that tarballs would result in
more than one file being copied out of them. However, we only extract
a single file from the wal-g tarball, and install it at the output
path. The new rule attempts to replace it with an empty directory
after extraction.
Switch back to `ensure => present` for the tarball codepath.
These hooks are run immediately around the critical section of the
upgrade. If the upgrade fails for preparatory reasons, the pre-deploy
hook may not be run; if it fails during the upgrade, the post-deploy
hook will not be run. Hooks are called from the CWD of the new
deploy, with arguments of the old version and the new version. If
they exit with non-0 exit code, the deploy aborts.
Similar to the previous commit, Django was responsible for setting the
Content-Disposition based on the filename, whereas the Content-Type
was set by nginx based on the filename. This difference is not
exploitable, as even if they somehow disagreed with Django's expected
Content-Type, nginx will only ever respond with Content-Types found in
`uploads.types` -- none of which are unsafe for user-supplied content.
However, for consistency, have Django provide both Content-Type and
Content-Disposition headers.
The Content-Type of user-provided uploads was provided by the browser
at initial upload time, and stored in S3; however, 04cf68b45e
switched to determining the Content-Disposition merely from the
filename. This makes uploads vulnerable to a stored XSS, wherein a
file uploaded with a content-type of `text/html` and an extension of
`.png` would be served to browsers as `Content-Disposition: inline`,
which is unsafe.
The `Content-Security-Policy` headers in the previous commit mitigate
this, but only for browsers which support them.
Revert parts of 04cf68b45e, specifically by allowing S3 to provide
the Content-Disposition header, and using the
`ResponseContentDisposition` argument when necessary to override it to
`attachment`. Because we expect S3 responses to vary based on this
argument, we include it in the cache key; since the query parameter
has dashes in it, we can't use use the helper `$arg_` variables, and
must parse it from the query parameters manually.
Adding the disposition may decrease the cache hit rate somewhat, but
downloads are infrequent enough that it is unlikely to have a
noticeable effect. We take care to not adjust the cache key for
requests which do not specify the disposition.
This was missed in 04cf68b45ebb5c03247a0d6453e35ffc175d55da; as this
content is fundamentally untrusted, it must be served with
`Content-Security-Policy` headers in order to be safe. These headers
were not provided previously for S3 content because it was served from
the S3 domain.
This mitigates content served from Zulip which could be a stored XSS,
but only in browsers which support Content-Security-Policy headers;
see subsequent commit for the complete solution.
In nginx, `location` blocks operate on the _decoded_ URI[^1]:
> The matching is performed against a normalized URI, after decoding
> the text encoded in the “%XX” form
This means that if a user-uploaded file contains characters that are
not URI-safe, the browser encodes them in UTF-8 and then URI-encodes
them -- and nginx decodes them and reassembles the original character
before running the `location ~ ^/...` match. This means that the `$2`
_is not URI-encoded_ and _may contain non-ASCII characters.
When `proxy_pass` is passed a value containing one or more variables,
it does no encoding on that expanded value, assuming that the bytes
are exactly as they should be passed to the upstream. This means that
directly calling `proxy_pass https://$1/$2` would result in sending
high-bit characters to the S3 upstream, which would rightly balk.
However, a longstanding bug in nginx's `set` directive[^2] means that
the following line:
```nginx
set $download_url https://$1/$2;
```
...results in nginx accidentally URI-encoding $1 and $2 when they are
inserted, resulting in a `$download_url` which is suitable to pass to
`proxy_pass`. This bug is only present with numeric capture
variables, not named captures; this is particularly relevant because
numeric captures are easily overridden by additional regexes
elsewhere, as subsequent commits will add.
Fixing this is complicated; nginx does not supply any way to escape
values[^3], besides a third-party module[^4] which is an undue
complication to begin using. The only variable which nginx exposes
which is _not_ un-escaped already is `$request_uri`, which contains
the very original URL sent by the browser -- and thus can't respect
any work done in Django to generate the `X-Accel-Redirect` (e.g., for
`/user_uploads/temporary/` URLs). We also cannot pass these URLs to
nginx via query-parameters, since `$arg_foo` values are not
URI-decoded by nginx, there is no function to do so[^3], and the
values must be URI-encoded because they themselves are URLs with query
parameters.
Extra-URI-encode the path that we pass to the `X-Accel-Redirect`
location, for S3 redirects. We rely on the `location` block
un-escaping that layer, leaving `$s3_hostname` and `$s3_path` as they
were intended in Django.
This works around the nginx bug, with no behaviour change.
[^1]: http://nginx.org/en/docs/http/ngx_http_core_module.html#location
[^2]: https://trac.nginx.org/nginx/ticket/348
[^3]: https://trac.nginx.org/nginx/ticket/52
[^4]: https://github.com/openresty/set-misc-nginx-module#set_escape_uri
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The `postfix.mailname` setting in `/etc/zulip.conf` was previously
only used for incoming mail, to identify in Postfix configuration
which messages were "local."
Also set `/etc/mailname`, which is used by Postfix to set how it
identifies to other hosts when sending outgoing email.
Co-authored-by: Alex Vandiver <alexmv@zulip.com>
Puppet _always_ sets the `+x` bit on directories if they have the `r`
bit set for that slot[^1]:
> When specifying numeric permissions for directories, Puppet sets the
> search permission wherever the read permission is set.
As such, for instance, `0640` is actually applied as `0750`.
Fix what we "want" to match what puppet is applying, by adding the `x`
bit. In none of these cases did we actually intend the directory to
not be executable.
[1] https://www.puppet.com/docs/puppet/5.5/types/file.html#file-attribute-mode
This was last really used in d7a3570c7e, in 2013, when it was
`/home/humbug/logs`.
Repoint the one obscure piece of tooling that writes there, and remove
the places that created it.
Zulip runs puppet manually, using the command-line tool; it does not
make use of the `puppet` service which, by default, attempts to
contact a host named `puppet` every two minutes to get a manifest to
apply. These attempts can generate log spam and user confusion.
Disable and stop the `puppet` service via puppet.
When file uploads are stored in S3, this means that Zulip serves as a
302 to S3. Because browsers do not cache redirects, this means that
no image contents can be cached -- and upon every page load or reload,
every recently-posted image must be re-fetched. This incurs extra
load on the Zulip server, as well as potentially excessive bandwidth
usage from S3, and on the client's connection.
Switch to fetching the content from S3 in nginx, and serving the
content from nginx. These have `Cache-control: private, immutable`
headers set on the response, allowing browsers to cache them locally.
Because nginx fetching from S3 can be slow, and requests for uploads
will generally be bunched around when a message containing them are
first posted, we instruct nginx to cache the contents locally. This
is safe because uploaded file contents are immutable; access control
is still mediated by Django. The nginx cache key is the URL without
query parameters, as those parameters include a time-limited signed
authentication parameter which lets nginx fetch the non-public file.
This adds a number of nginx-level configuration parameters to control
the caching which nginx performs, including the amount of in-memory
index for he cache, the maximum storage of the cache on disk, and how
long data is retained in the cache. The currently-chosen figures are
reasonable for small to medium deployments.
The most notable effect of this change is in allowing browsers to
cache uploaded image content; however, while there will be many fewer
requests, it also has an improvement on request latency. The
following tests were done with a non-AWS client in SFO, a server and
S3 storage in us-east-1, and with 100 requests after 10 requests of
warm-up (to fill the nginx cache). The mean and standard deviation
are shown.
| | Redirect to S3 | Caching proxy, hot | Caching proxy, cold |
| ----------------- | ------------------- | ------------------- | ------------------- |
| Time in Django | 263.0 ms ± 28.3 ms | 258.0 ms ± 12.3 ms | 258.0 ms ± 12.3 ms |
| Small file (842b) | 586.1 ms ± 21.1 ms | 266.1 ms ± 67.4 ms | 288.6 ms ± 17.7 ms |
| Large file (660k) | 959.6 ms ± 137.9 ms | 609.5 ms ± 13.0 ms | 648.1 ms ± 43.2 ms |
The hot-cache performance is faster for both large and small files,
since it saves the client the time having to make a second request to
a separate host. This performance improvement remains at least 100ms
even if the client is on the same coast as the server.
Cold nginx caches are only slightly slower than hot caches, because
VPC access to S3 endpoints is extremely fast (assuming it is in the
same region as the host), and nginx can pool connections to S3 and
reuse them.
However, all of the 648ms taken to serve a cold-cache large file is
occupied in nginx, as opposed to the only 263ms which was spent in
nginx when using redirects to S3. This means that to overall spend
less time responding to uploaded-file requests in nginx, clients will
need to find files in their local cache, and skip making an
uploaded-file request, at least 60% of the time. Modeling shows a
reduction in the number of client requests by about 70% - 80%.
The `Content-Disposition` header logic can now also be entirely shared
with the local-file codepath, as can the `url_only` path used by
mobile clients. While we could provide the direct-to-S3 temporary
signed URL to mobile clients, we choose to provide the
served-from-Zulip signed URL, to better control caching headers on it,
and greater consistency. In doing so, we adjust the salt used for the
URL; since these URLs are only valid for 60s, the effect of this salt
change is minimal.
Moving `/user_avatars/` to being served partially through Django
removes the need for the `no_serve_uploads` nginx reconfiguring when
switching between S3 and local backends. This is important because a
subsequent commit will move S3 attachments to being served through
nginx, which would make `no_serve_uploads` entirely nonsensical of a
name.
Serve the files through Django, with an offload for the actual image
response to an internal nginx route. In development, serve the files
directly in Django.
We do _not_ mark the contents as immutable for caching purposes, since
the path for avatar images is hashed only by their user-id and a salt,
and as such are reused when a user's avatar is updated.
The `django-sendfile2` module unfortunately only supports a single
`SENDFILE` root path -- an invariant which subsequent commits need to
break. Especially as Zulip only runs with a single webserver, and
thus sendfile backend, the functionality is simple to inline.
It is worth noting that the following headers from the initial Django
response are _preserved_, if present, and sent unmodified to the
client; all other headers are overridden by those supplied by the
internal redirect[^1]:
- Content-Type
- Content-Disposition
- Accept-Ranges
- Set-Cookie
- Cache-Control
- Expires
As such, we explicitly unset the Content-type header to allow nginx to
set it from the static file, but set Content-Disposition and
Cache-Control as we want them to be.
[^1]: https://www.nginx.com/resources/wiki/start/topics/examples/xsendfile/
As uploads are a feature of the application, not of a generic nginx
deployment, move them into the `zulip::app_frontend_base` class. This
is purely for organizational clarity -- we do not support deployments
with has `zulip::nginx` but not `zulip::app_frontend_base`.
‘exit’ is pulled in for the interactive interpreter as a side effect
of the site module; this can be disabled with python -S and shouldn’t
be relied on.
Also, use the NoReturn type where appropriate.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Starting with wal-g 2.0.1, they provide `aarch64` assets[^1].
Effectively revert d7b59c86ce, and use
the pre-built binary for `aarch64` rather than spend a bunch of space
and time having to build it from source.
[^1]: https://github.com/wal-g/wal-g/releases/tag/v2.0.1
A number of autossh connections are already left open for
port-forwarding Munin ports; autossh starts the connections and
ensures that they are automatically restarted if they are severed.
However, this represents a missed opportunity. Nagios's monitoring
uses a large number of SSH connections to the remote hosts to run
commands on them; each of these connections requires doing a complete
SSH handshake and authentication, which can have non-trivial network
latency, particularly for hosts which may be located far away, in a
network topology sense (up to 1s for a no-op command!).
Use OpenSSH's ability to multiplex multiple connections over a single
socket, to reuse the already-established connection. We leave an
explicit `ControlMaster no` in the general configuration, and not
`auto`, as we do not wish any of the short-lived Nagios connections to
get promoted to being a control socket if the autossh is not running
for some reason.
We enable protocol-level keepalives, to give a better chance of the
socket being kept open.
These hosts were excluded from `zulipconf_nagios_hosts` in
8cff27f67d, because it was replicating the previously hard-coded
behaviour exactly. That behaviour was an accident of history, in that
4fbe201187 and before had simply not monitored hosts of this class.
There is no reason to not add SSH tunnels and munin monitoring for
these hosts; stop skipping them.
Since Django factors request.is_secure() into its CSRF check, we need
this to tell it to consider requests forwarded from nginx to Tornado
as secure.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
One should now be able to configure a regex by appending _regex to the
port number:
[tornado_sharding]
9802_regex = ^[l-p].*\.zulipchat\.com$
Signed-off-by: Anders Kaseorg <anders@zulip.com>
https://nginx.org/en/docs/http/ngx_http_map_module.html
Since Puppet doesn’t manage the contents of nginx_sharding.conf after
its initial creation, it needs to be renamed so we can give it
different default contents.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Some legitimate requests in Zulip can take more than 20s to be
processed, and we don't have a current problem where having a 20s
limit here is preventing a problem.
The `needrestart` tool added in 22.04 is useful in terms of listing
which services may need to be restarted to pick up updated libraries.
However, it prompts about the current state of services needing
restart for *every* subsequent `apt-get upgrade`, and defaulting core
services to restarting requires carefully manually excluding them
every time, at risk of causing an unscheduled outage.
Build a list of default-off services based on the list in
unattended-upgrades.
The default value in uwsgi is 4k; receiving more than this amount from
nginx leads to a 502 response (though, happily, the backend uwsgi does not
terminate).
ab18dbfde5 originally increased it from the unstated uwsgi default
of 4096, to 8192; b1da797955 made it configurable, in order to allow
requests from clients with many cookies, without causing 502's[1].
nginx defaults to a limitation of 1k, with 4 additional 8k header
lines allowed[2]; any request larger than that returns a response of
`400 Request Header Or Cookie Too Large`. The largest header size
theoretically possible from nginx, by default, is thus 33k, though
that would require packing four separate headers to exactly 8k each.
Remove the gap between nginx's limit and uwsgi's, which could trigger
502s, by removing the uwsgi configurability, and setting a 64k size in
uwsgi (the max allowable), which is larger than nginx's default limit.
uWSGI's documentation of `buffer-size` ([3], [4]) also notes that "It
is a security measure too, so adapt to your app needs instead of
maxing it out." Python has no security issues with buffers of 64k,
and there is no appreciable memory footprint difference to having a
larger buffer available in uwsgi.
[1]: https://chat.zulip.org/#narrow/stream/31-production-help/topic/works.20in.20Edge.20not.20Chrome/near/719523
[2]: https://nginx.org/en/docs/http/ngx_http_core_module.html#client_header_buffer_size
[3]: https://uwsgi-docs.readthedocs.io/en/latest/ThingsToKnow.html
[4]: https://uwsgi-docs.readthedocs.io/en/latest/Options.html#buffer-size
Support for this header was removed in Chrome 78, Safari 15.4, and
Edge 17. It was never supported in Firefox.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This check loads Django, and as such must be run as the zulip user.
Repeat the same pattern used elsewhere in nagios, of writing a state
file, which is read by `check_cron_file`.
Replication checks should only run on primary and replicas, not
standalone hosts; while `autovac_freeze` currently only runs on
primary hosts, it functions identically on replicas, and is fine to
run there.
Make `autovac_freeze` run on all `postgresql` hosts, and make
standalone hosts no longer `postgres_primary`, so they do not fail the
replication tests.
These style of checks just look for matching process names using
`check_remote_arg_string`, which dates to 8edbd64bb8. These were
added because the original two (`missedmessage_emails` and
`slow_queries`) did not create consumers, instead polling for events.
Switch these to checking the queue consumer counts that the
`check-rabbitmq-consumers` check is already writing out. Since the
`missedmessage_emails` was _already_ checked via the consumer check, a
duplicate is not added.
Even the `pageable_servers` group did not page for high load -- in
part because what was "high" depends on the servers. Set slightly
better limits based on server role.
`zmirror` itself was `zmirror_main` + `zmirrorp` but was unused; we
consistently just use the term `zmirror` for the non-personals server,
so use it as the hostgroup name.
The Redis nagios checks themselves are done against `redis` +
`frontends` groups, so there is no need to misleadingly place
`frontends` in the `redis` hostgroup.
5abf4dee92 made this distinction, then multitornado_frontends was
never used; the singletornado_frontends alerting worked even for the
multiple-Tornado instances.
Remove the useless and misleading distinction.
Even if Django and PostgreSQL are on the same host, the `nagios` user
may lack permissions to read accessory configuration files needed to
load the Django configuration (e.g. authentication keys).
Catch those failures, and switch to loading the required settings from
`/etc/zulip/zulip.conf`.
Without this, uwsgi does not release the GIL before going back into
`epoll_wait` to wait for the next request. This results in any
background threads languishing, unserviced.[1]
Practically, this results in Sentry background reporter threads timing
out when attempting to post results -- but only in situations with low
traffic, as in those significant time is spent in `epoll_wait`. This
is seen in logs as:
WARN [urllib3.connectionpool] Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))': /api/123456789/envelope/
Or:
WARN [urllib3.connectionpool] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProxyError('Cannot connect to proxy.', RemoteDisconnected('Remote end closed connection without response'))': /api/123456789/envelope/
Sentry attempts to detect this and warn, but due to startup ordering,
the warning is not printed without lazy-loading.
Enable threads, at a miniscule performance cost, in order to support
background workers like Sentry[2].
[1] https://github.com/unbit/uwsgi/issues/1141#issuecomment-169042767
[2] https://docs.sentry.io/clients/python/advanced/#a-note-on-uwsgi
This is a reprise of c97162e485, but for the case where certbot
certs are no longer in use by way of enabling `http_only` and letting
another server handle TLS termination.
Fixes: #22034.
This allows system-level configuration to be done by `apt-get install`
of nginx modules, which place their load statements in this directory.
The initial import in ed0cb0a5f8 of the stock nginx config omitted
this include -- one potential explanation was in an effort to reduce
the memory footprint of the server.
The default nginx install enables:
50-mod-http-auth-pam.conf
50-mod-http-dav-ext.conf
50-mod-http-echo.conf
50-mod-http-geoip2.conf
50-mod-http-geoip.conf
50-mod-http-image-filter.conf
50-mod-http-subs-filter.conf
50-mod-http-upstream-fair.conf
50-mod-http-xslt-filter.conf
50-mod-mail.conf
50-mod-stream.conf
While Zulip doesn't actively use any of these, they likely don't do
any harm to simply be loaded -- they are loaded into every nginx by
default.
Having the `modules-enabled` include allows easier extension of the
server, as neither of the existing wildcard
includes (`/etc/nginx/conf.d/*.conf` and
`/etc/nginx/zulip-include/app.d/*.conf`) are in the top context, and
thus able to load modules.
54b6a83412 fixed the typo introduced in 49ad188449, but that does
not clean up existing installs which had the file with the wrong name
already.
Remove the file with the typo'd name, so two jobs do not race, and fix
the typo in the comment.
The top-level `chdir` setting only does the chdir once, at initial
`uwsgi` startup time. Rolling restarts, however, however, require
that `uwsgi` pick up the _new_ value of the `current` directory, and
start new workers in that directory -- as currently implemented,
rolling restarts cannot restart into newer versions of the code, only
the same one in which they were started.
Use [configurable hooks][1] to execute the `chdir` after every fork.
This causes the following behaviour:
```
Thu May 12 18:56:55 2022 - chain reload starting...
Thu May 12 18:56:55 2022 - chain next victim is worker 1
Gracefully killing worker 1 (pid: 1757689)...
worker 1 killed successfully (pid: 1757689)
Respawned uWSGI worker 1 (new pid: 1757969)
Thu May 12 18:56:56 2022 - chain is still waiting for worker 1...
running "chdir:/home/zulip/deployments/current" (post-fork)...
Thu May 12 18:56:57 2022 - chain is still waiting for worker 1...
Thu May 12 18:56:58 2022 - chain is still waiting for worker 1...
Thu May 12 18:56:59 2022 - chain is still waiting for worker 1...
WSGI app 0 (mountpoint='') ready in 3 seconds on interpreter 0x55dfca409170 pid: 1757969 (default app)
Thu May 12 18:57:00 2022 - chain next victim is worker 2
[...]
```
..and so forth down the line of processes. Each process is correctly
started in the _current_ value of `current`, and thus picks up the
correct code.
[1]: https://uwsgi-docs.readthedocs.io/en/latest/Hooks.html
Our current EC2 systems don’t have an interface named ‘eth0’, and if
they did, this script would do nothing but crash with ImportError
because we have never installed boto.utils for Python 3.
(The message of commit 2a4d851a7c made
an effort to document for future researchers why this script should
not have been blindly converted to Python 3. However, commit
2dc6d09c2a (#14278) was evidently
unresearched and untested.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
6f5ae8d13d removed the `$replication` variable from the
configurations of PostgreSQL 12 and higher, but left it in the
templates for PostgreSQL 10 and 11. Because `undef != ''`,
deployments on PostgreSQL 10 and 11 started trying to push to S3
backups, regardless of if they were configured, leaving frequent log
messages like:
```
2022-04-30 12:45:47.805 UTC [626d24ec.1f8db0]: [107-1] LOG: archiver process (PID 2086106) exited with exit code 1
2022-04-30 12:45:49.680 UTC [626d24ee.1f8dc3]: [18-1] LOG: checkpoint complete: wrote 19 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=1.910 s, sync=0.022 s, total=1.950 s; sync files=16, longest=0.018 s, average=0.002 s; distance=49 kB, estimate=373 kB
/usr/bin/timeout: failed to run command "/usr/local/bin/env-wal-g": No such file or directory
2022-04-30 12:46:17.852 UTC [626d2f99.1fd4e9]: [1-1] FATAL: archive command failed with exit code 127
2022-04-30 12:46:17.852 UTC [626d2f99.1fd4e9]: [2-1] DETAIL: The failed archive command was: /usr/bin/timeout 10m /usr/local/bin/env-wal-g wal-push pg_wal/000000010000000300000080
```
Switch the PostgreSQL 10 and 11 configuration to check
`s3_backups_bucket`, like the other versions.
It is possible to have previously installed certbot, but switched back
to using self-signed certificates -- in which case renewing them using
certbot may fail.
Verify that the certificate is a symlink into certbot's output
directory before running `fix-standalone-certbot`.
Commit f6d27562fa (#21564) tried to
ensure Chrony is running, which fails in containers where Chrony
doesn’t have permission to update the host clock.
The Debian package should still attempt to start it, and Puppet should
still restart it when chrony.conf is modified.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Since wal-g does not provide binaries for aarch64, build them from
source. While building them from source for arm64 would better ensure
that build process is tested, the build process takes 7min and 700M of
temp files, which is an unacceptable cost; we thus only build on
aarch64.
Since the wal-g build process uses submodules, which are not in the
Github export, we clone the full wal-g repository. Because the
repository is relatively small, we clone it anew on each new version,
rather than attempt to manage the remotes.
Fixes#21070.
The default timeout for `exec` commands in Puppet is 5 minutes[1]. On
slow connections, this may not be sufficient to download larger
downloads, such as the ~135MB golang tarball.
Increase the timeout to 10 minutes; this is a minimum download speed
of is ~225kB/s.
Fixes#21449.
[1]: https://puppet.com/docs/puppet/5.5/types/exec.html#exec-attribute-timeout
This commit adds a cron job which runs every hour to add the users to
full members system group if user is promoted to a full member.
This should ensure that full member status is available no more than
an hour after configuration suggests it should be.
Previously, it was possible to configure `wal-g` backups without
replication enabled; this resulted in only daily backups, not
streaming backups. It was also possible to enable replication without
configuring the `wal-g` backups bucket; this simply failed to work.
Make `wal-g` backups always streaming, and warn loudly if replication
is enabled but `wal-g` is not configured.
It would confuse a future Debian 15.10 release with Ubuntu 15.10, it
relies on the legacy fact $::operatingsystemrelease, the modern fact
$::os provides this information without extra logic, and it’s unused
as of commit 03bffd3938.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Zulip writes a `rabbitmq.config` configuration file which locks down
RabbitMQ to listen only on localhost:5672, as well as the RabbitMQ
distribution port, on localhost:25672.
The "distribution port" is part of Erlang's clustering configuration;
while it is documented that the protocol is fundamentally
insecure ([1], [2]) and can result in remote arbitrary execution of
code, by default the RabbitMQ configuration on Debian and Ubuntu
leaves it publicly accessible, with weak credentials.
The configuration file that Zulip writes, while effective, is only
written _after_ the package has been installed and the service
started, which leaves the port exposed until RabbitMQ or system
restart.
Ensure that rabbitmq's `/etc/rabbitmq/rabbitmq.config` is written
before rabbitmq is installed or starts, and that changes to that file
trigger a restart of the service, such that the ports are only ever
bound to localhost. This does not mitigate existing installs, since
it does not force a rabbitmq restart.
[1] https://www.erlang.org/doc/apps/erts/erl_dist_protocol.html
[2] https://www.erlang.org/doc/reference_manual/distributed.html#distributed-erlang-system
This is required in order to lock down the RabbitMQ port to only
listen on localhost. If the nodename is `rabbit@hostname`, in most
circumstances the hostname will resolve to an external IP, which the
rabbitmq port will not be bound to.
Installs which used `rabbit@hostname`, due to RabbitMQ having been
installed before Zulip, would not have functioned if the host or
RabbitMQ service was restarted, as the localhost restrictions in the
RabbitMQ configuration would have made rabbitmqctl (and Zulip cron
jobs that call it) unable to find the rabbitmq server.
The previous commit ensures that configure-rabbitmq is re-run after
the nodename has changed. However, rabbitmq needs to be stopped
before `rabbitmq-env.conf` is changed; we use an `onlyif` on an `exec`
to print the warning about the node change, and let the subsequent
config change and notify of the service and configure-rabbitmq to
complete the re-configuration.
`/etc/rabbitmq/rabbitmq-env.conf` sets the nodename; anytime the
nodename changes, the backing database changes, and this requires
re-creating the rabbitmq users and permissions.
Trigger this in puppet by running configure-rabbitmq after the file
changes.
The Erlang `epmd` daemon listens on port 4369, and provides
information (without authentication) about which Erlang processes are
listening on what ports. This information is not itself a
vulnerability, but may provide information for remote attackers about
what local Erlang services (such as `rabbitmq-server`) are running,
and where.
`epmd` supports an `ERL_EPMD_ADDRESS` environment variable to limit
which interfaces it binds on. While this environment variable is set
in `/etc/default/rabbitmq-server`, Zulip unfortunately attempts to
start `epmd` using an explicit `exec` block, which ignores those
settings.
Regardless, this lack of `ERL_EPMD_ADDRESS` variable only controls
`epmd`'s startup upon first installation. Upon reboot, there are two
ways in which `epmd` might be started, neither of which respect
`ERL_EPMD_ADDRESS`:
- On Focal, an `epmd` service exists and is activated, which uses
systemd's configuration to choose which interfaces to bind on, and
thus `ERL_EPMD_ADDRESS` is irrelevant.
- On Bionic (and Focal, due to a broken dependency from
`rabbitmq-server` to `epmd@` instead of `epmd`, which may lead to
the explicit `epmd` service losing a race), `epmd` is started by
`rabbitmq-server` when it does not detect a running instance.
Unfortunately, only `/etc/init.d/rabbitmq-server` would respects
`/etc/default/rabbitmq-server` -- and it defers the actual startup
to using systemd, which does not pass the environment variable
down. Thus, `ERL_EPMD_ADDRESS` is also irrelevant here.
We unfortunately cannot limit `epmd` to only listening on localhost,
due to a number of overlapping bugs and limitations:
- Manually starting `epmd` with `-address 127.0.0.1` silently fails
to start on hosts with IPv6 disabled, due to an Erlang bug ([1],
[2]).
- The dependencies of the systemd `rabbitmq-server` service can be
fixed to include the `epmd` service, and systemd can be made to
bind to `127.0.0.1:4369` and pass that socket to `epmd`, bypassing
the above bug. However, the startup of this service is not
guaranteed, because it races with other sources of `epmd` (see
below).
- Any process that runs `rabbitmqctl` results in `epmd` being started
if one is not currently running; these instances do not respect any
environment variables as to which addresses to bind on. This is
also triggered by `service rabbitmq-server status`, as well as
various Zulip cron jobs which inspect the rabbitmq queues. As
such, it is difficult-to-impossible to ensure that some other
`epmd` process will not win the race and open the port on all
interfaces.
Since the only known exposure from leaving port 4369 open is
information that rabbitmq is running on the host, and the complexity
of adjusting this to only bind on localhost is high, we remove the
setting which does not address the problem, and document that the port
is left open, and should be protected via system-level or
network-level firewalls.
[1]: https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109
[2]: https://github.com/erlang/otp/issues/4820
mochiweb was renamed to web_dispatch in RabbitMQ 3.8.0, and the plugin
is not enabled. Nor does this control the management interface, which
would listen on port 15672.
This addresses the problems mentioned in the previous commit, but for
existing installations which have `authenticator = standalone` in
their configurations.
This reconfigures all hostnames in certbot to use the webroot
authenticator, and attempts to force-renew their certificates.
Force-renewal is necessary because certbot contains no way to merely
update the configuration. Let's Encrypt allows for multiple extra
renewals per week, so this is a reasonable cost.
Because the certbot configuration is `configobj`, and not
`configparser`, we have no way to easily parse to determine if webroot
is in use; additionally, `certbot certificates` does not provide this
information. We use `grep`, on the assumption that this will catch
nearly all cases.
It is possible that this will find `authenticator = standalone`
certificates which are managed by Certbot, but not Zulip certificates.
These certificates would also fail to renew while Zulip is running, so
switching them to use the Zulip webroot would still be an improvement.
Fixes#20593.
As a consequence:
• Bump minimum supported Python version to 3.7.
• Move Vagrant environment to Debian 10, which has Python 3.7.
• Move CI frontend tests to Debian 10.
• Move production build test to Debian 10.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Doing so requires protecting /metrics from direct access when proxied
through nginx. If camo is placed on a separate host, the equivalent
/metrics URL may need to be protected.
See https://github.com/cactus/go-camo#metrics for details on the
statistics so reported. Note that 5xx responses are _expected_ from
go-camo's statistics, as it returns 502 status code when the remote
server responds with 500/502/503/504, or 504 when the remote host
times out.
Because Camo includes logic to deny access to private subnets, routing
its requests through Smokescreen is generally not necessary. However,
it may be necessary if Zulip has configured a non-Smokescreen exit
proxy.
Default Camo to using the proxy only if it is not Smokescreen, with a
new `proxy.enable_for_camo` setting to override this behaviour if need
be. Note that that setting is in `zulip.conf` on the host with Camo
installed -- not the Zulip frontend host, if they are different.
Fixes: #20550.
For `no_serve_uploads`, `http_only`, which previously specified
"non-empty" to enable, this tightens what values are true. For
`pgroonga` and `queue_workers_multiprocess`, this broadens the
possible values from `enabled`, and `true` respectively.
Restarting the uwsgi processes by way of supervisor opens a window
during which nginx 502's all responses. uwsgi has a configuration
called "chain reloading" which allows for rolling restart of the uwsgi
processes, such that only one process at once in unavailable; see
uwsgi documentation ([1]).
The tradeoff is that this requires that the uwsgi processes load the
libraries after forking, rather than before ("lazy apps"); in theory
this can lead to larger memory footprints, since they are not shared.
In practice, as Django defers much of the loading, this is not as much
of an issue. In a very basic test of memory consumption (measured by
total memory - free - caches - buffers; 6 uwsgi workers), both
immediately after restarting Django, and after requesting `/` 60 times
with 6 concurrent requests:
| Non-lazy | Lazy app | Difference
------------------+------------+------------+-------------
Fresh | 2,827,216 | 2,870,480 | +43,264
After 60 requests | 3,332,284 | 3,409,608 | +77,324
..................|............|............|.............
Difference | +505,068 | +539,128 | +34,060
That is, "lazy app" loading increased the footprint pre-requests by
43MB, and after 60 requests grew the memory footprint by 539MB, as
opposed to non-lazy loading, which grew it by 505MB. Using wsgi "lazy
app" loading does increase the memory footprint, but not by a large
percentage.
The other effect is that processes may be served by either old or new
code during the restart window. This may cause transient failures
when new frontend code talks to old backend code.
Enable chain-reloading during graceful, puppetless restarts, but only
if enabled via a zulip.conf configuration flag.
Fixes#2559.
[1]: https://uwsgi-docs.readthedocs.io/en/latest/articles/TheArtOfGracefulReloading.html#chain-reloading-lazy-apps
Fix another tidy error caused by 1e4e6a09af23; as also noted in
f9a39b6703, these resources are necessary such that tidy does not
cleanup of smokescreen, and then force a recompilation of it again.