Since logrotate runs in a daily cron, this practically means "daily,
but only if it's larger than 500M." For large installs with large
traffic, this is effectively daily for 10 days; for small installs, it
is an unknown amount of time.
Switch to daily logfiles, defaulting to 14 days to match nginx; this
can be overridden using a zulip.conf setting. This makes it easier to
ensure that access logs are only kept for a bounded period of time.
Increasing worker_connections has a memory cost, unlike the rest of
the changes in 1c76036c61d8; setting it to 1 million caused nginx to
consume several GB of memory.
Reduce the default down to 10k, and allow deploys to configure it up
if necessary. `worker_rlimit_nofile` is left at 1M, since it has no
impact on memory consumption.
This is the behaviour inherited from Django[^1]. While setting the
password to empty (`email_password = `) in
`/etc/zulip/zulip-secrets.conf` also would suffice, it's unclear what
the user would have been putting into `EMAIL_HOST_USER` in that
context.
Because we previously did not warn when `email_password` was not
present in `zulip-secrets.conf`, having the error message clarify the
correct configuration for disabling SMTP auth is important.
Fixes: #23938.
[^1]: https://docs.djangoproject.com/en/4.1/ref/settings/#std-setting-EMAIL_HOST_USER
The documentation for restoring backups referenced that it needed to
be to the same version of PostgreSQL, but did not explain how to do
that.
Link to the relevant section of the installer documentation, and name
the flag explicitly.
Fixes: #23691
These hooks are run immediately around the critical section of the
upgrade. If the upgrade fails for preparatory reasons, the pre-deploy
hook may not be run; if it fails during the upgrade, the post-deploy
hook will not be run. Hooks are called from the CWD of the new
deploy, with arguments of the old version and the new version. If
they exit with non-0 exit code, the deploy aborts.
We will hopefully be able to just this in #16208 to document what
users need to configure in order to do this manually, but the content
here will be useful for anyone who hasn't set that up regardless.
This adds a new endpoint /jwt/fetch_api_key that accepts a JWT and can
be used to fetch API keys for a certain user. The target realm is
inferred from the request and the user email is part of the JWT.
A JSON containing an user API key, delivery email and (optionally)
raw user profile data is returned in response.
The profile data in the response is optional and can be retrieved by
setting the POST param "include_profile" to "true" (default=false).
Co-authored-by: Mateusz Mandera <mateusz.mandera@zulip.com>
- Renames "Customize Zulip" to "Server configuration".
- Cross-links "Server configuration" with "System and deployment
configuration".
Fixes part of #23984.
The `postfix.mailname` setting in `/etc/zulip.conf` was previously
only used for incoming mail, to identify in Postfix configuration
which messages were "local."
Also set `/etc/mailname`, which is used by Postfix to set how it
identifies to other hosts when sending outgoing email.
Co-authored-by: Alex Vandiver <alexmv@zulip.com>
When file uploads are stored in S3, this means that Zulip serves as a
302 to S3. Because browsers do not cache redirects, this means that
no image contents can be cached -- and upon every page load or reload,
every recently-posted image must be re-fetched. This incurs extra
load on the Zulip server, as well as potentially excessive bandwidth
usage from S3, and on the client's connection.
Switch to fetching the content from S3 in nginx, and serving the
content from nginx. These have `Cache-control: private, immutable`
headers set on the response, allowing browsers to cache them locally.
Because nginx fetching from S3 can be slow, and requests for uploads
will generally be bunched around when a message containing them are
first posted, we instruct nginx to cache the contents locally. This
is safe because uploaded file contents are immutable; access control
is still mediated by Django. The nginx cache key is the URL without
query parameters, as those parameters include a time-limited signed
authentication parameter which lets nginx fetch the non-public file.
This adds a number of nginx-level configuration parameters to control
the caching which nginx performs, including the amount of in-memory
index for he cache, the maximum storage of the cache on disk, and how
long data is retained in the cache. The currently-chosen figures are
reasonable for small to medium deployments.
The most notable effect of this change is in allowing browsers to
cache uploaded image content; however, while there will be many fewer
requests, it also has an improvement on request latency. The
following tests were done with a non-AWS client in SFO, a server and
S3 storage in us-east-1, and with 100 requests after 10 requests of
warm-up (to fill the nginx cache). The mean and standard deviation
are shown.
| | Redirect to S3 | Caching proxy, hot | Caching proxy, cold |
| ----------------- | ------------------- | ------------------- | ------------------- |
| Time in Django | 263.0 ms ± 28.3 ms | 258.0 ms ± 12.3 ms | 258.0 ms ± 12.3 ms |
| Small file (842b) | 586.1 ms ± 21.1 ms | 266.1 ms ± 67.4 ms | 288.6 ms ± 17.7 ms |
| Large file (660k) | 959.6 ms ± 137.9 ms | 609.5 ms ± 13.0 ms | 648.1 ms ± 43.2 ms |
The hot-cache performance is faster for both large and small files,
since it saves the client the time having to make a second request to
a separate host. This performance improvement remains at least 100ms
even if the client is on the same coast as the server.
Cold nginx caches are only slightly slower than hot caches, because
VPC access to S3 endpoints is extremely fast (assuming it is in the
same region as the host), and nginx can pool connections to S3 and
reuse them.
However, all of the 648ms taken to serve a cold-cache large file is
occupied in nginx, as opposed to the only 263ms which was spent in
nginx when using redirects to S3. This means that to overall spend
less time responding to uploaded-file requests in nginx, clients will
need to find files in their local cache, and skip making an
uploaded-file request, at least 60% of the time. Modeling shows a
reduction in the number of client requests by about 70% - 80%.
The `Content-Disposition` header logic can now also be entirely shared
with the local-file codepath, as can the `url_only` path used by
mobile clients. While we could provide the direct-to-S3 temporary
signed URL to mobile clients, we choose to provide the
served-from-Zulip signed URL, to better control caching headers on it,
and greater consistency. In doing so, we adjust the salt used for the
URL; since these URLs are only valid for 60s, the effect of this salt
change is minimal.
Moving `/user_avatars/` to being served partially through Django
removes the need for the `no_serve_uploads` nginx reconfiguring when
switching between S3 and local backends. This is important because a
subsequent commit will move S3 attachments to being served through
nginx, which would make `no_serve_uploads` entirely nonsensical of a
name.
Serve the files through Django, with an offload for the actual image
response to an internal nginx route. In development, serve the files
directly in Django.
We do _not_ mark the contents as immutable for caching purposes, since
the path for avatar images is hashed only by their user-id and a salt,
and as such are reused when a user's avatar is updated.
The authenticate_by_username limit of 5 attempts per 30 minutes can get
annoying in some cases where the user really forgot their password and
should be allowed to keep trying with admin approvial - so we should
document the command that allows unblocking them.
The documentation included the full policy for the file uploads
bucket, but only one additional statement for the avatars bucket; the
reader needed to assemble the full policy themselves.
Switch to explicitly providing the full policy for both.
Fixes#23110.
Since this is being moved to admin-facing documentation, also adds a
paragraph about the main concern with enabling this on a server that's
not zulip.com.
We should rearrange Zulip's developer docs to make it easier to
find the documentation that new contributors need.
Name changes
Rename "Code contribution guide" section -> "Contributing to Zulip".
Rename "Contributing to Zulip" page -> "Contributing guide".
Organizational changes to the newly-named "Contributing to Zulip":
Move up "Contributing to Zulip", as the third link in sidebar index.
Move up renamed "Contributing guide" page to the top of this section.
Move up "Zulip code of Conduct", as the second link of this section.
Move down "Licensing", as the last link of this section.
Move "Accessibility" just below "HTML and CSS" in Subsystems section.
Update all links according to the changes above.
Redirects should be added as needed.
Fixes: #22517.
This is not a feature intended to be used outside zulip.com, since it
just sets your server to have the zulip.com landing pages. I think
it's only been turned on by people who were confused by this text.
The default value in uwsgi is 4k; receiving more than this amount from
nginx leads to a 502 response (though, happily, the backend uwsgi does not
terminate).
ab18dbfde5 originally increased it from the unstated uwsgi default
of 4096, to 8192; b1da797955 made it configurable, in order to allow
requests from clients with many cookies, without causing 502's[1].
nginx defaults to a limitation of 1k, with 4 additional 8k header
lines allowed[2]; any request larger than that returns a response of
`400 Request Header Or Cookie Too Large`. The largest header size
theoretically possible from nginx, by default, is thus 33k, though
that would require packing four separate headers to exactly 8k each.
Remove the gap between nginx's limit and uwsgi's, which could trigger
502s, by removing the uwsgi configurability, and setting a 64k size in
uwsgi (the max allowable), which is larger than nginx's default limit.
uWSGI's documentation of `buffer-size` ([3], [4]) also notes that "It
is a security measure too, so adapt to your app needs instead of
maxing it out." Python has no security issues with buffers of 64k,
and there is no appreciable memory footprint difference to having a
larger buffer available in uwsgi.
[1]: https://chat.zulip.org/#narrow/stream/31-production-help/topic/works.20in.20Edge.20not.20Chrome/near/719523
[2]: https://nginx.org/en/docs/http/ngx_http_core_module.html#client_header_buffer_size
[3]: https://uwsgi-docs.readthedocs.io/en/latest/ThingsToKnow.html
[4]: https://uwsgi-docs.readthedocs.io/en/latest/Options.html#buffer-size
Backups are written every 16k of WAL archive, and by default do not
have an upper limit on how out of date they are, as `archive_timeout`
defaults to 0.
Also emphasize that these are streaming backups, not just one
point-in-time backup daily.
Fixes#21976.
Notable changes:
- Describe `X-Forwarded-For` by name.
- Switch each specific proxy to numbered steps.
- Link back to the `X-Forwarded-For` section in each proxy
- Default to using HTTPS, not HTTP, for the backend.
- Include the HTTP-to-HTTPS redirect code for all proxies; it is
important that it happen at the proxy, as the backend is unaware of
it.
- Call out Apache2 modules which are necessary.
- Specify where the dhparam.pem file can be found.
- Call out the `Host:` header forwarding necessary, and document
`USE_X_FORWARDED_HOST` if that is not possible.
- Standardize on 20 minutes of connection timeout.
As a consequence:
• Bump minimum supported Python version to 3.8.
• Move Vagrant environment to Ubuntu 20.04, which has Python 3.8.
• Move CI frontend tests to Ubuntu 20.04.
• Move production build test to Ubuntu 20.04.
• Move 3.4 upgrade test to Ubuntu 20.04.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
On the Debian 10 -> 11 upgrade, the server is running Zulip 4.x, which
lets us pass `--audit-fts-indexes` to `upgrade-zulip-stage-2` rather
than run the command as a separate step.
The reindex-textual-data tool needs the venv to be cable to run;
switch the order of the last two steps, making them now match the
Debian 9 -> 10 and 10 -> upgrades.
Ref #21296.
When the credentials are provided by dint of being run on an EC2
instance with an assigned Role, we must be able to fetch the instance
metadata from IMDS -- which is precisely the type of internal-IP
request that Smokescreen denies.
While botocore supports a `proxies` argument to the `Config` object,
this is not actually respected when making the IMDS queries; only the
environment variables are read from. See
https://github.com/boto/botocore/issues/2644
As such, implement S3_SKIP_PROXY by monkey-patching the
`botocore.utils.should_bypass_proxies` function, to allow requests to
IMDS to be made without Smokescreen impeding them.
Fixes#20715.
Previously, it was possible to configure `wal-g` backups without
replication enabled; this resulted in only daily backups, not
streaming backups. It was also possible to enable replication without
configuring the `wal-g` backups bucket; this simply failed to work.
Make `wal-g` backups always streaming, and warn loudly if replication
is enabled but `wal-g` is not configured.
Ubuntu 22.04 pushed a post-feature-freeze update to Python 3.10,
breaking virtual environments in a Debian patch
(https://bugs.launchpad.net/ubuntu/+source/python3.10/+bug/1962791).
Also, our antique version of Tornado doesn’t work in 3.10, and we’ll
need to do some work to upgrade that.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This uses the myst_heading_anchors option to automatically generate
header anchors and make Sphinx aware of them. See
https://myst-parser.readthedocs.io/en/latest/syntax/optional.html#auto-generated-header-anchors.
Note: to be compatible with GitHub, MyST-Parser uses a slightly
different convention for .md fragment links than .html fragment links
when punctuation is involved. This does not affect the generated
fragment links in the HTML output.
Fixes#13264.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
We previously had a convention of redundantly including the directory
in relative links to reduce mistakes when moving content from one file
to another. However, these days we have a broken link checker in
test-documentation, and after #21237, MyST-Parser will check relative
links (including fragments) when you run build-docs.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This was only used for upgrading from Zulip < 1.9.0, which is no
longer possible because Zulip < 2.1.0 had no common supported
platforms with current main.
If we ever want this optimization for a future migration, it would be
better implemented using Django merge migrations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This is required in order to lock down the RabbitMQ port to only
listen on localhost. If the nodename is `rabbit@hostname`, in most
circumstances the hostname will resolve to an external IP, which the
rabbitmq port will not be bound to.
Installs which used `rabbit@hostname`, due to RabbitMQ having been
installed before Zulip, would not have functioned if the host or
RabbitMQ service was restarted, as the localhost restrictions in the
RabbitMQ configuration would have made rabbitmqctl (and Zulip cron
jobs that call it) unable to find the rabbitmq server.
The previous commit ensures that configure-rabbitmq is re-run after
the nodename has changed. However, rabbitmq needs to be stopped
before `rabbitmq-env.conf` is changed; we use an `onlyif` on an `exec`
to print the warning about the node change, and let the subsequent
config change and notify of the service and configure-rabbitmq to
complete the re-configuration.
The Erlang `epmd` daemon listens on port 4369, and provides
information (without authentication) about which Erlang processes are
listening on what ports. This information is not itself a
vulnerability, but may provide information for remote attackers about
what local Erlang services (such as `rabbitmq-server`) are running,
and where.
`epmd` supports an `ERL_EPMD_ADDRESS` environment variable to limit
which interfaces it binds on. While this environment variable is set
in `/etc/default/rabbitmq-server`, Zulip unfortunately attempts to
start `epmd` using an explicit `exec` block, which ignores those
settings.
Regardless, this lack of `ERL_EPMD_ADDRESS` variable only controls
`epmd`'s startup upon first installation. Upon reboot, there are two
ways in which `epmd` might be started, neither of which respect
`ERL_EPMD_ADDRESS`:
- On Focal, an `epmd` service exists and is activated, which uses
systemd's configuration to choose which interfaces to bind on, and
thus `ERL_EPMD_ADDRESS` is irrelevant.
- On Bionic (and Focal, due to a broken dependency from
`rabbitmq-server` to `epmd@` instead of `epmd`, which may lead to
the explicit `epmd` service losing a race), `epmd` is started by
`rabbitmq-server` when it does not detect a running instance.
Unfortunately, only `/etc/init.d/rabbitmq-server` would respects
`/etc/default/rabbitmq-server` -- and it defers the actual startup
to using systemd, which does not pass the environment variable
down. Thus, `ERL_EPMD_ADDRESS` is also irrelevant here.
We unfortunately cannot limit `epmd` to only listening on localhost,
due to a number of overlapping bugs and limitations:
- Manually starting `epmd` with `-address 127.0.0.1` silently fails
to start on hosts with IPv6 disabled, due to an Erlang bug ([1],
[2]).
- The dependencies of the systemd `rabbitmq-server` service can be
fixed to include the `epmd` service, and systemd can be made to
bind to `127.0.0.1:4369` and pass that socket to `epmd`, bypassing
the above bug. However, the startup of this service is not
guaranteed, because it races with other sources of `epmd` (see
below).
- Any process that runs `rabbitmqctl` results in `epmd` being started
if one is not currently running; these instances do not respect any
environment variables as to which addresses to bind on. This is
also triggered by `service rabbitmq-server status`, as well as
various Zulip cron jobs which inspect the rabbitmq queues. As
such, it is difficult-to-impossible to ensure that some other
`epmd` process will not win the race and open the port on all
interfaces.
Since the only known exposure from leaving port 4369 open is
information that rabbitmq is running on the host, and the complexity
of adjusting this to only bind on localhost is high, we remove the
setting which does not address the problem, and document that the port
is left open, and should be protected via system-level or
network-level firewalls.
[1]: https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109
[2]: https://github.com/erlang/otp/issues/4820
As a consequence:
• Bump minimum supported Python version to 3.7.
• Move Vagrant environment to Debian 10, which has Python 3.7.
• Move CI frontend tests to Debian 10.
• Move production build test to Debian 10.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
With tweaks to security-model.md by tabbott to expand the SSO acronym.
Ignored, but still needs discussion on whether we should exclude this
rule:
```
The word ‘install’ is not a noun.
✗ ...ble to connect to the client during the install process: So you'll need to shut down a...
^^^^^^^
✓ ...ble to connect to the client during the installation process: So you'll need to shut down a...
A_INSTALL: a/the + install
The word ‘install’ is not a noun.
✗ ...detected at install time will cause the install to abort. If you already have PostgreSQ...
^^^^^^^
✓ ...detected at install time will cause the installation to abort. If you already have PostgreSQ...
A_INSTALL: a/the + install
```