While this could be done previously by calling
`upgrade-zulip-from-git --remote-url /srv/zulip.git`, the explicit
argument makes this more straightforward, and avoids churning the
`refs/remotes/origin/` namespace.
Decouple the sending of client restart events from the restarting of
the servers. Restarts use the new Tornado restart-clients endpoint to
inject "restart" events into queues of clients which were loaded from
the previous Tornado process. The rate is controlled by the
`application_server.client_restart_rate`, in clients per minute, or a
flag to `restart-clients` which overrides it. Note that a web client
will also spread its restart over 5 minutes, so artificially-slow
client restarts are generally not very necessary.
Restarts of clients are deferred to until after post-deploy hooks are
run, such that the pre- and post- deploy hooks are around the actual
server restarts, even if pushing restart events to clients takes
significant time.
This flag was generally used not because we wanted to avoid
restarting Tornado, but because we wanted to avoid increasing load
the server when all of the clients were told to reload.
Since we have laid the groundwork for separately telling Tornado to
tell clients to restart, we remove the --skip-tornado flag; the next
commit will add the ability to skip client restarts.
This queue is used to things which definitionally may take longer than
a request, so paging after 60s is rather aggressive. This is
especially true because this queue has a very long tail of very slow
tasks -- p99 of task time in this queue is 8.5s, while p99.9 is 197s.
Raise the paging threshold to 15 minutes. While there are
semi-user-facing tasks which use this queue (primarily marking
messages as read), those being delayed for minutes is already a real
possibility if they are stuck behind a large realm export -- and this
is not a situation which should necessarily page, since it is not
solvable by the administrator.
Filling caches needs to happen close to when the server is restarted,
as the gap opens us up to race conditions with user modifications. If
there are migrations, however, it must happen within the critical
period after the migrations are applied.
Move the call to fill the caches to within the `shutdown_server`
function, so that we push it as close to the server shutdown as
possible.
Tweaked provision script to run successfully in Fedora 38 and
included a script to build the groonga libs from source because
the packages in Fedora repos are outdated.
There is a major version jump from the last supported version (F34)
which is EOL so references and support for older versions were
removed.
Fixes: #20635
nginx sets the value of the `$http_host` variable to the empty string
when using http/3, as there is technically no `Host:` header sent:
https://github.com/nginx-quic/nginx-quic/issues/3
Users with a browser that support http/3 will send their first request
to nginx with http/2, and get an expected HTTP 200 -- but any
subsequent requests will fail with am HTTP 400, since the browser will
have upgraded to http/3, which has an empty `Host` header, which Zulip
rejects.
Switch to the `$host` variable, which works for all HTTP versions.
Co-authored-by: Alex Vandiver <alexmv@zulip.com>
Restore the default django.utils.log.AdminEmailHandler when
ERROR_REPORTING is enabled. Those with more sophisticated needs can
turn it off and use Sentry or a Sentry-compatible system.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The claim in the comment from c8ec3dfcf6, that we can and should use
the current deploy's venv, misses one key case -- when upgrading the
operating system, the current deploy's venv is unworkable, since it
was configured for a previous version of Python. As such, any attempt
to load Django to verify the version of PostgreSQL it is talking to
must happen after the venv is configured.
Move the database version check into
`scripts/lib/check-database-compatibility`, which also moves it after
the new venv is configured.
Because we no longer reliably know, at `apt-get upgrade` time, what
version of PostgreSQL is installed, we hold all versions of the
pgroonga packages.
This ensures that the next `upgrade-zulip-from-git` has access to the
commit history of the initial install, if it was from a forked
repository. `/home/zulip/deployments/current` and `/srv/zulip.git`
are not quite organized into the steady-state that they will have
after one `upgrade-zulip-from-git`:
- `/home/zulip/deployments/current` is its own clone, not a worktree
- `/srv/zulip.git` has an origin of `/home/zulip/deployments/current`
- `remote.origin.mirror` is set on `/srv/zulip.git`
- `remote.origin.fetch` is `+refs/*:refs/*`
All but the first are automatically cleaned up by
`upgrade-zulip-from-git` when it is next run, using the code added in
30457ecd02. The additional complexity of making an existing
independent clone into a worktree seem not worth solving the first
point.
Updating the pgroonga package is not sufficient to upgrade the
extension in PostgreSQL -- an `ALTER EXTENSION pgroonga UPDATE` must
explicitly be run[^1]. Failure to do so can lead to unexpected behavior,
including crashes of PostgreSQL.
Expand on the existing `pgroonga_setup.sql.applied` file, to track
which version of the PostgreSQL extension has been configured. If the
file exists but is empty, we run `ALTER EXTENSION pgroonga UPDATE`
regardless -- if it is a no-op, it still succeeds with a `NOTICE`:
```
zulip=# ALTER EXTENSION pgroonga UPDATE;
NOTICE: version "3.0.8" of extension "pgroonga" is already installed
ALTER EXTENSION
```
The simple `ALTER EXTENSION` is sufficient for the
backwards-compatible case[^1] -- which, for our usage, is every
upgrade since 0.9 -> 1.0. Since version 1.0 was released in 2015,
before pgroonga support was added to Zulip in 2016, we can assume for
the moment that all pgroonga upgrades are backwards-compatible, and
not bother regenerating indexes.
Fixes: #25989.
[^1]: https://pgroonga.github.io/upgrade/
If `zulip-puppet-apply` is run during an upgrade, it will immediately
try to re-`stop-server` before running migrations; if the last step in
the puppet application was to restart `supervisor`, it may not be
listening on its UNIX socket yet. In such cases, `socket.connect()`
throws a `FileNotFoundError`:
```
Traceback (most recent call last):
File "./scripts/stop-server", line 53, in <module>
services = list_supervisor_processes(services, only_running=True)
File "./scripts/lib/supervisor.py", line 34, in list_supervisor_processes
processes = rpc().supervisor.getAllProcessInfo()
File "/usr/lib/python3.9/xmlrpc/client.py", line 1116, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python3.9/xmlrpc/client.py", line 1456, in __request
response = self.__transport.request(
File "/usr/lib/python3.9/xmlrpc/client.py", line 1160, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib/python3.9/xmlrpc/client.py", line 1172, in single_request
http_conn = self.send_request(host, handler, request_body, verbose)
File "/usr/lib/python3.9/xmlrpc/client.py", line 1285, in send_request
self.send_content(connection, request_body)
File "/usr/lib/python3.9/xmlrpc/client.py", line 1315, in send_content
connection.endheaders(request_body)
File "/usr/lib/python3.9/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.9/http/client.py", line 1010, in _send_output
self.send(msg)
File "/usr/lib/python3.9/http/client.py", line 950, in send
self.connect()
File "./scripts/lib/supervisor.py", line 10, in connect
self.sock.connect(self.host)
FileNotFoundError: [Errno 2] No such file or directory
```
Catch the `FileNotFoundError` and retry twice more, with backoff. If
it fails repeatedly, point to `service supervisor status` for further
debugging, as `FileNotFoundError` is rather misleading -- the file
exists, it simply is not accepting connections.
Instead of copying over a mostly-unchanged `postgresql.conf`, we
transition to deploying a `conf.d/zulip.conf` which contains the
only material changes we made to the file, which were previously
appended to the end.
While shipping separate while `postgresql.conf` files for each
supported version is useful if there is large variety in supported
options between versions, there is not no such variation at current,
and the burden of overriding the entire default configuration is that
it must be keep up to date wit the package's version.
Installs which are upgrading to current `main`, and are upgrading for
the very first time from an install which was originally from git,
have a `/home/zulip/deployments/current` which, unlike all later
upgrades, is not a `git worktree` of `/srv/zulip.git`, but rather a
direct `git clone` of some arbitrary URL. As such, it does not have
an `upstream` remote, nor a cached `zulip-git-version` file.
This makes later attempts to determine the pre-upgrade revision of
git (for pre-deploy hooks) fail, as without a `zulip-git-version`
file, `ZULIP_VERSION` is insufficiently-specific (e.g. `6.1+git`), and
there is no guarantee the necessary tags exist either.
While we can make fresh git installs set up an `upstream` and run
`./tools/cache-zulip-git-version` going forward (see subsequent
commit), that does not address the issue for deploys which already
exist. For those, we must configure and fetch a `remote` in the old
checkout, followed by re-generating a cached `zulip-git-version`.
Fixes: #25076.
On Docker for Mac with the gRPC FUSE or VirtioFS file sharing
implementations, we nondeterministically get errors like this from
pnpm install:
pnpm: ENOENT: no such file or directory, copyfile '/srv/zulip/.pnpm-store/v3/files/7d/6b44bb658625281b48194e5a3d3a07452bea1f256506dd16f7a21941ef3f0d259e1bcd0cc6202642bf1fd129bc187e6a3921d382d568d312bd83f3023979a0' -> '/srv/zulip/node_modules/.pnpm/regexpu-core@5.3.2/node_modules/_tmp_3227_7f867a9c510832f5f82601784e21e7be/LICENSE-MIT.txt'
Subcommand of ./lib/provision.py failed with exit status 1: /usr/local/bin/pnpm install --frozen-lockfile
Actual error output for the subcommand is just above this.
Work around this using --package-import-method=copy.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Ever since we started bundling the app with webpack, there’s been less
and less overlap between our ‘static’ directory (files belonging to
the frontend app) and Django’s interpretation of the ‘static’
directory (files served directly to the web).
Split the app out to its own ‘web’ directory outside of ‘static’, and
remove all the custom collectstatic --ignore rules. This makes it
much clearer what’s actually being served to the web, and what’s being
bundled by webpack. It also shrinks the release tarball by 3%.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
These hooks are run immediately around the critical section of the
upgrade. If the upgrade fails for preparatory reasons, the pre-deploy
hook may not be run; if it fails during the upgrade, the post-deploy
hook will not be run. Hooks are called from the CWD of the new
deploy, with arguments of the old version and the new version. If
they exit with non-0 exit code, the deploy aborts.
Corepack manages multiple per-project version of Yarn and PNPM, which
means we have to maintain less installation code, and could help us
switch away from Yarn 1 without making the system unusable for
development of other Yarn 1 projects.
https://nodejs.org/api/corepack.html
The Unicode spaces in the timerender test resulted from an ICU
upgrade: https://github.com/nodejs/node/pull/45068.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This was last really used in d7a3570c7e, in 2013, when it was
`/home/humbug/logs`.
Repoint the one obscure piece of tooling that writes there, and remove
the places that created it.
`check_version` in `install-yarn` had the rather careful check that
the yarn it installed into `/usr/bin/yarn` was the yarn which was
first in the user's `$PATH`. This caused problems when the user had a
pre-existing `/usr/local/bin/yarn`; however, those problems are
limited to the `install-yarn` script itself, since the nearly all
calls to yarn from Zulip's code already hardcode the `/srv/zulip-yarn`
location, and do not depend on what is in `$PATH`.
Remove the checks in `install-yarn` that depend on the local `$PATH`,
and stop installing our `yarn` into it. We also adjust the two
callsites which did not specify the full path to `yarn`, so use
`/srv/zulip-yarn`.
Fixes: #23993
Co-authored-by: Alex Vandiver <alexmv@zulip.com>
‘exit’ is pulled in for the interactive interpreter as a side effect
of the site module; this can be disabled with python -S and shouldn’t
be relied on.
Also, use the NoReturn type where appropriate.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Starting with wal-g 2.0.1, they provide `aarch64` assets[^1].
Effectively revert d7b59c86ce, and use
the pre-built binary for `aarch64` rather than spend a bunch of space
and time having to build it from source.
[^1]: https://github.com/wal-g/wal-g/releases/tag/v2.0.1
If there is a syntax error in `settings.py`, `restart-server` should
provide a reasonable message about this. It did so prior to
af08bcdb3f, becausde any invocation `./manage.py` without
`--skip-checks` will verify `settings.py`, among several other checks.
After af08bcdb3f, there are no `./manage.py` calls in most restarts,
which fa77be6e6c took further.
Add an explicit `./manage.py check` in the default case.
upgrade-zulip-stage-2 overrides this by passing `--skip-checks`, for
performance. This also means that `upgrade-zulip-from-git` itself
picks up the same `--skip-checks` flag, since it inherits the same
flag parsing, though that is perhaps of dubious utility.
Although Node.js 18 is not the active LTS release for another 3 weeks,
the Node.js 16 end-of-life date was moved forward to September 2023,
(https://nodejs.org/en/blog/announcements/nodejs16-eol/), so it seems
prudent to switch now.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Fixes “E713 Test for membership should be `not in`” found by
ruff (https://github.com/charliermarsh/ruff).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
One should now be able to configure a regex by appending _regex to the
port number:
[tornado_sharding]
9802_regex = ^[l-p].*\.zulipchat\.com$
Signed-off-by: Anders Kaseorg <anders@zulip.com>
https://nginx.org/en/docs/http/ngx_http_map_module.html
Since Puppet doesn’t manage the contents of nginx_sharding.conf after
its initial creation, it needs to be renamed so we can give it
different default contents.
Signed-off-by: Anders Kaseorg <anders@zulip.com>