This queue is used to things which definitionally may take longer than
a request, so paging after 60s is rather aggressive. This is
especially true because this queue has a very long tail of very slow
tasks -- p99 of task time in this queue is 8.5s, while p99.9 is 197s.
Raise the paging threshold to 15 minutes. While there are
semi-user-facing tasks which use this queue (primarily marking
messages as read), those being delayed for minutes is already a real
possibility if they are stuck behind a large realm export -- and this
is not a situation which should necessarily page, since it is not
solvable by the administrator.
Filling caches needs to happen close to when the server is restarted,
as the gap opens us up to race conditions with user modifications. If
there are migrations, however, it must happen within the critical
period after the migrations are applied.
Move the call to fill the caches to within the `shutdown_server`
function, so that we push it as close to the server shutdown as
possible.
This can happen if `machine.pgroonga` is set during initial
installation. We cannot run `CREATE EXTENSION PGROONGA` because the
database that we need to run that statement in does not exist yet;
make the command a silent no-op that does not create the
`pgroonga_setup.sql.applied` flag file, such that a later
`zulip-puppet-apply` once the database exists can pick up and install
the extension.
Tweaked provision script to run successfully in Fedora 38 and
included a script to build the groonga libs from source because
the packages in Fedora repos are outdated.
There is a major version jump from the last supported version (F34)
which is EOL so references and support for older versions were
removed.
Fixes: #20635
nginx sets the value of the `$http_host` variable to the empty string
when using http/3, as there is technically no `Host:` header sent:
https://github.com/nginx-quic/nginx-quic/issues/3
Users with a browser that support http/3 will send their first request
to nginx with http/2, and get an expected HTTP 200 -- but any
subsequent requests will fail with am HTTP 400, since the browser will
have upgraded to http/3, which has an empty `Host` header, which Zulip
rejects.
Switch to the `$host` variable, which works for all HTTP versions.
Co-authored-by: Alex Vandiver <alexmv@zulip.com>
Restore the default django.utils.log.AdminEmailHandler when
ERROR_REPORTING is enabled. Those with more sophisticated needs can
turn it off and use Sentry or a Sentry-compatible system.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The claim in the comment from c8ec3dfcf6, that we can and should use
the current deploy's venv, misses one key case -- when upgrading the
operating system, the current deploy's venv is unworkable, since it
was configured for a previous version of Python. As such, any attempt
to load Django to verify the version of PostgreSQL it is talking to
must happen after the venv is configured.
Move the database version check into
`scripts/lib/check-database-compatibility`, which also moves it after
the new venv is configured.
Because we no longer reliably know, at `apt-get upgrade` time, what
version of PostgreSQL is installed, we hold all versions of the
pgroonga packages.
This ensures that the next `upgrade-zulip-from-git` has access to the
commit history of the initial install, if it was from a forked
repository. `/home/zulip/deployments/current` and `/srv/zulip.git`
are not quite organized into the steady-state that they will have
after one `upgrade-zulip-from-git`:
- `/home/zulip/deployments/current` is its own clone, not a worktree
- `/srv/zulip.git` has an origin of `/home/zulip/deployments/current`
- `remote.origin.mirror` is set on `/srv/zulip.git`
- `remote.origin.fetch` is `+refs/*:refs/*`
All but the first are automatically cleaned up by
`upgrade-zulip-from-git` when it is next run, using the code added in
30457ecd02. The additional complexity of making an existing
independent clone into a worktree seem not worth solving the first
point.
Updating the pgroonga package is not sufficient to upgrade the
extension in PostgreSQL -- an `ALTER EXTENSION pgroonga UPDATE` must
explicitly be run[^1]. Failure to do so can lead to unexpected behavior,
including crashes of PostgreSQL.
Expand on the existing `pgroonga_setup.sql.applied` file, to track
which version of the PostgreSQL extension has been configured. If the
file exists but is empty, we run `ALTER EXTENSION pgroonga UPDATE`
regardless -- if it is a no-op, it still succeeds with a `NOTICE`:
```
zulip=# ALTER EXTENSION pgroonga UPDATE;
NOTICE: version "3.0.8" of extension "pgroonga" is already installed
ALTER EXTENSION
```
The simple `ALTER EXTENSION` is sufficient for the
backwards-compatible case[^1] -- which, for our usage, is every
upgrade since 0.9 -> 1.0. Since version 1.0 was released in 2015,
before pgroonga support was added to Zulip in 2016, we can assume for
the moment that all pgroonga upgrades are backwards-compatible, and
not bother regenerating indexes.
Fixes: #25989.
[^1]: https://pgroonga.github.io/upgrade/
This was only necessary for PGroonga 1.x, and the `pgroonga` schema
will most likely be removed at some point inthe future, which will
make this statement error out.
Drop the unnecessary statement.
If the `postgresql.version` in `/etc/zulip/zulip.conf` is out of date
or wrong, upgrading to the actual current version would drop your
production database without prompting. While we do document taking a
Zulip backup (which includes a database backup) before running
`upgrade-postgresql`[^1], not everyone does so, with possibly
catastrophic consequences.
Do a true end-to-end check of the version in `/etc/zulip/zulip.conf`
by asking Django to query the database for its version, checking that
against the configured value, and aborting if there is any
disagreement.
[^1]: https://zulip.readthedocs.io/en/latest/production/upgrade.html#upgrading-postgresql
If `zulip-puppet-apply` is run during an upgrade, it will immediately
try to re-`stop-server` before running migrations; if the last step in
the puppet application was to restart `supervisor`, it may not be
listening on its UNIX socket yet. In such cases, `socket.connect()`
throws a `FileNotFoundError`:
```
Traceback (most recent call last):
File "./scripts/stop-server", line 53, in <module>
services = list_supervisor_processes(services, only_running=True)
File "./scripts/lib/supervisor.py", line 34, in list_supervisor_processes
processes = rpc().supervisor.getAllProcessInfo()
File "/usr/lib/python3.9/xmlrpc/client.py", line 1116, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python3.9/xmlrpc/client.py", line 1456, in __request
response = self.__transport.request(
File "/usr/lib/python3.9/xmlrpc/client.py", line 1160, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib/python3.9/xmlrpc/client.py", line 1172, in single_request
http_conn = self.send_request(host, handler, request_body, verbose)
File "/usr/lib/python3.9/xmlrpc/client.py", line 1285, in send_request
self.send_content(connection, request_body)
File "/usr/lib/python3.9/xmlrpc/client.py", line 1315, in send_content
connection.endheaders(request_body)
File "/usr/lib/python3.9/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.9/http/client.py", line 1010, in _send_output
self.send(msg)
File "/usr/lib/python3.9/http/client.py", line 950, in send
self.connect()
File "./scripts/lib/supervisor.py", line 10, in connect
self.sock.connect(self.host)
FileNotFoundError: [Errno 2] No such file or directory
```
Catch the `FileNotFoundError` and retry twice more, with backoff. If
it fails repeatedly, point to `service supervisor status` for further
debugging, as `FileNotFoundError` is rather misleading -- the file
exists, it simply is not accepting connections.