This adds `--automated` and `--no-automated` flags to all Zulip
management commands, whose default is based on if STDIN is a TTY.
This enables cron jobs and supervisor commands to continue to report
to Sentry, and manually-run commands (when reporting to Sentry does
not provide value, since the user can see them) to not.
Note that this only applies to Zulip commands -- core Django
commands (e.g. `./manage.py`) do not grow support for `--automated`
and will always report exceptions to Sentry.
`manage.py` subcommands in the `upgrade` and `restart-server` paths
are marked as `--automated`, since those may be run semi-unattended,
and they are useful to log to Sentry.
The "invites" worker exists to do two things -- make a Confirmation
object, and send the outgoing email. Making the Confirmation object
in a background process from where the PreregistrationUser is created
temporarily leaves the PreregistrationUser in invalid state, and
results in 500's, and the user not immediately seeing the sent
invitation. That the "invites" worker also wants to create the
Confirmation object means that "resending" an invite invalidates the
URL in the previous email, which can be confusing to the user.
Moving the Confirmation creation to the same transaction solves both
of these issues, and leaves the "invites" worker with nothing to do
but send the email; as such, we remove it entirely, and use the
existing "email_senders" worker to send the invites. The volume of
invites is small enough that this will not affect other uses of that
worker.
Fixes: #21306Fixes: #24275
Testing for it in Python means that we have to worry about keeping the
`upgrade-zulip-stage-2` backwards-compatible with all versions of
Python which we could ever be upgrading from -- which is all of them.
Factor out the "supported operating systems" check, and share it
between upgrade and install codepaths.
`--no-init-db` is used to silence the need for `--hostname` and
`--email` arguments; it is a proxy for "this is not a frontend host."
We would ideally like to use `has_class` to know if the user's
provided puppet classes are include an `app_frontend`, and thus
`--hostname` and `--email` are required -- but doing that requires
several other steps, and we would like this feedback to be immediate.
We make the presence of `--puppet-classes` equivalent to
`--no-init-db`, since nearly every configuration with
`--puppet-classes` does not install both a database and a frontend,
which is what is required to initialize a database.
While this could be done previously by calling
`upgrade-zulip-from-git --remote-url /srv/zulip.git`, the explicit
argument makes this more straightforward, and avoids churning the
`refs/remotes/origin/` namespace.
Decouple the sending of client restart events from the restarting of
the servers. Restarts use the new Tornado restart-clients endpoint to
inject "restart" events into queues of clients which were loaded from
the previous Tornado process. The rate is controlled by the
`application_server.client_restart_rate`, in clients per minute, or a
flag to `restart-clients` which overrides it. Note that a web client
will also spread its restart over 5 minutes, so artificially-slow
client restarts are generally not very necessary.
Restarts of clients are deferred to until after post-deploy hooks are
run, such that the pre- and post- deploy hooks are around the actual
server restarts, even if pushing restart events to clients takes
significant time.
This flag was generally used not because we wanted to avoid
restarting Tornado, but because we wanted to avoid increasing load
the server when all of the clients were told to reload.
Since we have laid the groundwork for separately telling Tornado to
tell clients to restart, we remove the --skip-tornado flag; the next
commit will add the ability to skip client restarts.
This queue is used to things which definitionally may take longer than
a request, so paging after 60s is rather aggressive. This is
especially true because this queue has a very long tail of very slow
tasks -- p99 of task time in this queue is 8.5s, while p99.9 is 197s.
Raise the paging threshold to 15 minutes. While there are
semi-user-facing tasks which use this queue (primarily marking
messages as read), those being delayed for minutes is already a real
possibility if they are stuck behind a large realm export -- and this
is not a situation which should necessarily page, since it is not
solvable by the administrator.
Filling caches needs to happen close to when the server is restarted,
as the gap opens us up to race conditions with user modifications. If
there are migrations, however, it must happen within the critical
period after the migrations are applied.
Move the call to fill the caches to within the `shutdown_server`
function, so that we push it as close to the server shutdown as
possible.