On a system where ‘apt-get update’ has never been run, ‘apt-cache
policy’ may show no repositories at all. Try to correct this with
‘apt-get update’ before giving up.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The certbot package installs its own systemd timer (and cron job,
which disabled itself if systemd is enabled) which updates
certificates. This process races with the cron job which Zulip
installs -- the only difference being that Zulip respects the
`certbot.auto_renew` setting, and that it passes the deploy hook.
This means that occasionally nginx would not be reloaded, when the
systemd timer caught the expiration first.
Remove the custom cron job and `certbot-maybe-renew` script, and
reconfigure certbot to always reload nginx after deploying, using
certbot directory hooks.
Since `certbot.auto_renew` can't have an effect, remove the setting.
In turn, this removes the need for `--no-zulip-conf` to
`setup-certbot`. `--deploy-hook` is similarly removed, as running
deploy hooks to restart nginx is now the default; pass
`--no-directory-hooks` in standalone mode to not attempt to reload
nginx. The other property of `--deploy-hook`, of skipping symlinking
into place, is given its own flog.
We've had a number of unhappy reports of upgrades failing due to
webpack requiring too much memory. While the previous commit will
likely fix this issue for everyone, it's worth improving the error
message for failures here.
We avoid doing the stop+retry ourselves, because that could cause an
outage in a production system if webpack fails for another reason.
Fixes#20105.
Since the upgrade to Webpack 5, we've been seeing occasional reports
that servers with roughly 4GiB of RAM were getting OOM kills while
running webpack.
Since we can't readily optimize the memory requirements for webpack
itself, we should raise the RAM requirements for doing the
lower-downtime upgrade strategy.
Fixes#20231.
scripts.lib.node_cache expects Yarn to be in /srv/zulip-yarn, so if
it’s installed somewhere else, even if it’s the right version, we need
to reinstall it.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
It recently started failing on Debian 10 (buster). We immediately
follow this by replacing these packages with our own versions from
pip.txt, anyway.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The support for bullseye was added in #17951
but it was not documented as bullseye was
frozen and did not have proper configuration
files, hence wasn't documented.
Since now bullseye is released as a stable
version, it's support can be documented.
The usual output from this command looks like
Notice: Compiled catalog for localhost in environment production in 2.33 seconds
Notice: /Stage[main]/Zulip::Apt_repository/Exec[setup_apt_repo]/returns: current_value 'notrun', should be ['0'] (noop)
Notice: Class[Zulip::Apt_repository]: Would have triggered 'refresh' from 1 event
Notice: Stage[main]: Would have triggered 'refresh' from 1 event
Notice: Applied catalog in 1.20 seconds
which doesn’t seem abnormally alarming, and hiding it makes failures
harder to diagnose.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
These changes are all independent of each other; I just didn’t feel
like making dozens of commits for them.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Nonexistent processes and groups passed to `supervisortctl status` are
printed to STDOUT as follows:
```
$ supervisorctl status zulip-django nonexistent-process nonexistent-group:*
nonexistent-process: ERROR (no such process)
nonexistent-group: ERROR (no such group)
zulip-django RUNNING pid 16043, uptime 17:31:31
```
On supervisor 4 and above, this exits with an exit code of 4;
previously, it returned exit code 0. Ubuntu 18.04 has version 3.3.1,
and Ubuntu 20.04 has version 4.1.0.
Skip any lines with `ERROR (no such ...)`, and accept exit code 4 from
`supervisorctl status`.
This parameter is somewhat useful, and adding this also fixes a
regression where purge-old-deployments would crash since the changes
around c5580607a7 because of
inconsistent supported args lists.
Fixes#16659.
If the server is behind a reverse proxy with http_only=True, the
requests made by email-mirror-postfix need to use http, as https
doesn't work.
Staging and other hosts that are `zulip::app_frontend_base` but not
`zulip::app_frontend_once` do not have a
/etc/supervisor/conf.d/zulip/zulip-once.conf and as such do not have
`zulip_deliver_scheduled_emails` or `zulip_deliver_scheduled_messages`
and thus supervisor will fail to reload.
Making the contents of `zulip-workers` contingent on if the server is
_also_ a `-once` server is complicated, and would involve using Concat
fragments, which severely limit readability.
Instead, expel those two from `zulip-workers`; this is somewhat
reasonable, since they are use an entirely different codepath from
zulip_events_*, using the database rather than RabbitMQ for their
queuing.
This commit will allow us to pass the arguments in the
'clean...' functions when calling the `main` function (in
`provision`). It also changes args parsing
function location to `if __name__ == "__main__"` block as
we wouldn't need it to parse args when we call the
function.
We convert the `clean-unused-caches` script to a
python file so we can run it in provision by importing it
instead of running the script, hence saving some time.