Commit Graph

1135 Commits

Author SHA1 Message Date
Anders Kaseorg 1cc1de82cd reindex-textual-data: Reindex textual functional indexes too.
This catches nine functional indexes that the previous query didn’t:

upper_preregistration_email_idx
upper_stream_name_idx
upper_subject_idx
upper_userprofile_email_idx
zerver_message_recipient_upper_subject
zerver_mutedtopic_stream_topic
zerver_stream_realm_id_name_uniq
zerver_userprofile_realm_id_delivery_email_uniq
zerver_userprofile_realm_id_email_uniq

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-07 10:37:04 -08:00
Alex Vandiver 6218ed91c2 puppet: Use lazy-apps and uwsgi control sockets for rolling reloads.
Restarting the uwsgi processes by way of supervisor opens a window
during which nginx 502's all responses.  uwsgi has a configuration
called "chain reloading" which allows for rolling restart of the uwsgi
processes, such that only one process at once in unavailable; see
uwsgi documentation ([1]).

The tradeoff is that this requires that the uwsgi processes load the
libraries after forking, rather than before ("lazy apps"); in theory
this can lead to larger memory footprints, since they are not shared.
In practice, as Django defers much of the loading, this is not as much
of an issue.  In a very basic test of memory consumption (measured by
total memory - free - caches - buffers; 6 uwsgi workers), both
immediately after restarting Django, and after requesting `/` 60 times
with 6 concurrent requests:

                      |  Non-lazy  |  Lazy app  | Difference
    ------------------+------------+------------+-------------
    Fresh             |  2,827,216 |  2,870,480 |   +43,264
    After 60 requests |  3,332,284 |  3,409,608 |   +77,324
    ..................|............|............|.............
    Difference        |   +505,068 |   +539,128 |   +34,060

That is, "lazy app" loading increased the footprint pre-requests by
43MB, and after 60 requests grew the memory footprint by 539MB, as
opposed to non-lazy loading, which grew it by 505MB.  Using wsgi "lazy
app" loading does increase the memory footprint, but not by a large
percentage.

The other effect is that processes may be served by either old or new
code during the restart window.  This may cause transient failures
when new frontend code talks to old backend code.

Enable chain-reloading during graceful, puppetless restarts, but only
if enabled via a zulip.conf configuration flag.

Fixes #2559.

[1]: https://uwsgi-docs.readthedocs.io/en/latest/articles/TheArtOfGracefulReloading.html#chain-reloading-lazy-apps
2022-01-05 14:48:52 -08:00
Alex Vandiver 4aaa250623 zulip_tools: Fix a typo in a comment. 2022-01-05 14:48:52 -08:00
Alex Vandiver 9d85f64e5a upgrade-zulip-stage-2: Pass through --skip-tornado and --less-graceful.
These restart-server arguments are useful to be able to provide to
`upgrade-zulip`.
2021-12-31 11:17:14 -08:00
Alex Vandiver fb3368b482 restart-server: Factor out argparser, to allow reuse. 2021-12-31 11:17:14 -08:00
Alex Vandiver 93f3da4c05 upgrade-from-git: Pass unknown options through to the upgrade process. 2021-12-31 11:17:14 -08:00
Anders Kaseorg 82748d45d8 install-yarn: Use test -ef in case /srv is a symlink.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-30 13:42:07 -08:00
Anders Kaseorg 0b454dda12 install: Try apt-get update if the Ubuntu universe check fails.
On a system where ‘apt-get update’ has never been run, ‘apt-cache
policy’ may show no repositories at all.  Try to correct this with
‘apt-get update’ before giving up.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-16 17:56:23 -08:00
Alex Vandiver f6520a97cd setup-certbot: Reinstate nginx reload after installation.
If nginx was already installed, and we're using the webroot method of
initializing certbot, nginx needs to be reloaded.  Hooks in
`/etc/letsencrypt/renewal-hooks/deploy/` do not run during initial
`certbot certonly`, so an explicit reload is required.
2021-12-10 16:43:53 -08:00
Alex Vandiver 01e8f752a8 puppet: Use certbot package timer, not our own cron job.
The certbot package installs its own systemd timer (and cron job,
which disabled itself if systemd is enabled) which updates
certificates.  This process races with the cron job which Zulip
installs -- the only difference being that Zulip respects the
`certbot.auto_renew` setting, and that it passes the deploy hook.
This means that occasionally nginx would not be reloaded, when the
systemd timer caught the expiration first.

Remove the custom cron job and `certbot-maybe-renew` script, and
reconfigure certbot to always reload nginx after deploying, using
certbot directory hooks.

Since `certbot.auto_renew` can't have an effect, remove the setting.
In turn, this removes the need for `--no-zulip-conf` to
`setup-certbot`.  `--deploy-hook` is similarly removed, as running
deploy hooks to restart nginx is now the default; pass
`--no-directory-hooks` in standalone mode to not attempt to reload
nginx.  The other property of `--deploy-hook`, of skipping symlinking
into place, is given its own flog.
2021-12-09 13:47:33 -08:00
Tim Abbott 9aa2e0ad45 upgrade-zulip-from-git: Improve webpack failure error handling.
We've had a number of unhappy reports of upgrades failing due to
webpack requiring too much memory.  While the previous commit will
likely fix this issue for everyone, it's worth improving the error
message for failures here.

We avoid doing the stop+retry ourselves, because that could cause an
outage in a production system if webpack fails for another reason.

Fixes #20105.
2021-12-09 12:26:34 -08:00
Tim Abbott 72b381d749 upgrade-zulip-from-git: Require more memory to run webpack.
Since the upgrade to Webpack 5, we've been seeing occasional reports
that servers with roughly 4GiB of RAM were getting OOM kills while
running webpack.

Since we can't readily optimize the memory requirements for webpack
itself, we should raise the RAM requirements for doing the
lower-downtime upgrade strategy.

Fixes #20231.
2021-12-09 12:23:25 -08:00
Alex Vandiver 939d2e2705 scripts: Only stop/start existing tornado processes.
Stopping both `zulip-tornado` and `zulip-tornado:*` causes errors on
deploys with tornado sharding, as the plain `zulip-tornado` service
does not exist.

Pass `zulip-tornado:*`, which matches both plain `zulip-tornado`, as
well as the sharded `zulip-tornado:zulip-tornado-port-9800` cases.
2021-12-08 14:06:06 -08:00
Tim Abbott 73d503995a scripts: Fix running compare-settings-to-template from any CWD.
This matches the number of dirname() calls for other files in its
directory.

Fixes #20489.
2021-12-07 14:45:53 -08:00
Anders Kaseorg 2e5af073b7 install-node: Upgrade Node.js from 16.13.0 to 16.13.1.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-03 14:33:53 -08:00
Anders Kaseorg 2e1a8ff632 configure-rabbitmq: Increase startup timeout.
Starting RabbitMQ at boot seems to have gotten slower, which broke
‘vagrant up --provision’.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-03 14:32:23 -08:00
Alex Vandiver 3455fc137a upgrade-postgresql: Check for extension upgrade steps. 2021-11-20 07:13:50 -08:00
Alex Vandiver 544e8c569e install: Switch default to PostgreSQL 14. 2021-11-08 18:21:46 -08:00
Alex Vandiver f77bbd3323 upgrade-postgresql: Switch to vacuumdb --all --analzyze-only --jobs 10.
The `analyze_new_cluster.sh` script output by `pg_upgrade` just runs
`vacuumdb --all --analyze-in-stages`, which runs three passes over the
database, getting better stats each time.  Each of these passes is
independent; the third pass does not require the first two.
`--analyze-in-stages` is only provided to get "something" into the
database, on the theory that it could then be started and used.  Since
we wait for all three passes to complete before starting the database,
the first two passes add no value.

Additionally, PosttgreSQL 14 and up stop writing the
`analyze_new_cluster.sh` script as part of `pg_upgrade`, suggesting
the equivalent `vacuumdb --all --analyze-in-stages` call instead.

Switch to explicitly call `vacuumdb --all --analyze-only`, since we do
not gain any benefit from `--analyze-in-stages`.  We also enable
parallelism, with `--jobs 10`, in order to analyze up to 10 tables in
parallel.  This may increase load, but will accelerate the upgrade
process.
2021-11-08 18:21:46 -08:00
Anders Kaseorg f2a443a736 install-node: Upgrade Node.js from 14.18.1 to 16.13.0.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-11-05 17:34:13 -07:00
Anders Kaseorg 458844a2f5 install-yarn: Verify that the install location is /srv/zulip-yarn.
scripts.lib.node_cache expects Yarn to be in /srv/zulip-yarn, so if
it’s installed somewhere else, even if it’s the right version, we need
to reinstall it.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-11-03 16:49:58 -07:00
rht bb8504d925 lint: Fix typos found by codespell. 2021-10-19 16:51:13 -07:00
Anders Kaseorg 291087d70c install-yarn: Upgrade Yarn from 1.22.11 to 1.22.17.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-10-17 07:15:09 -07:00
Anders Kaseorg 7df96b78c6 install-node: Upgrade Node.js from 14.17.6 to 14.18.1.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-10-17 07:15:09 -07:00
Anders Kaseorg 2f993f1a79 install-node: Stop using NVM.
NVM doesn’t check hashes or signatures and really just adds
complexity we don’t need.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-09-24 06:58:32 -07:00
Anders Kaseorg 902883d818 setup_venv: Skip virtualenv’s automatic download of setuptools.
It recently started failing on Debian 10 (buster).  We immediately
follow this by replacing these packages with our own versions from
pip.txt, anyway.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-09-23 14:29:04 -07:00
Anders Kaseorg 08e459b393 zulip_tools: Convert "".format to Python 3.6 f-strings.
Generated automatically by pyupgrade.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-09-22 13:58:46 -07:00
Anders Kaseorg 9bed17e0ab install-node: Upgrade Node.js from 14.17.5 to 14.17.6.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-09-13 10:12:43 -07:00
Gaurav Pandey 502697d239 docs: Add documentation for bullseye support.
The support for bullseye was added in #17951
but it was not documented as bullseye was
frozen and did not have proper configuration
files, hence wasn't documented.

Since now bullseye is released as a stable
version, it's support can be documented.
2021-09-09 11:05:16 -07:00
Anders Kaseorg 915884bff7 docs: Apply bullet style changes from Prettier.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-09-08 12:06:24 -07:00
Anders Kaseorg 02582c6956 upgrade-zulip-from-git: Run git fetch with --prune.
This prevents upgrading to an obsolete version of a branch that has
been deleted or renamed.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-09-01 05:34:57 -07:00
Anders Kaseorg 3cb66d59ac install: Remove /dev/null redirect for zulip-puppet-apply.
The usual output from this command looks like

Notice: Compiled catalog for localhost in environment production in 2.33 seconds
Notice: /Stage[main]/Zulip::Apt_repository/Exec[setup_apt_repo]/returns: current_value 'notrun', should be ['0'] (noop)
Notice: Class[Zulip::Apt_repository]: Would have triggered 'refresh' from 1 event
Notice: Stage[main]: Would have triggered 'refresh' from 1 event
Notice: Applied catalog in 1.20 seconds

which doesn’t seem abnormally alarming, and hiding it makes failures
harder to diagnose.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-31 16:30:53 -07:00
Alex Vandiver faf71eea41 upgrade-postgresql: Do not remove other supervisor configs.
We previously used `zulip-puppet-apply` with a custom config file,
with an updated PostgreSQL version but more limited set of
`puppet_classes`, to pre-create the basic settings for the new cluster
before running `pg_upgradecluster`.

Unfortunately, the supervisor config uses `purge => true` to remove
all SUPERVISOR configuration files that are not included in the puppet
configuration; this leads to it removing all other supervisor
processes during the upgrade, only to add them back and start them
during the second `zulip-puppet-apply`.

It also leads to `process-fts-updates` not being started after the
upgrade completes; this is the one supervisor config file which was
not removed and re-added, and thus the one that is not re-started due
to having been re-added.  This was not detected in CI because CI added
a `start-server` command which was not in the upgrade documentation.

Set a custom facter fact that prevents the `purge` behaviour of the
supervisor configuration.  We want to preserve that behaviour in
general, and using `zulip-puppet-apply` continues to be the best way
to pre-set-up the PostgreSQL configuration -- but we wish to avoid
that behaviour when we know we are applying a subset of the puppet
classes.

Since supervisor configs are no longer removed and re-added, this
requires an explicit start-server step in the instructions after the
upgrades complete.  This brings the documentation into alignment with
what CI is testing.
2021-08-24 19:00:58 -07:00
Anders Kaseorg 7b2e585213 install-yarn: Upgrade Yarn from 1.22.10 to 1.22.11.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-23 12:33:27 -07:00
Anders Kaseorg ebb8e9109c install-node: Upgrade Node.js from 14.17.3 to 14.17.5.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-23 12:29:04 -07:00
Anders Kaseorg 4206e5f00b python: Remove locally dead code.
These changes are all independent of each other; I just didn’t feel
like making dozens of commits for them.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-19 01:51:37 -07:00
Alex Vandiver c9bb2c16cc restart-server: Add a --skip-tornado.
Tornado restarts are the most user-visible; provide a means to restart
everything but them, for changes which are known to not affect
Tornado.
2021-08-04 10:57:53 -07:00
Tim Abbott d439a2a53e emails: Create wider marketing email base template.
For our marketing emails, we want a width that's more appropriate for
newsletter context, vs. the narrow emails we use for transactional
content.

I haven't figured out a cleaner way to do this than duplicating most
of email_base_default.source.html. But it's not a big deal to
duplicate, since we've been changing that base template only about
once a year.
2021-08-03 11:57:31 -07:00
Anders Kaseorg 5483ebae37 python: Convert "".format to Python 3.6 f-strings.
Generated automatically by pyupgrade.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-02 15:53:52 -07:00
Anders Kaseorg ad5f0c05b5 python: Remove default "utf8" argument for encode(), decode().
Partially generated by pyupgrade.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-02 15:53:52 -07:00
Anders Kaseorg 1760897a8c python: Remove default "r" mode for open().
Generated automatically by pyupgrade.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-02 15:53:52 -07:00
Anders Kaseorg 3665deb93a python: Remove unnecessary intermediate lists.
Generated automatically by pyupgrade.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-02 15:53:52 -07:00
manavdesai27 572cef9a0f provision: Add support for Fedora 34. 2021-07-20 12:10:41 -07:00
Alex Vandiver 91282ab490 reindex-textual-data: Provide a tool to reindex all text indices.
The script is added to upgrade steps for 20.04 and Buster because
those are the upgrades that cross glibc 2.28, which is most
problematic.  It will also be called out in the upgrade notes, to
catch those that have already done that upgrade.
2021-07-19 16:34:23 -07:00
Anders Kaseorg 47897c76a2 scripts: Use curl -f (--fail).
This makes curl exit with nonzero status on HTTP 4xx/5xx errors.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-07-13 16:47:49 -07:00
Alex Vandiver 16691110a6 scripts: Only stop/restart zulip_deliver_scheduled_* processes if known.
Running `supervisorctl stop` or `supervisorctl restart` on a process
name which is not known is an error:

```
$ supervisorctl stop nonexistent-process
nonexistent-process: ERROR (no such process)
$ echo $?
1
```

ef6d0ec5ca moved
zulip_deliver_scheduled_* out of the `workers:` group.  Since upgrades
run `stop-server` before applying puppet, the list of processes at
that time is from the previous version of Zulip, so may not have the
new `zulip_deliver_scheduled_*` names -- and the `stop-server` will
hence fail.

If the upgrade is not applying puppet, it will `restart-server`. At
that point, the old names will still be in the configuration, so
relying on the current `superisorctl status` is the best gauge of what
exists to restart.

In short, only ever stop/start/restart the `zulip_deliver_scheduled_*`
processes if `supervisorctl status` knows about them already.
2021-07-09 10:04:53 -07:00
Alex Vandiver c94bdd8534 zulip_tools: Find missing processes/groups in list_supervisor_processes.
Nonexistent processes and groups passed to `supervisortctl status` are
printed to STDOUT as follows:

```
$ supervisorctl status zulip-django nonexistent-process nonexistent-group:*
nonexistent-process: ERROR (no such process)
nonexistent-group: ERROR (no such group)
zulip-django                     RUNNING   pid 16043, uptime 17:31:31
```

On supervisor 4 and above, this exits with an exit code of 4;
previously, it returned exit code 0.  Ubuntu 18.04 has version 3.3.1,
and Ubuntu 20.04 has version 4.1.0.

Skip any lines with `ERROR (no such ...)`, and accept exit code 4 from
`supervisorctl status`.
2021-07-09 10:04:53 -07:00
Alex Vandiver 85a9c0982a zulip_tools: Extract out `list_supervisor_processes`. 2021-07-09 10:04:53 -07:00
Anders Kaseorg d83c91526b install-node: Upgrade Node.js from 14.17.0 to 14.17.3.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-07-05 14:51:24 -07:00
Anders Kaseorg 684dad8145 tools: Use root-based absolute import for tools.lib, etc.
Mypy can’t follow absolute imports based on directories other than the
root.  This was hiding some type errors due to ignore_missing_imports.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-07-05 12:21:52 -07:00