The contents in the database are unchanged across the PostgreSQL
restart; as such, there is no reason to invalidate the caches.
This step was inherited from the general operating system upgrade
documentation. When Python versions change, such as during OS
upgrades, we must ensure that memcached is cleared. However, the
`do-release-upgrade` process uninstalled and upgraded to a new
memcached, as well as likely restarted the system; a separate step for
OS upgrades to restart memcached is thus unnecessary.
pg_upgradecluster has two possibilities for `--method`: `dump`, and
`upgrade`. The former is the default, and does a `pg_dump` of all of
the databases in the old cluster and feeds them into the new cluster.
This is a sure-fire way of getting the same information in both
databases, but may be extremely slow on large databases, and is
guaranteed to fail on servers whose databases take up >50% of their
disk.
The `--method=upgrade` method, by contrast, uses pg_upgrade to copy
the raw database data file over to the new cluster, and then fiddles
with their internal structure as needed by the upgrade to let them be
correct for the new version[1]. This is slightly faster than the
dump/load method, since it skips the serialization step, but still
requires that there be enough space on disk for both old and new
versions at once. `pg_upgrade` is currently supported for all
versions of PostgreSQL from 8.4 to 12.
Using `pg_upgrade` incurs slightly more risk, but since the it is
widely used by now, using it in the relatively-controlled Zulip server
environment is reasonable. The expected worst failure is failure to
upgrade, not corruption or data loss.
Additionally passing `--link` uses hardlinks to link the data files
into both the old and new directories simultaneously. This resolve
both the runtime of the operation, as well as the disk space usage.
The only potential downside to this is that as soon as writes have
occurred on the upgraded cluster, the old cluster can no longer be
started. Since this tooling intends to remove the old cluster
immediately after the upgrade completes successfully, this is not a
significant drawback.
Switch to using `--method=upgrade --link`. This technique spits out
two shell scripts which are expected to be run after completion of the
upgrade; one re-analyzes the statistics, the other does an `rm -rf` of
the data where it is still hardlinked in the old cluster. Extract the
location of these scripts from parsing the `pg_upgradecluster` output;
since the path is not static, we must rely on it being relatively easy
to parse. The risk of the path changing is lower, and has more
obvious failure modes, than inserting the current contents of these
upgrade steps into the overall `upgrade-postgres`.
[1] https://www.postgresql.org/docs/12/pgupgrade.html
Running `pg-upgradecluster` runs the `CREATE TEXT SEARCH DICTIONARY`
and `CREATE TEXT SEARCH CONFIGURATION` from
`zerver/migrations/0001_initial.py` on the new PostgreSQL cluster;
this requires that the stopwords file and dictionary exist _prior_
to `pg_upgradecluster` being run.
This causes a minor dependency conflict -- we do not wish to duplicate
the functionality from `zulip::postgres_appdb_base` which configures
those files, but installing all of `zulip::postgres_appdb_tuned` will
attempt to restart PostgreSQL -- which has not configured the cluster
for the new version yet.
In order to split out configuration of the prerequisites for the
application database, and the steps required to run it, we need to be
able to apply only part of the puppet configuration. Use the
newly-added `--config` argument to provide a more limited `zulip.conf`
which only applies `zulip::postgres_appdb_base` to the new version of
Postgres, creating the required tsearch data files.
This also preserves the property that a failure at any point prior to
the `pg_upgradecluster` is easily recoverable, by re-running
`zulip-puppet-apply`.
This is based on the existing steps in the documentation, with
additional changes now that the PostgreSQL version is stored in
`/etc/zulip/zulip.conf`.
This is a prep commit. Running terminate-psql-sessions command on
docker-zulip results in the script exiting with non-zero exit status
2. This is because the current session also gets terminated while
running terminate-psql-sessions command. To prevent that from happening
we don't terminate the session created by terminate-psql-sessions.
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
certbot-auto doesn’t work on Ubuntu 20.04, and won’t be updated; we
migrate to instead using the certbot package shipped with the OS
instead. Also made sure that sure certbot gets installed when running
zulip-puppet-apply, to handle existing systems.
We try to avoid importing Django settings unless
we really need them, since we want this program
to run very quickly during `provision` (when
secrets have already been generated earlier).
Generated by autopep8, with the setup.cfg configuration from #14532.
I’m not sure why pycodestyle didn’t already flag these.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Previously, the send_custom_email code path leaked files in paths that
were not `.gitignored`, under templates/zerver/emails.
This became problematic when we added automated tests for this code
path, as it meant we leaked these files every time `test-backend` ran.
Fix this by ensuring all the files we generate are in this special
subdirectory.
isort 5 knows not to reorder imports across function calls, so this
will stop isort from breaking our code.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
"Zulip Voyager" was a name invented during the Hack Week to open
source Zulip for what a single-system Zulip server might be called, as
a Star Trek pun on the code it was based on, "Zulip Enterprise".
At the time, we just needed a name quickly, but it was never a good
name, just a placeholder. This removes that placeholder name from
much of the codebase. A bit more work will be required to transition
the `zulip::voyager` Puppet class, as that has some migration work
involved.
At some point the PostgreSQL Docker image started creating the zulip
database for us, which caused our CREATE DATABASE to fail.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Our recent fixes to using the system's configured memcached settings
broke populate_db, because its hacky clear_database helper is called
with a hacked-up settings module.
We fix this by first moving this out-of-place code from models.py into
populate_db, and then saving the settings required to access memcached
so that we can use them in clear_database.
We also fix a mypy erorr in flush-memcached that matches the same
issue fixed in clear_database.
We'll be soon documenting a production workflow that involves using
it, and that means it needs to live under scripts/ (since tools/ isn't
present in release tarballs).
We no longer use tsearch_extras, and the camo patch is irrelevant on
systemd systems (Xenial and newer). So we no longer need to
provide/install a PPA at all.
Closes#13027.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Now that we're implemented tsearch_extras in pure postgres, we no
longer need a custom extension. This should help us considerably, as
it means we no longer need to ship custom apt packages at all.
Fixes#467.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
As a result of dropping support for trusty, we can remove our old
pattern of putting `if False` before importing the typing module,
which was essential for Python 3.4 support, but not required and maybe
harmful on newer versions.
cron_file_helper
check_rabbitmq_consumers
hash_reqs
check_zephyr_mirror
check_personal_zephyr_mirrors
check_cron_file
zulip_tools
check_postgres_replication_lag
api_test_helpers
purge-old-deployments
setup_venv
node_cache
clean_venv_cache
clean_node_cache
clean_emoji_cache
pg_backup_and_purge
restore-backup
generate_secrets
zulip-ec2-configure-interfaces
diagnose
check_user_zephyr_mirror_liveness
Previously, if you restored onto a different production system from
the one where you took the backup, backup restoration would fail
because the generated rabbitmq passwords for the two systems would be
different, and we didn't update the restored system to use the
password from the original system.
Fixes#12114.
This should ensure that we apply any special configuration for the
database system (e.g. installing `pgroonga`) before we try to restore
the database contents from the archive.
For pgroonga in particular, this is important so that we can preserve
the configuration of the extension in the `pg_restore` process.
Fixes#12345.
With the S3 file upload backend, we don't store uploads locally, so
the `uploads` directory in the backup will be empty, and more
importantly, LOCAL_UPLOADS_DIR will be None, which the previous code
crashed on.
We have been semi-accidentally relying on the fact that terminate-psql-sessions
fails silently when there are PIDs we don't have permission to terminate.
This actually happens somewhat often, generally when we're doing a series of
operations in quick succession by different users, because postgres processes
live a little longer than the `psql` shell that started them.
As part of adding ON_STOP_ERROR to all of our postgres commands, it makes
sense to enforce we don't fail here, but that means we need to actually filter
the target PIDs to only ones we can actually kill.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Also use psql -e (--echo-queries) in scripts that use ‘set -x’, so
errors can be traced to a specific query from the output.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Fixes permission errors when running restore-backup on a tarball
inaccessible to the zulip user.
Fixes#12125.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
There’s no reason to do this unless you’re, like, trying to trip the
Let’s Encrypt rate limits (or perhaps trying to manually test this code).
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
/bin/sh and /usr/bin/env are the only two binaries that NixOS provides
at a fixed path (outside a buildFHSUserEnv sandbox).
This discussion was split from #11004.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This library was absolutely essential as part of our Python 2->3
migration process, but all of its calls should be either no-ops or
encode/decode operations.
Note also that the library has been wrong since the incorrect
refactoring in 1f9244e060.
Fixes#10807.
This commit allows specifying Subject Alternative Names to issue certs
for multiple domains using certbot. The first name passed to certbot-auto
becomes the common name for the certificate; common name and the other
names are then added to the SAN field. All of these arguments are now
positional. Also read the following for the certbot syntax reference:
https://community.letsencrypt.org/t/how-to-specify-subject-name-on-san/Fixes#10674.
In scripts/setup/terminate-psql-sessions line 16:
major=$(echo "$version" | cut -d. -f1,2)
^-- SC2034: major appears unused. Verify use (or export if used externally).
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In scripts/setup/terminate-psql-sessions line 5:
[ "$1" = "`echo -e "$1\n$2" | sort -V | tail -n1`" ]
^-- SC2006: Use $(..) instead of legacy `..`.
^-- SC1117: Backslash is literal in "\n". Prefer explicit escaping: "\\n".
In scripts/setup/terminate-psql-sessions line 20:
major=$(echo $version | cut -d. -f1,2)
^-- SC2086: Double quote to prevent globbing and word splitting.
In scripts/setup/terminate-psql-sessions line 24:
tables=$(echo "'$@'" | sed "s/ /','/g")
^-- SC2145: Argument mixes string and array. Use * or separate argument.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In scripts/setup/setup-certbot line 64:
if [ -z "$DOMAIN" -o -z "$EMAIL" ]; then
^-- SC2166: Prefer [ p ] || [ q ] as [ p -o q ] is not well defined.
In scripts/setup/setup-certbot line 73:
method_args=(--webroot --webroot-path=/var/lib/zulip/certbot-webroot/)
^-- SC2191: The = here is literal. To assign by index, use ( [index]=value ) with no spaces. To keep as literal, quote it.
In scripts/setup/setup-certbot line 112:
if [ -z "$deploy_hook" ]; then
^-- SC2128: Expanding an array without an index only gives the first element.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In scripts/setup/postgres-init-db line 12:
records=`su "$POSTGRES_USER" -c "psql -Atc 'SELECT COUNT(*) FROM zulip.zerver_message;' zulip" | cat`
^-- SC2006: Use $(..) instead of legacy `..`.
In scripts/setup/postgres-init-db line 35:
source "$(dirname "$0")/terminate-psql-sessions" postgres zulip zulip_base
^-- SC1090: Can't follow non-constant source. Use a directive to specify location.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In scripts/setup/install line 18:
if [ $failed = 1 ]; then
^-- SC2086: Double quote to prevent globbing and word splitting.
In scripts/setup/install line 19:
echo -e "\033[0;31m"
^-- SC1117: Backslash is literal in "\0". Prefer explicit escaping: "\\0".
In scripts/setup/install line 25:
echo -e "\033[0m"
^-- SC1117: Backslash is literal in "\0". Prefer explicit escaping: "\\0".
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In scripts/setup/initialize-database line 38:
echo -e "\033[32mPopulating default database failed."
^-- SC1117: Backslash is literal in "\0". Prefer explicit escaping: "\\0".
In scripts/setup/initialize-database line 42:
echo -e "\033[0m"
^-- SC1117: Backslash is literal in "\0". Prefer explicit escaping: "\\0".
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In scripts/setup/generate-self-signed-cert line 36:
if [ -n "$EXISTS_OK" ] && [ -e "$KEYFILE" -a -e "$CERTFILE" ]; then
^-- SC2166: Prefer [ p ] && [ q ] as [ p -a q ] is not well defined.
In scripts/setup/generate-self-signed-cert line 40:
if [ -z "$FORCE" ] && [ -e "$KEYFILE" -o -e "$CERTFILE" ]; then
^-- SC2166: Prefer [ p ] || [ q ] as [ p -o q ] is not well defined.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In scripts/setup/configure-rabbitmq line 13:
sudo rabbitmqctl $RABBITMQ_FLAGS delete_user "$RABBITMQ_USERNAME" || true
^-- SC2086: Double quote to prevent globbing and word splitting.
In scripts/setup/configure-rabbitmq line 14:
sudo rabbitmqctl $RABBITMQ_FLAGS delete_user zulip || true
^-- SC2086: Double quote to prevent globbing and word splitting.
In scripts/setup/configure-rabbitmq line 15:
sudo rabbitmqctl $RABBITMQ_FLAGS delete_user guest || true
^-- SC2086: Double quote to prevent globbing and word splitting.
In scripts/setup/configure-rabbitmq line 16:
sudo rabbitmqctl $RABBITMQ_FLAGS add_user "$RABBITMQ_USERNAME" "$RABBITMQ_PASSWORD"
^-- SC2086: Double quote to prevent globbing and word splitting.
In scripts/setup/configure-rabbitmq line 17:
sudo rabbitmqctl $RABBITMQ_FLAGS set_user_tags "$RABBITMQ_USERNAME" administrator
^-- SC2086: Double quote to prevent globbing and word splitting.
In scripts/setup/configure-rabbitmq line 18:
sudo rabbitmqctl $RABBITMQ_FLAGS set_permissions -p / "$RABBITMQ_USERNAME" '.*' '.*' '.*'
^-- SC2086: Double quote to prevent globbing and word splitting.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
--agree-tos is useful for the Docker environment, where we won't have
an interactive shell present for agreeing to the ToS.
--deploy-hook is also useful for the Docker environment; it makes it
possible to customize what deploy hook (if any) we pass into the
underlying cerbot command.
This is multi-stage build which first builds tsearch-extras with the
current version of postgres and then configs postgres for zulip. The
zulip config installs the hunspell dictionaries, stop words file,
tsearch-extras, and creates the initial database.
**Testing Plan:**
1) `docker-compose up` the existing config.
2) Build the new image
3) Edit docker-compose.yml to use the new image id
4) `docker-compose up` and verify full text search is still working.
The zulip user has no need to see this file; it's used by nginx.
And when we set up the cert early in install, there's no zulip user
yet anyway, so this fails.
Thanks to the magic of `set -x`, I noticed this:
```
+ cat
++ ssl-cert
/tmp/src/zulip-server/scripts/setup/generate-self-signed-cert: line 49: ssl-cert: command not found
+ apt-get install -y openssl
[...]
```
In other words, we were trying to run `ssl-cert` -- the name of a
Debian package I meant to refer to in a comment inside the templated
temporary config file for `openssl req` -- as if it were a command.
It wasn't, hence the error.
Because `set -e` has loopholes like a sieve, this didn't cause the
script to exit, just produced this funny output and presumably caused
the config file's comment to be missing a word. In principle, it
could do something surprising if for some reason there were a command
named `ssl-cert` on PATH.
Fix it.
This gives us just one way of adopting a self-signed cert, rather than
one script which would generate a new one and an option to another
which would symlink to the system's snakeoil cert. Now those two
codepaths converge, and do the same thing.
The small advantage of generating our own over the alternative is that
it lets us set the name in the cert to EXTERNAL_HOST, rather than the
system's hostname as embedded in the system snakeoil certs. Not a big
deal, but might make things go slightly smoother if some browsers are
lenient (in a way that they probably shouldn't be.)
Take the core of the logic from how Debian generates the system's
/etc/ssl/certs/ssl-cert-snakeoil.pem ; that gives me more confidence
in the various config choices, and it also demonstrates a much cleaner
way to use the `openssl` tool. Also replace the outer shell logic for
CLI and logging with a cleaner version.
It's not appropriate for our script to pass the `--agree-tos` flag
without any evidence of the user actually having any knowledge of,
let alone intent to agree to, any such ToS. Stop doing that.
Fortunately this script hasn't been part of any release, so it's
likely that no users have gone down this path.
The script already won't work without them; so if the user gets the
invocation wrong, give a halfway-reasonable error rather than just
crash into the ground.
This allows the installer to continue using this script for the
`standalone` method, while the no-argument form now uses the same
`webroot` method as the renewal cron job, suitable for running
by hand to adopt Certbot after initial install.
This causes the cron job to run only when a Zulip-managed certbot
install is actually set up.
Inside `install`, zulip.conf doesn't yet exist when we run
setup-certbot, so we write the setting later. But we also give
setup-certbot the ability to write the setting itself, so that we
can recommend it in instructions for adopting certbot in an
existing Zulip installation.
This helps make this script suitable to run on existing installations,
by mitigating any worry about clobbering existing certs with links to
the new ones, in case the admin changes their mind or was using the
certs for something else too.
This enforces our use of a consistent style in how we access Python
modules; "from os.path import dirname" is a particularly popular
abbreviation inconsistent with our style, and so it deserves a lint
rule.
Commit message and error text tweaked by tabbott.
Fixes#6543.
This causes `upgrade-zulip-from-git`, as well as a no-option run of
`tools/build-release-tarball`, to produce a Zulip install running
Python 3, rather than Python 2. In particular this means that the
virtualenv we create, in which all application code runs, is Python 3.
One shebang line, on `zulip-ec2-configure-interfaces`, explicitly
keeps Python 2, and at least one external ops script, `wal-e`, also
still runs on Python 2. See discussion on the respective previous
commits that made those explicit. There may also be some other
third-party scripts we use, outside of this source tree and running
outside our virtualenv, that still run on Python 2.
We may not necessarily be running out of /root/zulip or any particular path,
but the point this comment was really trying to make in the first place stands.
Make it more clearly and still-accurately.
This will simplify step 1 of prod-install instruction to reduce
suffering in testing/experimenting production environments.
Attribution: the scripts/setup/configure-certs is based on @galexrt's
5c0daf6211
Further tweaked by tabbott to rename the script and edit the messages.
This replaces nvm in npm-wrapper by harcoding the path the way we do
with node. The main benefit is that this saves a few hundred
milliseconds every time we invoke npm.
When we added support for automatically adding new secrets in
generate_secrets.py, we failed to account for the possibility that a
human editor might have let the secrets file without a trailing
newline.
We address this by adding a leading newline before our new secret.
Fixes#5209.
Now, generate_secrets.py will never overwrite existing secrets. In
addition to being a safer model in generate, this fixes 2 significant
issues:
(1) It makes it much easier to preserve secrets like Oauth tokens in a
development environment (previously, provision would destroy them).
(2) It makes it possible to automatically add new secrets as part of
the upgrade process. In particular, this is useful for the
zulip_org_id settings.
Fixes#4797.
This fixes a significant performance issue with LaTeX rendering (and
other things that invoked node) where starting up node took a few
hundred milliseconds due to nvm initialization.
Tweaked by tabbott to avoid copying the node binary itself, instead
using a tiny wrapper script.
This is important primarily because it's possible a future version of
node will expect to find libraries/dependencies/etc. installed via NVM
at some path related to the path of the node binary itself, and that's
more guaranteed with this new model.
Fixes#4618.
Now that we're no longer actively debugging this tool, there's no need
to have it print everything it's doing.
This will make `test-backend` a lot nicer to use.
generate-secrets.py now requires --development for development environment
setup or --production for production environment setup (and one of these
options is mandatory).
This solves the problem that it was somewhat easy to accidentally run
generate-secrets.py without the `-d` option while doing manual development
environment setup.
Fixes: #1911.
NVM takes a specific node version and installs the node package and
a corresponding compatible npm package.
We use it in a somewhat hackish way to install node/npm globally with
a pinned version, since that's how we actually want to consume node in
our development environment.
Other details:
- Travis CI now is configured to use the version of node installed by
provision; the easiest way to do this was to sabotage the existing node
installation.
- jsdom is upgraded to a current version, which both requires recent
node and also is required for the tests to pass with recent node.
This fixes running the node tests on Xenial.
Fixes#1498.
[tweaked by tabbott]
The manage.py change effectively switches the Zulip production server
to use the virtualenv, since all of our supervisord commands for the
various Python services go through manage.py.
Additionally, this migrates the production scripts and Nagios plugins
to use the virtualenv as well.
Previously, we used shell quoting that would result in the shell variable not
being substituted. Instead, we use `"`s that will allow for variable
substitution.
Previously these were hardcoded in zproject/settings.py to be accessed
on localhost.
[Modified by Tim Abbott to adjust comments and fix configure-rabbitmq]
This fixes an annoying issue where one tries to rebuild the database,
and it fails due to there being existing connections.
The one thing that is potentially scary about this implementation is
that it means it's now a lot easier to accidentally drop your
production database by running the wrong script; might be worth adding
a "--force" flag controlling this behavior or something.
Thanks to Nemanja Stanarevic and Neeraj Wahi for prototypes of this
implementation! They did most of the work and testing for this.
This fixes some issues that we've had where commands will fail is
confusing ways after the database is rebuilt because data from before
the database was dropped is still in the memcached cache.
This fixes issue #123. Namely, the script in scripts/setup/install was
returning 0. Adding `set -e` and `set -o pipeline` causes the install
script to exit and return 1 if any part fails, including piping output
(`set -o pipeline` does this).
Most of our installation process is idempotent, but this step in
particular is not, so it's important to provide a clear error message
about how to proceed.