These are more correct to the sense of "is this a service we
configured for Zulip", and removes potential confusion around the 0/1
values being backwards from how binary is usually interpreted.
Using checks of `,$PUPPET_CLASSES,` is repetitive and error-prone; it
does not properly deal with `zulip_ops::` classes, for instance, which
include the `zulip::` classes.
As alluded to in ca9d27175b, this can be fixed by inspecting the
classes that would be applied, using `puppet --write-catalog-summary`.
We work around the chicken-and-egg problem alluded to therein by
writing out as complete `zulip.conf` as would be necessary, before
running puppet and removing the sections we then know to not be
needed.
Unfortunately, there are two checks for `$PUPPET_CLASSES` which cannot
be switched to this technique, as they concern errors that we wish to
catch quite early, and thus before we have puppet installed. Since we
expect failures of those to only concern warnings, and only be
mistakenly omitted for internal `zulip_ops::` classes, this seems a
reasonable risk to admit in exchange for catching common errors early.
When supervisor is first installed, it is started automatically, and
creates the socket, owned by root. Subsequent reconfiguration in
puppet only calls `reread + update`, which is insufficient to apply
the `chown = zulip:zulip` line in `supervisord.conf`, leaving the
socket owned by `root` and the last part of the installation unable to
restart `supervisor` services as the `zulip` user. The `chown` line
in `scripts/lib/install` exists to paper over this.
Add a separate exec target for changes to `supervisord.conf` itself,
which restarts the full service. This leaves the default `restart`
action on the service for the lightweight `reread + update` action,
which is more common.
We use `systemctl` only on redhat-esque builds, because CI runs
Ubuntu, but init is not systemd in that context. `systemctl reload`
is sufficient to re-apply the socket ownership, but a full `restart`
and not `reload` is necessary under `/etc/init.d/supervisor`.
49a7a66004 and immediately previous commits began installing
PostgreSQL 12 from their apt repository. On machines which already
have the distribution-provided version of PostgreSQL installed,
however, this leads to failure to apply puppet when restarting
PostgreSQL 12, as both attempt to claim the same port.
During installation, if we will be installing PostgreSQL, look for
other versions than what we will install, and abort if they are
found. This is safer than attempting to automatically uninstall or
reconfigure existing databases.
This allows for installing from-scratch with a different pinned
version of PostgreSQL, and provides a single place to change when the
default should increase.
Using `/etc/init.d/postgresql` as the detection of if Postgres is on
the server is incorrect, because this line runs _before_ puppet and
any packages are installed. Thus, it cannot tell the difference
between a new Ubuntu one-host first-time-install without PostgreSQL
yet, and one which is merely a front-end and will never have
PostgreSQL. This leads to failures in first-time installs:
```
Error: Evaluation Error: Error while evaluating a Function Call,
Could not find template 'zulip/postgresql//postgresql.conf.template.erb'
```
The only way to detect if PostgreSQL will be present in the _end_
state of the install is to examine the puppet classes that are
applied.
To do this, we must inspect `PUPPET_CLASSES`. Unfortunately, this can
be fragile to subclassing (e.g. `zulip_ops::postgres_appdb`). We
might desire to use `puppet apply --write-catalog-summary` to deduce
the _applied_ classes, which would unroll the inheritance; however,
this causes a chicken-and-egg problem, because `zulip.conf` must be
already written out (including a value for `postgresql.version`, if
necessary!) before such a puppet run could successfully complete.
Switch to predicating the `postgresql.version` key on the puppet
classes that are known to install postgres.
0f4b1076ad removed Ubuntu 16.04 "xenial" and Debian 9 "stretch" from
the printed list of supported operating systems, but left them in the
verification check that controls if that message is printed,
effectively continuing to support them.
Conversely, 439f0d3004 added Ubuntu 20.04 "focal" to the check, but
not to the printed list.
Synchronize to check and print the right supported distributions:
Ubuntu 18.04 "bionic", Ubuntu 20.04 "focal", and Debian 10 "buster".
The previous commit removed the only behavior difference between the
two flags; both of them skip user/database creation, and the tables
therein.
Of the two options `--no-init-db` is more explicit as to what it does,
as opposed to just one facet of when it might be used; remove
`--remote-postgres`.
Since `--postgres-missing-dictionaries` edits `/etc/zulip/zulip.conf`,
it interferes with the intent of `--no-overwrite-settings`.
Make the two settings conflict, to prevent this unclear state.
The `--no-init-db` option previously only controlled if
`initialize-database` was run, which sets up the tables inside the
database. If PostgreSQL was installed locally, it still attempted to
create the user and empty database.
This fails on hosts which are remote PostgreSQL hosts, and not
application hosts, as:
- They may already have a local database, and while
`initialize-datbase` will detect and offer to abort if one is
found,`--no-init-db` seems like it should be the option to not
overwrite it
- `flush-memcached` requires that a local venv be installed, which it
often is not on non-frontend machines.
Skip the database configuration when run with `--no-init-db`.
Since we now support Postgres versions from 10 to 12, we might as well
have new installations start on Postgres 12 to avoid unnecessary
migration/upgrade work.
Since now we want to use production suites on Circle CI so there
is no need to set TRAVIS in env while running scripts.
CIRCLECI is set default in the enviroment of Circle CI builds
so we can use it directly.
Also Travis CI had rabbitmq-server installed so we had to add workaround
in install script to avoid the error. That workaround is removed.
This simplifies the RDS installation process to avoid awkwardly
requiring running the installer twice, and also is significantly more
robust in handling issues around rerunning the installer.
Finally, the answer for whether dictionaries are missing is available
to Django for future use in warnings/etc. around full-text search not
being great with this configuration, should they be required.
This commit finishes adding end-to-end support for the install script
on Debian Buster (making it production ready). Some support for this
was already added in prior commits such as
99414e2d96.
We plan to revert the postgres hunks of this once we've built
tsearch_extras for our packagecloud archive.
Fixes#9828.
Now that we have the run_as_root helper function, we don't need to
install sudo to run Zulip in production
This reverts commit a7d7d181ea.
Fixes#10036.
On usage errors (except --help), write usage message to stderr and
exit with nonzero status.
Forbid setting the hostname and email to the example values. Those
are specifically checked for and would fail later.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Since yarn has a package.json conveniently available, we can parse
that with jq, saving the expensive operation of starting up yarn.
This saves ~300ms in a no-op provision.
This fixes an actual user-facing issue in our mobile push
notifications documentation (where we were incorrectly failing to
quote the argument to `./manage.py register_server` making it not
work), as well as preventing future similar issues from occurring
again via a linter rule.
This commit allows specifying Subject Alternative Names to issue certs
for multiple domains using certbot. The first name passed to certbot-auto
becomes the common name for the certificate; common name and the other
names are then added to the SAN field. All of these arguments are now
positional. Also read the following for the certbot syntax reference:
https://community.letsencrypt.org/t/how-to-specify-subject-name-on-san/Fixes#10674.
By far the dominant cause of errors when installing apt packages is
not having the Universe repository enabled in Ubuntu bionic (this
seems to have started happening a lot recently; I wonder if Ubuntu
changed the defaults for new server installs or something?).
In any case, providing that suggestion in the error output should help
reduce these a lot.
In scripts/lib/install line 71:
ZULIP_PATH="$(readlink -f $(dirname $0)/../..)"
^-- SC2046: Quote this to prevent word splitting.
^-- SC2086: Double quote to prevent globbing and word splitting.
In scripts/lib/install line 105:
mem_kb=$(cat /proc/meminfo | head -n1 | awk '{print $2}')
^-- SC2002: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead.
In scripts/lib/install line 141:
apt-get -y dist-upgrade $APT_OPTIONS
^-- SC2086: Double quote to prevent globbing and word splitting.
In scripts/lib/install line 145:
$ADDITIONAL_PACKAGES
^-- SC2086: Double quote to prevent globbing and word splitting.
In scripts/lib/install line 254:
if [ -n "ZULIP_ADMINISTRATOR" ]; then
^-- SC2157: Argument to -n is always true due to literal strings.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
We use it to drop privileges from root to other users in the installer
process (which ideally, we would remove, but it will take some
annoying refactoring).
This should generally be safe to do, since the default sudo
permissions only allow root to use it anyway.
See https://github.com/zulip/zulip/issues/10036 for the follow-up
issue of removing the need to do this.
Previously, we were having issues installing on Debian Stretch with
non-English locales, because `locale-gen` actually doesn't take a
locale as an argument (and thus `locale-gen en_US.UTF-8` did nothing).
We should instead be calling localedef directly.
Thanks to Tom Daff for debugging this.
Fixes#10629.
For building Zulip in an environment where a custom CA certificate is
required to access the public Internet, one needs to be able to
specify that CA certificate for all network access done by the Zulip
installer/build process. This change allows configuring that via the
environment.
Apparently, perl at least expects LANG, LANGUAGE, and LC_ALL to be
consistent, and thus apt spits out a bunch of warnings if these are
different. So if we're forcing LC_ALL in these installer/upgrade
script blocks, we should force the rest too.
I believe this fixes the remaining locale part of #9946.
This commits adds the necessary puppet configuration and
installer/upgrade code for installing and managing the thumbor service
in production. This configuration is gated by the 'thumbor.pp'
manifest being enabled (which is not yet the default), and so this
commit should have no effect in a default Zulip production environment
(or in the long term, in any Zulip production server that isn't using
thumbor).
Credit for this effort is shared by @TigorC (who initiated the work on
this project), @joshland (who did a great deal of work on this and got
it working during PyCon 2017) and @adnrs96, who completed the work.
The docker installer configuration incorrectly had has_appserver set
to 0; this meant that (A) the docker-zulip code needed to copy the
block of code in the installer for the `has_appserver` case into the
Dockerfile (unnecessarily), and (B) one couldn't use `install` from a
Git ref (because the static asset compiler didn't end up in the right
place).
It appears that docker-zulip tried to set this flag in their `install`
command line, but the construction inside `install` meant that didn't
work.
It wasn't obvious reading this message that you can perfectly well
bring your own SSL/TLS certificate; unless you read quite a bit
between the lines where we say "could not find", or followed the link
to the detailed docs, the message sounded like you had to either use
--certbot or --self-signed-cert.
So, explicitly mention the BYO option. Because the "complete chain"
requirement is a bit tricky, don't try to give instructions for it
in this message; just refer the reader to the docs.
Also, drop the logic to identify which of the files is missing; it
certainly makes the code more complex, and I think even the error
message is actually clearer when it just gives the complete list of
required files -- it's much more likely that the reader doesn't know
what's required than that they do and have missed one, and even then
it's easy for them to look for themselves.
The installation isn't really complete here, and wasn't even when this
was the only success case; the instructions we're giving are for the
next step in the installation.
These instructions don't say what to do in an actual use case for this
option, but decent instructions there will require having a concrete
use case in front of us and designing the flow for it. At this stage,
just say where we are in the normal flow, and an admin who's chosen to
go off that flow can figure out how they want to vary it from there.
This flips the experimental `--express` option to be the default.
We retain the old behavior, where the script exits before
`initialize-database`, as an option `--no-init-db`; it might be useful
in e.g. a migration scenario (from a Zulip install elsewhere, or
another chat system) where the admin wants to set up the database
separately.
The install instructions are adjusted to match, getting shorter by two
steps and a bunch of words. I think this opens up opportunities to
refactor the text to simplify things further, too, but leaving that
for another commit.
Also tweak the "production" test suite to match.
Kind of unfortunate because the `sudo` interface for running a command
is objectively better -- a list of arguments, rather than a string to
be re-parsed by the shell. But some bare-bones machine images lack
`sudo`, so this makes things a bit more portable.
We'll make this the normal behavior soon, once we're satisfied with
our arrangements for sending the admin straight to realm creation and
using the app without configuring email. The instructions in the docs
will also have to change accordingly, of course.
This causes us to give an error if you pass the installer any
positional arguments, e.g. with `--`. There's no reason you'd want
to do this, but I accidentally did it by passing an extra `--` to
the `test-install/install` wrapper and spent a few minutes on
confused debugging.
This gives us just one way of adopting a self-signed cert, rather than
one script which would generate a new one and an option to another
which would symlink to the system's snakeoil cert. Now those two
codepaths converge, and do the same thing.
The small advantage of generating our own over the alternative is that
it lets us set the name in the cert to EXTERNAL_HOST, rather than the
system's hostname as embedded in the system snakeoil certs. Not a big
deal, but might make things go slightly smoother if some browsers are
lenient (in a way that they probably shouldn't be.)
Before this fix, the installer has an extremely annoying bug where
when run inside a container with `lxc-attach`, when the installer
finishes, the `lxc-attach` just hangs and doesn't respond even to
C-c or C-z. The only way to get the terminal back is to root around
from some other terminal to find the PID and kill it; then run
something like `stty sane` to fix the messed-up terminal settings
left behind.
After bisecting pieces of the install script to locate which step
was causing the issue, it comes down to the `service camo restart`.
The comment here indicates that we knew about an annoying bug here
years ago, and just swept it under the rug by skipping this step
when in Travis. >_<
The issue can be reproduced by running simply `service camo restart`
under `lxc-attach` instead of the installer; or `service camo start`,
following a `service camo stop`. If `lxc-attach` is used to get an
interactive shell, these commands appear to work fine; but then when
that shell exits, the same hang appears. So, when we start camo
we're evidently leaving some kind of mess that entangles the daemon
with our shell.
Looking at the camo initscript where it starts the daemon, there's
not much code, and one flag jumps out as suspicious:
start-stop-daemon --start --quiet --pidfile $PIDFILE -bm \
--exec $DAEMON --no-close -c nobody --test > /dev/null 2>&1 \
|| return 1
start-stop-daemon --start --quiet --pidfile $PIDFILE -bm \
--no-close -c nobody --exec $DAEMON -- \
$DAEMON_ARGS >> /var/log/camo/camo.log 2>&1 \
|| return 2
What does `--no-close` do?
-C, --no-close
Do not close any file descriptor when forcing the daemon
into the background (since version 1.16.5). Used for
debugging purposes to see the process output, or to
redirect file descriptors to log the process output.
And in fact, looking in /proc/PID/fd while a hang is happening finds
that fd 0 on the camo daemon process, aka stdin, is connected to our
terminal.
So, stop that by denying the initscript our stdin in the first place.
This fixes the problem.
The Debian maintainer turns out to be "Zulip Debian Packaging Team",
at debian@zulip.com; so this package and its bugs are basically ours.
This provides a major simplification for non-production installs,
including our own testing (it's already in both the test-install
harness script and the "production" test suite) as well as potential
admins evaluating Zulip.
Ultimately this should probably be the default behavior, with perhaps
something shown to admins on the web as a reminder and link to help on
installing a better certificate. For now, pending working through
that, just get the behavior in and leave it opt-in.
The third-party `install-yarn.sh` script uses `curl`, and we invoke it
in `install-node`. So we need to install it as a dependency.
We've mostly gotten away with this because it's common for `curl` to
already be installed; but it isn't always.
This updates commit 11ab545f3 "install: Set the locale ..."
to be somewhat cleaner, and to explain more in the commit message.
In some environments, either pip itself fails or some packages fail to
install, and setting the locale to en_US.UTF-8 resolves the issue.
We heard reports of this kind of behavior with at least two different
sets of symptoms, with 1.7.0 or its release candidates:
https://chat.zulip.org/#narrow/stream/general/subject/Trusty.201.2E7.20Upgrade/near/302214https://chat.zulip.org/#narrow/stream/production.20help/subject/1.2E6.20to.201.2E7/near/306250
In all reported cases, commit 11ab545f3 or equivalent fixed the issue.
Setting LC_CTYPE is redundant when also setting LC_ALL, because LC_ALL
overrides all `LC_*` environment variables; so skip that. Also move
the line in `install` to a more appropriate spot, and adjust the
comments.
This allows the installer to continue using this script for the
`standalone` method, while the no-argument form now uses the same
`webroot` method as the renewal cron job, suitable for running
by hand to adopt Certbot after initial install.
This causes the cron job to run only when a Zulip-managed certbot
install is actually set up.
Inside `install`, zulip.conf doesn't yet exist when we run
setup-certbot, so we write the setting later. But we also give
setup-certbot the ability to write the setting itself, so that we
can recommend it in instructions for adopting certbot in an
existing Zulip installation.
This should make it easier to script the installation process, and
also conveniently are the options one would want for the --certbot
option.
Significantly modified by tabbott to have a sane right interface,
include --help, and avoid printing all the `set -x` garbage before the
usage notices.
Historically, one has needed to build a release tarball in order to
use/test the Zulip installer, but you could upgrade a Zulip server
from Git. However, the only reason for that requirement was that we
didn't run `tools/update-prod-static` as part of the install script if
it's required. A good test for that case is whether we're in a Git
repository, but a better one is to check whether the prod-static
content exists in the tarball paths.
Fixes#3704.
This should make it much more likely that users see this before
waiting a long time for other things to happen, since the `apt-get
dist-upgrade` step is really slow. We can't move further to the top,
since this requires `lsb_release` to be installed.