install: Start on an LXC-based dev/test environment for the installer.

In order to do development on the installer itself in a sane way,
we need a reasonably fast and automatic way to get a fresh environment
to try to run it in.

This calls for some form of virtualization.  Choices include

 * A public cloud, like EC2 or Digital Ocean.  These could work, if we
   wrote some suitable scripts against their APIs, to manage
   appropriate base images (as AMIs or snapshots respectively) and to
   start fresh instances/droplets from a base image.  There'd be some
   latency on starting a new VM, and this would also require the user
   to have an account on the relevant cloud with API access to create
   images and VMs.

 * A local whole-machine VM system (hypervisor) like VirtualBox or
   VMware, perhaps managing the configuration through Vagrant.  These
   hypervisors can be unstable and painfully slow.  They're often the
   only way to get development work done on a Mac or Windows machine,
   which is why we use them there for the normal Zulip development
   environment; but I don't really want to find out how their
   instability scales when constantly spawning fresh VMs from an image.

 * Containers.  The new hotness, the name on everyone's lips, is Docker.
   But Docker is not designed for virtualizing a traditional Unix server,
   complete with its own init system and a fleet of processes with a
   shared filesystem -- in other words, the platform Zulip's installer
   and deployment system are for.  Docker brings its own quite
   different model of deployment, and someday we may port Zulip from
   the traditional Unix server to the Docker-style deployment model,
   but for testing our traditional-Unix-server deployment we need a
   (virtualized) traditional Unix server.

 * Containers, with LXC.  LXC provides containers that function as
   traditional Unix servers; because of the magic of containers, the
   overhead is quite low, and LXC offers handy snapshotting features
   so that we can quickly start up a fresh environment from a base
   image.  Running LXC does require a Linux base system.  For
   contributors whose local development machine isn't already Linux,
   the same solutions are available as for our normal development
   environment: the base system for running LXC could be e.g. a
   Vagrant-managed VirtualBox VM, or a machine in a public cloud.

This commit adds a first version of such a thing, using LXC to manage
a base image plus a fresh container for each test run.  The test
containers function as VMs: once installed, all the Zulip services run
normally in them and can be managed in the normal production ways.

This initial version has a shortage of usage messages or docs, and
likely has some sharp edges.  It also requires familiarity with the
basics of LXC commands in order to make good use of the resulting
containers: `lxc-ls -f`, `lxc-attach`, `lxc-stop`, and `lxc-start`,
in particular.
This commit is contained in:
Greg Price 2018-01-19 16:14:40 -08:00
parent a5be1fb109
commit bf5f1b5f20
2 changed files with 114 additions and 0 deletions

60
tools/test-install/install Executable file
View File

@ -0,0 +1,60 @@
#!/bin/bash
set -ex
if [ "$EUID" -ne 0 ]; then
echo "error: this script must be run as root" >&2
exit 1
fi
RELEASE="$1"
INSTALLER="$2"
THIS_DIR="$(dirname "$(readlink -f "$0")")"
BASE_CONTAINER_NAME=zulip-install-"$RELEASE"-base
if ! lxc-info -n "$BASE_CONTAINER_NAME" >/dev/null 2>&1; then
"$THIS_DIR"/prepare-base "$RELEASE"
fi
while [ -z "$CONTAINER_NAME" ] || lxc-info -n "$CONTAINER_NAME" >/dev/null 2>&1; do
CONTAINER_NAME="$(mktemp -u zulip-install-"$RELEASE"-XXXXX)"
done
lxc-copy --ephemeral --keepdata -n "$BASE_CONTAINER_NAME" -N "$CONTAINER_NAME"
run() {
lxc-attach -n "$CONTAINER_NAME" -- "$@"
}
# Wait for the container to boot, polling.
ok=
for i in {1..60}; do
runlevel="$(run runlevel 2>/dev/null)" || { sleep 1; continue; }
if [ "$runlevel" != "${0%[0-9]}" ]; then
ok=1
break
fi
sleep 1
done
if [ -z "ok" ]; then
echo "error: timeout waiting for container to boot" >&2
exit 1
fi
# TODO kill this with an installer flag
run apt-get install -y openssl ssl-cert
run ln -nsf /etc/ssl/certs/ssl-cert-snakeoil.pem /etc/ssl/certs/zulip.combined-chain.crt
run ln -nsf /etc/ssl/private/ssl-cert-snakeoil.key /etc/ssl/private/zulip.key
# TODO make this a proper dep -- else
# /tmp/zulip-server-1.7.1/scripts/lib/../../scripts/lib/third/install-yarn.sh: line 43: curl: command not found
run apt-get install -y curl
<"$INSTALLER" run dd of=/tmp/zulip-server.tar.gz
run tar -xf /tmp/zulip-server.tar.gz -C /tmp/
run sh -c '/tmp/zulip-server-*/scripts/setup/install'
# TODO install ends as a zombie (workaround: `sudo ps aux | grep lxc-attach`, kill that)
# TODO settings.py, initialize-database, create realm
# TODO eatmydata, for speed

54
tools/test-install/prepare-base Executable file
View File

@ -0,0 +1,54 @@
#!/bin/bash
set -ex
if [ "$EUID" -ne 0 ]; then
echo "error: this script must be run as root" >&2
exit 1
fi
RELEASE="$1"
ARCH=amd64 # TODO: maybe i686 too
# TODO: xenial too
case "$RELEASE" in
trusty) ;;
*)
echo "error: unsupported target release: $RELEASE" >&2
exit 1
;;
esac
CONTAINER_NAME=zulip-install-$RELEASE-base
if ! lxc-info -n "$CONTAINER_NAME" >/dev/null 2>&1; then
lxc-create -n "$CONTAINER_NAME" -t download -- -d ubuntu -r "$RELEASE" -a "$ARCH"
fi
lxc-start -n "$CONTAINER_NAME"
run() {
lxc-attach -n "$CONTAINER_NAME" -- "$@"
}
run passwd -d root
run apt-get update
# As an optimization, we install a bunch of packages the installer
# would install for itself.
run apt-get install -y --no-install-recommends \
xvfb parallel netcat unzip zip jq python3-pip wget \
build-essential python3-dev \
closure-compiler memcached rabbitmq-server redis-server \
hunspell-en-us supervisor libssl-dev yui-compressor puppet \
gettext libffi-dev libfreetype6-dev libz-dev libjpeg-dev \
libldap2-dev libmemcached-dev python-dev python-pip \
python-virtualenv python-six libxml2-dev libxslt1-dev libpq-dev \
postgresql-9.3
run ln -sf /usr/share/zoneinfo/Etc/UTC /etc/localtime
run locale-gen en_US.UTF-8 || true
echo "LC_ALL=en_US.UTF-8" | run tee /etc/default/locale
# TODO: on failure, either stop or print message
lxc-stop -n "$CONTAINER_NAME"