install: Work around a bug in the (our) Debian package for camo.

Before this fix, the installer has an extremely annoying bug where
when run inside a container with `lxc-attach`, when the installer
finishes, the `lxc-attach` just hangs and doesn't respond even to
C-c or C-z.  The only way to get the terminal back is to root around
from some other terminal to find the PID and kill it; then run
something like `stty sane` to fix the messed-up terminal settings
left behind.

After bisecting pieces of the install script to locate which step
was causing the issue, it comes down to the `service camo restart`.
The comment here indicates that we knew about an annoying bug here
years ago, and just swept it under the rug by skipping this step
when in Travis. >_<

The issue can be reproduced by running simply `service camo restart`
under `lxc-attach` instead of the installer; or `service camo start`,
following a `service camo stop`.  If `lxc-attach` is used to get an
interactive shell, these commands appear to work fine; but then when
that shell exits, the same hang appears.  So, when we start camo
we're evidently leaving some kind of mess that entangles the daemon
with our shell.

Looking at the camo initscript where it starts the daemon, there's
not much code, and one flag jumps out as suspicious:

  start-stop-daemon --start --quiet --pidfile $PIDFILE -bm \
    --exec $DAEMON --no-close -c nobody --test > /dev/null 2>&1 \
    || return 1
  start-stop-daemon --start --quiet --pidfile $PIDFILE -bm \
    --no-close -c nobody --exec $DAEMON -- \
    $DAEMON_ARGS >> /var/log/camo/camo.log 2>&1 \
    || return 2

What does `--no-close` do?

 -C, --no-close
     Do not close any file descriptor when forcing the daemon
     into  the  background  (since version 1.16.5).  Used for
     debugging purposes to see  the  process  output,  or  to
     redirect  file  descriptors  to  log the process output.

And in fact, looking in /proc/PID/fd while a hang is happening finds
that fd 0 on the camo daemon process, aka stdin, is connected to our
terminal.

So, stop that by denying the initscript our stdin in the first place.
This fixes the problem.

The Debian maintainer turns out to be "Zulip Debian Packaging Team",
at debian@zulip.com; so this package and its bugs are basically ours.
This commit is contained in:
Greg Price 2018-01-22 17:27:35 -08:00
parent 6e7ae9a239
commit 2a59b2d2ac
2 changed files with 6 additions and 4 deletions

View File

@ -223,9 +223,12 @@ if [ "$has_appserver" = 0 ]; then
fi
# Restart camo since generate_secrets.py likely replaced its secret key
if [ "$has_camo" = 0 ] && [ -z "$TRAVIS" ]; then
# We don't run this in Travis CI due to a weird hang bug
service camo restart
if [ "$has_camo" = 0 ]; then
# Cut off stdin because a bug in the Debian packaging for camo
# causes our stdin to leak to the daemon, which can cause tools
# invoking the installer to hang.
# TODO: fix in Debian too.
service camo restart </dev/null
fi
if [ "$has_rabbit" = 0 ]; then

View File

@ -72,6 +72,5 @@ if [ -z "ok" ]; then
fi
run eatmydata -- /tmp/src/zulip-server/scripts/setup/install --snakeoil-cert "${INSTALLER_ARGS[@]}"
# TODO install ends as a zombie (workaround: `sudo ps aux | grep lxc-attach`, kill that)
# TODO settings.py, initialize-database, create realm