With this change, we are now testing the production static asset
pipeline and installation process in a new testing job (and also run
the frontend/backend tests separately).
This means that changes that break the Zulip static asset pipeline or
production installation process are more likely to fail tests. The
testing is imperfect in that it does not have proper isolation -- we
build a complete Zulip development environment and then install a
Zulip production environment on top of it, so e.g. any apt
dependencies installed for Zulip development will still be available
for the Zulip production environment. But, it's better than nothing!
A good v2 of this would be to have the production setup process just
install the minimum stuff needed to run `build-release-tarball` and
then uninstall it / clean it up so that we can do a more clear
production installation, but that's more work.
This fixes an annoying issue where one tries to rebuild the database,
and it fails due to there being existing connections.
The one thing that is potentially scary about this implementation is
that it means it's now a lot easier to accidentally drop your
production database by running the wrong script; might be worth adding
a "--force" flag controlling this behavior or something.
Thanks to Nemanja Stanarevic and Neeraj Wahi for prototypes of this
implementation! They did most of the work and testing for this.
This fixes some issues that we've had where commands will fail is
confusing ways after the database is rebuilt because data from before
the database was dropped is still in the memcached cache.
This fixes issue #123. Namely, the script in scripts/setup/install was
returning 0. Adding `set -e` and `set -o pipeline` causes the install
script to exit and return 1 if any part fails, including piping output
(`set -o pipeline` does this).
Most of our installation process is idempotent, but this step in
particular is not, so it's important to provide a clear error message
about how to proceed.
While the docu on https://www.zulip.org/server.html says:
```
cd /root/zulip
./scripts/setup/install
```
This script downloads the `python-django-guardian_1.3-1~zulip4_all.deb` file to current working dir (`/root/zulip` if you follow the docu), but tries to install it from /root/.
This fails obviously. So i changed the download location to /tmp/.
If there's a problem with Django settings then RMQPW would just be
empty, causing more confusing errors downstream.
(imported from commit 5948b1a15eb92fc032ea02e499be58365d8e9ecb)
Source LOCAL_DATABASE_PASSWORD and INITIAL_PASSWORD_SALT from the secrets file.
Fix the creation of pgpass file.
Tim's note: This will definitely break the original purpose of the
tool but it should be pretty easy to add that back as an option.
(imported from commit 8ab31ea2b7cbc80a4ad2e843a2529313fad8f5cf)
The manual step here is that we need to do the `puppet apply` before
pushing this commit, or `restart-server` will crash.
Previously we shut down everything in one group, which performed
poorly with supervisor's bad performance on restarting many daemons at
once. Now we shut down the unimportant stuff, then the important
stuff, bring back the important stuff, and then bring back the
unimportant stuff.
This new model has a little over 5s of downtime for the core
user-facing daemons -- which is still far more than would be ideal,
but a lot less than the 13s or so that we had before.
Here's some logs with the current setup for the tornado/django downtime:
2013-12-19 20:16:51,995 restart-server: Stopping daemons
2013-12-19 20:16:53,461 restart-server: Starting daemons
2013-12-19 20:16:57,146 restart-server: Starting workers
Compare with the behavior on master today:
2013-12-19 20:21:45,281 restart-server: Stopping daemons
2013-12-19 20:21:49,225 restart-server: Starting daemons
2013-12-19 20:21:58,463 restart-server: Done!
(imported from commit b2c1ba77f3dc989551d0939779208465a8410435)
This reverts commit acef4c0027b77053497ef6e9f7aa4b61703205c3.
Despite the lower total downtime, this caused more user-facing downtime.
(imported from commit 5cce032bb20abe83853a65ee72bf0bb28af403cc)
This shaves about 1.5 seconds off our restart time on ls-dev (9s ->
7.5s). Still too slow, but it's a little bit better.
(imported from commit acef4c0027b77053497ef6e9f7aa4b61703205c3)
We don't use apache in the main app -- only for the SSO situation --
this code was just copied from our own install script. And it caused
problems at CUSTOMER13 because they installed Apache in preparation for
the SSO integration, but restarting it failed.
(imported from commit 3f2961574134847c836e8b69736f60d9f8790201)