Apparently, the refactoring to make this script only run when changes
are present was buggy, in that if `apt-get update` failed, running
provision against wouldn't rerun `apt-get update`, resulting in a
broken state that requires expertise to fix. This closes that gap, by
using a stamp file to ensure we always successfully update apt before
proceeding.
It doesn't fix existing installations.
Modify `generate_sha1sum_node_modules()` such that it can calculate
the hash for a particular installation.
Tweaked by tabbott to use os.path.realpath in the setup_dir
calculation, to ensure it's consistent.
This should make it much more likely that users see this before
waiting a long time for other things to happen, since the `apt-get
dist-upgrade` step is really slow. We can't move further to the top,
since this requires `lsb_release` to be installed.
Given the path of directory containing all the caches, a list of
caches in use and threshold days, this function returns a list
of caches which can be removed safely.
This function returns a list of all the deployments directories
which are newer than some threshold number of days including the
`/root/zulip` directory if it exists.
This saves us from spending 200-250ms of CPU time importing Django
again just to log that we're running a management command. On
`scripts/restart-server`, this saves us from one thundering herd of
Django startups when all the queue workers are restarted; but there's
still the Django startup for the `manage.py` process itself for each
worker, so on a machine with e.g. 2 (virtual) cores the restart is
still painful.
This causes `upgrade-zulip-from-git`, as well as a no-option run of
`tools/build-release-tarball`, to produce a Zulip install running
Python 3, rather than Python 2. In particular this means that the
virtualenv we create, in which all application code runs, is Python 3.
One shebang line, on `zulip-ec2-configure-interfaces`, explicitly
keeps Python 2, and at least one external ops script, `wal-e`, also
still runs on Python 2. See discussion on the respective previous
commits that made those explicit. There may also be some other
third-party scripts we use, outside of this source tree and running
outside our virtualenv, that still run on Python 2.
We now call the create_large_migrations management command as part of
upgrade-zulip-stage-2 if needed, so that we can create large indexes
while the app is still up.
We can't fully support it until we fix the tsearch_extras availability
issue, but for now, this is an improvement.
Tweaked by tabbott to cover the outstanding tsearch_extras issue.
Also make our dependency on `six` (for e.g. `replace-tarball-shebang`)
explicit -- we've been getting it via `python-pip`, but `python3-pip`
(on trusty) doesn't have that dependency for some reason.
Since we can use both perfer_offline=True and False in a since build
prefer_offline shouldn't be used as a cache key or it will confuse the
cleanup script. Since yarn install (if successful) should be idempotent.
This will probably be ok.
If we do wind up with a symlink lying around at `local_settings.py`,
it won't do us any harm and shouldn't be materially more confusing
than the regular file we've long had there for almost all installs.
It'll also only last as long as the current deploy. So just
let it be, and simplify the code a bit.
Also add a line to help the reader understand the remaining half of
this logic (which is essential so long as people might have pre-1.4.0
deploys lying around that they eventually get around to trying to
upgrade). The fact that it's addressed to a situation which exists
only in the past of this tree, not in its present, makes a brief
comment potentially very helpful.
This replaces nvm in npm-wrapper by harcoding the path the way we do
with node. The main benefit is that this saves a few hundred
milliseconds every time we invoke npm.
For performance reasons, we spawn each linter in a separate OS thread.
The downside of this is that all lints would end up in stdout without
much visual separation, resulting in confusing error log. This commit
introduce the `print_err` function, which shows which linter each line
of lint is from.
Basically we just seperate out the sha1sum generation for the
node modules so that it can be reused later for cache clearance
logic. This is achieved by adding a function which returns the
sha1sum based HEX digest.
The Zulip email mirror script called by postfix had performance/load
issues, because it spent so much time on startup/import due to use of
the Zulip virtualenv.
The script was rewritten using pure python (no Django) to improve
performance.
The install script was failing on 2nd+ attempts if the first attempt
was interrupted.
This failure happened because zulip-venv already existed at
`current_venv_path`. Changing the `ln` command's flags from `-s` to
`-nsf` should make this part of the script idempotent.
This fixes a significant performance issue with LaTeX rendering (and
other things that invoked node) where starting up node took a few
hundred milliseconds due to nvm initialization.
Tweaked by tabbott to avoid copying the node binary itself, instead
using a tiny wrapper script.
This is important primarily because it's possible a future version of
node will expect to find libraries/dependencies/etc. installed via NVM
at some path related to the path of the node binary itself, and that's
more guaranteed with this new model.
Fixes#4618.
This fixes a performance problem where we were previously starting up
a full Django process (~0.7s even on a fast machine) every time a new
email came in, potentially allowing users to accidentally DoS a Zulip
server. Now, we just post over HTTPS, allowing the existing thread
pool support to do its job.
- Add script wrapper to communicate postfix pipe with django web server
over HTTP(S). It uses shared_secret authentication mode.
- Add django view to process messages from email mirror server.
- Clean management command `email-mirror`. Left just functional
for cron email processing.
- Add routes for new tornado view.
- Change pipe script in master process postfix config template
based on updated script.
- Add tests.
Tweaked by tabbott to adjust the directory and set better defaults.
Fixes#2421.
Follow-on from #2373/ PR https://github.com/zulip/zulip/pull/4316, to set an
appropriate umask also when upgrading so files have appropriate permissions.
I've tested this by starting from a clean install, deleting /srv/* so new
files are downloaded, and then doing an upgrade. It worked starting with both
a current version from master and an older release installed with a less
restrictive umask and then the umask changed.
Fixes#2373.
* Now queue_workers.py sorts queue names and prints them on their own
line. Previously it's output was nondeterministic.
* Simplified grep strategy for removing the "test" worker.
This list was likely to end up out of date quickly, since it wasn't
documented that you need to update it when adding a queue. The best
solution is to just not require it to be updated.
Now that we no longer use node_modules at all in production (it's only
used to generate static assets), we don't include `node_modules` in
the production tarballs, and thus we shouldn't attempt to copy
`node_modules` out of the production tarballs when installing.
Fixes a regression introduced in
d71f2e7b9b.
This saves about a minute of downtime when using
upgrade-zulip-from-git in the default configuration.
It should also save several seconds of downtime when upgrading to a
production release tarball as well.
This indirectly causes the RabbitMQ node name for new Zulip
installations to default to zulip@localhost, which would eliminate the
persistent problems we have had
Fixes#194, #465, #1375, #1751.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This adds a dependency on the realpath package on trusty; we could try
to remove it if needed, but given that realpath is included in
coreutils on Xenial (and presumably anything else modern), I think
it's reasonable to add it.
Fixes#1797.
Previously, success_stamp was touched whenever we used a particular
node_modules version; it makes more sense to only touch it when the
node_modules directory has actually changed.
get_package_names did not correctly strip the GitHub URLs from package
names, resulting in the "package names" for our dependencies installed
from Git being tracked with the complete sha1sum included in the name.
This meant that upgrading our virtualenvs incorrectly ended up
resorting to creating an entirely new virtualenv whenever we changed a
dependency that had previously been installed from GitHub URLs.
generate-secrets.py now requires --development for development environment
setup or --production for production environment setup (and one of these
options is mandatory).
This solves the problem that it was somewhat easy to accidentally run
generate-secrets.py without the `-d` option while doing manual development
environment setup.
Fixes: #1911.
This is a first pass at building a framework for collecting various
stats about realms, users, streams, etc. Includes:
* New analytics tables for storing counts data
* Raw SQL queries for pulling data from zerver/models.py tables
* Aggregation functions for aggregating hourly stats into daily stats, and
aggregating user/stream level stats into realm level stats
* A management command for pulling the data
Note that counts.py was added to the linter exclude list due to errors
around %%s.
This adds a new system for copying packages from old virtualenvs that
are sufficiently similar to the new virtualenv required.
In practice, this results in a huge performance improvement for
re-provisioning Zulip development environments when the requirements
files have changed (which is the dominant performance problem with
provision today).
Fixes: #1507.
Between releases 1.3.13 and 1.4.0, local_settings.py was renamed to
prod_settings.py. The upgrade scripts were adjusted to reflect this name
change. But because the first part of the upgrade script is run with the
currently installed version's code, the symlink to /etc/zulip/settings.py is
created with the old name. This was causing upgrade-zulip-stage-2 to fail.
Now upgrade-zulip-stage-2 creates the symlink at zproject/prod_settings.py
if it doesn't already exist.
Fixes#1731.
Because rabbitmq doesn't support changing the nodename of a running
rabbitmq node, Zulip installations suffered a plague of issues where
e.g. a Zulip server would reboot, the hostname would change, and
suddenly the local rabbitmq instance being used by Zulip would stop
working.
We address this problem by using, by default, a fixed rabbitmq
nodename, but providing server administrators the option to set the
rabbitmq nodename used by Zulip however they choose.
To upgrade an existing server to use this new configuration, one will
need to add something like the following to /etc/zulip/zulip.conf:
[rabbitmq]
nodename = zulip@localhost
However, I don't believe we have the puppet code in place to make this
work correctly at initial installation without rabbitmq-server being
already installed (but off), as we can easily setup in Travis CI but I
haven't been willing to do for the installer. So for now, this just
fixes our Travis CI problems.
Fixes: #1579.
This reverts commit 3f95e567c1.
Apparently `apt-add-repository` fails periodically in CI. I suspect
this is some sort of silly networking problem, but given that all
we're saving is a few lines of code, the old version was better if
this fails basically ever.
Previously, the install script would fail if you passed various
non-default puppet rules, since the code to configure and restart
services that runs later on in the install script largely ran
unconditionally, regardless of whether the relevant service was
actually installed on the target system.
This should make the main install script reusable for installing
e.g. a dedicated Postgres server for use with Zulip.
This reverts commit f1f48f305e.
The use of sklearn unfortunately caused a substantial slowdown to the
Zulip provisioning process, which didn't seem worth it for a
relatively minor feature.
In python 3, subprocess uses bytes for input and output if
universal_newlines=False (the default). It uses str for input and
output if universal_newlines=True.
Since we're dealing with strings here, add universal_newlines=True
to subprocess.check_output calls.
This is important for both ensuring the Nagios checks work correctly
in production, as well as making sure the `zulip` user can access the
virtualenv (owned by the `travis` user) in Travis CI.
The manage.py change effectively switches the Zulip production server
to use the virtualenv, since all of our supervisord commands for the
various Python services go through manage.py.
Additionally, this migrates the production scripts and Nagios plugins
to use the virtualenv as well.
Apparently, c74a74dc74 introduced a bug
where we are no longer correctly depending on build-essential as part
of the Zulip development environment installation process.
Fixes#1111.
This is needed because hash_reqs.py is used to create a virtualenv.
Currently we only use virtualenv in development, but we will soon
start using it in production. Scripts used in production should be
put in scripts/.
Camo is a caching image proxy, used in Zulip to avoid mixed-content
warnings by proxying HTTP image content over HTTPS. We've been using
it in zulip.com production for years; this change makes it available
in standalone Zulip deployments.
The main function of prompting inside `manage.py migrate` is to ask
the user if they want to delete stale content-types, which is
unimportant and likely scary, so we disable doing so.
This automatically loads settings, zerver.models.* and
zerver.lib.actions.* when you start `manage.py shell`, which should
save a bit of time basically every time someone uses it.
Fixes#275.
A common issue when doing a Zulip upgrade is trying to pass
upgrade-zulip a tarball path under /root, which doesn't work because
the Zulip user doesn't have permission to read the tarball. We
could fix this by just unpacking the tarballs as root, but it seemed
like a nicer approach would be to archive the release tarballs
somewhere readable by the Zulip user (/home/zulip/archives) and unpack
them from there.
Fixes#208.
The point of the lock is to prevent two deployments happening at the
same time and racing with each other, not to prevent doing any future
deployments after an error happens (which is what the current
implementation does in practice).
Addresses part of #208.
The #! line processing interpreted the argument to pass to `env` as
"python2.7 -u", which obviously isn't a real program.
We fix this by setting the PYTHONUNBUFFERED environment variable
inside the program, which has the same effect.
Thanks to Dan Fedele for the bug report and suggested solution!
With this change, we are now testing the production static asset
pipeline and installation process in a new testing job (and also run
the frontend/backend tests separately).
This means that changes that break the Zulip static asset pipeline or
production installation process are more likely to fail tests. The
testing is imperfect in that it does not have proper isolation -- we
build a complete Zulip development environment and then install a
Zulip production environment on top of it, so e.g. any apt
dependencies installed for Zulip development will still be available
for the Zulip production environment. But, it's better than nothing!
A good v2 of this would be to have the production setup process just
install the minimum stuff needed to run `build-release-tarball` and
then uninstall it / clean it up so that we can do a more clear
production installation, but that's more work.
While the docu on https://www.zulip.org/server.html says:
```
cd /root/zulip
./scripts/setup/install
```
This script downloads the `python-django-guardian_1.3-1~zulip4_all.deb` file to current working dir (`/root/zulip` if you follow the docu), but tries to install it from /root/.
This fails obviously. So i changed the download location to /tmp/.
We don't use apache in the main app -- only for the SSO situation --
this code was just copied from our own install script. And it caused
problems at CUSTOMER13 because they installed Apache in preparation for
the SSO integration, but restarting it failed.
(imported from commit 3f2961574134847c836e8b69736f60d9f8790201)