Revert c8f034e9a "queue: Remove missedmessage_email_senders code."
As the comment in the code says, it ensures a smooth upgrade path
from 1.7.x; we can delete it in master after 1.8.0 is released.
The removal commit was merged early due to a communication failure.
We do the following here:
* Remove libjasper-dev from THUMBOR_VENV_DEPENDENCIES.
Reason: This dependancy wasn't really needed by us for using
thumbor. It was a dependancy for using open-cv as Imaging Engine
in thumbor but we use PIL (Pillow now) as Imaging Engine.
* Add zlib1g-dev, libfreetype6-dev to THUMBOR_VENV_DEPENDENCIES.
Reason: These are dependancies of Pillow which are required for it
Pillow to function. Since we use Pillow in thumbor as Imaging Engine
we need these. Stuff before this didn't break because we also use
Pillow in development Environment and have these dependancies
installed from VENV_DEPENDENCIES as well.
The zulip user has no need to see this file; it's used by nginx.
And when we set up the cert early in install, there's no zulip user
yet anyway, so this fails.
We'll make this the normal behavior soon, once we're satisfied with
our arrangements for sending the admin straight to realm creation and
using the app without configuring email. The instructions in the docs
will also have to change accordingly, of course.
This script iterates over all the mobile.json resources and creates a
single file at static/locale/mobile_info.json which contains total and
not-translated strings information against each language. After doing
this, it deletes all the mobile i18n resources downloaded by
tools/sync-translations because we neither want to check them in our
repository nor we want to make our repository dirty.
This causes us to give an error if you pass the installer any
positional arguments, e.g. with `--`. There's no reason you'd want
to do this, but I accidentally did it by passing an extra `--` to
the `test-install/install` wrapper and spent a few minutes on
confused debugging.
Thanks to the magic of `set -x`, I noticed this:
```
+ cat
++ ssl-cert
/tmp/src/zulip-server/scripts/setup/generate-self-signed-cert: line 49: ssl-cert: command not found
+ apt-get install -y openssl
[...]
```
In other words, we were trying to run `ssl-cert` -- the name of a
Debian package I meant to refer to in a comment inside the templated
temporary config file for `openssl req` -- as if it were a command.
It wasn't, hence the error.
Because `set -e` has loopholes like a sieve, this didn't cause the
script to exit, just produced this funny output and presumably caused
the config file's comment to be missing a word. In principle, it
could do something surprising if for some reason there were a command
named `ssl-cert` on PATH.
Fix it.
This gives us just one way of adopting a self-signed cert, rather than
one script which would generate a new one and an option to another
which would symlink to the system's snakeoil cert. Now those two
codepaths converge, and do the same thing.
The small advantage of generating our own over the alternative is that
it lets us set the name in the cert to EXTERNAL_HOST, rather than the
system's hostname as embedded in the system snakeoil certs. Not a big
deal, but might make things go slightly smoother if some browsers are
lenient (in a way that they probably shouldn't be.)
Take the core of the logic from how Debian generates the system's
/etc/ssl/certs/ssl-cert-snakeoil.pem ; that gives me more confidence
in the various config choices, and it also demonstrates a much cleaner
way to use the `openssl` tool. Also replace the outer shell logic for
CLI and logging with a cleaner version.
Before this fix, the installer has an extremely annoying bug where
when run inside a container with `lxc-attach`, when the installer
finishes, the `lxc-attach` just hangs and doesn't respond even to
C-c or C-z. The only way to get the terminal back is to root around
from some other terminal to find the PID and kill it; then run
something like `stty sane` to fix the messed-up terminal settings
left behind.
After bisecting pieces of the install script to locate which step
was causing the issue, it comes down to the `service camo restart`.
The comment here indicates that we knew about an annoying bug here
years ago, and just swept it under the rug by skipping this step
when in Travis. >_<
The issue can be reproduced by running simply `service camo restart`
under `lxc-attach` instead of the installer; or `service camo start`,
following a `service camo stop`. If `lxc-attach` is used to get an
interactive shell, these commands appear to work fine; but then when
that shell exits, the same hang appears. So, when we start camo
we're evidently leaving some kind of mess that entangles the daemon
with our shell.
Looking at the camo initscript where it starts the daemon, there's
not much code, and one flag jumps out as suspicious:
start-stop-daemon --start --quiet --pidfile $PIDFILE -bm \
--exec $DAEMON --no-close -c nobody --test > /dev/null 2>&1 \
|| return 1
start-stop-daemon --start --quiet --pidfile $PIDFILE -bm \
--no-close -c nobody --exec $DAEMON -- \
$DAEMON_ARGS >> /var/log/camo/camo.log 2>&1 \
|| return 2
What does `--no-close` do?
-C, --no-close
Do not close any file descriptor when forcing the daemon
into the background (since version 1.16.5). Used for
debugging purposes to see the process output, or to
redirect file descriptors to log the process output.
And in fact, looking in /proc/PID/fd while a hang is happening finds
that fd 0 on the camo daemon process, aka stdin, is connected to our
terminal.
So, stop that by denying the initscript our stdin in the first place.
This fixes the problem.
The Debian maintainer turns out to be "Zulip Debian Packaging Team",
at debian@zulip.com; so this package and its bugs are basically ours.
This provides a major simplification for non-production installs,
including our own testing (it's already in both the test-install
harness script and the "production" test suite) as well as potential
admins evaluating Zulip.
Ultimately this should probably be the default behavior, with perhaps
something shown to admins on the web as a reminder and link to help on
installing a better certificate. For now, pending working through
that, just get the behavior in and leave it opt-in.
It's not appropriate for our script to pass the `--agree-tos` flag
without any evidence of the user actually having any knowledge of,
let alone intent to agree to, any such ToS. Stop doing that.
Fortunately this script hasn't been part of any release, so it's
likely that no users have gone down this path.
The third-party `install-yarn.sh` script uses `curl`, and we invoke it
in `install-node`. So we need to install it as a dependency.
We've mostly gotten away with this because it's common for `curl` to
already be installed; but it isn't always.
This commit just copies all the code from MissedMessageSendingWorker
class to a new EmailSendingWorker class. All the logic to send an email
through a queue was already there. This commit only makes the logic
generic. It does so by creating a special purpose queue called
'email_senders' to send any type of email. To make
MissedMessageSendingWorker still work we derive it from
EmailSendingWorker. All the tests that were testing
MissedMessageSendingWorker now run against EmailSendingWorker.
Apparently, this was checking the wrong path in Travis CI, and thus
never actually running (meaning we'd accumulate every `node_modules`
directory ever in the Travis caches, which in turn resulted in very
slow builds).
This updates commit 11ab545f3 "install: Set the locale ..."
to be somewhat cleaner, and to explain more in the commit message.
In some environments, either pip itself fails or some packages fail to
install, and setting the locale to en_US.UTF-8 resolves the issue.
We heard reports of this kind of behavior with at least two different
sets of symptoms, with 1.7.0 or its release candidates:
https://chat.zulip.org/#narrow/stream/general/subject/Trusty.201.2E7.20Upgrade/near/302214https://chat.zulip.org/#narrow/stream/production.20help/subject/1.2E6.20to.201.2E7/near/306250
In all reported cases, commit 11ab545f3 or equivalent fixed the issue.
Setting LC_CTYPE is redundant when also setting LC_ALL, because LC_ALL
overrides all `LC_*` environment variables; so skip that. Also move
the line in `install` to a more appropriate spot, and adjust the
comments.
This fixes a bug where, when a user is unsubscribed from a stream,
they might have unread messages on that stream leak. While it might
seem to be a minor problem, it can cause significant problems for
computing the `unread_msgs` data structures, since it means we need to
add an extra filter for whether the user is still subscribed, either
in the backend or in the UI.
Fixes#7095.
This commit renames various source requirements files like `dev.txt`,
`mypy.txt` etc to `dev.in`, `mypy.in` etc and various locked requirements
files like `dev_lock.txt`, `mypy_lock.txt` etc to `dev.txt`, `mypy.txt`
etc. This will help in emphasizing to the user that *.in are actually
input to `update-locked-requirements` tool which should be run after
updating any of these.
In this commit we add new dependencies needed for running thumbor.
Also we add the script for creating the virtual environment ready
for thumbor.
Note: Thumbor will use python2 and thus have different virtualenv
dedicated to it.
Credits to @TigorC and @joshland as well for there work on this.
The script already won't work without them; so if the user gets the
invocation wrong, give a halfway-reasonable error rather than just
crash into the ground.
This allows the installer to continue using this script for the
`standalone` method, while the no-argument form now uses the same
`webroot` method as the renewal cron job, suitable for running
by hand to adopt Certbot after initial install.
Certbot replaces the cert files under /etc/letsencrypt/live/,
which our nginx config refers to symlinks to; but it doesn't
tell nginx there's been an update, so nginx keeps serving the
old cert.
This is fine as long as nginx is restarted, or just told to
reload its config, at some point before the cert actually
expires about 30 days later. Which is probably the common
case, but of course we should make it just work. So, if we
actually renew a cert, tell nginx to reload its config now.
This causes the cron job to run only when a Zulip-managed certbot
install is actually set up.
Inside `install`, zulip.conf doesn't yet exist when we run
setup-certbot, so we write the setting later. But we also give
setup-certbot the ability to write the setting itself, so that we
can recommend it in instructions for adopting certbot in an
existing Zulip installation.
This helps make this script suitable to run on existing installations,
by mitigating any worry about clobbering existing certs with links to
the new ones, in case the admin changes their mind or was using the
certs for something else too.
Except in:
- docs/writing-bots-guide.md, because bots are supposed to be Python 2
compatible
- puppet/zulip_ops/files/zulip-ec2-configure-interfaces, because this
script is still on python2.7
- tools/lint
- tools/linter_lib
- tools/lister.py
For the latter two, because they might be yanked away to a separate repo
for general use with other FLOSS projects.
This didn't work at all when one did a `vagrant destroy` and then
`vagrant up`, because the cache state would be preserved even though
the machine is gone.
Fixes#5981.
This should make it easier to script the installation process, and
also conveniently are the options one would want for the --certbot
option.
Significantly modified by tabbott to have a sane right interface,
include --help, and avoid printing all the `set -x` garbage before the
usage notices.
Based on #450, with commits
restructured by Rein Zustand.
Tweaks by Rein Zustand:
- Replace configure-cert with generate-self-signed-certs
- `mv scripts/lib/create-zulip-admin.sh scripts/lib/create-zulip-admin`
Deployments whose name is not in the format of a timestamp are
always included in the `recent_deployments` and are not deleted,
hence we don't need to check for them explicitly.
We were checking for whether an item in the deployments directory
represents a directory but were using its relative path which was
causing a false value to be returned for all items irrespective of
their being a directory or not if the script was invoked from some
where other than the deployments directory.
This commit re-arranges the arguments of `purge_unused_caches()`
function in order to remain consistent with other similar functions
in the library like `may_be_perform_caching()`.
This function will replace the repetitive definition of `parse_args()`
in various cache cleaning scripts. Also adds a `--verbose` argument
to the parser.
Historically, one has needed to build a release tarball in order to
use/test the Zulip installer, but you could upgrade a Zulip server
from Git. However, the only reason for that requirement was that we
didn't run `tools/update-prod-static` as part of the install script if
it's required. A good test for that case is whether we're in a Git
repository, but a better one is to check whether the prod-static
content exists in the tarball paths.
Fixes#3704.
This enforces our use of a consistent style in how we access Python
modules; "from os.path import dirname" is a particularly popular
abbreviation inconsistent with our style, and so it deserves a lint
rule.
Commit message and error text tweaked by tabbott.
Fixes#6543.
The recent rewrite of purge-old-deployments accidentally attempted to
purge the symlinks, sockets, lock, and other files in the deployment
directory.
The new version has been tested out in production successfully.
Expands `purge-old-deployments` such that now it accepts the threshold
days as argument. Also `clean-unused-caches` script is automatically
run after purging the old deployments so that the orphaned caches
gets automatically cleaned.
Fixes: #5726.
Based on the `dry_run` flag, this function either purges the list
of directories passed to them or prints a listing of the directories
it would have purged/kept_back, had the `dry_run` flag been false.
Apparently, the refactoring to make this script only run when changes
are present was buggy, in that if `apt-get update` failed, running
provision against wouldn't rerun `apt-get update`, resulting in a
broken state that requires expertise to fix. This closes that gap, by
using a stamp file to ensure we always successfully update apt before
proceeding.
It doesn't fix existing installations.
Modify `generate_sha1sum_node_modules()` such that it can calculate
the hash for a particular installation.
Tweaked by tabbott to use os.path.realpath in the setup_dir
calculation, to ensure it's consistent.
In dev always include the currently active cache in order not to break
current installation in case dependencies are updated with bumping the
provision version.
This should make it much more likely that users see this before
waiting a long time for other things to happen, since the `apt-get
dist-upgrade` step is really slow. We can't move further to the top,
since this requires `lsb_release` to be installed.
Given the path of directory containing all the caches, a list of
caches in use and threshold days, this function returns a list
of caches which can be removed safely.
This function returns a list of all the deployments directories
which are newer than some threshold number of days including the
`/root/zulip` directory if it exists.
This saves us from spending 200-250ms of CPU time importing Django
again just to log that we're running a management command. On
`scripts/restart-server`, this saves us from one thundering herd of
Django startups when all the queue workers are restarted; but there's
still the Django startup for the `manage.py` process itself for each
worker, so on a machine with e.g. 2 (virtual) cores the restart is
still painful.
This causes `upgrade-zulip-from-git`, as well as a no-option run of
`tools/build-release-tarball`, to produce a Zulip install running
Python 3, rather than Python 2. In particular this means that the
virtualenv we create, in which all application code runs, is Python 3.
One shebang line, on `zulip-ec2-configure-interfaces`, explicitly
keeps Python 2, and at least one external ops script, `wal-e`, also
still runs on Python 2. See discussion on the respective previous
commits that made those explicit. There may also be some other
third-party scripts we use, outside of this source tree and running
outside our virtualenv, that still run on Python 2.
We now call the create_large_migrations management command as part of
upgrade-zulip-stage-2 if needed, so that we can create large indexes
while the app is still up.
We can't fully support it until we fix the tsearch_extras availability
issue, but for now, this is an improvement.
Tweaked by tabbott to cover the outstanding tsearch_extras issue.
We may not necessarily be running out of /root/zulip or any particular path,
but the point this comment was really trying to make in the first place stands.
Make it more clearly and still-accurately.
Also make our dependency on `six` (for e.g. `replace-tarball-shebang`)
explicit -- we've been getting it via `python-pip`, but `python3-pip`
(on trusty) doesn't have that dependency for some reason.
Since we can use both perfer_offline=True and False in a since build
prefer_offline shouldn't be used as a cache key or it will confuse the
cleanup script. Since yarn install (if successful) should be idempotent.
This will probably be ok.
If we do wind up with a symlink lying around at `local_settings.py`,
it won't do us any harm and shouldn't be materially more confusing
than the regular file we've long had there for almost all installs.
It'll also only last as long as the current deploy. So just
let it be, and simplify the code a bit.
Also add a line to help the reader understand the remaining half of
this logic (which is essential so long as people might have pre-1.4.0
deploys lying around that they eventually get around to trying to
upgrade). The fact that it's addressed to a situation which exists
only in the past of this tree, not in its present, makes a brief
comment potentially very helpful.
This will simplify step 1 of prod-install instruction to reduce
suffering in testing/experimenting production environments.
Attribution: the scripts/setup/configure-certs is based on @galexrt's
5c0daf6211
Further tweaked by tabbott to rename the script and edit the messages.
This replaces nvm in npm-wrapper by harcoding the path the way we do
with node. The main benefit is that this saves a few hundred
milliseconds every time we invoke npm.
For performance reasons, we spawn each linter in a separate OS thread.
The downside of this is that all lints would end up in stdout without
much visual separation, resulting in confusing error log. This commit
introduce the `print_err` function, which shows which linter each line
of lint is from.
We document the `deployment.git_repo_url` setting in `/etc/zulip/zulip.conf`
to control where this script fetches from, and don't say that it's
only read on the first such upgrade and cached thereafter. The documented
behavior seems like the right behavior. So use the currently configured
URL every time, by writing it anew into the config of our cache repo.
Basically we just seperate out the sha1sum generation for the
node modules so that it can be reused later for cache clearance
logic. This is achieved by adding a function which returns the
sha1sum based HEX digest.
When we added support for automatically adding new secrets in
generate_secrets.py, we failed to account for the possibility that a
human editor might have let the secrets file without a trailing
newline.
We address this by adding a leading newline before our new secret.
Fixes#5209.
The Zulip email mirror script called by postfix had performance/load
issues, because it spent so much time on startup/import due to use of
the Zulip virtualenv.
The script was rewritten using pure python (no Django) to improve
performance.
The install script was failing on 2nd+ attempts if the first attempt
was interrupted.
This failure happened because zulip-venv already existed at
`current_venv_path`. Changing the `ln` command's flags from `-s` to
`-nsf` should make this part of the script idempotent.
Now, generate_secrets.py will never overwrite existing secrets. In
addition to being a safer model in generate, this fixes 2 significant
issues:
(1) It makes it much easier to preserve secrets like Oauth tokens in a
development environment (previously, provision would destroy them).
(2) It makes it possible to automatically add new secrets as part of
the upgrade process. In particular, this is useful for the
zulip_org_id settings.
Fixes#4797.
This fixes a significant performance issue with LaTeX rendering (and
other things that invoked node) where starting up node took a few
hundred milliseconds due to nvm initialization.
Tweaked by tabbott to avoid copying the node binary itself, instead
using a tiny wrapper script.
This is important primarily because it's possible a future version of
node will expect to find libraries/dependencies/etc. installed via NVM
at some path related to the path of the node binary itself, and that's
more guaranteed with this new model.
Fixes#4618.
Also puts them into a processing queue, though the queue processor
does nothing.
Rewritten by tabbott to avoid unnecessary database queries in
do_send_messages.
This fixes a performance problem where we were previously starting up
a full Django process (~0.7s even on a fast machine) every time a new
email came in, potentially allowing users to accidentally DoS a Zulip
server. Now, we just post over HTTPS, allowing the existing thread
pool support to do its job.
- Add script wrapper to communicate postfix pipe with django web server
over HTTP(S). It uses shared_secret authentication mode.
- Add django view to process messages from email mirror server.
- Clean management command `email-mirror`. Left just functional
for cron email processing.
- Add routes for new tornado view.
- Change pipe script in master process postfix config template
based on updated script.
- Add tests.
Tweaked by tabbott to adjust the directory and set better defaults.
Fixes#2421.
Follow-on from #2373/ PR https://github.com/zulip/zulip/pull/4316, to set an
appropriate umask also when upgrading so files have appropriate permissions.
I've tested this by starting from a clean install, deleting /srv/* so new
files are downloaded, and then doing an upgrade. It worked starting with both
a current version from master and an older release installed with a less
restrictive umask and then the umask changed.
Fixes#2373.
- Add new 'missedmessage_email_senders' queue for sending missed messages emails.
- Add the new worker to process 'missedmessage_email_senders' queue.
- Split aggregation missed messages and sending missed messages email
to separate queue workers.
- Adapt tests for sending missed emails to the new logic.
Fixes#2607
* Now queue_workers.py sorts queue names and prints them on their own
line. Previously it's output was nondeterministic.
* Simplified grep strategy for removing the "test" worker.
This list was likely to end up out of date quickly, since it wasn't
documented that you need to update it when adding a queue. The best
solution is to just not require it to be updated.
Now that we no longer use node_modules at all in production (it's only
used to generate static assets), we don't include `node_modules` in
the production tarballs, and thus we shouldn't attempt to copy
`node_modules` out of the production tarballs when installing.
Fixes a regression introduced in
d71f2e7b9b.
This saves about a minute of downtime when using
upgrade-zulip-from-git in the default configuration.
It should also save several seconds of downtime when upgrading to a
production release tarball as well.
This indirectly causes the RabbitMQ node name for new Zulip
installations to default to zulip@localhost, which would eliminate the
persistent problems we have had
Fixes#194, #465, #1375, #1751.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This adds a dependency on the realpath package on trusty; we could try
to remove it if needed, but given that realpath is included in
coreutils on Xenial (and presumably anything else modern), I think
it's reasonable to add it.
Fixes#1797.
Previously, success_stamp was touched whenever we used a particular
node_modules version; it makes more sense to only touch it when the
node_modules directory has actually changed.
get_package_names did not correctly strip the GitHub URLs from package
names, resulting in the "package names" for our dependencies installed
from Git being tracked with the complete sha1sum included in the name.
This meant that upgrading our virtualenvs incorrectly ended up
resorting to creating an entirely new virtualenv whenever we changed a
dependency that had previously been installed from GitHub URLs.