We clean up the code related to launching
processes here.
We extract:
server_processes
We also extract these helper for webpack
stuff:
do_one_time_webpack_compile
start_webpack_watcher
And then we move the code to actually launch
them lexically within the file (so as not to
be obscured by various function definitions).
Here is the new output for displaying ports:
Zulip services will listen on ports:
9991: web proxy
9992: Django
9993: Tornado
9994: webpack
9995: Thumbor
Note to Vagrant users: Only the proxy port (9991) is exposed.
I tone down the yellow for the Vagrant warning, and I show
the web proxy in cyan to emphasize it.
I also extracted the code into a function, and I don't call
that function until after `app.listen()`. (The users probably
won't notice much difference in the timing of this message, but
the message won't show if the `listen` step fails for some
reason, which I think is what we want here.)
We remove the import-tools code that was plunked
right into the middle of our command line
arguments.
Then we add a local var called `DESCRIPTION` to
fix some ugly code formatting, and we stop with the
unnecessary `r` prefix to the multi-line string.
We recently changed our droplet setup such that their
host names no longer include zulipdev.org. This caused
a few things to break.
The particular symptom that this commit fixes is that
we were trying to server static assets from
showell:9991 instead of showell.zulipdev.org:9991,
which meant that you couldn't use the app locally.
(The server would start, but the site's pretty unusable
without static assets.)
Now we rely 100% on `dev_settings.py` to set
`EXTERNAL_HOST` for any droplet users who don't set
that var in their own environment. That allows us to
remove some essentially duplicate code in `run-dev.py`.
We also set `IS_DEV_DROPLET` explicitly, so that other
code doesn't have to make inferences or duplicate
logic to detemine whether we're a droplet or not.
And then in `settings.py` we use `IS_DEV_DROPLET` to
know that we can use a prod-like method of calculating
`STATIC_URL`, instead of hard coding `localhost`.
We may want to iterate on this further--this was
sort of a quick fix to get droplets functional again.
It's possible we can re-configure droplets to have
folks get reasonable `EXTERNAL_HOST` settings in their
bash profiles, or something like that, although that
may have its own tradeoffs.
This reverts commit 36a8e61e67 (#13934).
The Django 2.2 autoreloader works by forking into a child process that
exits with status 3 when a file changes, and a parent process that
restarts the child when it exits with status 3. Setting this
environment variable had the effect of pretending we were already the
child process, without a parent process to restart it. Therefore,
changing any code used by the queue processor caused it to exit rather
than restart.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
In Django 2.2 the autoreload system has changed.
DJANGO_AUTORELOAD_ENV env variable should be set when calling code
that'll use the autoreloader. Otherwise there's some kind of race
condition in the autoreload code when SIGINT is sent, where
restart_with_reloader() (called only if the env variable isn't set)
has the subprocess module calling p.kill() on a process that's already
exited, raising ProcessLookupError and printing an ugly traceback. This
causes non-deterministic test-run-dev failures.
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013. We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not. Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.
While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.
This WebSockets code path had a lot of downsides/complexity,
including:
* The messy hack involving constructing an emulated request object to
hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
needs and must be provisioned independently from the rest of the
server).
* A duplicate check_send_receive_time Nagios test specific to
WebSockets.
* The requirement for users to have their firewalls/NATs allow
WebSocket connections, and a setting to disable them for networks
where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
been poorly maintained, and periodically throws random JavaScript
exceptions in our production environments without a deep enough
traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
server restart, and especially for large installations like
zulipchat.com, resulting in extra delay before messages can be sent
again.
As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients. We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.
If we later want WebSockets, we’ll likely want to just use Django
Channels.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Previously, while Django code that relied on EXTERNAL_HOST and other
settings would know the Zulip server is actually on port 9991, the
upcoming Django SAML code in python-social-auth would end up detecting
a port of 9992 (the one the Django server is actually listening on).
We fix this using X-Forwarded-Port.
Apparently Tornado decompresses gzip responses by default. Worse, it
fails to adjust the Content-Length header when it does.
https://github.com/tornadoweb/tornado/issues/2743
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
A HEAD response has a Content-Length but no body; it’s not correct in
that case to let Tornado default Content-Length to 0.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
As of #367, `tools/run-dev-queue-processors` has evolved into nothing
more than an unnecessarily elaborate wrapper around `manage.py
process_queue --all`. Remove it (mostly to make it marginally easier
to Tab-complete `tools/run-dev.py`, if I’m being honest).
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This seems to be a common enough pitfall to justify
a bit of extra handling. Example output:
$ ./tools/run-dev.py
Clearing memcached ...
Flushing memcached...
OK
Starting Zulip services on ports: web proxy: ...
Note: only port 9991 is exposed to the host in a Vagrant environment.
ERROR: You probably have another server running!!!
Traceback (most recent call last):
File "./tools/run-dev.py", line 421, in <module>
app.listen(proxy_port, address=options.interface)
File "/srv/zulip-py3-venv/lib/python3.5/...
server.listen(port, address)
File "/srv/zulip-py3-venv/lib/python3.5/...
sockets = bind_sockets(port, address=address)
File "/srv/zulip-py3-venv/lib/python3.5/...
sock.bind(sockaddr)
OSError: [Errno 98] Address already in use
Terminated
Before this change, the way we loaded
webpack for various tools was brittle.
First, I addressed test-api and test-help-documentation.
These tools used to be unable to run standalone on a
clean provision, because they were (indirectly)
calling tools/webpack without the `--test` option.
The problem was a bit obscure, since running things
like `./tools/test-backend` or `./tools/test-all` in
your workflow would create `./var/webpack-stats-test.json`
for the broken tools (and then they would work).
The tools themselves weren't broken; they were the
only relying on the common `test_server_running` helper.
And even that helper wasn't broken; it was just that
`run-dev.py` wasn't respecting the `--test` option.
So I made it so that `./tools/run-dev` passes in `--test` to
`./tools/webpack`.
To confuse matters even more, for some reason Casper
uses `./webpack-stats-production.json` via various
hacks for its webpack configuration, so when I fixed
the other tests, it broke Casper.
Here is the Casper-related hack in zproject/test_settings.py,
which was in place before my change and remains
after it:
if CASPER_TESTS:
WEBPACK_FILE = 'webpack-stats-production.json'
else:
WEBPACK_FILE = os.path.join('var', 'webpack-stats-test.json')
I added similar logic in tools/webpack:
if "CASPER_TESTS" in os.environ:
build_for_prod_or_casper(args.quiet)
I also made the helper functions in `./tools/webpack` have
nicer names.
So, now tools should all be able to run standalone and not
rely on previous tools creating webpack stats files for
them and leaving them in the file system. That's good.
Things are still a bit janky, though. It's not completely
clear to me why `test-js-with-casper` should work off of
a different webpack configuration than the other tests.
For now most of the jankiness is around Casper, and we have
hacks in two different places, `zproject/test_settings.py` and
`tools/webpack` to force it to use the production stats
file instead of the "test" one, even though Casper uses
test-like settings for other things like which database
you're using.
This changes run-dev.py to ensure that we have in fact compiled
handlebars templates before running webpack, which is the right model.
Future work will likely include running the handlebars compiler from
webpack, and thus eliminating this extra process.
This commit adds a --quiet argument to tools/webpack which removes
the verbose output from webpack and replaces it with showing only
errors. It also makes tools/run-dev --tests use this argument while
running webpack for testing.
Tweaked by tabbott to clean up the code a bit.
Webpack dev server by default does host checking for requests. so
in dev enviorment if the the request came for zulipdev.com it would not
send js files which caused dev envoirment to not work.
For now, this does nothing in a production environment, but it should
simplify the process of doing testing on the Thumbor implementation,
by integrating a lot of dependency management logic.
This reverts commit 66261f1cc. See parent commit for reason; here,
provision worked but `tools/run-dev.py` would give errors.
We need to figure out a test that reproduces these issues, then make a
version of these changes that keeps that test working, before we
re-merge them.
This method was new in Tornado 4.0. It saves us from having to get
the time ourselves and do the arithmetic -- which not only makes the
code a bit shorter, but also easier to get right. Tornado docs (see
http://www.tornadoweb.org/en/stable/ioloop.html) say we should have
been getting the time from `ioloop.time()` rather than hardcoding
`time.time()`, because the loop could e.g. be running on the
`time.monotonic()` clock.
Tweaked by tabbott to not remove it from lister.py, linter_lib, and
friends, since those are intended to support both Python 2 and 3
(we're planning to extract them from the repository).
This causes `upgrade-zulip-from-git`, as well as a no-option run of
`tools/build-release-tarball`, to produce a Zulip install running
Python 3, rather than Python 2. In particular this means that the
virtualenv we create, in which all application code runs, is Python 3.
One shebang line, on `zulip-ec2-configure-interfaces`, explicitly
keeps Python 2, and at least one external ops script, `wal-e`, also
still runs on Python 2. See discussion on the respective previous
commits that made those explicit. There may also be some other
third-party scripts we use, outside of this source tree and running
outside our virtualenv, that still run on Python 2.
This allow the webbpack dev server to properly reload JavaScript modules
while running in dev without restarting the server. We need to connect
to webpack-dev-server directly because SockJS doesn't support more than
one connection on the same host/port.
This hack saved a lot of time debugging weird issues back in 2016, but
it's no longer needed.
Anyone rebasing a branch from 2016 will need to provision afterwards
regardless, which will fix this issue automatically, and more
importantly, these changes were made obsolete when we moved to the
cached `node_modules` model.
This fixes an issue where if you saved a Python file (even just
changing whitespace) while casper tests were running, the Tornado
server being used would restart, triggering a confusing error like
this:
ReferenceError: Can't find variable: $
Traceback:
undefined:2
:4
Suite explicitly interrupted without any message given.