In Django 2.2 the autoreload system has changed.
DJANGO_AUTORELOAD_ENV env variable should be set when calling code
that'll use the autoreloader. Otherwise there's some kind of race
condition in the autoreload code when SIGINT is sent, where
restart_with_reloader() (called only if the env variable isn't set)
has the subprocess module calling p.kill() on a process that's already
exited, raising ProcessLookupError and printing an ugly traceback. This
causes non-deterministic test-run-dev failures.
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013. We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not. Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.
While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.
This WebSockets code path had a lot of downsides/complexity,
including:
* The messy hack involving constructing an emulated request object to
hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
needs and must be provisioned independently from the rest of the
server).
* A duplicate check_send_receive_time Nagios test specific to
WebSockets.
* The requirement for users to have their firewalls/NATs allow
WebSocket connections, and a setting to disable them for networks
where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
been poorly maintained, and periodically throws random JavaScript
exceptions in our production environments without a deep enough
traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
server restart, and especially for large installations like
zulipchat.com, resulting in extra delay before messages can be sent
again.
As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients. We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.
If we later want WebSockets, we’ll likely want to just use Django
Channels.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Previously, while Django code that relied on EXTERNAL_HOST and other
settings would know the Zulip server is actually on port 9991, the
upcoming Django SAML code in python-social-auth would end up detecting
a port of 9992 (the one the Django server is actually listening on).
We fix this using X-Forwarded-Port.
Apparently Tornado decompresses gzip responses by default. Worse, it
fails to adjust the Content-Length header when it does.
https://github.com/tornadoweb/tornado/issues/2743
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
A HEAD response has a Content-Length but no body; it’s not correct in
that case to let Tornado default Content-Length to 0.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
As of #367, `tools/run-dev-queue-processors` has evolved into nothing
more than an unnecessarily elaborate wrapper around `manage.py
process_queue --all`. Remove it (mostly to make it marginally easier
to Tab-complete `tools/run-dev.py`, if I’m being honest).
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This seems to be a common enough pitfall to justify
a bit of extra handling. Example output:
$ ./tools/run-dev.py
Clearing memcached ...
Flushing memcached...
OK
Starting Zulip services on ports: web proxy: ...
Note: only port 9991 is exposed to the host in a Vagrant environment.
ERROR: You probably have another server running!!!
Traceback (most recent call last):
File "./tools/run-dev.py", line 421, in <module>
app.listen(proxy_port, address=options.interface)
File "/srv/zulip-py3-venv/lib/python3.5/...
server.listen(port, address)
File "/srv/zulip-py3-venv/lib/python3.5/...
sockets = bind_sockets(port, address=address)
File "/srv/zulip-py3-venv/lib/python3.5/...
sock.bind(sockaddr)
OSError: [Errno 98] Address already in use
Terminated
Before this change, the way we loaded
webpack for various tools was brittle.
First, I addressed test-api and test-help-documentation.
These tools used to be unable to run standalone on a
clean provision, because they were (indirectly)
calling tools/webpack without the `--test` option.
The problem was a bit obscure, since running things
like `./tools/test-backend` or `./tools/test-all` in
your workflow would create `./var/webpack-stats-test.json`
for the broken tools (and then they would work).
The tools themselves weren't broken; they were the
only relying on the common `test_server_running` helper.
And even that helper wasn't broken; it was just that
`run-dev.py` wasn't respecting the `--test` option.
So I made it so that `./tools/run-dev` passes in `--test` to
`./tools/webpack`.
To confuse matters even more, for some reason Casper
uses `./webpack-stats-production.json` via various
hacks for its webpack configuration, so when I fixed
the other tests, it broke Casper.
Here is the Casper-related hack in zproject/test_settings.py,
which was in place before my change and remains
after it:
if CASPER_TESTS:
WEBPACK_FILE = 'webpack-stats-production.json'
else:
WEBPACK_FILE = os.path.join('var', 'webpack-stats-test.json')
I added similar logic in tools/webpack:
if "CASPER_TESTS" in os.environ:
build_for_prod_or_casper(args.quiet)
I also made the helper functions in `./tools/webpack` have
nicer names.
So, now tools should all be able to run standalone and not
rely on previous tools creating webpack stats files for
them and leaving them in the file system. That's good.
Things are still a bit janky, though. It's not completely
clear to me why `test-js-with-casper` should work off of
a different webpack configuration than the other tests.
For now most of the jankiness is around Casper, and we have
hacks in two different places, `zproject/test_settings.py` and
`tools/webpack` to force it to use the production stats
file instead of the "test" one, even though Casper uses
test-like settings for other things like which database
you're using.
This changes run-dev.py to ensure that we have in fact compiled
handlebars templates before running webpack, which is the right model.
Future work will likely include running the handlebars compiler from
webpack, and thus eliminating this extra process.
This commit adds a --quiet argument to tools/webpack which removes
the verbose output from webpack and replaces it with showing only
errors. It also makes tools/run-dev --tests use this argument while
running webpack for testing.
Tweaked by tabbott to clean up the code a bit.
Webpack dev server by default does host checking for requests. so
in dev enviorment if the the request came for zulipdev.com it would not
send js files which caused dev envoirment to not work.
For now, this does nothing in a production environment, but it should
simplify the process of doing testing on the Thumbor implementation,
by integrating a lot of dependency management logic.
This reverts commit 66261f1cc. See parent commit for reason; here,
provision worked but `tools/run-dev.py` would give errors.
We need to figure out a test that reproduces these issues, then make a
version of these changes that keeps that test working, before we
re-merge them.
This method was new in Tornado 4.0. It saves us from having to get
the time ourselves and do the arithmetic -- which not only makes the
code a bit shorter, but also easier to get right. Tornado docs (see
http://www.tornadoweb.org/en/stable/ioloop.html) say we should have
been getting the time from `ioloop.time()` rather than hardcoding
`time.time()`, because the loop could e.g. be running on the
`time.monotonic()` clock.
Tweaked by tabbott to not remove it from lister.py, linter_lib, and
friends, since those are intended to support both Python 2 and 3
(we're planning to extract them from the repository).
This causes `upgrade-zulip-from-git`, as well as a no-option run of
`tools/build-release-tarball`, to produce a Zulip install running
Python 3, rather than Python 2. In particular this means that the
virtualenv we create, in which all application code runs, is Python 3.
One shebang line, on `zulip-ec2-configure-interfaces`, explicitly
keeps Python 2, and at least one external ops script, `wal-e`, also
still runs on Python 2. See discussion on the respective previous
commits that made those explicit. There may also be some other
third-party scripts we use, outside of this source tree and running
outside our virtualenv, that still run on Python 2.
This allow the webbpack dev server to properly reload JavaScript modules
while running in dev without restarting the server. We need to connect
to webpack-dev-server directly because SockJS doesn't support more than
one connection on the same host/port.
This hack saved a lot of time debugging weird issues back in 2016, but
it's no longer needed.
Anyone rebasing a branch from 2016 will need to provision afterwards
regardless, which will fix this issue automatically, and more
importantly, these changes were made obsolete when we moved to the
cached `node_modules` model.
This fixes an issue where if you saved a Python file (even just
changing whitespace) while casper tests were running, the Tornado
server being used would restart, triggering a confusing error like
this:
ReferenceError: Can't find variable: $
Traceback:
undefined:2
:4
Suite explicitly interrupted without any message given.
This helps make the Zulip development environment somewhat more robust
to new contributors, since it will give them a nice warning if they
try running any of our development tools outside the Zulip virtualenv.
Fixes#3468.
- Add pid file of development processes group, which allows to
manage development processes group with os utils. Also it allows to
kill subprocesses when parent process was closed incorrectly.
- Add tool 'stop_dev_server' to stop development server by pid file.
Fixes#1547
Previously, we got the following as a part of the output when running
`tools/run-dev.py` without provisioning:
It looks like you checked out a branch that has added
dependencies beyond what you last provisioned. Your tests
are likely to fail until you add dependencies by provisioning.
which is a bit confusing.
- Add rewriting 'X-REAL-IP' header for each request through dev
proxy server.
- Change webpack configuration ip to 0.0.0.0 to fix socket.io build
for launching remote dev server.
When we migrated run-dev.py from Twisted to Tornado a few weeks ago,
the --interface argument wasn't properly ported and thus was ignored.
This restores the original functionality of defaulting to only
listening on localhost.
Ideally, we'd replace the vagrant/zulipdev user check with something
that just checks whether a special file that is created by the
Vagrant/remote-dev-vm creation process exists; that would be more
robust.
(Why is -u needed at all? I’m not sure, but test-run-dev spins forever
“Polling run-dev...” without it.)
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
- Use tornado as proxy server for development environment,
replacing twisted (this doesn't support websockets).
- Upgrade tornado version to 4.4.1 (needs to be coupled to the
above since neither change works without the other)
- All necessary strings was converted to bytestring
- Added twisted as py3 dependency
- Change type annotation for method getchild of class Resource
- Remove activating python2 env section from run-dev.py script
Fixes#1256
The new messages make it more obvious which services are started
from run-dev.py, and explicitly call out where to access the web
proxy to reach the Zulip web UI. This is a common confusion for
new administrators/developers. Messages are output before the
processes are launched, as run-dev.py does not currently have a
way to know if they started successfully.
Example output:
Starting Zulip services on ports: web proxy: 9991, Django: 9992, Tornado: 9993, webpack: 9994
Note: only port 9991 is exposed to the host in a Vagrant environment.
Alternate behavior for automated testing:
If run-dev.py is invoked with --test, don't include the webpack
port as it isn't used.
Tested on Ubuntu 14.04, by running run-dev.py at a shell prompt and
via the test-all script.
Fixes#1861
Clear memcached when tools/run-dev.py is run. This prevents
errors on using a different python version because values are
pickled before being stored in memcached and different python
versions implement pickling differently.
Also provide a command-line option --no-clear-mc to prevent
memcached from being cleared.
runtornado unbuffers its output using
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0).
This is not python 3 compatible since we can't specify
buffering on a text stream in python 3. So use the '-u'
option of python when calling runtornado.py to make output
unbuffered.
This works around a nasty problem with Webpack that you can't run two
copies of the Webpack development server on the same project at the
same time (even if on different ports). The second copy doesn't fail,
it just hangs waiting for some lock, which is confusing; but even if
that were to be solved, we don't actually need the webpack development
server running to run the Casper tests; we just need bundle.js built.
So the easy solution is to just run webpack manually and be sure to
include bundle.js in the JS_SPECS entry.
As a follow-up to this change, we should clean up how test_settings.py
is implemented to not require duplicating code from settings.py.
Fixes#878.
manage_args is set to a list of arguments a few lines later in the
function, making this initialization as the empty string useless and
confusing.
Discovered using mypy.
Django's `manage.py runserver` prints a relatively low-information log
line for every request of the form:
[14/Dec/2015 00:43:06]"GET /static/js/message_list.js HTTP/1.0" 200 21969
This is pretty spammy, especially given that we already have our own
middleware printing a more detailed version of the same log lines:
2015-12-14 00:43:06,935 INFO 127.0.0.1 GET 200 0ms /static/js/message_list.js (unauth via ?)
Since runserver doesn't have support controlling whether these log
lines are printed, we wrap it with a small bit of code that silences
the log lines for 200/304 requests (aka the uninteresting ones).
The old language was confusing because "the interface" could refer to something
like eth0, but in actuality refers to the IP/hostname to listen on.
(imported from commit 4f77d72a4dfcdbe7e7747c6228975aa68dfbe6ac)
This makes it simpler to test between two VMs by allowing you to bind to
non localhost interfaces.
(imported from commit f70755533b52ff8c49fd916941d2210fb8c33b47)
Importing zerver.worker.queue_processors (which is needed to get the list of
workers to start) is slow because it, in turn, imports a bunch of stuff. So we
move the process of starting up queue processing workers into another script
that gets started in parallel with everything else.
(imported from commit 839bada6dc7b93825c69b0d8fd9fbe2de75eabee)
Have run-dev.py watch for template changes by calling
`./tools/compile-handlebars-templates forever`. This doesn't
have much effect until the subsequent commit, but it does
alert users to broken templates.
(imported from commit 3fa5f403cabe0057f6f43180f1d09db669d98682)
This replaces the --noworkers option; the new --minimal option
starts a couple "essential" workers.
(imported from commit 4ca08709052c47257bc0448e51760edb4969d92e)
This requires a puppet apply on each of staging and prod0 to update
the nginx configuration to support the new URL when it is deployed.
(imported from commit a35a71a563fd1daca0d3ea4ec6874c5719a8564f)
New dependency: sockjs-tornado
One known limitation is that we don't clean up sessions for
non-websockets transports. This is a bug in Tornado so I'm going to
look at upgrading us to the latest version:
https://github.com/mrjoes/sockjs-tornado/issues/47
(imported from commit 31cdb7596dd5ee094ab006c31757db17dca8899b)