Commit Graph

83 Commits

Author SHA1 Message Date
Anders Kaseorg 333f7d16c9 logging: Pass more format arguments to logging.
Commit bdc365d0fe (#14852) missed this
because of https://github.com/returntocorp/semgrep/issues/831.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-26 11:42:23 -07:00
Tim Abbott 0f1bdcc46f restart-server: Restart Tornado processes individually.
After some testing, I've confirmed that this seems to behave
significantly better in terms of the number of failed requests due to
Tornado being the process of restarting compared with the previous
version, as each individual process is only down for a short time,
rather than all of them being down at once.
2020-03-27 06:23:34 -07:00
Anders Kaseorg ea6934c26d dependencies: Remove WebSockets system for sending messages.
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013.  We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not.  Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.

While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.

This WebSockets code path had a lot of downsides/complexity,
including:

* The messy hack involving constructing an emulated request object to
  hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
  needs and must be provisioned independently from the rest of the
  server).
* A duplicate check_send_receive_time Nagios test specific to
  WebSockets.
* The requirement for users to have their firewalls/NATs allow
  WebSocket connections, and a setting to disable them for networks
  where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
  been poorly maintained, and periodically throws random JavaScript
  exceptions in our production environments without a deep enough
  traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
  server restart, and especially for large installations like
  zulipchat.com, resulting in extra delay before messages can be sent
  again.

As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients.  We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.

If we later want WebSockets, we’ll likely want to just use Django
Channels.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-01-14 22:34:00 -08:00
Anders Kaseorg 8d91bebf95 restart-server: Warn if the shell’s PWD goes through an updated symlink.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-09-21 12:02:15 -07:00
Harshit Bansal 50ef91bb08 scripts: Add argparse option to `restart-zerver` for `--fill-cache`.
Nowm unless you specify `--fill-cache`, memcached caches will not be
pre-filled after a server restart. This will be helpful when someone
is in a hurry (e.g. if the server is down right now, or if he/she
testing a configuration change in a newly setup server), it's best to
just restart without pre-filling the cache.

Fixes: #10900.
2019-01-14 15:20:01 -08:00
Anders Kaseorg a694c3cafd scripts/restart-server: Avoid shelling out for ln.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2018-11-28 17:26:54 -08:00
Tim Abbott 5a56925495 restart-server: Fix restarting server with multiple tornado processes.
Previously, we unconditionally tried to restart the Tornado process
name corresponding to the historically always-true case of a single
Tornado process.  This resulted in Tornado not being automatically
restarted on a production deployment on servers with more than one
Tornado process configured.
2018-11-27 17:20:05 -08:00
Tim Abbott 1a0e9fe2f9 restart-server: Restart tornado early.
This dramatically reduces the Tornado downtime when restarting a Zulip
server, which is generally the most significant source of user-facing
bad experiences.
2018-10-16 15:04:07 -07:00
Abhilash Verma 0e2322a322 logging: Show timestamp in UTC in non-django production scripts.
Done in pair programming with @aero31aero.

Fixes #9678.
2018-08-20 12:52:40 -07:00
Tim Abbott a8e5551395 restart-server: Ensure we restart process-fts-updates.
This is mostly important in that if you're running this as part of a
follow-up to a failed upgrade, and you don't do this,
process-fts-updates will be left not running, resulting in full-text
search not updating.
2018-07-30 16:27:53 -07:00
Joshua Schmidlkofer b1a57d144f thumbor: Add production installer/puppet support.
This commits adds the necessary puppet configuration and
installer/upgrade code for installing and managing the thumbor service
in production.  This configuration is gated by the 'thumbor.pp'
manifest being enabled (which is not yet the default), and so this
commit should have no effect in a default Zulip production environment
(or in the long term, in any Zulip production server that isn't using
thumbor).

Credit for this effort is shared by @TigorC (who initiated the work on
this project), @joshland (who did a great deal of work on this and got
it working during PyCon 2017) and @adnrs96, who completed the work.
2018-07-12 20:37:34 +05:30
rht 71188d7b0a scripts: Remove import print_function. 2017-09-29 15:43:30 -07:00
Greg Price a099e698e2 py3: Switch almost all shebang lines to use `python3`.
This causes `upgrade-zulip-from-git`, as well as a no-option run of
`tools/build-release-tarball`, to produce a Zulip install running
Python 3, rather than Python 2.  In particular this means that the
virtualenv we create, in which all application code runs, is Python 3.

One shebang line, on `zulip-ec2-configure-interfaces`, explicitly
keeps Python 2, and at least one external ops script, `wal-e`, also
still runs on Python 2.  See discussion on the respective previous
commits that made those explicit.  There may also be some other
third-party scripts we use, outside of this source tree and running
outside our virtualenv, that still run on Python 2.
2017-08-16 17:54:43 -07:00
Umair Khan 336a041ac0 Django 1.10: Use uWSGI.
Fixes: #1121

With some tweaks by tabbott to make the number of processes configurable.
2016-12-13 21:40:43 -08:00
Anders Kaseorg 207cf6302b Always start python via shebang lines.
This is preparation for supporting using Python 3 in production.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2016-11-26 14:46:37 -08:00
Taranjeet Singh d606b95242 zulip_tools.py: Move zulip_tools.py in scripts/lib.
This commit moves zulip_tools.py as part of cleaning the root directory
and organizing proejct into better directory structure.
2016-08-15 16:44:50 -07:00
Tim Abbott 1158a86ae7 restart-server: Maintain a last symlink. 2016-08-04 17:02:48 -07:00
Eklavya Sharma 51ea5c1602 scripts/: Make subprocess calls unicode-aware. 2016-07-26 12:06:41 -07:00
Eklavya Sharma 11732f9ab0 Make all scripts in scripts/ pass mypy check. 2016-07-24 00:17:21 +05:30
Eklavya Sharma 94e4b39112 Replace python2.7 by python everywhere. 2016-05-29 05:03:08 -07:00
Eklavya Sharma 149938d468 Change shebangs from python2.7 to python. 2016-05-29 05:03:08 -07:00
Tim Abbott 6e1872987d Move bin/get-django-setting to scripts/. 2016-05-07 19:37:06 -07:00
Eklavya Sharma c59185e119 Apply Python 3 futurize transform libfuturize.fixes.fix_print_with_import
Refer #256
2016-03-10 22:02:17 -08:00
Steven Oud d5435fad1d Consistently use /usr/bin/env python2.7 in shebangs and commands. 2015-10-21 22:58:21 +00:00
Tim Abbott 5b8894cd25 Rename USING_SSO to something more clear.
(imported from commit 94e8ae84b01419783872a5d09bafe5c2eb933c18)
2015-08-18 20:48:15 -07:00
Tim Abbott b2d01e2da0 [manual] restart-server: Minimize downtime for message sender worker.
The manual step here is that we need to do the `puppet apply` before
pushing this commit, or `restart-server` will crash.

Previously we shut down everything in one group, which performed
poorly with supervisor's bad performance on restarting many daemons at
once.  Now we shut down the unimportant stuff, then the important
stuff, bring back the important stuff, and then bring back the
unimportant stuff.

This new model has a little over 5s of downtime for the core
user-facing daemons -- which is still far more than would be ideal,
but a lot less than the 13s or so that we had before.

Here's some logs with the current setup for the tornado/django downtime:
2013-12-19 20:16:51,995 restart-server: Stopping daemons
2013-12-19 20:16:53,461 restart-server: Starting daemons
2013-12-19 20:16:57,146 restart-server: Starting workers

Compare with the behavior on master today:
2013-12-19 20:21:45,281 restart-server: Stopping daemons
2013-12-19 20:21:49,225 restart-server: Starting daemons
2013-12-19 20:21:58,463 restart-server: Done!

(imported from commit b2c1ba77f3dc989551d0939779208465a8410435)
2013-12-19 17:21:23 -05:00
Zev Benjamin 9b2aa657be Revert "restart-server: Use 'all' instead of specifying the supervisor jobs to operate on explicitly"
This reverts commit acef4c0027b77053497ef6e9f7aa4b61703205c3.

Despite the lower total downtime, this caused more user-facing downtime.

(imported from commit 5cce032bb20abe83853a65ee72bf0bb28af403cc)
2013-11-21 15:14:38 -05:00
Zev Benjamin a363b7185d restart-server: Use 'all' instead of specifying the supervisor jobs to operate on explicitly
This shaves about 1.5 seconds off our restart time on ls-dev (9s ->
7.5s).  Still too slow, but it's a little bit better.

(imported from commit acef4c0027b77053497ef6e9f7aa4b61703205c3)
2013-11-15 15:23:02 -05:00
Zev Benjamin 974159ec94 Move apache2 restart for SSO sites to restart-server
(imported from commit f999e2b0591a11442c1d3fdba2393ecf6e78bad3)
2013-11-15 11:34:48 -05:00
Zev Benjamin 787215d743 [manual] Switch over to new /etc/zulip/zulip.conf config file
Run the following commands as root before deploying this branch:
 # /root/zulip/tools/migrate-server-config
 # rm /etc/zulip/machinetype /etc/zulip/server /etc/zulip/local /etc/humbug-machinetype /etc/humbug-server /etc/humbug-local

(imported from commit aa7dcc50d2f4792ce33834f14761e76512fca252)
2013-11-05 14:14:19 -05:00
Zev Benjamin 57bef07832 [manual] Move /etc/humbug-* files into /etc/zulip
The moved files are:
humbug-server
humbug-local
humbug-machinetype

Their new names are their old names with 'humbug-' removed.

zulip-puppet-apply must be run before this commit is deployed

(imported from commit f4eb523244d3409b5809c279301225d3fdf0c230)
2013-10-30 15:42:25 -04:00
Tim Abbott fffd4f3c59 Move zulip_tools library to root of repository.
(imported from commit 2fada9d2acbcf81f8e2b3de8caadbf335141dfaa)
2013-10-28 10:54:48 -04:00
Tim Abbott 56e9ad230e [manual] Move our deployment scripts to scripts/.
This will require updating the post-receive code on git.zulip.net to
work.

(imported from commit 2e51fa2d7b891c1138d3f22ae534cfb8a6cf174c)
2013-10-28 10:54:48 -04:00