restart-server: Restart Tornado processes individually.

After some testing, I've confirmed that this seems to behave
significantly better in terms of the number of failed requests due to
Tornado being the process of restarting compared with the previous
version, as each individual process is only down for a short time,
rather than all of them being down at once.
This commit is contained in:
Tim Abbott 2020-03-27 06:23:34 -07:00
parent cb1fb94ac8
commit 0f1bdcc46f
1 changed files with 8 additions and 1 deletions

View File

@ -60,8 +60,15 @@ except (configparser.NoSectionError, configparser.NoOptionError):
# insufficient priority. This is important, because Tornado is the # insufficient priority. This is important, because Tornado is the
# main source of user-visible downtime when we restart a Zulip server. # main source of user-visible downtime when we restart a Zulip server.
if tornado_processes > 1: if tornado_processes > 1:
subprocess.check_call(["supervisorctl", "restart", "zulip-tornado:*"]) for p in range(9800, 9800+tornado_processes):
# Restart Tornado processes individually for a better rate of
# restarts. This also avoids behavior with restarting a whole
# supervisord group where if any individual process is slow to
# stop, the whole bundle stays stopped for an extended time.
logging.info("Restarting Tornado process on port %s" % (p,))
subprocess.check_call(["supervisorctl", "restart", "zulip-tornado:port-%s" % (p,)])
else: else:
logging.info("Restarting Tornado process")
subprocess.check_call(["supervisorctl", "restart", "zulip-tornado", "zulip-tornado:*"]) subprocess.check_call(["supervisorctl", "restart", "zulip-tornado", "zulip-tornado:*"])
# Restart the uWSGI and related processes via supervisorctl. # Restart the uWSGI and related processes via supervisorctl.