zulip/zerver/management/commands/runtornado.py

109 lines
3.7 KiB
Python
Raw Normal View History

import logging
import sys
from typing import Any
from urllib.parse import SplitResult
import __main__
from django.conf import settings
from django.core.management.base import BaseCommand, CommandError, CommandParser
from tornado import autoreload, ioloop
settings.RUNNING_INSIDE_TORNADO = True
from zerver.lib.debug import interactive_debug_listen
from zerver.tornado.application import create_tornado_application, setup_tornado_rabbitmq
from zerver.tornado.event_queue import (
add_client_gc_hook,
get_wrapped_process_notification,
missedmessage_hook,
setup_event_queue,
)
dependencies: Remove WebSockets system for sending messages. Zulip has had a small use of WebSockets (specifically, for the code path of sending messages, via the webapp only) since ~2013. We originally added this use of WebSockets in the hope that the latency benefits of doing so would allow us to avoid implementing a markdown local echo; they were not. Further, HTTP/2 may have eliminated the latency difference we hoped to exploit by using WebSockets in any case. While we’d originally imagined using WebSockets for other endpoints, there was never a good justification for moving more components to the WebSockets system. This WebSockets code path had a lot of downsides/complexity, including: * The messy hack involving constructing an emulated request object to hook into doing Django requests. * The `message_senders` queue processor system, which increases RAM needs and must be provisioned independently from the rest of the server). * A duplicate check_send_receive_time Nagios test specific to WebSockets. * The requirement for users to have their firewalls/NATs allow WebSocket connections, and a setting to disable them for networks where WebSockets don’t work. * Dependencies on the SockJS family of libraries, which has at times been poorly maintained, and periodically throws random JavaScript exceptions in our production environments without a deep enough traceback to effectively investigate. * A total of about 1600 lines of our code related to the feature. * Increased load on the Tornado system, especially around a Zulip server restart, and especially for large installations like zulipchat.com, resulting in extra delay before messages can be sent again. As detailed in https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it appears that removing WebSockets moderately increases the time it takes for the `send_message` API query to return from the server, but does not significantly change the time between when a message is sent and when it is received by clients. We don’t understand the reason for that change (suggesting the possibility of a measurement error), and even if it is a real change, we consider that potential small latency regression to be acceptable. If we later want WebSockets, we’ll likely want to just use Django Channels. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-07-23 01:43:40 +02:00
from zerver.tornado.sharding import notify_tornado_queue_name
if settings.USING_RABBITMQ:
from zerver.lib.queue import TornadoQueueClient, set_queue_client
class Command(BaseCommand):
help = "Starts a Tornado Web server wrapping Django."
def add_arguments(self, parser: CommandParser) -> None:
parser.add_argument(
"addrport",
help="[port number or ipaddr:port]",
)
def handle(self, *args: Any, **options: Any) -> None:
interactive_debug_listen()
addrport = options["addrport"]
assert isinstance(addrport, str)
from tornado import httpserver
if addrport.isdigit():
addr, port = "", int(addrport)
else:
r = SplitResult("", addrport, "", "", "")
if r.port is None:
raise CommandError(f"{addrport!r} does not have a valid port number.")
addr, port = r.hostname or "", r.port
if not addr:
addr = "127.0.0.1"
if settings.DEBUG:
logging.basicConfig(
level=logging.INFO, format="%(asctime)s %(levelname)-8s %(message)s"
)
def inner_run() -> None:
from django.utils import translation
translation.activate(settings.LANGUAGE_CODE)
# We pass display_num_errors=False, since Django will
# likely display similar output anyway.
self.check(display_num_errors=False)
print(f"Tornado server (re)started on port {port}")
if settings.USING_RABBITMQ:
queue_client = TornadoQueueClient()
set_queue_client(queue_client)
# Process notifications received via RabbitMQ
queue_name = notify_tornado_queue_name(port)
queue_client.start_json_consumer(
queue_name, get_wrapped_process_notification(queue_name)
)
try:
# Application is an instance of Django's standard wsgi handler.
application = create_tornado_application()
# start tornado web server in single-threaded mode
http_server = httpserver.HTTPServer(application, xheaders=True)
http_server.listen(port, address=addr)
from zerver.tornado.ioloop_logging import logging_data
logging_data["port"] = str(port)
tornado: Move SIGTERM shutdown handler into a callback. A SIGTERM can show up at any point in the ioloop, even in places which are not prepared to handle it. This results in the process ignoring the `sys.exit` which the SIGTERM handler calls, with an uncaught SystemExit exception: ``` 2021-11-09 15:37:49.368 ERR [tornado.application:9803] Uncaught exception Traceback (most recent call last): File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/http1connection.py", line 238, in _read_message delegate.finish() File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/httpserver.py", line 314, in finish self.delegate.finish() File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/routing.py", line 251, in finish self.delegate.finish() File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 2097, in finish self.execute() File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 2130, in execute **self.path_kwargs) File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/gen.py", line 307, in wrapper yielded = next(result) File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 1510, in _execute result = method(*self.path_args, **self.path_kwargs) File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/handlers.py", line 150, in get request = self.convert_tornado_request_to_django_request() File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/handlers.py", line 113, in convert_tornado_request_to_django_request request = WSGIRequest(environ) File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/django/core/handlers/wsgi.py", line 66, in __init__ script_name = get_script_name(environ) File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/event_queue.py", line 611, in <lambda> signal.signal(signal.SIGTERM, lambda signum, stack: sys.exit(1)) SystemExit: 1 ``` Supervisor then terminates the process with a SIGKILL, which results in dropping data held in the tornado process, as it does not dump its queue. The only command which is safe to run in the signal handler is `ioloop.add_callback_from_signal`, which schedules the callback to run during the course of the normal ioloop. This callbacks does an orderly shutdown of the server and the ioloop before exiting.
2021-11-12 03:27:02 +01:00
setup_event_queue(http_server, port)
add_client_gc_hook(missedmessage_hook)
if settings.USING_RABBITMQ:
setup_tornado_rabbitmq(queue_client)
instance = ioloop.IOLoop.instance()
if hasattr(__main__, "add_reload_hook"):
autoreload.start()
instance.start()
except KeyboardInterrupt:
# Monkey patch tornado.autoreload to prevent it from continuing
# to watch for changes after catching our SystemExit. Otherwise
# the user needs to press Ctrl+C twice.
__main__.wait = lambda: None
sys.exit(0)
inner_run()