zulip/zerver/management/commands/runtornado.py

121 lines
4.3 KiB
Python
Raw Normal View History

import logging
import sys
from typing import Any, Callable
from urllib.parse import SplitResult
from django.conf import settings
from django.core.management.base import BaseCommand, CommandError, CommandParser
from tornado import ioloop
from tornado.log import app_log
# We must call zerver.tornado.ioloop_logging.instrument_tornado_ioloop
# before we import anything else from our project in order for our
# Tornado load logging to work; otherwise we might accidentally import
# zerver.lib.queue (which will instantiate the Tornado ioloop) before
# this.
from zerver.tornado.ioloop_logging import instrument_tornado_ioloop
settings.RUNNING_INSIDE_TORNADO = True
instrument_tornado_ioloop()
from zerver.lib.debug import interactive_debug_listen
from zerver.tornado.application import create_tornado_application, setup_tornado_rabbitmq
from zerver.tornado.autoreload import start as zulip_autoreload_start
from zerver.tornado.event_queue import (
add_client_gc_hook,
get_wrapped_process_notification,
missedmessage_hook,
setup_event_queue,
)
dependencies: Remove WebSockets system for sending messages. Zulip has had a small use of WebSockets (specifically, for the code path of sending messages, via the webapp only) since ~2013. We originally added this use of WebSockets in the hope that the latency benefits of doing so would allow us to avoid implementing a markdown local echo; they were not. Further, HTTP/2 may have eliminated the latency difference we hoped to exploit by using WebSockets in any case. While we’d originally imagined using WebSockets for other endpoints, there was never a good justification for moving more components to the WebSockets system. This WebSockets code path had a lot of downsides/complexity, including: * The messy hack involving constructing an emulated request object to hook into doing Django requests. * The `message_senders` queue processor system, which increases RAM needs and must be provisioned independently from the rest of the server). * A duplicate check_send_receive_time Nagios test specific to WebSockets. * The requirement for users to have their firewalls/NATs allow WebSocket connections, and a setting to disable them for networks where WebSockets don’t work. * Dependencies on the SockJS family of libraries, which has at times been poorly maintained, and periodically throws random JavaScript exceptions in our production environments without a deep enough traceback to effectively investigate. * A total of about 1600 lines of our code related to the feature. * Increased load on the Tornado system, especially around a Zulip server restart, and especially for large installations like zulipchat.com, resulting in extra delay before messages can be sent again. As detailed in https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it appears that removing WebSockets moderately increases the time it takes for the `send_message` API query to return from the server, but does not significantly change the time between when a message is sent and when it is received by clients. We don’t understand the reason for that change (suggesting the possibility of a measurement error), and even if it is a real change, we consider that potential small latency regression to be acceptable. If we later want WebSockets, we’ll likely want to just use Django Channels. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-07-23 01:43:40 +02:00
from zerver.tornado.sharding import notify_tornado_queue_name
if settings.USING_RABBITMQ:
from zerver.lib.queue import TornadoQueueClient, get_queue_client
def handle_callback_exception(callback: Callable[..., Any]) -> None:
logging.exception("Exception in callback", stack_info=True)
app_log.error("Exception in callback %r", callback, exc_info=True)
class Command(BaseCommand):
help = "Starts a Tornado Web server wrapping Django."
def add_arguments(self, parser: CommandParser) -> None:
parser.add_argument(
"addrport",
help="[port number or ipaddr:port]",
)
def handle(self, *args: Any, **options: Any) -> None:
interactive_debug_listen()
addrport = options["addrport"]
assert isinstance(addrport, str)
import django
from tornado import httpserver
if addrport.isdigit():
addr, port = "", int(addrport)
else:
r = SplitResult("", addrport, "", "", "")
if r.port is None:
raise CommandError(f"{addrport!r} does not have a valid port number.")
addr, port = r.hostname or "", r.port
if not addr:
addr = "127.0.0.1"
if settings.DEBUG:
logging.basicConfig(
level=logging.INFO, format="%(asctime)s %(levelname)-8s %(message)s"
)
def inner_run() -> None:
from django.conf import settings
from django.utils import translation
translation.activate(settings.LANGUAGE_CODE)
# We pass display_num_errors=False, since Django will
# likely display similar output anyway.
self.check(display_num_errors=False)
print(f"Tornado server (re)started on port {port}")
if settings.USING_RABBITMQ:
queue_client = get_queue_client()
assert isinstance(queue_client, TornadoQueueClient)
# Process notifications received via RabbitMQ
queue_name = notify_tornado_queue_name(port)
queue_client.start_json_consumer(
queue_name, get_wrapped_process_notification(queue_name)
)
try:
# Application is an instance of Django's standard wsgi handler.
application = create_tornado_application()
if settings.AUTORELOAD:
zulip_autoreload_start()
# start tornado web server in single-threaded mode
http_server = httpserver.HTTPServer(application, xheaders=True)
http_server.listen(port, address=addr)
from zerver.tornado.ioloop_logging import logging_data
logging_data["port"] = str(port)
tornado: Move SIGTERM shutdown handler into a callback. A SIGTERM can show up at any point in the ioloop, even in places which are not prepared to handle it. This results in the process ignoring the `sys.exit` which the SIGTERM handler calls, with an uncaught SystemExit exception: ``` 2021-11-09 15:37:49.368 ERR [tornado.application:9803] Uncaught exception Traceback (most recent call last): File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/http1connection.py", line 238, in _read_message delegate.finish() File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/httpserver.py", line 314, in finish self.delegate.finish() File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/routing.py", line 251, in finish self.delegate.finish() File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 2097, in finish self.execute() File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 2130, in execute **self.path_kwargs) File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/gen.py", line 307, in wrapper yielded = next(result) File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 1510, in _execute result = method(*self.path_args, **self.path_kwargs) File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/handlers.py", line 150, in get request = self.convert_tornado_request_to_django_request() File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/handlers.py", line 113, in convert_tornado_request_to_django_request request = WSGIRequest(environ) File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/django/core/handlers/wsgi.py", line 66, in __init__ script_name = get_script_name(environ) File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/event_queue.py", line 611, in <lambda> signal.signal(signal.SIGTERM, lambda signum, stack: sys.exit(1)) SystemExit: 1 ``` Supervisor then terminates the process with a SIGKILL, which results in dropping data held in the tornado process, as it does not dump its queue. The only command which is safe to run in the signal handler is `ioloop.add_callback_from_signal`, which schedules the callback to run during the course of the normal ioloop. This callbacks does an orderly shutdown of the server and the ioloop before exiting.
2021-11-12 03:27:02 +01:00
setup_event_queue(http_server, port)
add_client_gc_hook(missedmessage_hook)
setup_tornado_rabbitmq()
instance = ioloop.IOLoop.instance()
if django.conf.settings.DEBUG:
instance.set_blocking_log_threshold(5)
2016-11-28 23:29:01 +01:00
instance.handle_callback_exception = handle_callback_exception
instance.start()
except KeyboardInterrupt:
sys.exit(0)
inner_run()