puppet: Use lazy-apps and uwsgi control sockets for rolling reloads.

Restarting the uwsgi processes by way of supervisor opens a window
during which nginx 502's all responses.  uwsgi has a configuration
called "chain reloading" which allows for rolling restart of the uwsgi
processes, such that only one process at once in unavailable; see
uwsgi documentation ([1]).

The tradeoff is that this requires that the uwsgi processes load the
libraries after forking, rather than before ("lazy apps"); in theory
this can lead to larger memory footprints, since they are not shared.
In practice, as Django defers much of the loading, this is not as much
of an issue.  In a very basic test of memory consumption (measured by
total memory - free - caches - buffers; 6 uwsgi workers), both
immediately after restarting Django, and after requesting `/` 60 times
with 6 concurrent requests:

                      |  Non-lazy  |  Lazy app  | Difference
    ------------------+------------+------------+-------------
    Fresh             |  2,827,216 |  2,870,480 |   +43,264
    After 60 requests |  3,332,284 |  3,409,608 |   +77,324
    ..................|............|............|.............
    Difference        |   +505,068 |   +539,128 |   +34,060

That is, "lazy app" loading increased the footprint pre-requests by
43MB, and after 60 requests grew the memory footprint by 539MB, as
opposed to non-lazy loading, which grew it by 505MB.  Using wsgi "lazy
app" loading does increase the memory footprint, but not by a large
percentage.

The other effect is that processes may be served by either old or new
code during the restart window.  This may cause transient failures
when new frontend code talks to old backend code.

Enable chain-reloading during graceful, puppetless restarts, but only
if enabled via a zulip.conf configuration flag.

Fixes #2559.

[1]: https://uwsgi-docs.readthedocs.io/en/latest/articles/TheArtOfGracefulReloading.html#chain-reloading-lazy-apps
This commit is contained in:
Alex Vandiver 2021-12-31 20:20:49 -08:00 committed by Tim Abbott
parent 4aaa250623
commit 6218ed91c2
4 changed files with 45 additions and 2 deletions

View File

@ -623,6 +623,14 @@ override is useful both Docker systems (where the above algorithm
might see the host's memory, not the container's) and/or when using
remote servers for postgres, memcached, redis, and RabbitMQ.
#### `rolling_restart`
If set to a non-empty value, when using `./scripts/restart-server` to
restart Zulip, restart the uwsgi processes one-at-a-time, instead of
all at once. This decreases the number of 502's served to clients, at
the cost of slightly increased memory usage, and the possibility that
different requests will be served by different versions of the code.
#### `uwsgi_buffer_size`
Override the default uwsgi buffer size of 8192.

View File

@ -119,6 +119,12 @@ class zulip::app_frontend_base {
notify => Service[$zulip::common::supervisor_service],
}
$uwsgi_rolling_restart = zulipconf('application_server', 'rolling_restart', '')
if $uwsgi_rolling_restart == '' {
file { '/home/zulip/deployments/uwsgi-control':
ensure => absent,
}
}
$uwsgi_listen_backlog_limit = zulipconf('application_server', 'uwsgi_listen_backlog_limit', 128)
$uwsgi_buffer_size = zulipconf('application_server', 'uwsgi_buffer_size', 8192)
$uwsgi_processes = zulipconf('application_server', 'uwsgi_processes', $uwsgi_default_processes)

View File

@ -16,6 +16,13 @@ gid=zulip
stats=/home/zulip/deployments/uwsgi-stats
<% if @uwsgi_rolling_restart != '' -%>
master-fifo=/home/zulip/deployments/uwsgi-control
# lazy-apps are required for rolling restarts:
# https://uwsgi-docs.readthedocs.io/en/latest/articles/TheArtOfGracefulReloading.html#preforking-vs-lazy-apps-vs-lazy
lazy-apps=true
<% end -%>
ignore-sigpipe = true
ignore-write-errors = true
disable-write-exception = true

View File

@ -13,6 +13,7 @@ from scripts.lib.zulip_tools import (
ENDC,
OKGREEN,
WARNING,
get_config,
get_config_file,
get_tornado_ports,
has_application_server,
@ -128,8 +129,29 @@ if has_application_server():
subprocess.check_call(["supervisorctl", action, "zulip-tornado:*"])
# Finally, restart the Django uWSGI processes.
logging.info("%s django server", verbing)
subprocess.check_call(["supervisorctl", action, "zulip-django"])
if (
action == "restart"
and not args.less_graceful
and get_config(config_file, "application_server", "rolling_restart") != ""
and os.path.exists("/home/zulip/deployments/uwsgi-control")
):
# See if it's currently running
uwsgi_status = subprocess.run(
["supervisorctl", "status", "zulip-django"],
stdout=subprocess.DEVNULL,
)
if uwsgi_status.returncode == 0:
logging.info("Starting rolling restart of django server")
with open("/home/zulip/deployments/uwsgi-control", "w") as control_socket:
# "c" is chain-reloading:
# https://uwsgi-docs.readthedocs.io/en/latest/MasterFIFO.html#available-commands
control_socket.write("c")
else:
logging.info("Starting django server")
subprocess.check_call(["supervisorctl", "start", "zulip-django"])
else:
logging.info("%s django server", verbing)
subprocess.check_call(["supervisorctl", action, "zulip-django"])
using_sso = subprocess.check_output(["./scripts/get-django-setting", "USING_APACHE_SSO"])
if using_sso.strip() == b"True":