zulip/scripts/restart-server

#!/usr/bin/env python3
import os
import sys
import pwd
import subprocess
import logging
import time

sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
from scripts.lib.zulip_tools import ENDC, OKGREEN, DEPLOYMENTS_DIR

logging.Formatter.converter = time.gmtime
logging.basicConfig(format="%(asctime)s restart-server: %(message)s",
                    level=logging.INFO)

deploy_path = os.path.realpath(os.path.join(os.path.dirname(__file__), '..'))
os.chdir(deploy_path)

if pwd.getpwuid(os.getuid()).pw_name != "zulip":
    logging.error("Must be run as user 'zulip'.")
    sys.exit(1)

# Send a statsd event on restarting the server
subprocess.check_call(["./manage.py", "send_stats", "incr", "events.server_restart", str(int(time.time()))])

logging.info("Filling memcached caches")
subprocess.check_call(["./manage.py", "fill_memcached_caches"])

core_server_services = ["zulip-django", "zulip-tornado", "zulip-senders:*"]
if os.path.exists("/etc/supervisor/conf.d/thumbor.conf"):
    core_server_services.append("zulip-thumbor")

# Restart the uWSGI and related processes via supervisorctl.
logging.info("Stopping workers")
subprocess.check_call(["supervisorctl", "stop", "zulip-workers:*"])
logging.info("Stopping server core")
subprocess.check_call(["supervisorctl", "stop"] + core_server_services)

current_symlink = os.path.join(DEPLOYMENTS_DIR, "current")
last_symlink = os.path.join(DEPLOYMENTS_DIR, "last")
if os.readlink(current_symlink) != deploy_path:
    subprocess.check_call(["ln", '-nsf', os.readlink(current_symlink), last_symlink])
    subprocess.check_call(["ln", '-nsf', deploy_path, current_symlink])
logging.info("Starting server core")
subprocess.check_call(["supervisorctl", "start"] + core_server_services)
logging.info("Starting workers")
subprocess.check_call(["supervisorctl", "start", "zulip-workers:*"])

using_sso = subprocess.check_output(['./scripts/get-django-setting', 'USING_APACHE_SSO'])
if using_sso.strip() == b'True':
    logging.info("Restarting Apache WSGI process...")
    subprocess.check_call(["pkill", "-f", "apache2", "-u", "zulip"])

if os.path.exists("/etc/supervisor/conf.d/zulip_db.conf"):
    subprocess.check_call(["supervisorctl", "restart", "process-fts-updates"])

logging.info("Done!")
print(OKGREEN + "Application restarted successfully!" + ENDC)
py3: Switch almost all shebang lines to use `python3`. This causes `upgrade-zulip-from-git`, as well as a no-option run of `tools/build-release-tarball`, to produce a Zulip install running Python 3, rather than Python 2. In particular this means that the virtualenv we create, in which all application code runs, is Python 3. One shebang line, on `zulip-ec2-configure-interfaces`, explicitly keeps Python 2, and at least one external ops script, `wal-e`, also still runs on Python 2. See discussion on the respective previous commits that made those explicit. There may also be some other third-party scripts we use, outside of this source tree and running outside our virtualenv, that still run on Python 2. 2017-08-02 23:15:16 +02:00			`#!/usr/bin/env python3`
Split restart-server code out of update-deployment. (imported from commit 3ae913b950be0a0c94fbaf0173012ea315f36d62) 2013-01-31 16:49:09 +01:00			`import os`
			`import sys`
Use different mechanism to determine the running user Per http://docs.python.org/2/library/os.html#os.getlogin, getlogin() only works when you have an associated controlling tty. This script didn't work previously because when we do deployments there is no tty. Thus, we switch to the alternative mechanism for determining the current username described on the page linked above. (imported from commit 1dbcf98fd7248d20e501fd7fb22e1dbd306040fd) 2013-06-19 21:16:39 +02:00			`import pwd`
Split restart-server code out of update-deployment. (imported from commit 3ae913b950be0a0c94fbaf0173012ea315f36d62) 2013-01-31 16:49:09 +01:00			`import subprocess`
restart-server: Add some output on what's happening as we go. restart-server has been relatively slow recently, and it'd be nice to know what it is spending its time doing when it hangs for a few seconds. (imported from commit a411c951f5a3f2a1366b6d5d3a40d0660ebec11b) 2013-03-13 19:26:51 +01:00			`import logging`
Log a statsd event when restarting the server (imported from commit e9fa632a39f0a6b6aa7311e80e68faf4178a2cf3) 2013-04-18 22:58:32 +02:00			`import time`
[manual] Move our deployment scripts to scripts/. This will require updating the post-receive code on git.zulip.net to work. (imported from commit 2e51fa2d7b891c1138d3f22ae534cfb8a6cf174c) 2013-10-25 23:20:40 +02:00
Move zulip_tools library to root of repository. (imported from commit 2fada9d2acbcf81f8e2b3de8caadbf335141dfaa) 2013-10-25 23:46:02 +02:00			`sys.path.append(os.path.join(os.path.dirname(__file__), '..'))`
zulip_tools.py: Move zulip_tools.py in scripts/lib. This commit moves zulip_tools.py as part of cleaning the root directory and organizing proejct into better directory structure. 2016-08-13 17:46:19 +02:00			`from scripts.lib.zulip_tools import ENDC, OKGREEN, DEPLOYMENTS_DIR`
restart-server: Add some output on what's happening as we go. restart-server has been relatively slow recently, and it'd be nice to know what it is spending its time doing when it hangs for a few seconds. (imported from commit a411c951f5a3f2a1366b6d5d3a40d0660ebec11b) 2013-03-13 19:26:51 +01:00
logging: Show timestamp in UTC in non-django production scripts. Done in pair programming with @aero31aero. Fixes #9678. 2018-08-12 01:56:58 +02:00			`logging.Formatter.converter = time.gmtime`
restart-server: Add some output on what's happening as we go. restart-server has been relatively slow recently, and it'd be nice to know what it is spending its time doing when it hangs for a few seconds. (imported from commit a411c951f5a3f2a1366b6d5d3a40d0660ebec11b) 2013-03-13 19:26:51 +01:00			`logging.basicConfig(format="%(asctime)s restart-server: %(message)s",`
			`level=logging.INFO)`
Split restart-server code out of update-deployment. (imported from commit 3ae913b950be0a0c94fbaf0173012ea315f36d62) 2013-01-31 16:49:09 +01:00
Move the current deployment symlink in restart-server This will help minimize downtime. (imported from commit 47fb66f0d2e21fc12f62c69b7c59ca6828553309) 2013-06-03 19:29:52 +02:00			`deploy_path = os.path.realpath(os.path.join(os.path.dirname(__file__), '..'))`
			`os.chdir(deploy_path)`
Split restart-server code out of update-deployment. (imported from commit 3ae913b950be0a0c94fbaf0173012ea315f36d62) 2013-01-31 16:49:09 +01:00
Make all scripts in scripts/ pass mypy check. 2016-07-23 20:33:58 +02:00			`if pwd.getpwuid(os.getuid()).pw_name != "zulip":`
[manual] Switch over to new /etc/zulip/zulip.conf config file Run the following commands as root before deploying this branch: # /root/zulip/tools/migrate-server-config # rm /etc/zulip/machinetype /etc/zulip/server /etc/zulip/local /etc/humbug-machinetype /etc/humbug-server /etc/humbug-local (imported from commit aa7dcc50d2f4792ce33834f14761e76512fca252) 2013-11-01 00:00:30 +01:00			`logging.error("Must be run as user 'zulip'.")`
			`sys.exit(1)`
Make restart-server refuse to run if non-Humbug user on deployment If you're running this as a user other than "humbug" on a deployed server, you're going to have a bad time. Specifically, memcached won't work, and other undefined behaviour may occur. So here we add a check and error out if you run this script on an app_frontend as non-"humbug". (imported from commit a3d5f0f58ded42393c03f4d21b4650494fae418f) 2013-06-19 17:25:42 +02:00
Log a statsd event when restarting the server (imported from commit e9fa632a39f0a6b6aa7311e80e68faf4178a2cf3) 2013-04-18 22:58:32 +02:00			`# Send a statsd event on restarting the server`
Always start python via shebang lines. This is preparation for supporting using Python 3 in production. Signed-off-by: Anders Kaseorg <andersk@mit.edu> 2016-11-22 01:44:16 +01:00			`subprocess.check_call(["./manage.py", "send_stats", "incr", "events.server_restart", str(int(time.time()))])`
Log a statsd event when restarting the server (imported from commit e9fa632a39f0a6b6aa7311e80e68faf4178a2cf3) 2013-04-18 22:58:32 +02:00
Fill memcached caches synchronously before restarting the server (imported from commit a45fa845e94a1fc6e96a1aafca31e9a6fc2b7526) 2013-05-30 21:05:34 +02:00			`logging.info("Filling memcached caches")`
Always start python via shebang lines. This is preparation for supporting using Python 3 in production. Signed-off-by: Anders Kaseorg <andersk@mit.edu> 2016-11-22 01:44:16 +01:00			`subprocess.check_call(["./manage.py", "fill_memcached_caches"])`
Fill memcached caches synchronously before restarting the server (imported from commit a45fa845e94a1fc6e96a1aafca31e9a6fc2b7526) 2013-05-30 21:05:34 +02:00
thumbor: Add production installer/puppet support. This commits adds the necessary puppet configuration and installer/upgrade code for installing and managing the thumbor service in production. This configuration is gated by the 'thumbor.pp' manifest being enabled (which is not yet the default), and so this commit should have no effect in a default Zulip production environment (or in the long term, in any Zulip production server that isn't using thumbor). Credit for this effort is shared by @TigorC (who initiated the work on this project), @joshland (who did a great deal of work on this and got it working during PyCon 2017) and @adnrs96, who completed the work. 2017-05-24 02:46:52 +02:00			`core_server_services = ["zulip-django", "zulip-tornado", "zulip-senders:*"]`
			`if os.path.exists("/etc/supervisor/conf.d/thumbor.conf"):`
			`core_server_services.append("zulip-thumbor")`

Django 1.10: Use uWSGI. Fixes: #1121 With some tweaks by tabbott to make the number of processes configurable. 2016-11-23 13:36:09 +01:00			`# Restart the uWSGI and related processes via supervisorctl.`
[manual] restart-server: Minimize downtime for message sender worker. The manual step here is that we need to do the `puppet apply` before pushing this commit, or `restart-server` will crash. Previously we shut down everything in one group, which performed poorly with supervisor's bad performance on restarting many daemons at once. Now we shut down the unimportant stuff, then the important stuff, bring back the important stuff, and then bring back the unimportant stuff. This new model has a little over 5s of downtime for the core user-facing daemons -- which is still far more than would be ideal, but a lot less than the 13s or so that we had before. Here's some logs with the current setup for the tornado/django downtime: 2013-12-19 20:16:51,995 restart-server: Stopping daemons 2013-12-19 20:16:53,461 restart-server: Starting daemons 2013-12-19 20:16:57,146 restart-server: Starting workers Compare with the behavior on master today: 2013-12-19 20:21:45,281 restart-server: Stopping daemons 2013-12-19 20:21:49,225 restart-server: Starting daemons 2013-12-19 20:21:58,463 restart-server: Done! (imported from commit b2c1ba77f3dc989551d0939779208465a8410435) 2013-12-19 21:07:02 +01:00			`logging.info("Stopping workers")`
			`subprocess.check_call(["supervisorctl", "stop", "zulip-workers:*"])`
			`logging.info("Stopping server core")`
thumbor: Add production installer/puppet support. This commits adds the necessary puppet configuration and installer/upgrade code for installing and managing the thumbor service in production. This configuration is gated by the 'thumbor.pp' manifest being enabled (which is not yet the default), and so this commit should have no effect in a default Zulip production environment (or in the long term, in any Zulip production server that isn't using thumbor). Credit for this effort is shared by @TigorC (who initiated the work on this project), @joshland (who did a great deal of work on this and got it working during PyCon 2017) and @adnrs96, who completed the work. 2017-05-24 02:46:52 +02:00			`subprocess.check_call(["supervisorctl", "stop"] + core_server_services)`
restart-server: Maintain a last symlink. 2016-08-05 01:58:57 +02:00
			`current_symlink = os.path.join(DEPLOYMENTS_DIR, "current")`
			`last_symlink = os.path.join(DEPLOYMENTS_DIR, "last")`
			`if os.readlink(current_symlink) != deploy_path:`
			`subprocess.check_call(["ln", '-nsf', os.readlink(current_symlink), last_symlink])`
			`subprocess.check_call(["ln", '-nsf', deploy_path, current_symlink])`
[manual] restart-server: Minimize downtime for message sender worker. The manual step here is that we need to do the `puppet apply` before pushing this commit, or `restart-server` will crash. Previously we shut down everything in one group, which performed poorly with supervisor's bad performance on restarting many daemons at once. Now we shut down the unimportant stuff, then the important stuff, bring back the important stuff, and then bring back the unimportant stuff. This new model has a little over 5s of downtime for the core user-facing daemons -- which is still far more than would be ideal, but a lot less than the 13s or so that we had before. Here's some logs with the current setup for the tornado/django downtime: 2013-12-19 20:16:51,995 restart-server: Stopping daemons 2013-12-19 20:16:53,461 restart-server: Starting daemons 2013-12-19 20:16:57,146 restart-server: Starting workers Compare with the behavior on master today: 2013-12-19 20:21:45,281 restart-server: Stopping daemons 2013-12-19 20:21:49,225 restart-server: Starting daemons 2013-12-19 20:21:58,463 restart-server: Done! (imported from commit b2c1ba77f3dc989551d0939779208465a8410435) 2013-12-19 21:07:02 +01:00			`logging.info("Starting server core")`
thumbor: Add production installer/puppet support. This commits adds the necessary puppet configuration and installer/upgrade code for installing and managing the thumbor service in production. This configuration is gated by the 'thumbor.pp' manifest being enabled (which is not yet the default), and so this commit should have no effect in a default Zulip production environment (or in the long term, in any Zulip production server that isn't using thumbor). Credit for this effort is shared by @TigorC (who initiated the work on this project), @joshland (who did a great deal of work on this and got it working during PyCon 2017) and @adnrs96, who completed the work. 2017-05-24 02:46:52 +02:00			`subprocess.check_call(["supervisorctl", "start"] + core_server_services)`
[manual] restart-server: Minimize downtime for message sender worker. The manual step here is that we need to do the `puppet apply` before pushing this commit, or `restart-server` will crash. Previously we shut down everything in one group, which performed poorly with supervisor's bad performance on restarting many daemons at once. Now we shut down the unimportant stuff, then the important stuff, bring back the important stuff, and then bring back the unimportant stuff. This new model has a little over 5s of downtime for the core user-facing daemons -- which is still far more than would be ideal, but a lot less than the 13s or so that we had before. Here's some logs with the current setup for the tornado/django downtime: 2013-12-19 20:16:51,995 restart-server: Stopping daemons 2013-12-19 20:16:53,461 restart-server: Starting daemons 2013-12-19 20:16:57,146 restart-server: Starting workers Compare with the behavior on master today: 2013-12-19 20:21:45,281 restart-server: Stopping daemons 2013-12-19 20:21:49,225 restart-server: Starting daemons 2013-12-19 20:21:58,463 restart-server: Done! (imported from commit b2c1ba77f3dc989551d0939779208465a8410435) 2013-12-19 21:07:02 +01:00			`logging.info("Starting workers")`
			`subprocess.check_call(["supervisorctl", "start", "zulip-workers:*"])`
Split restart-server code out of update-deployment. (imported from commit 3ae913b950be0a0c94fbaf0173012ea315f36d62) 2013-01-31 16:49:09 +01:00
Move bin/get-django-setting to scripts/. 2016-05-08 04:02:32 +02:00			`using_sso = subprocess.check_output(['./scripts/get-django-setting', 'USING_APACHE_SSO'])`
scripts/: Make subprocess calls unicode-aware. 2016-07-26 06:40:05 +02:00			`if using_sso.strip() == b'True':`
Move apache2 restart for SSO sites to restart-server (imported from commit f999e2b0591a11442c1d3fdba2393ecf6e78bad3) 2013-11-15 00:40:23 +01:00			`logging.info("Restarting Apache WSGI process...")`
			`subprocess.check_call(["pkill", "-f", "apache2", "-u", "zulip"])`

restart-server: Ensure we restart process-fts-updates. This is mostly important in that if you're running this as part of a follow-up to a failed upgrade, and you don't do this, process-fts-updates will be left not running, resulting in full-text search not updating. 2018-07-31 01:27:53 +02:00			`if os.path.exists("/etc/supervisor/conf.d/zulip_db.conf"):`
			`subprocess.check_call(["supervisorctl", "restart", "process-fts-updates"])`

restart-server: Add some output on what's happening as we go. restart-server has been relatively slow recently, and it'd be nice to know what it is spending its time doing when it hangs for a few seconds. (imported from commit a411c951f5a3f2a1366b6d5d3a40d0660ebec11b) 2013-03-13 19:26:51 +01:00			`logging.info("Done!")`
Apply Python 3 futurize transform libfuturize.fixes.fix_print_with_import Refer #256 2016-03-10 17:15:34 +01:00			`print(OKGREEN + "Application restarted successfully!" + ENDC)`