zulip

Commit Graph

Author	SHA1	Message	Date
Gaurav Pandey	af08bcdb3f	management: Delete send_stats command. This command is part of a statsd infrastructure that we stopped supporting years ago. Its only purpose for some time has been to provide sample code for how the restart script might trigger a notification to a graphing system, which doesn't justify maintaining it. Fixes part of #18898.	2021-06-25 09:13:48 -07:00
Alex Vandiver	d51272cc3d	puppet: Remove zulip_deliver_scheduled_* from zulip-workers:. Staging and other hosts that are `zulip::app_frontend_base` but not `zulip::app_frontend_once` do not have a /etc/supervisor/conf.d/zulip/zulip-once.conf and as such do not have `zulip_deliver_scheduled_emails` or `zulip_deliver_scheduled_messages` and thus supervisor will fail to reload. Making the contents of `zulip-workers` contingent on if the server is _also_ a `-once` server is complicated, and would involve using Concat fragments, which severely limit readability. Instead, expel those two from `zulip-workers`; this is somewhat reasonable, since they are use an entirely different codepath from zulip_events_, using the database rather than RabbitMQ for their queuing.	2021-06-14 17:12:59 -07:00
Tim Abbott	de47feab43	scripts: Fix check for services running when upgrading. When upgrading from a pre-4.0 release, scripts/stop-server logic would check whether supervisord configuration files were present to determine what it needed to restart, but only considered paths to those files that are introduced in Zulip 4.0. Fixed #18493.	2021-05-13 18:57:19 -07:00
Robert Imschweiler	534d78232c	scripts: Add {start,stop,restart}-server support for postgresql role. During the upgrade process of a postgresql-only Zulip installation, (`puppet_classes = zulip::profile::postgresql` in `/etc/zulip/zulip.conf`) either `scripts/start-server` or `scripts/stop-server` fail because they try to handle supervisor services that are not available (e.g. Tornado) since only `/etc/supervisor/conf.d/zulip/zulip_db.conf` is present and not `/etc/supervisor/conf.d/zulip/zulip.conf`. While this wasn't previously supported, it's a pretty reasonable thing to do, and can be readily supported by just adding a few conditionals.	2021-05-07 09:41:05 -07:00
Anders Kaseorg	9d57fa9759	puppet: Use pgrep -x to avoid accidental matches. Matching the full process name (-x without -f) or full command line (-xf) is less prone to mistakes like matching a random substring of some other command line or pgrep matching itself. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-05-07 08:54:41 -07:00
Anders Kaseorg	405bc8dabf	requirements: Remove Thumbor. Thumbor and tc-aws have been dragging their feet on Python 3 support for years, and even the alphas and unofficial forks we’ve been running don’t seem to be maintained anymore. Depending on these projects is no longer viable for us. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-05-06 20:07:32 -07:00
Alex Vandiver	daabc52a78	restart-server: Reorder supervisorctl calls for less downtime. Instead of taking the "onion" approach, where all services are stopped, and then started back up again, default to a rolling restart across all processes. This draws out how long the overall "restart" takes, but minimizes the time that any of the services are down. This minimizes user-visible impact and queue buildup. In cases where speed is more important than minimal impact (for example, there is already a current outage), a --less-graceful flag is provided, which brings the services down more suddenly, and back up in a still-correct order.	2021-04-30 16:47:15 -07:00
Alex Vandiver	ec12a6128a	scripts: Add a start-server as well. In general, `./scripts/restart-server` will already work in any circumstance where the server is already stopped and needs to be started. However, it will output a couple minor warnings, and it is not readily obvious that it will work correctly. Add an alias for `restart-server` named `start-server`, for parallelism with `stop-server`, which omits the steps of `restart-server` which would stop the server first.	2021-04-21 10:24:08 -07:00
Alex Vandiver	31169526ec	scripts: Say "Zulip" rather than "Application".	2021-04-21 10:24:08 -07:00
Alex Vandiver	0de8357820	scripts: Fix path to additional Zulip supervisor files. The path which contains all of the Zulip supervisor files changed in `3ab9b31d2f` to make it easier to purge now-unwanted supervisor configuration files. However, the paths that the zulip upgrade process, and restart-server, look at were not adjusted. Fix the supervisor configuration file paths.	2021-04-21 10:24:08 -07:00
Anders Kaseorg	6e4c3e41dc	python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	11741543da	python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Alex Vandiver	2a12fedcf1	tornado: Remove explicit tornado_processes setting; compute it. We can compute the intended number of processes from the sharding configuration. In doing so, also validate that all of the ports are contiguous. This removes a discrepancy between `scripts/lib/sharding.py` and other parts of the codebase about if merely having a `[tornado_sharding]` section is sufficient to enable sharding. Having behaviour which changes merely based on if an empty section exists is surprising. This does require that a (presumably empty) `9800` configuration line exist, but making that default explicit is useful. After this commit, configuring sharding can be done by adding to `zulip.conf`: ``` [tornado_sharding] 9800 = # default 9801 = other_realm ``` Followed by running `./scripts/refresh-sharding-and-restart`.	2020-09-18 15:13:40 -07:00
Alex Vandiver	efdaa58c24	supervisor: Use more specific process_name than "port-9800". Making this include "zulip-tornado" makes it clearer in supervisor logs. Without this, one only sees: ``` 2020-09-14 03:43:13,788 INFO waiting for port-9807 to stop 2020-09-14 03:43:14,466 INFO stopped: port-9807 (exit status 1) 2020-09-14 03:43:14,469 INFO spawned: 'port-9807' with pid 24289 2020-09-14 03:43:15,470 INFO success: port-9807 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) ```	2020-09-14 22:17:51 -07:00
Alex Vandiver	dc58dec231	restart-server: Start services in opposite order from stop. `supervisorctl` starts and stops its arguments sequentially, in the order they are passed[1]. Start them in the opposite order from the order in which they were stopped -- this puts the dependencies first, and the most core services (`zulip-django`) last. While the only "dependency" here is currently thumbor, this sets us up in case others are added later. [1] https://github.com/Supervisor/supervisor/blob/master/supervisor/supervisorctl.py#L782	2020-09-14 16:27:15 -07:00
Anders Kaseorg	b4597a8ca8	python: Elide default for store_{true,false} argparse arguments. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-03 16:17:14 -07:00
Anders Kaseorg	1ded51aa9d	python: Replace list literal concatenation with * unpacking. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-02 11:15:41 -07:00
Anders Kaseorg	a5dbab8fb0	python: Remove redundant dest for argparse arguments. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-02 11:04:10 -07:00
Anders Kaseorg	5dc9b55c43	python: Manually convert more percent-formatting to f-strings. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	365fe0b3d5	python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-11 16:45:32 -07:00
Anders Kaseorg	67e7a3631d	python: Convert percent formatting to Python 3.6 f-strings. Generated by pyupgrade --py36-plus. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-10 15:02:09 -07:00
Anders Kaseorg	333f7d16c9	logging: Pass more format arguments to logging. Commit `bdc365d0fe` (#14852) missed this because of https://github.com/returntocorp/semgrep/issues/831. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-05-26 11:42:23 -07:00
Tim Abbott	0f1bdcc46f	restart-server: Restart Tornado processes individually. After some testing, I've confirmed that this seems to behave significantly better in terms of the number of failed requests due to Tornado being the process of restarting compared with the previous version, as each individual process is only down for a short time, rather than all of them being down at once.	2020-03-27 06:23:34 -07:00
Anders Kaseorg	ea6934c26d	dependencies: Remove WebSockets system for sending messages. Zulip has had a small use of WebSockets (specifically, for the code path of sending messages, via the webapp only) since ~2013. We originally added this use of WebSockets in the hope that the latency benefits of doing so would allow us to avoid implementing a markdown local echo; they were not. Further, HTTP/2 may have eliminated the latency difference we hoped to exploit by using WebSockets in any case. While we’d originally imagined using WebSockets for other endpoints, there was never a good justification for moving more components to the WebSockets system. This WebSockets code path had a lot of downsides/complexity, including: * The messy hack involving constructing an emulated request object to hook into doing Django requests. * The `message_senders` queue processor system, which increases RAM needs and must be provisioned independently from the rest of the server). * A duplicate check_send_receive_time Nagios test specific to WebSockets. * The requirement for users to have their firewalls/NATs allow WebSocket connections, and a setting to disable them for networks where WebSockets don’t work. * Dependencies on the SockJS family of libraries, which has at times been poorly maintained, and periodically throws random JavaScript exceptions in our production environments without a deep enough traceback to effectively investigate. * A total of about 1600 lines of our code related to the feature. * Increased load on the Tornado system, especially around a Zulip server restart, and especially for large installations like zulipchat.com, resulting in extra delay before messages can be sent again. As detailed in https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it appears that removing WebSockets moderately increases the time it takes for the `send_message` API query to return from the server, but does not significantly change the time between when a message is sent and when it is received by clients. We don’t understand the reason for that change (suggesting the possibility of a measurement error), and even if it is a real change, we consider that potential small latency regression to be acceptable. If we later want WebSockets, we’ll likely want to just use Django Channels. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-01-14 22:34:00 -08:00
Anders Kaseorg	8d91bebf95	restart-server: Warn if the shell’s PWD goes through an updated symlink. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-09-21 12:02:15 -07:00
Harshit Bansal	50ef91bb08	scripts: Add argparse option to `restart-zerver` for `--fill-cache`. Nowm unless you specify `--fill-cache`, memcached caches will not be pre-filled after a server restart. This will be helpful when someone is in a hurry (e.g. if the server is down right now, or if he/she testing a configuration change in a newly setup server), it's best to just restart without pre-filling the cache. Fixes: #10900.	2019-01-14 15:20:01 -08:00
Anders Kaseorg	a694c3cafd	scripts/restart-server: Avoid shelling out for ln. Signed-off-by: Anders Kaseorg <andersk@mit.edu>	2018-11-28 17:26:54 -08:00
Tim Abbott	5a56925495	restart-server: Fix restarting server with multiple tornado processes. Previously, we unconditionally tried to restart the Tornado process name corresponding to the historically always-true case of a single Tornado process. This resulted in Tornado not being automatically restarted on a production deployment on servers with more than one Tornado process configured.	2018-11-27 17:20:05 -08:00
Tim Abbott	1a0e9fe2f9	restart-server: Restart tornado early. This dramatically reduces the Tornado downtime when restarting a Zulip server, which is generally the most significant source of user-facing bad experiences.	2018-10-16 15:04:07 -07:00
Abhilash Verma	0e2322a322	logging: Show timestamp in UTC in non-django production scripts. Done in pair programming with @aero31aero. Fixes #9678.	2018-08-20 12:52:40 -07:00
Tim Abbott	a8e5551395	restart-server: Ensure we restart process-fts-updates. This is mostly important in that if you're running this as part of a follow-up to a failed upgrade, and you don't do this, process-fts-updates will be left not running, resulting in full-text search not updating.	2018-07-30 16:27:53 -07:00
Joshua Schmidlkofer	b1a57d144f	thumbor: Add production installer/puppet support. This commits adds the necessary puppet configuration and installer/upgrade code for installing and managing the thumbor service in production. This configuration is gated by the 'thumbor.pp' manifest being enabled (which is not yet the default), and so this commit should have no effect in a default Zulip production environment (or in the long term, in any Zulip production server that isn't using thumbor). Credit for this effort is shared by @TigorC (who initiated the work on this project), @joshland (who did a great deal of work on this and got it working during PyCon 2017) and @adnrs96, who completed the work.	2018-07-12 20:37:34 +05:30
rht	71188d7b0a	scripts: Remove import print_function.	2017-09-29 15:43:30 -07:00
Greg Price	a099e698e2	py3: Switch almost all shebang lines to use `python3`. This causes `upgrade-zulip-from-git`, as well as a no-option run of `tools/build-release-tarball`, to produce a Zulip install running Python 3, rather than Python 2. In particular this means that the virtualenv we create, in which all application code runs, is Python 3. One shebang line, on `zulip-ec2-configure-interfaces`, explicitly keeps Python 2, and at least one external ops script, `wal-e`, also still runs on Python 2. See discussion on the respective previous commits that made those explicit. There may also be some other third-party scripts we use, outside of this source tree and running outside our virtualenv, that still run on Python 2.	2017-08-16 17:54:43 -07:00
Umair Khan	336a041ac0	Django 1.10: Use uWSGI. Fixes: #1121 With some tweaks by tabbott to make the number of processes configurable.	2016-12-13 21:40:43 -08:00
Anders Kaseorg	207cf6302b	Always start python via shebang lines. This is preparation for supporting using Python 3 in production. Signed-off-by: Anders Kaseorg <andersk@mit.edu>	2016-11-26 14:46:37 -08:00
Taranjeet Singh	d606b95242	zulip_tools.py: Move zulip_tools.py in scripts/lib. This commit moves zulip_tools.py as part of cleaning the root directory and organizing proejct into better directory structure.	2016-08-15 16:44:50 -07:00
Tim Abbott	1158a86ae7	restart-server: Maintain a last symlink.	2016-08-04 17:02:48 -07:00
Eklavya Sharma	51ea5c1602	scripts/: Make subprocess calls unicode-aware.	2016-07-26 12:06:41 -07:00
Eklavya Sharma	11732f9ab0	Make all scripts in scripts/ pass mypy check.	2016-07-24 00:17:21 +05:30
Eklavya Sharma	94e4b39112	Replace python2.7 by python everywhere.	2016-05-29 05:03:08 -07:00
Eklavya Sharma	149938d468	Change shebangs from python2.7 to python.	2016-05-29 05:03:08 -07:00
Tim Abbott	6e1872987d	Move bin/get-django-setting to scripts/.	2016-05-07 19:37:06 -07:00
Eklavya Sharma	c59185e119	Apply Python 3 futurize transform libfuturize.fixes.fix_print_with_import Refer #256	2016-03-10 22:02:17 -08:00
Steven Oud	d5435fad1d	Consistently use /usr/bin/env python2.7 in shebangs and commands.	2015-10-21 22:58:21 +00:00
Tim Abbott	5b8894cd25	Rename USING_SSO to something more clear. (imported from commit 94e8ae84b01419783872a5d09bafe5c2eb933c18)	2015-08-18 20:48:15 -07:00
Tim Abbott	b2d01e2da0	[manual] restart-server: Minimize downtime for message sender worker. The manual step here is that we need to do the `puppet apply` before pushing this commit, or `restart-server` will crash. Previously we shut down everything in one group, which performed poorly with supervisor's bad performance on restarting many daemons at once. Now we shut down the unimportant stuff, then the important stuff, bring back the important stuff, and then bring back the unimportant stuff. This new model has a little over 5s of downtime for the core user-facing daemons -- which is still far more than would be ideal, but a lot less than the 13s or so that we had before. Here's some logs with the current setup for the tornado/django downtime: 2013-12-19 20:16:51,995 restart-server: Stopping daemons 2013-12-19 20:16:53,461 restart-server: Starting daemons 2013-12-19 20:16:57,146 restart-server: Starting workers Compare with the behavior on master today: 2013-12-19 20:21:45,281 restart-server: Stopping daemons 2013-12-19 20:21:49,225 restart-server: Starting daemons 2013-12-19 20:21:58,463 restart-server: Done! (imported from commit b2c1ba77f3dc989551d0939779208465a8410435)	2013-12-19 17:21:23 -05:00
Zev Benjamin	9b2aa657be	Revert "restart-server: Use 'all' instead of specifying the supervisor jobs to operate on explicitly" This reverts commit acef4c0027b77053497ef6e9f7aa4b61703205c3. Despite the lower total downtime, this caused more user-facing downtime. (imported from commit 5cce032bb20abe83853a65ee72bf0bb28af403cc)	2013-11-21 15:14:38 -05:00
Zev Benjamin	a363b7185d	restart-server: Use 'all' instead of specifying the supervisor jobs to operate on explicitly This shaves about 1.5 seconds off our restart time on ls-dev (9s -> 7.5s). Still too slow, but it's a little bit better. (imported from commit acef4c0027b77053497ef6e9f7aa4b61703205c3)	2013-11-15 15:23:02 -05:00
Zev Benjamin	974159ec94	Move apache2 restart for SSO sites to restart-server (imported from commit f999e2b0591a11442c1d3fdba2393ecf6e78bad3)	2013-11-15 11:34:48 -05:00

1 2

54 Commits