zulip

Commit Graph

Author	SHA1	Message	Date
Alex Vandiver	31d80a77d4	puppet: Update nagios check_postgres_replication_lag to be on DB hosts `7d4a370a57` attempted to move the replication check to on the PostgreSQL hosts. While it updated the _check_ to assume it was running and talking to a local PostgreSQL instance, the configuration and installation for the check were not updated. As such, the check ran on the nagios host for each DB host, and produced no output. Start distributing the check to all apopdb hosts, and configure nagios to use the SSH tunnel to get there.	2020-07-14 16:27:18 -07:00
Alex Vandiver	6c27f07c1d	puppet: Move PostgreSQL backups to their own class. wal-g was used in `puppet/zulip` by env-wal-g, but only installed in `puppet/zulip_ops`. Merge all of the dependencies of doing backups using wal-g (wal-g installation, the pg_backup_and_purge job, the nagios plugin that verifies it happens) into a common base class in `puppet/zulip`, since it is generally useful.	2020-07-14 00:40:25 -07:00
Alex Vandiver	3691a94efe	puppet: Configure munin and nagios under apache with puppet. This swaps in the actually-in-use munin configuiration file; otherwise, it is an implementation of the configuration as it exists on the machine.	2020-07-13 13:23:11 -07:00
Alex Vandiver	7c7b5fcd6f	munin: Deal with spaces in the channel names.	2020-07-13 12:49:28 -07:00
Alex Vandiver	ddc7bb5a45	munin: Fix the path to check_send_receive_time.	2020-07-13 12:49:28 -07:00
Alex Vandiver	8be544e7eb	munin: Rename monitoring plugin to use zulip name, not humbug.	2020-07-13 12:49:28 -07:00
Alex Vandiver	a4e7c7a27e	nagios: Remove check_memcached. check_memcached does not support memcached authentication even in its latest release (it’s in a TODO item comment, and that’s it), and was never particularly useful.	2020-07-10 00:12:48 -07:00
Alex Vandiver	a21a086f5c	puppet: nagios-plugins-basic is replaced by monitoring-plugins-basic. In Bionic, nagios-plugins-basic is a transitional package which depends on monitoring-plugins-basic. In Focal, it is a virtual package, which means that every time puppet runs, it tries to re-install the nagios-plugins-basic package. Switch all instances to referring to `$zulip::common::nagios_plugins`, and repoint that to monitoring-plugins-basic.	2020-06-29 14:58:01 -07:00
Alex Vandiver	876ee4a8ed	installer: Remove code specific to stretch or xenial. Support for Xenial and Stretch was removed (`5154ddafca`, `0f4b1076ad`, `8944e0ad53`, `79acd5ae40`, `1219a2e854`), but not all codepaths were updated to remove their conditionals on it. Remove all code predicated on Xenial or Stretch. debathena support was migrated to Bionic, since that appears to be the current state of existing debathena servers.	2020-06-24 12:57:38 -07:00
Alex Vandiver	5f433d6eeb	puppet: Remove vestigial check_postgres.pl. `65774e1c4f` switched from using the bundled check_postgres.pl to using the version from packages; the file itself remained, however. Remove it, and clean up references to it. Fixes #15389.	2020-06-15 16:18:07 -07:00
Alex Vandiver	7d4a370a57	puppet: Move monitoring of pg replication to the pg hosts. Instead of SSH'ing around to them, run directly on the database hosts. This means that the replicas do not know how many bytes behind they are in _receiving_ the wall logs; thus, the monitoring also extends to the primary database, which knows that information for each replica. This also allows for detecting when there are too few active replicas.	2020-06-15 16:18:07 -07:00
Anders Kaseorg	74c17bf94a	python: Convert more percent formatting to Python 3.6 f-strings. Generated by pyupgrade --py36-plus. Now including %d, %i, %u, and multi-line strings. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	91a86c24f5	python: Replace None defaults with empty collections where appropriate. Use read-only types (List ↦ Sequence, Dict ↦ Mapping, Set ↦ AbstractSet) to guard against accidental mutation of the default value. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-13 15:31:27 -07:00
Alex Vandiver	55bd31721d	puppet: Remove custom `vm.dirty_ratio` and `vm.dirty_background_ratio`. These values differed between the primary and secondary database hosts, for unclear reasons. The differences date back to their introduction in `387f63deaa`. As the comment in the replica confguration notes, settings of `vm.dirty_ratio = 10` and `vm.dirty_background_ratio = 5` matched the kernel defaults for "newer" kernels; however, kernel 2.6.30 bumped those to 20 and 10, respectively[1], as a fix for underlying logic now being more correct. Remove these overrides; they should at very least be consistent across roles, and the previous values look to be an attempt to tune for a very much older version of the Linux kernel, which was using an different, buggier, algorithm under the hood. [1] `1b5e62b42b`	2020-06-12 14:57:46 -07:00
Alex Vandiver	f39816e768	puppet: Stop distributing recovery.conf file. This file controls streaming replication, and recovery using wal-g on the secondary. The `primary_conninfo` data needs to change on short notice when database failover happens, in a way that is not suitable for being controlled by puppet. PostgreSQL 12, in fact, removes the use of the `recovery.conf` file[1]; the `primary_conninfo` and `restore_command` information goes into the main `postgresql.conf` file, and the standby status is controlled by the presence of absence of an empty `standby.signal` file. Remove the puppet control of the `recovery.conf` file. [1] https://pgstef.github.io/2018/11/26/postgresql12_preview_recovery_conf_disappears.html	2020-06-12 14:57:46 -07:00
Alex Vandiver	316498a169	puppet: Remove unnecessary nagios authentication setup. Since the nagios authentication is stored _in the database_, it is unnecessary to run if the database is simply a replica of the production database. The only case in which this statement would have an effect is if the postgres node contains a _different_ (or empty) database, which `setup_disks` now effectively prevents. Remove the unnecessary step.	2020-06-11 21:01:49 -07:00
Alex Vandiver	1dc2de5026	puppet: Update setup-disks to be idempotent. The end state it produces is _either_: - `/srv/postgresql` already existed, which was symlinked into `/var/lib/postgresql`; postgres is left untouched. This is the situation if `setup_disks` is run on the database primary, or a replica which was correctly configured. - An empty `/srv/postgresql` now exists, symlinked into `/var/lib/postgresql`, and postgres is stopped. This is the situation if `puppet` was just run on a new host, or a previously-configured host was rebooted (clearing the temporary disk in `/dev/nvme0`) In the latter case, where `/srv/postgresql` is now empty, any previous contents of `/var/lib/postgresql` are placed under `/root`, timestamped for uniqueness. In either case, the tool should now be idempotent.	2020-06-11 21:01:49 -07:00
Anders Kaseorg	365fe0b3d5	python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-11 16:45:32 -07:00
Anders Kaseorg	69730a78cc	python: Use trailing commas consistently. Automatically generated by the following script, based on the output of lint with flake8-comma: import re import sys last_filename = None last_row = None lines = [] for msg in sys.stdin: m = re.match( r"\x1b\[35mflake8 \\|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg ) if m: filename, row_str, col_str, err = m.groups() row, col = int(row_str), int(col_str) if filename == last_filename: assert last_row != row else: if last_filename is not None: with open(last_filename, "w") as f: f.writelines(lines) with open(filename) as f: lines = f.readlines() last_filename = filename last_row = row line = lines[row - 1] if err in ["C812", "C815"]: lines[row - 1] = line[: col - 1] + "," + line[col - 1 :] elif err in ["C819"]: assert line[col - 2] == "," lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ") if last_filename is not None: with open(last_filename, "w") as f: f.writelines(lines) Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-06-11 16:04:12 -07:00
Alex Vandiver	b114eb2f10	puppet: Rename env-wal-e to env-wal-g. It runs wal-g now, not wal-e; make its name respect that.	2020-06-11 15:52:43 -07:00
Anders Kaseorg	67e7a3631d	python: Convert percent formatting to Python 3.6 f-strings. Generated by pyupgrade --py36-plus. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-10 15:02:09 -07:00
Alex Vandiver	8b1d49dbc7	puppet: Rename "wiki" realm to "monitoring". This is vestigial. It requires manually altering the `htdigest` file (not stored in this repo) to change the digest realm from `wiki` to `monitoring`, and will re-prompt users for their passwords if the browsers currently store them.	2020-05-30 12:26:21 -07:00
Alex Vandiver	b33aa8da7f	postgresql: Update setup-disks to use `service postgresql`. Using `service postgresql` makes it no longer linked to the specific version/cluster that is on the host.	2020-05-30 12:14:24 -07:00
Alex Vandiver	4e370cda75	postgresql: Update setup-disks to drop /mnt disabling. Hosts do not start out with a `/mnt`; there is no need to disable it.	2020-05-30 12:14:24 -07:00
Alex Vandiver	a7d85b7e69	postgresql: Update setup-disks to not move /tmp. Drop the change to move `/tmp` onto the local disk. Doing this move confuses `resolved` until there is a restart, and has no clear benefits. The change came in during `bf82fadc95`, but does not describe the reasoning; it is particularly puzzling, since postgresql stores its temporary files under `$PGDATA/base/pgsql_tmp`.	2020-05-30 12:14:24 -07:00
Alex Vandiver	481613a344	postgresql: Update setup-disks to not use RAID. Do not RAID the disks together. This was previously done when they were spinning media, for reliability; running them on an SSD obviates this sufficiently. This means that updating the initramfs is also not necessary.	2020-05-30 12:14:24 -07:00
Alex Vandiver	b537563bc1	postgresql: Set the current primary host.	2020-05-30 12:14:24 -07:00
Alex Vandiver	ad2918ea51	puppet: Remove `postgres_other` nagios hostgroup. This no longer has any rules specific to it. We leave the `postgres` munin group (which now only contains `postgres_appdb`) as future-proofing, and so that `postgres_appdb` matches to the puppet manifest of the same name.	2020-05-28 17:24:35 -07:00
Alex Vandiver	2c73fbdcb6	puppet: Remove munin monitoring for no-longer-used "postgres_other". The `wiki` and `trac` products are no longer used.	2020-05-28 17:24:35 -07:00
Anders Kaseorg	f5b33f9398	python: Further pyupgrade changes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-05-26 11:43:40 -07:00
Tim Abbott	220620e7cf	sharding: Add basic sharding configuration for Tornado. This allows straight-forward configuration of realm-based Tornado sharding through simply editing /etc/zulip/zulip.conf to configure shards and running scripts/refresh-sharding-and-restart. Co-Author-By: Mateusz Mandera <mateusz.mandera@zulip.com>	2020-05-20 13:47:20 -07:00
Mateusz Mandera	dd40649e04	queue_processors: Remove the slow_queries queue. While this functionality to post slow queries to a Zulip stream was very useful in the early days of Zulip, when there were only a few hundred accounts, it's long since been useless since (1) the total request volume on larger Zulip servers run by Zulip developers, and (2) other server operators don't want real-time notifications of slow backend queries. The right structure for this is just a log file. We get rid of the queue and replace it with a "zulip.slow_queries" logger, which will still log to /var/log/zulip/slow_queries.log for ease of access to this information and propagate to the other logging handlers. Reducing the amount of queues is good for lowering zulip's memory footprint and restart performance, since we run at least one dedicated queue worker process for each one in most configurations.	2020-05-11 00:45:13 -07:00
Anders Kaseorg	c0ffa71fa9	nginx: Replace unanchored regexes in location directives. We could anchor the regexes, but there’s no need for the power (and responsibility) of regexes here. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-24 16:58:19 -07:00
Anders Kaseorg	5e01a0ae8b	zulip-ec2-configure-interfaces: Convert function type annotations. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-24 13:06:54 -07:00
Anders Kaseorg	f8339f019d	python: Convert assignment type annotations to Python 3.6 style. Commit split by tabbott; this has changes to scripts/, tools/, and puppet/. scripts/lib/hash_reqs.py, scripts/lib/setup_venv.py, scripts/lib/zulip_tools.py, and tools/lib/provision.py are excluded so tools/provision still gives the right error message on Ubuntu 16.04 with Python 3.5. Generated by com2ann, with whitespace fixes and various manual fixes for runtime issues: -shebang_rules: List[Rule] = [ +shebang_rules: List["Rule"] = [ -trailing_whitespace_rule: Rule = { +trailing_whitespace_rule: "Rule" = { -whitespace_rules: List[Rule] = [ +whitespace_rules: List["Rule"] = [ -comma_whitespace_rule: List[Rule] = [ +comma_whitespace_rule: List["Rule"] = [ -prose_style_rules: List[Rule] = [ +prose_style_rules: List["Rule"] = [ -html_rules: List[Rule] = whitespace_rules + prose_style_rules + [ +html_rules: List["Rule"] = whitespace_rules + prose_style_rules + [ - target_port: int = None + target_port: int Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-24 13:06:54 -07:00
Aman Agrawal	2dc6d09c2a	python3-upgrade: Move python2 scripts to run on python3.	2020-04-22 16:13:15 -07:00
Anders Kaseorg	5901e7ba7e	python: Convert function type annotations to Python 3 style. Generated by com2ann (slightly patched to avoid also converting assignment type annotations, which require Python 3.6), followed by some manual whitespace adjustment, and six fixes for runtime issues: - def __init__(self, token: Token, parent: Optional[Node]) -> None: + def __init__(self, token: Token, parent: "Optional[Node]") -> None: -def main(options: argparse.Namespace) -> NoReturn: +def main(options: argparse.Namespace) -> "NoReturn": -def fetch_request(url: str, callback: Any, kwargs: Any) -> Generator[Callable[..., Any], Any, None]: +def fetch_request(url: str, callback: Any, kwargs: Any) -> "Generator[Callable[..., Any], Any, None]": -def assert_server_running(server: subprocess.Popen[bytes], log_file: Optional[str]) -> None: +def assert_server_running(server: "subprocess.Popen[bytes]", log_file: Optional[str]) -> None: -def server_is_up(server: subprocess.Popen[bytes], log_file: Optional[str]) -> bool: +def server_is_up(server: "subprocess.Popen[bytes]", log_file: Optional[str]) -> bool: - method_kwarg_pairs: List[FuncKwargPair], + method_kwarg_pairs: "List[FuncKwargPair]", Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-18 20:42:48 -07:00
Tim Abbott	e1ce53ac46	puppet: Update nagios checks for disk to exclude kernel filesystems. The fact that we have to explicitly list these is almost certainly a bug in check_disk, but at least this works.	2020-04-16 17:49:29 -07:00
Tim Abbott	cfbb617f5c	puppet: Update nagios configuration for checking local disk.	2020-04-16 17:48:36 -07:00
Tim Abbott	8e5a866122	puppet: Update tuning for load average monitoring.	2020-04-16 16:47:05 -07:00
Anders Kaseorg	c734bbd95d	python: Modernize legacy Python 2 syntax with pyupgrade. Generated by `pyupgrade --py3-plus --keep-percent-format` on all our Python code except `zthumbor` and `zulip-ec2-configure-interfaces`, followed by manual indentation fixes. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-09 16:43:22 -07:00
Vishnu KS	449f7e2d4b	team: Generate team page data using cron job. This eliminates the contributors data as a possible source of flakiness when installing Zulip from Git. Fixes #14351.	2020-04-08 12:52:31 -07:00
Stefan Weil	d2fa058cc1	text: Fix some typos (most of them found and fixed by codespell). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-27 17:25:56 -07:00
Anders Kaseorg	7ff9b22500	docs: Convert many http URLs to https. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-03-26 21:35:32 -07:00
Anders Kaseorg	687553a661	setup_path_on_import: Replace with setup_path function. isort 5 knows not to reorder imports across function calls, so this will stop isort from breaking our code. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-02-25 15:40:21 -08:00
Mateusz Mandera	4c5a8e6f0c	queue: Remove missedmessage_email_senders.	2020-01-31 12:13:51 -08:00
Tim Abbott	dd969b5339	install: Remove references to "Zulip Voyager". "Zulip Voyager" was a name invented during the Hack Week to open source Zulip for what a single-system Zulip server might be called, as a Star Trek pun on the code it was based on, "Zulip Enterprise". At the time, we just needed a name quickly, but it was never a good name, just a placeholder. This removes that placeholder name from much of the codebase. A bit more work will be required to transition the `zulip::voyager` Puppet class, as that has some migration work involved.	2020-01-30 12:40:41 -08:00
Tim Abbott	d70e799466	bots: Remove FEEDBACK_BOT implementation. This legacy cross-realm bot hasn't been used in several years, as far as I know. If we wanted to re-introduce it, I'd want to implement it as an embedded bot using those common APIs, rather than the totally custom hacky code used for it that involves unnecessary queue workers and similar details. Fixes #13533.	2020-01-25 22:41:39 -08:00
Anders Kaseorg	ea6934c26d	dependencies: Remove WebSockets system for sending messages. Zulip has had a small use of WebSockets (specifically, for the code path of sending messages, via the webapp only) since ~2013. We originally added this use of WebSockets in the hope that the latency benefits of doing so would allow us to avoid implementing a markdown local echo; they were not. Further, HTTP/2 may have eliminated the latency difference we hoped to exploit by using WebSockets in any case. While we’d originally imagined using WebSockets for other endpoints, there was never a good justification for moving more components to the WebSockets system. This WebSockets code path had a lot of downsides/complexity, including: * The messy hack involving constructing an emulated request object to hook into doing Django requests. * The `message_senders` queue processor system, which increases RAM needs and must be provisioned independently from the rest of the server). * A duplicate check_send_receive_time Nagios test specific to WebSockets. * The requirement for users to have their firewalls/NATs allow WebSocket connections, and a setting to disable them for networks where WebSockets don’t work. * Dependencies on the SockJS family of libraries, which has at times been poorly maintained, and periodically throws random JavaScript exceptions in our production environments without a deep enough traceback to effectively investigate. * A total of about 1600 lines of our code related to the feature. * Increased load on the Tornado system, especially around a Zulip server restart, and especially for large installations like zulipchat.com, resulting in extra delay before messages can be sent again. As detailed in https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it appears that removing WebSockets moderately increases the time it takes for the `send_message` API query to return from the server, but does not significantly change the time between when a message is sent and when it is received by clients. We don’t understand the reason for that change (suggesting the possibility of a measurement error), and even if it is a real change, we consider that potential small latency regression to be acceptable. If we later want WebSockets, we’ll likely want to just use Django Channels. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-01-14 22:34:00 -08:00
Tim Abbott	f84c037225	puppet: Tune check_postgres_locks parameters. This has been a spurious alert for a long time. It's unclear that this check is useful at all, but if it spikes dramatically above what's normal, there's perhaps still utility in being alerted.	2019-10-23 15:04:38 -07:00

1 2 3 4

182 Commits