zulip

Commit Graph

Author	SHA1	Message	Date
Ganesh Pawar	65e23dd713	puppet: Add Zulip specific postgresql configuration for 13. Based on the work done in `a03e4784c7`.	2021-02-05 09:30:34 -08:00
Ganesh Pawar	90a3dc8a91	puppet: Add upstream version of postgresql 13 config. This is a prep commit to add provision support for Ubuntu 20.10 Groovy.	2021-02-05 09:30:34 -08:00
Tim Abbott	fd8504e06b	munin: Update to use NAGIOS_BOT_HOST. We haven't actively used this plugin in years, and so it was never converted from the 2014-era monitoring to detect the hostname. This seems worth fixing since we may want to migrate this logic to a more modern monitoring system, and it's helpful to have it correct.	2021-01-27 12:07:09 -08:00
Alex Vandiver	ab035f76de	puppet: Be more restrictive about mm addresses. These will always have only 32 characters after the `mm`.	2021-01-26 10:13:58 -08:00
Alex Vandiver	a53092687e	puppet: Only match incoming gateway address on our mail domain. `79931051bd` allows outgoing emails from localhost, but outgoing recipients are still subjected to virtualmaps. This caused all outgoing email from Zulip with destination addresses containing `.`, `+`, or starting with `mm`, to be redirected back through the email gateway. Bracket the virualmap addresses used for local delivery to the mail gateway with a restriction on the domain matching the `postfix.mailname` configuration, regex-escaped, so those only apply to email destined for that domain. The hostname is _not_ moved from `mydestination` to `virtual_alias_domains`, as that would preclude delivery to actually-local addresses, like `postmaster@`.	2021-01-26 10:13:58 -08:00
Alex Vandiver	c2526844e9	worker: Remove SignupWorker and friends. ZULIP_FRIENDS_LIST_ID and MAILCHIMP_API_KEY are not currently used in production. This removes the unused 'signups' queue and worker.	2021-01-17 11:16:35 -08:00
Tim Abbott	4ee58f408b	process_fts_updates: Make normal development startup silent. We run this tool at DEBUG log level in production, so we will still see the notice on startup there; this avoids a spammy line in the development environment output..	2020-12-20 12:19:49 -08:00
Sutou Kouhei	0d3f9fc855	install: Use PGroonga packages built for PostgreSQL packages by PGDG Because we always use PostgreSQL packages by PGDG since Zulip 3.0. Fixes #16058.	2020-12-18 15:38:21 -08:00
Alex Vandiver	4868a4fe48	puppet: Set a long timeout on wal-g wal-push, to prevent stalls. `wal-g wal-push` has a known bug with occasionally hanging after file upload to S3[1]; set a rather long timeout on the upload process, so that we don't simply stall forever when archiving WAL segments. [1] https://github.com/wal-g/wal-g/issues/656	2020-11-20 11:32:36 -08:00
Sourabh Rana	419f163906	nginx: Increase file upload size from 25mb to 80mb.	2020-11-19 00:49:49 -08:00
Alex Vandiver	90ca06d873	puppet: Allow unattended upgrades of -updates in addition to -security. This ensures that software will be fully up-to-date, not just with security patches.	2020-11-13 16:45:05 -08:00
Alex Vandiver	2e20ab1658	puppet: Log the "Host" header and total response time. Logging `Host` is useful for determining access patterns to realms, especially if ROOT_DOMAIN_LANDING_PAGE is set. Total response time is useful in debugging access and performance patterns.	2020-11-13 16:42:32 -08:00
Tim Abbott	494a685827	puppet: Fix typo in name of missedmessage_emails consumer. This has been present since this check was introduced in `45c9c3cc30`.	2020-10-29 12:28:54 -07:00
Tim Abbott	ab3cb2b3bf	puppet: Fix internal redis puppet configuration. The inherits rule is required for overriding existing configuration files; while the `::profile` piece was missed in the recent ::profile migration.	2020-10-29 11:53:43 -07:00
Alex Vandiver	6b9d7000b5	puppet: Set proxy environment variables. These are respected by `urllib`, and thus also `requests`. We set `HTTP_proxy`, not `HTTP_PROXY`, because the latter is ignored in situations which might be running under CGI -- in such cases it may be coming from the `Proxy:` header in the request.	2020-10-28 12:17:35 -07:00
Alex Vandiver	8b0f32ee07	puppet: Move environment-setting into configuration, not command.	2020-10-28 12:13:04 -07:00
Alex Vandiver	b9797770d3	provision: Rename backup directory to postgresql.	2020-10-28 11:57:03 -07:00
Alex Vandiver	1f7132f50d	docs: Standardize on PostgreSQL, not Postgres.	2020-10-28 11:55:16 -07:00
Alex Vandiver	eaa99359b1	puppet: Rename to check_postgresql_replication_lag.	2020-10-28 11:51:52 -07:00
Alex Vandiver	53e59a0a13	puppet: Rename check_postgres_backup to check_postgresql_backup.	2020-10-28 11:51:52 -07:00
Alex Vandiver	45f6c79c4a	puppet: Rename postgres_ variables to postgresql_.	2020-10-28 11:51:52 -07:00
Alex Vandiver	e124324050	puppet: Rename postgres_appdb in nagios to postgresql.	2020-10-28 11:51:52 -07:00
Alex Vandiver	a155430eb5	docs: Document all zulip.conf settings. This provides a single reference point for all zulip.conf settings; these mostly link out to the more complete documentation about each setting, elsewhere. Fixes #12490.	2020-10-27 13:31:57 -07:00
Alex Vandiver	e81bc19e45	puppet: Remove shims for old classes, except dockervoyager. The upgrade mechanism in the previous commit negates the need for them -- with the exception of dockervoyager.	2020-10-27 13:29:19 -07:00
Alex Vandiver	d24c571bab	puppet: Automatically back up the database if we have the secrets. This avoids folks having to manually add to the puppet_classes.	2020-10-27 13:29:19 -07:00
Alex Vandiver	e7798d2797	puppet: Move zulip_ops::profile::postgres_appdb to postgresql.	2020-10-27 13:29:19 -07:00
Alex Vandiver	9f25389bff	puppet: Move top-level zulip_ops deployments to zulip_ops::profile.	2020-10-27 13:29:19 -07:00
Alex Vandiver	5365af544a	puppet: Rename zulip::profile::rabbit to ::rabbitmq.	2020-10-27 13:29:19 -07:00
Alex Vandiver	188af57296	puppet: Rename postgres_appdb to postgresql. There is only one PostgreSQL database; the "appdb" is irrelevant. Also use "postgresql," as it is the name of the software, whereas "postgres" the name of the binary and colloquial name. This is minor cleanup, but enabled by the other renames in the previous commit.	2020-10-27 13:29:19 -07:00
Alex Vandiver	91cb0988e1	puppet: Generalize docker detection. This also has the benefit of detecting zulip::dockervoyager as well as zulip::profile::docker.	2020-10-27 13:29:19 -07:00
Alex Vandiver	0f25acc7b3	puppet: Rename "voyager"/"dockervoyager" to "standalone"/"docker". The "voyager" name is non-intuitive and not significant. `zulip::voyager` and `zulip::dockervoyager` stubs are kept for back-compatibility with existing `zulip.conf` files.	2020-10-27 13:29:19 -07:00
Alex Vandiver	c2185a81d6	puppet: Move top-level zulip deployments into "profile" directory. This moves the puppet configuration closer to the "roles and profiles method"[1] which is suggested for organizing puppet classes. Notably, here it makes clear which classes are meant to be able to stand alone as deployments. Shims are left behind at the previous names, for compatibility with existing `zulip.conf` files when upgrading. [1] https://puppet.com/docs/pe/2019.8/the_roles_and_profiles_method	2020-10-27 13:29:19 -07:00
Alex Vandiver	27cfb14d92	puppet: Only include zulip::base for top-level deploys. This also removes direct includes of `zulip::common`, making `zulip::base` gatekeep the inclusion of it. This helps enforce that any top-level deploy only needs include a single class, and that any configuration which is not meant to be deployed by itself will not apply, due to lack of `zulip::common` include. The following commit will better differentiate these top-level deploys by moving them into a subdirectory.	2020-10-27 13:29:19 -07:00
Alex Vandiver	34e8c2c61e	puppet: Move total_memory_mb from zulip::base into zulip::common. This makes `zulip::common` used only for variable-setting, and `zulip::base` used only for resource creation.	2020-10-27 13:29:19 -07:00
Alex Vandiver	7bb888c2ec	puppet: Template supervisor.conf for redhat paths.	2020-10-27 13:29:19 -07:00
Alex Vandiver	3ab9b31d2f	puppet: Purge all un-managed supervisor configuration files. Relying on `defined(Class['...'])` makes the class sensitive to resource evaluation ordering, and thus brittle. It is also only functional for a single service (thumbor). Generalize by using `purge => true` for the directory to automatically remove all un-managed files. This is more general than the previous form, and may result in additional not-managed services being removed.	2020-10-27 13:29:19 -07:00
Alex Vandiver	1d54630b4e	log: Rename email-deliverer.log to match other files.	2020-10-25 14:56:37 -07:00
Alex Vandiver	93d661d119	puppet: Configure logrotate for all logger files. This adds log rotation to all /var/log/zulip files.	2020-10-25 14:56:37 -07:00
Alex Vandiver	c296b5d819	puppet: Allow unattended-upgrades for all but servers. Restarting servers is what can cause service interruptions, and increase risk. Add all of the servers that we use to the list of ignored packages, and uncomment the default allowed-origins in order to enable unattended upgrades.	2020-10-23 16:46:06 -07:00
Anders Kaseorg	72d6ff3c3b	docs: Fix more capitalization issues. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-10-23 11:46:55 -07:00
Alex Vandiver	a7d1fd9ffb	puppet: Remove non-working apt::source. `d2aa81858c` replaced the `apt::source` to set up debathena with `Exec['setup-apt-repo-debathena']`, but mistakenly left the `apt::source` in place in `zmirror` (but not `zmirror_personals`). The `apt::source` resource type was later removed in `c9d54f7854`, making the manifest to apply on `zmirror`. Remove the broken and unnecessary `apt::source` resource.	2020-10-23 11:31:20 -07:00
Alex Vandiver	48e06c25ba	puppet: Switch nagios SSH checks to id_ed25519 key. The ssh-rsa algorithm was deprecated[1] in OpenSSH 8.2 (2020-02-14) and will be removed in a future release. [1] https://www.openssh.com/txt/release-8.4	2020-10-22 16:42:30 -07:00
Alex Vandiver	0ea20bd7d8	puppet: Move postgres_version into postgres_common. This property is not related to the base zulip install; move it to zulip::postgres_common, which is already used as a namespace for various postgres variables.	2020-10-22 11:32:25 -07:00
Alex Vandiver	25e995b677	puppet: Move normal_queues to the one place that uses it.	2020-10-22 11:32:25 -07:00
Alex Vandiver	423b5c2be2	puppet: Move queue error and stats directories to just the app host.	2020-10-22 11:31:05 -07:00
Alex Vandiver	4d4c21499a	puppet: Move supervisor dependency into process_fts_updates. PostgreSQL itself has no dependency on supervisor; rather, the FTS updates do.	2020-10-22 11:30:53 -07:00
Alex Vandiver	ca971ebc59	puppet: Remove empty zulip_ops class.	2020-10-22 11:30:53 -07:00
Alex Vandiver	16af05758d	puppet: Move zulip_org into zulip_ops. This class is not of general interest.	2020-10-22 11:30:53 -07:00
Alex Vandiver	ad566c491d	puppet: Drop now-unused zulip_ops:::git class.	2020-10-22 11:30:53 -07:00
Alex Vandiver	50e9e2ed20	puppet: Make zulip::base include zulip::apt_repository. There was likely more dependency complexity prior to `97766102df`, but there is now no reason to require that consumers explicitly include zulip::apt_repository.	2020-10-22 11:30:53 -07:00
Alex Vandiver	2dc6d26ec6	puppet: Fix included monitoring class name.	2020-10-19 22:30:20 -07:00
Alex Vandiver	7a1132d605	puppet: Switch golang and smokescreen to use /srv. /srv and /opt have very similar usages; but we should be internally consistent. Move these two (the only usages of /opt) to match the rest in /srv.	2020-10-16 13:00:06 -07:00
Alex Vandiver	78b92a51cc	puppet: Allow access to smokescreen port via iptables.	2020-10-15 15:18:35 -07:00
Alex Vandiver	0d5356969e	puppet: Reformat ipv4 iptables rules comments.	2020-10-15 15:18:35 -07:00
Alex Vandiver	fffea9612b	puppet: Add an outgoing HTTP/HTTPS proxy server. Use https://github.com/stripe/smokescreen to provide a server for an outgoing proxy, run under supervisor. This will allow centralized blocking of internal metadata IPs, localhost, and so forth, as well as providing default request timeouts (10s by default).	2020-10-15 15:18:35 -07:00
Anders Kaseorg	dfaea9df65	shfmt: Reformat shell scripts with shfmt. https://github.com/mvdan/sh Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-10-15 15:16:00 -07:00
Alex Vandiver	f61ac4a28d	puppet: Move frontend monitoring into its own file. This allows it to be pulled in for deploys like czo, which don't use the full `zulip_ops::app_frontend`, but we wish to monitor.	2020-10-13 17:37:32 -07:00
Tim Abbott	7c2c82b190	nginx: Update nginx configuration for fhir/hl7 organization. We should eventually add templating for the set of hosts here, but it's worth merging this change to remove the deleted hostname and replace it with the current one.	2020-10-13 16:50:26 -07:00
Anders Kaseorg	723d285e46	nginx: Redirect {www.,}zulipchat.com, www.zulip.com to zulip.com. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-10-13 16:49:23 -07:00
Alex Vandiver	c8df9a150e	puppet: Drop all log2zulip configuration. Disabled on webservers in `047817b6b0`, it has since lingered in configuration, as well as running (to no effect) every minute on the loadbalancer. Remove the vestiges of its configuration.	2020-10-13 11:00:50 -07:00
Alex Vandiver	b431b1b021	puppet: Remove misleading motd. This banner shows on lb1, advertising itself as lb0. There is no compelling reason for a custom motd, especially one which needs to be reconfigured for each host.	2020-10-13 11:00:36 -07:00
Alex Vandiver	45c9c3cc30	queue: Monitor user_activity queue, now that it has a consumer. Since this was using repead individual get() calls previously, it could not be monitored for having a consumer. Add it in, by marking it of queue type "consumer" (the default), and adding Nagios lines for it. Also adjust missedmessage_emails to be monitored; it stopped using LoopQueueProcessingWorker in `5cec566cb9`, but was never added back into the set of monitored consumers.	2020-10-11 14:19:42 -07:00
Alex Vandiver	4fd7df4e8c	puppet: Remove absent of check-apns-tokens. This was marked as ensure absent in `d02101a401`, in v1.7.0 in 2017.	2020-09-29 18:17:08 -07:00
Alex Vandiver	872a349508	puppet: Remove absent of log2zulip. This was marked as ensure absent in `047817b6b0`, in v2.0.0 in 2018.	2020-09-29 18:17:08 -07:00
Alex Vandiver	0137772fdb	puppet: Remove absent of calculate-first-visible-message-id. This was marked as ensure absent in `dc7d44a245`, in v1.9.0 in 2018.	2020-09-29 18:17:08 -07:00
Alex Vandiver	966c8dc23d	puppet: Remove absent of email-mirror cron job. This was marked as ensure absent in `24f8492236`, in v1.3.0 in 2014.	2020-09-29 18:17:08 -07:00
Alex Vandiver	430d3b8554	puppet: Remove absent of libapache2-mod-wsgi. This was marked as ensure absent in `89b97e7480`, in v1.7.0 in 2017, though it did not take effect until `6e55aa2ce6`, in v1.9.0 in 2018.	2020-09-29 18:17:08 -07:00
Alex Vandiver	12085552d5	puppet: Tidy indentation.	2020-09-29 17:44:44 -07:00
Alex Vandiver	57d88eedd8	puppet: Only install rabbitmq cron jobs via zulip_ops. The rabbitmq cron jobs exist in order to call rabbitmqctl as root and write the output to files that nagios can consume, since nagios is not allowed to run rabbitmqctl. In systems which do not have nagios configured, these every-minute cron jobs add non-insignificant load, to no effect. Move their installation into `zulip_ops`. In doing so, also combine the cron.d files into a single file; this allows us to `ensure => absent` the old filenames, removing them from existing systems. Leave the resulting combined cron.d file in `zulip`, since it is still of general utility and note.	2020-09-29 17:44:44 -07:00
Alex Vandiver	79931051bd	puppet: Permit outgoing mail from postfix. The configuration change made in `1c17583ad5` only allowed delivery to those specific Zulip addresses. However, they also prevent the mailserver from being used as an outgoing email relay from Zulip, since all mail that passed through the mailserver (from any originator) was required to have a `RCPT TO` that matched those regexes. Allow mail originating from `mynetworks` to have an arbitrary addresses in `RCPT TO`.	2020-09-25 15:09:27 -07:00
Alex Vandiver	36ea307fbf	puppet: Depend other changes on sharding.py validation. Use the validation of the tornado sharding config that `stage_updated_sharding` does, by depending on it. This ensures that we don't write out a supervisor or nginx config based on a bad (e.g. non-sequential) list of tornado ports.	2020-09-25 10:52:40 -07:00
Alex Vandiver	c0e240277b	tornado: Remove fingerprinting, write out .tmp files always. Fingerprinting the config is somewhat brittle -- it requires either custom bootstrapping for old (fingerprint-less) configs, and may have false-positives. Since generating the config is lightweight, do so into the .tmp files, and compare the output to the originals to determine if there are changes to apply. In order to both surface errors, as well as notify the user in case a restart is necessary, we must run it twice. The `onlyif` functionality cannot show configuration errors to the user, only determine if the command runs or not. We thus run the command once, judging errors as "interesting" enough to run the actual command, whose failure will be verbose in Puppet and halt any steps that depend on it. Removing the `onlyif` would result in `stage_updated_sharding` showing up in the output of every Puppet run, which obscures the important messages it displays when an update to sharding is necessary. Removing the `command` (e.g. making it an `echo`) would result in removing the ability to report configuration errors. We thus have no choice but to run it twice; this is thankfully low-overhead.	2020-09-25 10:52:40 -07:00
Alex Vandiver	2a12fedcf1	tornado: Remove explicit tornado_processes setting; compute it. We can compute the intended number of processes from the sharding configuration. In doing so, also validate that all of the ports are contiguous. This removes a discrepancy between `scripts/lib/sharding.py` and other parts of the codebase about if merely having a `[tornado_sharding]` section is sufficient to enable sharding. Having behaviour which changes merely based on if an empty section exists is surprising. This does require that a (presumably empty) `9800` configuration line exist, but making that default explicit is useful. After this commit, configuring sharding can be done by adding to `zulip.conf`: ``` [tornado_sharding] 9800 = # default 9801 = other_realm ``` Followed by running `./scripts/refresh-sharding-and-restart`.	2020-09-18 15:13:40 -07:00
Alex Vandiver	f638518722	tornado: Move default production port to 9800. In development and test, we keep the Tornado port at 9993 and 9983, respectively; this allows tests to run while a dev instance is running. In production, moving to port 9800 consistently removes an odd edge case, when just one worker is on an entirely different port than if two workers are used.	2020-09-18 15:13:40 -07:00
Alex Vandiver	ff94254598	tornado: Log to files by port number. Without an explicit port number, the `stdout_logfile` values for each port are identical. Supervisor apparently decides that it will de-conflict this by appending an arbitrary number to the end: ``` /var/log/zulip/tornado.log /var/log/zulip/tornado.log.1 /var/log/zulip/tornado.log.10 /var/log/zulip/tornado.log.2 /var/log/zulip/tornado.log.3 /var/log/zulip/tornado.log.7 /var/log/zulip/tornado.log.8 /var/log/zulip/tornado.log.9 ``` This is quite confusing, since most other files in `/var/log/zulip/` use `.1` to mean logrotate was used. Also note that these are not all sequential -- 4, 5, and 6 are mysteriously missing, though they were used in previous restarts. This can make it extremely hard to debug logs from a particular Tornado shard. Give the logfiles a consistent name, and set them up to logrotate.	2020-09-14 22:17:51 -07:00
Alex Vandiver	efdaa58c24	supervisor: Use more specific process_name than "port-9800". Making this include "zulip-tornado" makes it clearer in supervisor logs. Without this, one only sees: ``` 2020-09-14 03:43:13,788 INFO waiting for port-9807 to stop 2020-09-14 03:43:14,466 INFO stopped: port-9807 (exit status 1) 2020-09-14 03:43:14,469 INFO spawned: 'port-9807' with pid 24289 2020-09-14 03:43:15,470 INFO success: port-9807 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) ```	2020-09-14 22:17:51 -07:00
Alex Vandiver	e9d0bdea65	puppet: Coerce uwsgi_listen_backlog_limit into an int before doing math.	2020-09-14 21:22:13 -07:00
Alex Vandiver	8adf530400	puppet: Generate sharding in puppet, then refresh-sharding-and-restart. This supports running puppet to pick up new sharding changes, which will warn of the need to finalize them via `refresh-sharding-and-restart`, or simply running that directly.	2020-09-14 16:27:15 -07:00
Alex Vandiver	0de356c2df	puppet: Move generation of tornado nginx upstreams into tornado_sharding. This puts the creation of the upstreams referenced by `nginx_sharding.conf` adjacent to their use.	2020-09-14 16:27:15 -07:00
Alex Vandiver	bf029d99f1	sharding: Also mark sharding.json 644 for consistency. There is no reason to limit this to 640; mark it 644 for consistency with the other file.	2020-09-14 16:27:15 -07:00
Alex Vandiver	1c17583ad5	puppet: Restrict postfix incoming addresses to postmaster and zulip. This removes the possibility of local user enumeration via RCPT TO.	2020-09-11 18:49:22 -07:00
Alex Vandiver	482c964dd3	puppet: Logrotate for webhook exceptions.	2020-09-10 17:47:21 -07:00
Alex Vandiver	e38051736d	puppet: Wrap and sort logrotate config.	2020-09-10 17:47:21 -07:00
Anders Kaseorg	75c59a820d	python: Convert subprocess.Popen.communicate to run or check_output. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-03 17:42:35 -07:00
Anders Kaseorg	fbfd4b399d	python: Elide action="store" for argparse arguments. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-03 16:17:14 -07:00
Anders Kaseorg	1f2ac1962f	python: Elide default=None for argparse arguments. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-03 16:17:14 -07:00
Anders Kaseorg	d751e0cece	puppet: Don’t install netcat. It’s been unused since commit `0af22dad18` (#13239). Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-03 10:33:47 -07:00
Anders Kaseorg	ab120a03bc	python: Replace unnecessary intermediate lists with generators. Mostly suggested by the flake8-comprehension plugin. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-02 11:15:41 -07:00
Anders Kaseorg	a5dbab8fb0	python: Remove redundant dest for argparse arguments. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-09-02 11:04:10 -07:00
Anders Kaseorg	dbdf67301b	memcached: Switch from pylibmc to python-binary-memcached. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-08-06 12:51:14 -07:00
Casper Kvan Clausen	ed7a6d5e4d	puppet: Support nginx_listen_port with http_only	2020-08-03 12:58:12 -07:00
Alex Vandiver	cd530d627b	uwsgi: Stop generating IOError and SIGPIPE on client close. Clients that close their socket to nginx suddenly also cause nginx to close its connection to uwsgi. When uwsgi finishes computing the response, it thus tries to write to a closed socket, and generates either IOError or SIGPIPE failures. Since these are caused by the _client_ closing the connection suddenly, they are not actionable by the server. At particularly high volumes, this could represent some sort of server-side failure; however, this is better detected by examining status codes at the loadbalancer. nginx uses the error code 499 for this occurrence: https://httpstatuses.com/499 Stop uwsgi from generating this family of exception entirely, using configuration for uwsgi[1]; it documents these errors as "(annoying)," hinting at their general utility." [1] https://uwsgi-docs.readthedocs.io/en/latest/Options.html#ignore-sigpipe	2020-07-31 10:40:09 -07:00
Alex Vandiver	ceb909dbc5	puppet: Increase backlogged socket count based on uwsgi backlog. Increasing the uwsgi listen backlog is intended to allow it to handle higher connection rates during server restart, when many clients may be trying to connect. The kernel, in turn, needs to have a proportionally increased somaxconn soas to not refuse the connection. Set somaxconn to 2x the uwsgi backlog, but no lower than the default (128).	2020-07-28 21:16:26 -07:00
Alex Vandiver	38d01cd4db	puppet: Generalize install-wal-g to be arbitrary tarballs.	2020-07-24 17:24:57 -07:00
Tim Abbott	5a1243db3c	puppet: Use correct scope for zulip_ops::munin_plugin.	2020-07-15 21:49:45 -07:00
Alex Vandiver	48c3c33d10	puppet: Fully-qualify the munin-plugin name	2020-07-14 17:58:51 -07:00
Alex Vandiver	c68333040b	puppet: Revert PostgreSQL setting of recovery_target_timeline. Prior to PostgreSQL 12, the `recovery_target_timeline` setting is only valid in a `recovery.conf` file, as that file has its own configuration parser. As such, including it in `postgresql.conf` results in an error, and PostgreSQL will fail to start. Remove the setting, reverting `bff3b540b1`. This fixes PostgreSQL 9.5, 9.6, 10, and 11; while the setting is not an error in a PostgreSQL 12 configuration file, it is unnecessary since `latest` is the default.	2020-07-14 16:28:20 -07:00
Alex Vandiver	31d80a77d4	puppet: Update nagios check_postgres_replication_lag to be on DB hosts `7d4a370a57` attempted to move the replication check to on the PostgreSQL hosts. While it updated the _check_ to assume it was running and talking to a local PostgreSQL instance, the configuration and installation for the check were not updated. As such, the check ran on the nagios host for each DB host, and produced no output. Start distributing the check to all apopdb hosts, and configure nagios to use the SSH tunnel to get there.	2020-07-14 16:27:18 -07:00
Alex Vandiver	2174db27db	puppet: Put the dependencies on pg_backup_and_purge itself, and ensure them.	2020-07-14 00:40:25 -07:00
Alex Vandiver	6c27f07c1d	puppet: Move PostgreSQL backups to their own class. wal-g was used in `puppet/zulip` by env-wal-g, but only installed in `puppet/zulip_ops`. Merge all of the dependencies of doing backups using wal-g (wal-g installation, the pg_backup_and_purge job, the nagios plugin that verifies it happens) into a common base class in `puppet/zulip`, since it is generally useful.	2020-07-14 00:40:25 -07:00
Anders Kaseorg	15483c09cb	puppet: Add missing trailing commas. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-07-13 15:36:06 -07:00
Alex Vandiver	3691a94efe	puppet: Configure munin and nagios under apache with puppet. This swaps in the actually-in-use munin configuiration file; otherwise, it is an implementation of the configuration as it exists on the machine.	2020-07-13 13:23:11 -07:00
Alex Vandiver	4e42164b4a	munin: Add plugins to prod hosts.	2020-07-13 13:23:11 -07:00
Alex Vandiver	2a14212b27	munin: Add a helper resource definition for munin plugins.	2020-07-13 12:49:28 -07:00
Alex Vandiver	7c7b5fcd6f	munin: Deal with spaces in the channel names.	2020-07-13 12:49:28 -07:00
Alex Vandiver	eda2c4b8e2	puppet: Split munin-node from munin-server. No plugins are installed inside the /usr/local/munin/lib this creates in munin-node, nor are they symlinked into /etc/munin/plugins, so non-default plugins are added by this.	2020-07-13 12:49:28 -07:00
Alex Vandiver	ddc7bb5a45	munin: Fix the path to check_send_receive_time.	2020-07-13 12:49:28 -07:00
Alex Vandiver	8be544e7eb	munin: Rename monitoring plugin to use zulip name, not humbug.	2020-07-13 12:49:28 -07:00
Alex Vandiver	1b3560af94	nagios: Stop assuming /api is where zulip client is. The api/ directory was removed in f9ba3cb60c; as that commit notes, we use the python-zulip-api module for that, added in `938597c5da`.	2020-07-13 12:49:28 -07:00
Mateusz Mandera	57d3ef42b8	puppet: Don't run thumbor services in production. Fixes #15649. Currently, no production services use thumbor; so, it makes sense to not run them in production systems.	2020-07-10 14:22:17 -07:00
Alex Vandiver	f0f29584aa	puppet: Add an arity count ("at least two") to zulipconf function.	2020-07-10 00:14:09 -07:00
Alex Vandiver	8cff27f67d	puppet: Pull hosts from zulip.conf, not hardcoded list. The one complexity is that hosts_fullstack are treated differently, as they are not currently found in the manual `hosts` list, and as such do not get munin monitoring.	2020-07-10 00:14:09 -07:00
Alex Vandiver	24383a5082	puppet: Rename hosts_domain so hosts_prefix can be grepped for.	2020-07-10 00:14:09 -07:00
Alex Vandiver	a4e7c7a27e	nagios: Remove check_memcached. check_memcached does not support memcached authentication even in its latest release (it’s in a TODO item comment, and that’s it), and was never particularly useful.	2020-07-10 00:12:48 -07:00
Anders Kaseorg	ebf7f4d0f6	zthumbor: Rename thumbor.conf to thumbor_settings.py. So we can apply all our lint checks to it. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-07-06 18:44:58 -07:00
Anders Kaseorg	9900298315	zthumbor: Remove Python 2 residue. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-07-06 18:44:58 -07:00
Alex Vandiver	17002f2a0e	puppet: Allow passing an alternate config path to zulip-puppet-apply. When temporary configuration changes are desired, this lets one set up an alternate `zulip.conf` to apply while leaving the true one in place.	2020-07-06 18:30:16 -07:00
Alex Vandiver	64b44a12f5	puppet: Add an exec rule to reload the whole supervisor config. When supervisor is first installed, it is started automatically, and creates the socket, owned by root. Subsequent reconfiguration in puppet only calls `reread + update`, which is insufficient to apply the `chown = zulip:zulip` line in `supervisord.conf`, leaving the socket owned by `root` and the last part of the installation unable to restart `supervisor` services as the `zulip` user. The `chown` line in `scripts/lib/install` exists to paper over this. Add a separate exec target for changes to `supervisord.conf` itself, which restarts the full service. This leaves the default `restart` action on the service for the lightweight `reread + update` action, which is more common. We use `systemctl` only on redhat-esque builds, because CI runs Ubuntu, but init is not systemd in that context. `systemctl reload` is sufficient to re-apply the socket ownership, but a full `restart` and not `reload` is necessary under `/etc/init.d/supervisor`.	2020-07-01 10:40:54 -07:00
Alex Vandiver	dd91f8edba	puppet: Move supervisor start command into zulip::common. Move this command alongside the rest of the distro-dependent supervisor paths.	2020-07-01 10:40:53 -07:00
Alex Vandiver	a5d63cfedf	wal-g: Update pg_backup_and_purge for wal-g format. wal-g has a slihghtly different format than wal-e in its `backup-list` output; it only contains three columns: - `name` - `last_modified`, - `wal_segment_backup_start` ..rather than wal-e's plethora, most of which were blank: - `name` - `last_modified` - `expanded_size_bytes` - `wal_segment_backup_start` - `wal_segment_offset_backup_start` - `wal_segment_backup_stop` - `wal_segment_offset_backup_stop` Remove one argument from the split.	2020-06-29 17:17:26 -07:00
Alex Vandiver	a21a086f5c	puppet: nagios-plugins-basic is replaced by monitoring-plugins-basic. In Bionic, nagios-plugins-basic is a transitional package which depends on monitoring-plugins-basic. In Focal, it is a virtual package, which means that every time puppet runs, it tries to re-install the nagios-plugins-basic package. Switch all instances to referring to `$zulip::common::nagios_plugins`, and repoint that to monitoring-plugins-basic.	2020-06-29 14:58:01 -07:00
Alex Vandiver	6fdcb4aa17	puppet: Move supervisor conf file path into zulip::common. Move this config file alongside the rest of the distro-dependent paths.	2020-06-29 13:41:05 -07:00
Alex Vandiver	93401448b9	puppet: Explain value of reload && update trick for supervisor. While the stock reload works just fine, it causes too much disruption.	2020-06-29 13:39:09 -07:00
Alex Vandiver	d2de5aced8	puppet: Remove unnecessary supervisor service name variable.	2020-06-29 13:39:09 -07:00
Alex Vandiver	73805f8279	puppet: Stop removing file that contains only comments. In modern PostgreSQL, this file, provided by `postgresql-common`, has no non-comment, non-blank lines. There's hence no reason to remove it.	2020-06-29 13:37:42 -07:00
Alex Vandiver	6e3a424921	puppet: Install the latest postgresql-client on frontend hosts. Frontend hosts in multiple-host configurations (including docker hosts) need a `psql` binary installed. `ca9d27175b` switched to not setting `postgresql.version` in `zulip.conf`, which in turn means that `$zulip::base::postgres_version` is unset. This, in turn, led to the frontend hosts installing `postgresql-client-`, whose trailing dash causes apt to _uninstall_ that package. Unconditionally install `postgresql-client` with no explicit version attached. This is a metapackage which depends on the latest client package, which currently means it will install `postgresql-client-12`. On single-host installs which have configured `postgresql.version` in `zulip.conf` to be a lower version, this will result in `postgresql-client-12` existing alongside another version (e.g. `postgresql-client-10`); `psql` will give the most recent. This is acceptable because the semantic meaning of the postgresql version in `zulip.conf` is about the database engine itself, not the command-line client.	2020-06-29 13:37:16 -07:00
Alex Vandiver	2c36bb19b2	puppet: Pull out `unzip` package which is identical in both cases.	2020-06-29 13:37:16 -07:00
Alex Vandiver	876ee4a8ed	installer: Remove code specific to stretch or xenial. Support for Xenial and Stretch was removed (`5154ddafca`, `0f4b1076ad`, `8944e0ad53`, `79acd5ae40`, `1219a2e854`), but not all codepaths were updated to remove their conditionals on it. Remove all code predicated on Xenial or Stretch. debathena support was migrated to Bionic, since that appears to be the current state of existing debathena servers.	2020-06-24 12:57:38 -07:00
Anders Kaseorg	a9e59b6bd3	memcached: Change the default MEMCACHED_USERNAME to zulip@localhost. This prevents memcached from automatically appending the hostname to the username, which was a source of problems on servers where the hostname was changed. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-19 21:22:30 -07:00
Alex Vandiver	7250d41bf7	puppet: Fix the path to install-wall-g	2020-06-17 15:23:18 -07:00
Alex Vandiver	03bffd3938	upgrade-zulip: Pin the postgres version to the OS default. We would prefer to use the postgres packages from Postgres themselves, if available. However, this requires ensures that, for existing installs, we preserve the same version of postgres as their base distribution installed. Move the version-determination logic from being computed at puppet interpolation time, to being computed at install time and pinned into zulip.conf.	2020-06-16 17:05:46 -07:00
Tim Abbott	26396c5e25	puppet: Fix exceptions with multiple certbot declarations. Since `9e8f1aacb3`, zulip_ops machines might have two Package declarations for `certbot`, which doesn't work in puppet. The fix is, as usual, to use our `zulip::safepackage` wrapper instead.	2020-06-15 18:21:33 -07:00
Alex Vandiver	bff3b540b1	puppet: Postgres replication should always switch to latest timeline. Omission of this setting makes resuming after a primary switchover difficult-to-impossible. It is the default in PostgreSQL 12.	2020-06-15 16:18:07 -07:00
Alex Vandiver	f8fc3a16eb	puppet: Use "primary" / "replica" consistently in comments. The style guide for Zulip is to always use "primary" and "replica" when describing database replication. Adjust a few comments under `puppet/` that do not adhere to this. Unfortunately, some references still remain to the insensitive and inaccurate "master" / "slave" terminology. However, these are only in files which we are attempting to preserve as close to the upstream versions they are derived from (e.g. postgresql.conf, postfix/master.cf).	2020-06-15 16:18:07 -07:00
Alex Vandiver	5f433d6eeb	puppet: Remove vestigial check_postgres.pl. `65774e1c4f` switched from using the bundled check_postgres.pl to using the version from packages; the file itself remained, however. Remove it, and clean up references to it. Fixes #15389.	2020-06-15 16:18:07 -07:00
Alex Vandiver	7d4a370a57	puppet: Move monitoring of pg replication to the pg hosts. Instead of SSH'ing around to them, run directly on the database hosts. This means that the replicas do not know how many bytes behind they are in _receiving_ the wall logs; thus, the monitoring also extends to the primary database, which knows that information for each replica. This also allows for detecting when there are too few active replicas.	2020-06-15 16:18:07 -07:00
Anders Kaseorg	5dc9b55c43	python: Manually convert more percent-formatting to f-strings. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	74c17bf94a	python: Convert more percent formatting to Python 3.6 f-strings. Generated by pyupgrade --py36-plus. Now including %d, %i, %u, and multi-line strings. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	1ed2d9b4a0	logging: Use logging.exception and exc_info for unexpected exceptions. logging.exception() and logging.debug(exc_info=True), etc. automatically include a traceback. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Tim Abbott	80589099d8	puppet: Fix typo in logic for whether to install certbot. Fixes #15372.	2020-06-14 16:04:39 -07:00
rht	89af2f381d	puppet: Link postgres dict symlinks to hunspell files on CentOS. This is a temporary measure until we can find the directory of postgresql dicts on CentOS.	2020-06-13 17:53:38 -07:00
rht	36a5ca5015	puppet: Add cyrus-sasl to memcached_packages on RedHat. This is to mirror the sasl2-bin package on Debian.	2020-06-13 17:49:51 -07:00
rht	e776d2d159	puppet: Abstract out owner:group of memcached-sasldb2.	2020-06-13 17:49:51 -07:00
Anders Kaseorg	91a86c24f5	python: Replace None defaults with empty collections where appropriate. Use read-only types (List ↦ Sequence, Dict ↦ Mapping, Set ↦ AbstractSet) to guard against accidental mutation of the default value. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-13 15:31:27 -07:00
Alex Vandiver	97b9308781	puppet: Merge multiple postgres roles in `zulip_ops`. All differences between the primary and replica roles having been merged, fold the `postgres_common`, `postgres_master`, and `postgres_slave` roles into just `postgres_appdb`.	2020-06-12 14:57:46 -07:00
Alex Vandiver	55bd31721d	puppet: Remove custom `vm.dirty_ratio` and `vm.dirty_background_ratio`. These values differed between the primary and secondary database hosts, for unclear reasons. The differences date back to their introduction in `387f63deaa`. As the comment in the replica confguration notes, settings of `vm.dirty_ratio = 10` and `vm.dirty_background_ratio = 5` matched the kernel defaults for "newer" kernels; however, kernel 2.6.30 bumped those to 20 and 10, respectively[1], as a fix for underlying logic now being more correct. Remove these overrides; they should at very least be consistent across roles, and the previous values look to be an attempt to tune for a very much older version of the Linux kernel, which was using an different, buggier, algorithm under the hood. [1] `1b5e62b42b`	2020-06-12 14:57:46 -07:00
Alex Vandiver	f39816e768	puppet: Stop distributing recovery.conf file. This file controls streaming replication, and recovery using wal-g on the secondary. The `primary_conninfo` data needs to change on short notice when database failover happens, in a way that is not suitable for being controlled by puppet. PostgreSQL 12, in fact, removes the use of the `recovery.conf` file[1]; the `primary_conninfo` and `restore_command` information goes into the main `postgresql.conf` file, and the standby status is controlled by the presence of absence of an empty `standby.signal` file. Remove the puppet control of the `recovery.conf` file. [1] https://pgstef.github.io/2018/11/26/postgresql12_preview_recovery_conf_disappears.html	2020-06-12 14:57:46 -07:00
Alex Vandiver	316498a169	puppet: Remove unnecessary nagios authentication setup. Since the nagios authentication is stored _in the database_, it is unnecessary to run if the database is simply a replica of the production database. The only case in which this statement would have an effect is if the postgres node contains a _different_ (or empty) database, which `setup_disks` now effectively prevents. Remove the unnecessary step.	2020-06-11 21:01:49 -07:00
Alex Vandiver	0774f54c1b	puppet: Move to `setup_disks` to postgres_common. The tooling should now be run no matter if the node is a primary or replica.	2020-06-11 21:01:49 -07:00
Alex Vandiver	6f6a0e890a	puppet: Run setup_disks based on symlink; remove mdadm dependency. `481613a344` updated the `setup_disks` script to no longer reference `mdadm`, since we no longer set up RAID on servers. Update the puppet that would call it to remove the `mdadm` dependency, and run only if the state is not what it produces -- namely, a symlink for `/var/lib/postgresql`, which must point to an existent `/srv/postgresql` directory.	2020-06-11 21:01:49 -07:00

1 2 3 4 5 ...

1200 Commits