Commit Graph

39 Commits

Author SHA1 Message Date
Tim Abbott 3ed7d658f8 Migrate check_send_receive_time to zulip/. 2016-04-05 13:27:04 -07:00
Tim Abbott ca45ec3f3f Migrate check_email_deliverer plugins to zulip/. 2016-04-05 13:27:04 -07:00
Tim Abbott 4e10424512 Migrate check_worker_memory to zulip/. 2016-04-05 13:27:04 -07:00
Tim Abbott 59b46278be Move check_queue_worker_errors into subdirectory.
This fixes an issue where this worker wasn't even being installed
properly in a way that sets us up for doing further reorganization of
the Zulip Nagios plugins.
2016-04-05 13:27:04 -07:00
Tim Abbott 2adf6d822f puppet: Fix process_queue command lines to use the new argument style.
cd2348e9ae broke installing Zulip in
production since it didn't correctly update the puppet configuration
to call the process_queue script using the new argument format.

This commit isn't ideal in that I'd prefer to not require updating
puppet in sync with the actual running code, but we don't have a great
mechanism for doing that.

Fixes #586.
2016-03-27 23:17:16 -07:00
Zev Benjamin 965f923ac3 Remove postgres2 configuration 2016-03-23 20:41:42 -07:00
Zev Benjamin ae2560a027 Add postgres3 configuration 2016-03-23 20:41:25 -07:00
David Roe e3f38acbce Enterprise => Voyager.
(imported from commit 41b9a67301aeaf5fd40bbbb8f34a326ca98431fd)
2015-08-21 10:33:35 -07:00
Zev Benjamin 23c108a05c nagios: Check HTTPS instead of HTTP
(imported from commit ba0bb76d9bea6661e5396308eb431ff95ef51771)
2014-06-05 17:30:15 -07:00
Luke Faraone b383884019 Change expected autossh processes to 10
(imported from commit 41b06ce3f7cded7a29101a6de2d471bdffab5bcc)
2014-05-15 10:49:54 -07:00
Zev Benjamin 286bd3005d nagios: Disable idle transaction checks
We apparently still have some process that occationally sits idle in a
transaction for a while, which makes this alert super noisy.

(imported from commit 074b04ad746bac0da1b8714763538d1ce22da64e)
2014-03-17 14:17:43 -04:00
Zev Benjamin f7b64827e4 nagios: Don't check txn_time on trac
Doing so requires superuser privileges because check_postgres.pl only connects
to one database for that action.  We could theoretically work around this, but I
don't think it's worthwhile for non-production DBs.

(imported from commit 3ab06e4dd6f844c81128b81709cdc3cdfbe37c47)
2014-03-14 20:48:46 -04:00
Zev Benjamin d445386adc nagios: Add Postgres check for disabled triggers
(imported from commit 08ff85aecfc44c9226e637383464fae4d2b8997a)
2014-03-14 20:48:44 -04:00
Zev Benjamin 1653541e83 nagios: Re-enable Postgres transaction time checks
We believe these will generally no longer be disruptive now that we have
autocommit enabled.

(imported from commit c8c1301e0d4b188d6708173cd8c8b16279e3d910)
2014-03-14 20:48:44 -04:00
Tim Abbott 12309c61b6 Remove Nagios monitoring for the old email mirror.
(imported from commit fc4d95b12d5ee29438a2d3e7d8d694e8aa21f202)
2014-03-12 21:15:19 -04:00
Jessica McKellar e7ef654b45 [puppet]: Adjust zmirror Nagios checks to be more tolerant of a bad network.
We get a lot of alerts and sometimes pages due to network blips.

(imported from commit 4766585e71533b8551d49fa61bc4653114a65457)
2014-03-11 13:06:16 -04:00
Zev Benjamin 32d66d6f73 [manual] Monitor the new redis servers with nagios and munin
We have to start the tunnels up manually and add them to the wiki

(imported from commit aa5f80630a651c3fb33bba321e9d4444b5c498a2)
2014-02-10 13:23:28 -05:00
Tim Abbott 5108253e97 nagios: Make Zephyr mirroring alerts not pageable.
(imported from commit ab98af762b1edf93703fc865496aedc59ce7bd2d)
2014-01-24 13:53:48 -05:00
Zev Benjamin 759d33fad1 puppet: Check all disks via nagios, not just /
(imported from commit 0bc9fc150e791ce3ccec99688f3593a8678a87c9)
2014-01-23 13:37:27 -05:00
Zev Benjamin 49f2657c8d nagios: Add check_postgres checks for the trac and wiki databases
We don't do the sequence check because that requires read access to the database
itself, which the zulip user doesn't have.

(imported from commit fba7604826353b2974e9757f01dcb426297993b3)
2014-01-22 12:07:56 -05:00
Zev Benjamin 3840cf760f nagios: Move a few services from hostgroup postgres -> hostgroup postgres_appdb
(imported from commit 54a738f19f176d36526d40968c379f6357d56e6b)
2014-01-22 12:07:56 -05:00
Zev Benjamin 1ae040c7fb nagios: Specify the db and user for check_postgres via arguments
(imported from commit c3b1a7fe7c63094ed8956ed1bdf4861d747637bd)
2014-01-22 12:07:56 -05:00
Zev Benjamin a974301b8b nagios: Add trac to the postgres_other hostgroup
(imported from commit 7e531b982b8f8961f2201cdc8b88d90d5d238907)
2014-01-22 12:07:56 -05:00
Zev Benjamin 41e274a8e4 nagios: Split postgres hostgroup into more fine-grained groups
(imported from commit ab5fcc0893fb8635defecdf3045a3ffdd5e26f14)
2014-01-22 12:07:56 -05:00
Zev Benjamin c045644097 puppet: Run check_ntp_time against an NTP pool instead of time.mit.edu
MIT implemented NTP rate-limiting to defend against on-going reflection attacks,
which was causing our nagios checks to fail intermittently.  When the attacks
die down or when external sites fix their NTP configurations, checking against
time.mit.edu will stop failing.  However, there also isn't much of a reason to
stick with checking against a single server.

(imported from commit 2c2a1a04646b880b010cbb4b6d94016b1eccd1a0)
2014-01-06 17:30:09 -05:00
Tim Abbott bdcc2e5c52 nagios: Set max_check_attempts to 3 for batched queue processors.
(imported from commit ec0ac86726cd6ff3d0fdfcfcb161d3329fca02ac)
2013-12-19 17:31:41 -05:00
Kevin Mehall f929e51776 puppet: Make Camo Nagios check waste less bandwidth
Use http://www.google.com/favicon.ico instead of a 1.7MB animated gif from
imgur.

(imported from commit 94993af35bf87b0f22e6e743a9ba1cc1c5c9a78f)
2013-12-13 17:27:01 -05:00
Tim Abbott 606d8a4f9b Add Nagios check for queue worker memory usage.
This is detect future memory leaks.

(imported from commit 75fd4c2ad41ea71e87a53fb33e2106c5773909d5)
2013-12-04 10:27:44 -05:00
Zev Benjamin 7af4b92b98 puppet: Rename app to prod0 in nagios
(imported from commit c2d1c2c06276a816ef33e057d3f859c755490cb3)
2013-11-25 11:43:16 -05:00
Zev Benjamin 9f2af6fd0d puppet: Fix postgres_primary alias
(imported from commit 1cd199224e45700fac03e68c99f9d4f7d9212b45)
2013-11-25 11:43:16 -05:00
Zev Benjamin 847d4dfbca puppet: Specify hosts for the postgres autovac_freeze check via a hostgroup
(imported from commit d0afc1b78015740fa9638563a5672d3400dd5002)
2013-11-23 12:08:49 -05:00
Zev Benjamin 139518ccbe puppet: Remove postgres0 from nagios and munin configs
(imported from commit 6a4eb208b2a344d65d684cf904ba882a5400056d)
2013-11-23 12:06:27 -05:00
Zev Benjamin bf8fb3c0df puppet: Add postgres2 to nagios monitoring
(imported from commit 799b1304eebe49cf6d8153fb2bfd0b11a3bcab00)
2013-11-23 08:10:44 -05:00
Zev Benjamin 658972dda3 [manual] puppet: Add postgres2 to munin monitoring
You must run
autossh -2 -fN -M 20018 -L 5009:localhost:4949 nagios@postgres2.zulip.net
as nagios on nagios.zulip.net after deploying this commit.

(imported from commit bd8a61f99555ccf0a0010d79dbd89017aaafbb8f)
2013-11-23 08:10:44 -05:00
Tim Abbott b50db26a18 puppet: Add monitoring for camo.
(imported from commit b3cf29b02de285cf860fc173183cb6f4f3a17c74)
2013-11-19 15:25:14 -05:00
acrefoot 0175440afc [manual] fixup nagios postmaster configuration
(imported from commit e3c00b31bbb0ced38e62d31ae80b58e8c6374c7f)
2013-11-13 17:37:54 -05:00
acrefoot 6d38285a2e fixup supervisor oops related to postmaster config
(imported from commit 8b5c39f0d13abb5e1def9f88a2ab82cfa67b42f6)
2013-11-13 17:15:55 -05:00
acrefoot eab6a1d190 [manual] add nagios checks for email_deliver
manual step: puppet apply, make sure that these nagios checks are working properly

(imported from commit abc75b8a5b153510243c14035b820fbc864b7776)
2013-11-13 16:41:36 -05:00
Tim Abbott b5979a3fed [manual]: Rename zulip-internal puppet module to zulip_internal.
(imported from commit 64ac7ec0f3495b1fe7810da3d4d41263c52b9b3b)
2013-11-05 17:06:32 -05:00