Commit Graph

21 Commits

Author SHA1 Message Date
Leo Franchi 113180b7b7 nagios: Don't page about load/disk/ levels on non-critical servers.
Add a pageable_servers and not_pageable_servers hostgroup, and only page for
app/postgres/zmirror.

(imported from commit 15c286324e942bd38e2a600a3b9091044f117e28)
2013-06-05 10:20:56 -04:00
Leo Franchi 25b915fa6a Enable rabbitmq consumser checks on app
(imported from commit e3df8bc849dc0e1ae2e7782c0c9be5c08d4818c2)
2013-05-20 23:29:54 -04:00
Leo Franchi 3d4e239247 Check rabbitmq consumers for all important queues
(imported from commit 1279d33e3e1c36ee8da01859875d24b54e14e2e6)
2013-05-17 01:02:35 -04:00
Leo Franchi 52f6c720d9 Add new stats server to logging
(imported from commit b3647ab039c902d09a92082c3e98b5b066e6a5c8)
2013-04-29 16:44:41 -04:00
Leo Franchi b3a3054f64 Slightly raise thresholds for load on nagios
(imported from commit 2dbc06c8ba204c10f6d6b590bc4858e07692540b)
2013-04-22 10:22:35 -04:00
Leo Franchi 350cf79ba0 Add a nagios check for a notify_tornado consumer
(imported from commit 050536bb4ac7384d5b98d5cf6cb7430b2b00dbd5)
2013-04-17 09:24:28 -04:00
Tim Abbott 5b1b2257bd nagios: Commit Luke's testing contact.
(imported from commit d88951f42ad7753777b8e0ab2d47b9bb61ff3f76)
2013-04-16 12:02:42 -04:00
Tim Abbott bb3b63206a nagios: Comment out the postgres time checks (they're too noisy).
(imported from commit c9569cdbd2909ea7fb8c8c14a681201ee033c62b)
2013-04-16 12:02:42 -04:00
Tim Abbott b73ac39a25 nagios: Run check_send_receive_time check against both staging and prod.
(imported from commit 749c5f04fba4832debe8a4e702914fa47d1fbeaa)
2013-04-16 12:02:42 -04:00
Tim Abbott 73886a95fd nagios: Update app.humbughq.com to use its primary hostname.
(imported from commit 39d291e06b0fa223ae4bb76022b26464b969a505)
2013-04-16 12:02:42 -04:00
Jessica McKellar fe7fedd252 nagios: add check for send_invite_emails process.
(imported from commit b30e55241249a02ee61fac2d3f7abecc4d8318bd)
2013-04-10 16:58:17 -04:00
Luke Faraone d89f5670bb Add nagios check to verify mailchimp is running on staging/app.
(imported from commit 2aa79cc6252aadaa0a212b5c60eff9c5c55b7781)
2013-04-05 14:44:18 -07:00
Leo Franchi 2a334a6328 Tighten rabbitmq thresholds and page_admins
(imported from commit 373014bf75346286b55b0ea7d370b21de49ffa33)
2013-03-22 15:55:49 -04:00
Tim Abbott 72d7adce93 nagios: Lower default check intervals and default counts.
The defaults are quite large for a small site like ours where on
server down means an outage (e.g. only check every 5 minutes and then
require 4 failures before we alert the admins).

(imported from commit 3b2f436bbb716262f4ee939434749be535ffd6d3)
2013-02-20 16:47:55 -05:00
Tim Abbott f547bdce9e nagios: Add swap check.
(imported from commit 37ffdb8dfc117e728acc6c3fe4bae671c66ce4c9)
2013-02-20 11:10:45 -05:00
Tim Abbott be834815aa nagios: Rename paging_admins to page_admins.
I think the name is a little clearer.

(imported from commit cd707b76339cb85365f007701c6313aa6d65b4a3)
2013-02-19 15:40:18 -05:00
Tim Abbott 02ff5bc38d Nagios: Change new services to paging mode.
(imported from commit 4406485179224287f4b7dfbaaa8ed4f97e6debbc)
2013-02-19 15:40:18 -05:00
Leo Franchi 9bb699f917 Add a nagios plugin for checking rabbitmq queue sizes
(imported from commit 32bd03bcfe4c4a4221ace17f83adb175f591c8ea)
2013-02-19 15:22:55 -05:00
Tim Abbott 63827c2301 Make the Nagios integration configurable, available, and documented.
(imported from commit 1208fc08ed366a892763c3b29b9aeafa90b29981)
2013-02-14 17:50:00 -05:00
Leo Franchi 0a0c4bb9a0 [manual] Use rabbitmq for asynchronous presence updating
Note: When deploying, restarting the process-user-activity-commandline script is needed

(imported from commit 63ee795c9c7a7db4a40170cff5636dc1dd0b46a8)
2013-02-11 18:05:57 -05:00
Zev Benjamin da95bb2988 puppet: Move all puppetized config files to the humbug module and reference them with puppet URLs
(imported from commit f0f325bbad381b87c12c6f7888f4dd5d6989f09f)
2013-02-08 16:06:34 -05:00