Leo Franchi
113180b7b7
nagios: Don't page about load/disk/ levels on non-critical servers.
...
Add a pageable_servers and not_pageable_servers hostgroup, and only page for
app/postgres/zmirror.
(imported from commit 15c286324e942bd38e2a600a3b9091044f117e28)
2013-06-05 10:20:56 -04:00
Leo Franchi
25b915fa6a
Enable rabbitmq consumser checks on app
...
(imported from commit e3df8bc849dc0e1ae2e7782c0c9be5c08d4818c2)
2013-05-20 23:29:54 -04:00
Leo Franchi
3d4e239247
Check rabbitmq consumers for all important queues
...
(imported from commit 1279d33e3e1c36ee8da01859875d24b54e14e2e6)
2013-05-17 01:02:35 -04:00
Leo Franchi
52f6c720d9
Add new stats server to logging
...
(imported from commit b3647ab039c902d09a92082c3e98b5b066e6a5c8)
2013-04-29 16:44:41 -04:00
Leo Franchi
b3a3054f64
Slightly raise thresholds for load on nagios
...
(imported from commit 2dbc06c8ba204c10f6d6b590bc4858e07692540b)
2013-04-22 10:22:35 -04:00
Leo Franchi
350cf79ba0
Add a nagios check for a notify_tornado consumer
...
(imported from commit 050536bb4ac7384d5b98d5cf6cb7430b2b00dbd5)
2013-04-17 09:24:28 -04:00
Tim Abbott
5b1b2257bd
nagios: Commit Luke's testing contact.
...
(imported from commit d88951f42ad7753777b8e0ab2d47b9bb61ff3f76)
2013-04-16 12:02:42 -04:00
Tim Abbott
bb3b63206a
nagios: Comment out the postgres time checks (they're too noisy).
...
(imported from commit c9569cdbd2909ea7fb8c8c14a681201ee033c62b)
2013-04-16 12:02:42 -04:00
Tim Abbott
b73ac39a25
nagios: Run check_send_receive_time check against both staging and prod.
...
(imported from commit 749c5f04fba4832debe8a4e702914fa47d1fbeaa)
2013-04-16 12:02:42 -04:00
Tim Abbott
73886a95fd
nagios: Update app.humbughq.com to use its primary hostname.
...
(imported from commit 39d291e06b0fa223ae4bb76022b26464b969a505)
2013-04-16 12:02:42 -04:00
Jessica McKellar
fe7fedd252
nagios: add check for send_invite_emails process.
...
(imported from commit b30e55241249a02ee61fac2d3f7abecc4d8318bd)
2013-04-10 16:58:17 -04:00
Luke Faraone
d89f5670bb
Add nagios check to verify mailchimp is running on staging/app.
...
(imported from commit 2aa79cc6252aadaa0a212b5c60eff9c5c55b7781)
2013-04-05 14:44:18 -07:00
Leo Franchi
2a334a6328
Tighten rabbitmq thresholds and page_admins
...
(imported from commit 373014bf75346286b55b0ea7d370b21de49ffa33)
2013-03-22 15:55:49 -04:00
Tim Abbott
72d7adce93
nagios: Lower default check intervals and default counts.
...
The defaults are quite large for a small site like ours where on
server down means an outage (e.g. only check every 5 minutes and then
require 4 failures before we alert the admins).
(imported from commit 3b2f436bbb716262f4ee939434749be535ffd6d3)
2013-02-20 16:47:55 -05:00
Tim Abbott
f547bdce9e
nagios: Add swap check.
...
(imported from commit 37ffdb8dfc117e728acc6c3fe4bae671c66ce4c9)
2013-02-20 11:10:45 -05:00
Tim Abbott
be834815aa
nagios: Rename paging_admins to page_admins.
...
I think the name is a little clearer.
(imported from commit cd707b76339cb85365f007701c6313aa6d65b4a3)
2013-02-19 15:40:18 -05:00
Tim Abbott
02ff5bc38d
Nagios: Change new services to paging mode.
...
(imported from commit 4406485179224287f4b7dfbaaa8ed4f97e6debbc)
2013-02-19 15:40:18 -05:00
Leo Franchi
9bb699f917
Add a nagios plugin for checking rabbitmq queue sizes
...
(imported from commit 32bd03bcfe4c4a4221ace17f83adb175f591c8ea)
2013-02-19 15:22:55 -05:00
Tim Abbott
63827c2301
Make the Nagios integration configurable, available, and documented.
...
(imported from commit 1208fc08ed366a892763c3b29b9aeafa90b29981)
2013-02-14 17:50:00 -05:00
Leo Franchi
0a0c4bb9a0
[manual] Use rabbitmq for asynchronous presence updating
...
Note: When deploying, restarting the process-user-activity-commandline script is needed
(imported from commit 63ee795c9c7a7db4a40170cff5636dc1dd0b46a8)
2013-02-11 18:05:57 -05:00
Zev Benjamin
da95bb2988
puppet: Move all puppetized config files to the humbug module and reference them with puppet URLs
...
(imported from commit f0f325bbad381b87c12c6f7888f4dd5d6989f09f)
2013-02-08 16:06:34 -05:00