Commit Graph

36 Commits

Author SHA1 Message Date
Tim Abbott 6a5cb0e48c puppet: Make problems with Zephyr mirroring pageable.
Generally this indicates sending messages is completely broken.
2017-10-12 00:16:32 -07:00
Tim Abbott e6e7bcf6e1 nagios: Move camo_check_url into configuration. 2017-10-05 21:09:24 -07:00
Tim Abbott 96c3014da0 nagios: Automate configuration of outgoing email with msmtp.
Now we no longer need to check in a bunch of hostnames in order to
configure Nagios.
2017-10-05 20:29:47 -07:00
Tim Abbott 162eaf8917 nagios: Modify check for swap to allow no swap.
If a machine is configured with no swap intentationally, that
shouldn't be a Nagios problem.  This alert is intended to flag
machines which are swapping.
2017-10-05 20:07:44 -07:00
Tim Abbott 886a8853ac nagios: Move server-specific config into hostgroups.
These new hostgroups exist so we can eliminate explicit references to
individual hosts in services.cfg.
2017-10-05 20:06:48 -07:00
Tim Abbott 5193936bc3 nagios: Add Memcached and Redis monitoring.
These are standard Nagios plugins that might be sometimes helpful.
2017-10-05 20:06:16 -07:00
Tim Abbott f7d554d533 nagios: Rename zmirror2 to zmirrorp in configuration.
The "p" stands for "personals", aka zephyr private messages, which is
what this host manages.
2017-10-05 20:06:08 -07:00
Tim Abbott 062d280914 puppet: Clean up unnecessary pagerduty_nagios.cfg. 2017-10-05 19:23:33 -07:00
Tim Abbott 6017d3dec5 puppet: Move contacts.cfg to be a template. 2017-10-05 19:23:33 -07:00
Tim Abbott 09aec3e467 puppet: Move hosts.cfg to be managed by a template. 2017-10-05 19:23:33 -07:00
Tim Abbott 5a80c029a2 nagios: Update path to sync_public_streams to match new config. 2017-10-05 13:34:27 -07:00
Reid Barton ccb4c5c26f bots: Move zephyr-related files to api/integrations/zephyr/. 2017-05-26 15:07:02 -07:00
Elliott Jin 0ec9e54954 bots: Add queue and QueueProcessingWorker for embedded bots. 2017-05-25 15:00:51 -07:00
vaibhav 8881b5eb9f Outgoing Webhook System: Check for @-mentioned outgoing webhook bots.
Also puts them into a processing queue, though the queue processor
does nothing.

Rewritten by tabbott to avoid unnecessary database queries in
do_send_messages.
2017-05-02 09:22:04 -07:00
K.Kanakhin 6a801db1c2 missed-emails-sending: Move email sending to separate queue worker.
- Add new 'missedmessage_email_senders' queue for sending missed messages emails.
- Add the new worker to process 'missedmessage_email_senders' queue.
- Split aggregation missed messages and sending missed messages email
  to separate queue workers.
- Adapt tests for sending missed emails to the new logic.

Fixes #2607
2017-03-07 20:08:40 -08:00
Tim Abbott fa8045a484 puppet: Add websockets Nagios test to configuration.
Since browser clients send messages via websockets and not the API,
this is an important element in making sure mission-critical Zulip
functionality is working.
2017-02-08 11:13:19 -08:00
JefftheBest1 9de75f5167 Fixed typos with separate 2017-01-12 04:52:05 -08:00
Tim Abbott 3e32102016 nagios: Fix various critical issues not tagged as pageable. 2017-01-06 21:49:20 -08:00
Tim Abbott 93c2c19775 nagios: Increase process count limits. 2017-01-06 21:49:15 -08:00
Tim Abbott 9ab8e7ba34 nagios: Disable swap checks for servers with no swap. 2017-01-06 21:39:07 -08:00
Tim Abbott 3e01ed1f73 nagios: Increase NTP max_check_attempts.
NTP often suffers from brief interruptions of service that lead to
spurious Nagios alerts; it makes sense to suppress these.
2017-01-06 21:32:43 -08:00
Tim Abbott 65774e1c4f zulip_ops: use check_postgres package from apt. 2017-01-06 21:18:55 -08:00
Tim Abbott 4fbe201187 puppet: Automate autossh process monitoring maintenance.
Previously, the Zulip Nagios configuration effectively hardcoded the
count for how many system should have autossh connections.
2016-10-26 00:49:03 -07:00
Tim Abbott 0a5a2c4eda nagios: Automate authorized users list maintenance. 2016-10-26 00:37:29 -07:00
Tim Abbott d490e83645 puppet: Upgrade nagios cgi.cfg with modern defaults. 2016-10-26 00:31:41 -07:00
Tim Abbott 1159ad4857 puppet: Upgrade nagios.cfg with modern defaults. 2016-10-26 00:31:41 -07:00
Tim Abbott 73178e5e5a puppet: Run check_send_receive_time via a cron job.
This allows the actual nagios work involved with
check_send_receive_time nagios checks to be done by an unprivileged
"nagios" user rather than the "zulip" user.
2016-10-26 00:26:52 -07:00
Tim Abbott 96cf330649 puppet: ssh as the nagios user instead of zulip user.
This is a follow-up to 4f58fef54b,
touching services.cfg instead of commands.cfg.
2016-10-26 00:23:47 -07:00
Tim Abbott c3727c9886 nagios: Remove old zulip.com trac/git/replica servers.
These are unlikely to be relevant to anyone.
2016-10-26 00:21:53 -07:00
Tim Abbott 383f39b543 nagios: Enable allow_empty_hostgroup_assignment.
This fixes the configuration being broken when we remove some of the
old zulip.com hosts that are unlikely to be of interest to anyone.
2016-10-26 00:19:21 -07:00
Tim Abbott 4f58fef54b zulip_ops: Use nagios user for all Nagios checks.
There's no reason these Nagios checks needs to run as the
semi-priviliged Zulip user.
2016-10-26 00:17:26 -07:00
Tim Abbott 32d244dbe5 puppet: Add Nagios checks for other consumers. 2016-10-26 00:11:08 -07:00
Tim Abbott 080dd8c987 nagios: Ignore kthreads in check_procs tests.
Modern Linux can have a lot of kernel threads not doing anything.
Since this isn't interesting from a monitoring perpsective, we ignore
these.
2016-10-26 00:10:40 -07:00
Tim Abbott f9ad75f58e puppet: Remove configuration for old zulip.com bots host.
This configuration didn't do anything anyway and just clutters the
repo.
2016-10-26 00:01:29 -07:00
Tim Abbott 2227e77cce puppet: Remove Dropbox usernames from Nagios config. 2016-10-25 23:55:42 -07:00
Tim Abbott 36e336edc3 puppet: Rename zulip_internal to zulip_ops.
The old "zulip_internal" name was from back when Zulip, Inc. had two
distributions of Zulip, the enterprise distribution in puppet/zulip/
and the "internal" SAAS distribution in puppet/zulip_internal.  I
think the name is a bit confusing in the new fully open-source Zulip
work, so we're replacing it with "zulip_ops".  I don't think the new
name is perfect, but it's better.

In the following commits, we'll delete a bunch of pieces of Zulip,
Inc.'s infrastructure that don't exist anymore and thus are no longer
useful (e.g. the old Trac configuration), with the goal of cleaning
the repository of as much unnecessary content as possible.
2016-10-16 19:23:27 -07:00