Commit Graph

491 Commits

Author SHA1 Message Date
Tim Abbott f2f97dd335 [puppet] Increase maximum file descriptor count for zulip in limits.conf.
We ran into the 1024 file descriptor limit today for tornado.  The
limits.conf file descriptor limit isn't used for supervisor started as
we currently do it, but it's a good idea to have this in place so that
if we move supervisor to run as the zulip user, we don't experience a
nasty surprise.

(imported from commit 2eb4805b5c129bc2684d151b77f295c2eaa9fc3e)
2013-10-25 11:48:35 -04:00
Tim Abbott 836f313e69 [puppet] Increase maximum file descriptor count for supervisord.
We ran into the 1024 file descriptor limit today, causing some issues
for users.

(imported from commit 295e6e5cce46024a3668fb8cebc8669568844ab3)
2013-10-25 11:48:35 -04:00
Zev Benjamin f4323a6c51 [manual] Update configure-rabbitmq script
After this commit is deployed, run the following:
  # rabbitmqctl delete_user humbug

(imported from commit 38b72897b2f2e800cfdf23296a89c940d938fbb1)
2013-10-24 16:54:55 -04:00
Zev Benjamin 55caff871b configure-rabbitmq: Fetch the rabbitmq password from the Django settings
(imported from commit b4c3f1a7ce99f40e4cb72094b2bc646ce50bc743)
2013-10-24 16:54:15 -04:00
Leo Franchi 5144ab4c85 Add python-apns-client to puppet and note about pem
(imported from commit 761c92f95d419fc3cb580e5df07ce174a56d59e0)
2013-10-24 14:54:31 -04:00
Leo Franchi 7c8ea27fe9 Add a cron.d entry for the apns token checker
(imported from commit 429957d7b1ad1617c27b3bcbecd083b2223a65e5)
2013-10-24 14:54:31 -04:00
Tim Abbott 996cb40c27 check_personal_zephyr_mirrors: Fix too-strict check for being current.
(imported from commit 3ce3c53ed6c52cabd09cd89c6543e74bf2180e33)
2013-10-24 11:15:47 -04:00
Zev Benjamin c97278ee8f [manual] puppet: Make RabbitMQ and epmd only listen on localhost
To apply this change, we must not only do a puppet apply, but also
restart rabbitmq and epmd.  Rabbitmq is easy to restart, but epmd is
a little more annoying.  epmd is run as a side effect of starting up
rabbitmq-server, but is not stopped when rabbitmq-server is stopped.
Therefore, the correct procedure is to stop rabbitmq-server, kill
epmd (by running `epmd -kill`), and then start rabbitmq-server again.

(imported from commit a651e5363a8b9a04b713c31baef379c566d5dbfc)
2013-10-24 11:12:59 -04:00
Zev Benjamin ff38b5fd95 Run 5 message sender queue processors
(imported from commit 176cfc07832fde42741202930dd5a88bca6685b8)
2013-10-22 18:45:11 -04:00
Zev Benjamin d527389b1f puppet: Add supervisor config for message sender worker
(imported from commit b8092cfd7e0c87240f19bf98a23d258260133614)
2013-10-22 18:45:11 -04:00
Zev Benjamin b348174f00 puppet: Add nginx configuration for websockets support
(imported from commit 9c64a5835d25701c7e1bacca64b3900b68cb138c)
2013-10-22 18:45:11 -04:00
Zev Benjamin 5979af3a45 [manual] Add asynchronous message sender via sockjs-tornado
New dependency: sockjs-tornado

One known limitation is that we don't clean up sessions for
non-websockets transports.  This is a bug in Tornado so I'm going to
look at upgrading us to the latest version:
https://github.com/mrjoes/sockjs-tornado/issues/47

(imported from commit 31cdb7596dd5ee094ab006c31757db17dca8899b)
2013-10-22 18:45:11 -04:00
Tim Abbott d7a887afe9 nagios: Use latest names for check intervals.
(imported from commit 8fe40c7e45db3c4334716381b0de82364ed670b8)
2013-10-22 13:34:14 -04:00
Steve Howell e0fa6e427b Tone down postgres lock warnings
(imported from commit 11cad022a15b3269294c59b9acd2a8fab2a52a5f)
2013-10-18 17:05:45 -04:00
Steve Howell 4a61ea38b2 Adjust warn/error parameters for postgres locks (more aggressive).
Setting values to 15 (warn) and 100 (error).

(imported from commit efda53fe04c7739a368aec449de7d43f181e61fb)
2013-10-17 13:21:54 -04:00
Zev Benjamin 890bfe1e95 puppet: Up the Postgres autovacuum_freeze_max_age and related parameters
These parameters collectively determine when very old transaction ids
are replaced with the special FrozenXID transaction id, which is
older than any other transaction id.  Old transaction ids must be
periodically replaced like this to prevent transaction id wraparound.

The only disadvantage of increasing autovacuum_freeze_max_age is
increased disk usage, but a value of 2 billion should only require
500MB, which is pretty trivial.

Since changing autovacuum_freeze_max_age requires a restart, we will
still have to issue a manual VACUUM, with vacuum_freeze_min_age and
vacuum_freeze_table_age temporarily set to lower values.

(imported from commit 22f9ecdfc5b6a07918771d541192aa6d6369878a)
2013-10-16 21:44:17 -04:00
Leo Franchi 15695ebfa3 Add zulip-events-slowqueries to our zulip-workers group in supervisor so it is restarted
(imported from commit bcb4037c3323f0be6e671365b6ef7eec47030915)
2013-10-16 15:33:35 -04:00
Zev Benjamin 82cacf97fe check_pg_replication_lag: Specify the DB username
The psql command was previously failing because it was trying to use
the non-existant DB user "zulip".

(imported from commit 68ad5c979ce33cc54e9d55f1822cba9fac648944)
2013-10-15 16:46:37 -04:00
Zev Benjamin 08e6b26d5b check_pg_replication_lag: Better error reporting when executing remote commands
(imported from commit 2430a9271aace68046d7162dcbdea5001d88d616)
2013-10-15 16:46:36 -04:00
acrefoot 48026384cf [manual] Add python-mandrill dependency
make sure to run `apt-get update` and then `tools/zulip-puppet-apply`
on staging and prod

(imported from commit 8d1e6295c7c395333a05cd664aa28cab6496bdaa)
2013-10-10 19:31:22 -04:00
Luke Faraone 5c0bcffc97 Depend on nodejs only on local_dev for now.
We need to make sure it's totally drop-in functional on the app servers
before I'd feel comfortable modifying their config.

(imported from commit e42d6e37732d65f827982aabaff9399ec1bda0f2)
2013-10-10 14:14:14 -04:00
Luke Faraone 1d9391e867 Initial local server configuration.
(imported from commit ac9b9896b74b78c6ca03af7f411d0788ae402cff)
2013-10-10 14:14:14 -04:00
Luke Faraone dda89debb2 Move base inclusion to child puppet classes.
(imported from commit 30a55c0ccf835aa6bfc708208914ee1f20352d75)
2013-10-10 14:14:14 -04:00
Tim Abbott 50a4b194af check_user_zephyr_mirror_liveness: Include older zephyr mirror API version.
(imported from commit e96f25e35709ee3ee85cc0c4522f98148cb31926)
2013-10-08 17:21:58 -04:00
Tim Abbott 2b5d036e34 Use events API in our Nagios monitoring scripts
(imported from commit 9b370e420095f17fbb7e9d1e466d51dd2e145de1)
2013-10-08 17:21:57 -04:00
Tim Abbott 0f2fa7e59a puppet: Fixup some humbug => zulip rename issues.
(imported from commit 4d83dc2af380cfbae3a1958f98c671c7e8c58f05)
2013-10-08 08:57:30 -04:00
Tim Abbott f9c6b7f2aa Use the 'test: Zulip monitoring' client string for our Nagios stuff.
(imported from commit 02618a4724e1ab64c05f95f60b83b7593b3fe62b)
2013-10-08 08:57:30 -04:00
Tim Abbott c4d28ad76f install-server: Rename local variables to use zulip names.
(imported from commit c2b9409488be9461a4ae41cb0df208b9e3cb799e)
2013-10-08 08:57:30 -04:00
Tim Abbott f3fd1a2c44 [manual] puppet: Rename humbug user to zulip.
(imported from commit 90e517a4a657d2821b371c833e557c2003c9340c)
2013-10-08 08:57:30 -04:00
Tim Abbott 6920ba0ac3 Add trac cgi-bin files to puppet.
(imported from commit c9cdba17d0091996f1d44d404a42c400d27bd84d)
2013-10-08 08:57:29 -04:00
Tim Abbott e11ae77ba6 [manual] Rename /home/humbug to /home/zulip.
This may require just doing an mv on the home directory, plus changing
the home directory in /etc/passwd.  It should of course be done carefully.

(imported from commit 660997d897ee6d33563af74f0fc5d4267a911755)
2013-10-08 08:57:29 -04:00
Tim Abbott b76bbc4982 [manual] Rename /root/humbug to /root/zulip.
(imported from commit be64a226cfb90c2ec2e8f4f17ae0218643573535)
2013-10-08 08:57:28 -04:00
Tim Abbott 9677ce8920 [manual] Move git checkouts from /home/humbug/humbug to /home/humbug/zulip.
(imported from commit d58be28e57fcb3b5585c0018f1dbb53adf5067df)
2013-10-08 08:57:28 -04:00
Tim Abbott bb902f5296 [manual] Rename humbug=>zulip in supervisor configuration.
This will probably require shutting down all the supervisor processes
and restarting them to deploy properly.

(imported from commit ce687078980b43be495c49f52ed6aa0e25cdad00)
2013-10-08 08:57:28 -04:00
Tim Abbott 31a445e42e [manual] Rename /home/humbug/humbug-deployments to /home/humbug/deployments.
This requires doing an `mv` and then `puppet apply` on each of staging
and prod as part of the deployment process.

(imported from commit 5d0be64a3846f7151d2036d2e0b31049bc1c2dd2)
2013-10-08 08:57:28 -04:00
Luke Faraone 80c2eb0367 Repoint to new repository
(imported from commit bc90453bf9776b1e3d05c222f78cc66383278c32)
2013-10-07 13:43:23 -04:00
Zev Benjamin 8edbd64bb8 Monitor the queue processors for the missedmessage_emails and slow_queries queues
(imported from commit 266b8f19b87a025ab35bd6dd4017bdf8a7694b49)
2013-10-04 17:58:44 -04:00
Tim Abbott d188d829d7 Update UserActivity queries for monitoring Zephyr mirroring.
(imported from commit 04a9536da2891e905c6e14e0d452ca62d632641d)
2013-10-04 16:15:53 -04:00
Zev Benjamin 2547e0768f puppet: Remove rabbitmq consumer checks based on check_procs
These have been superceded by checks for the existance of consumers
of the relevant queues.

(imported from commit 68a0e79734366411e39e9e4346b5a61bdd34144b)
2013-10-04 14:19:16 -04:00
Zev Benjamin dc082cd96d puppet: Add nagios notifications for the rest of our rabbitmq queues
(imported from commit 9d21a0ca3662396c436b482c574113d0cbc714a0)
2013-10-04 14:19:16 -04:00
Zev Benjamin 61ca14b400 [manual] puppet: Consolidate check_rabbitmq_*_consumers commands
This temporarily breaks the rabbitmq consumer checks for
user_activity and notify_tornado on prod.  This should be deployed in
such a way to minimize the time that the alert needs to be ignored.

(imported from commit 08fa2f0e7d78fca1346c62824573263e42339a45)
2013-10-04 14:19:16 -04:00
Zev Benjamin 6e54ca3045 puppet: Factor out writing the rabbitmq consumer check state file into its own script
This temporarily breaks the rabbitmq consumer checks for the
user_activity and notify_tornado queues because their state files
were renamed to match their queue names.  It will be fixed for
staging in the next commit.

(imported from commit a6aaa330a1134d8ddffe8f4959deb12b219f241a)
2013-10-04 14:19:16 -04:00
Tim Abbott dd3281fea8 Rename some staging nginx config variables humbug => zulip.
(imported from commit 7937e0ee2b1ebbdf184be3ceec74afc206a56c83)
2013-10-04 11:45:40 -04:00
Leo Franchi 2614716fca Log slow queries to zulip so we notice them
(imported from commit 23f311ad881edda4c4495089ea3b55213470a059)
2013-09-30 17:41:56 -04:00
Zev Benjamin a906890b4d install-server: Run resize2fs
This allows us to have larger root filesystems than the AMI image.

(imported from commit 4e9698432b0c154a0bc635df07abd278c08a4905)
2013-09-30 11:09:26 -04:00
Jessica McKellar 03fe84aa6a nagios: use last Received date to determine message age.
If there are delays while routing the email, we don't want to get a
spurious alert.

(imported from commit 3a9e3abf0a4db2b026f797c929f1b46978f1e5e4)
2013-09-27 11:39:42 -04:00
Jessica McKellar 4acddabe10 nagios: parse dates using timezone-aware functions in the email mirror check.
Why does email.utils.parsedate also exist? To put bugs in peoples' software.

(imported from commit a3dca741e5274027ef177388b49061b9b3c5d29e)
2013-09-27 11:39:42 -04:00
Tim Abbott 4be5d81af1 [manual] Write logs to /var/log/zulip rather than /var/log/humbug.
This requires a puppet apply to update the supervisor configuration.

(imported from commit f2836b6d9c53791af6f6ceb1650d0e0740df70ab)
2013-09-25 16:52:41 -04:00
Tim Abbott 0a4a53211c [manual] Rename /var/log/humbug to /var/log/zulip.
This requires a "puppet apply" to be done to create /var/log/zulip
before we deploy anything using the new directory.

(imported from commit 2d7baedbf923df9f01b152cf0bda6494f0eac936)
2013-09-25 16:52:39 -04:00
Tim Abbott 0b9e54416d [manual] puppet: Rename humbug=>zulip in nginx configuration filenames.
We need to manually remove the old humbug and humbug-staging sites-*
files when we deploy this via puppet.

(imported from commit d25e0172a14032c5acf1501668602d34b1b13b85)
2013-09-25 15:40:21 -04:00