zulip

Commit Graph

Author	SHA1	Message	Date
Leo Franchi	23322a791d	puppet: Add sparkle configuration files (imported from commit e36efd64584d946bb13fb5b44af817e85345e197)	2013-06-18 16:12:14 -04:00
Tim Abbott	261300d10e	puppet: Add Nagios crontab to puppet. (imported from commit 353b167b303b27ccbfc0cd0130665399faab80dc)	2013-06-17 13:48:06 -04:00
Tim Abbott	d3d5334a55	puppet: Import pagerduty_nagios.pl into puppet. (imported from commit 1b91524498372d3e69f07468e4635c4d66c44d85)	2013-06-17 13:48:06 -04:00
Tim Abbott	5c388ed28e	puppet: Run our wiki out of supervisord. (imported from commit a8f6d14ce55de0e7458496f9debb15529120deaf)	2013-06-17 13:48:06 -04:00
Tim Abbott	4d31e5d79e	puppet: Increase memcached memory limit to 512MB. (imported from commit 152c2545a3337fb1d6794a41c63c4d0b148adecc)	2013-06-17 13:48:05 -04:00
Zev Benjamin	a9e4441bee	[manual] Serve static files from the same location across prod deploys This only affects DEPLOYED installations. This does not take care of removing old versions of static files from that directory. The problem is that staticfiles is clever and doesn't copy files that are already there, so we can't depend on mtime for detecting which files we no longer need. Hopefully that won't be too much of a problem for now. (imported from commit 4341460dd5bc6544086fd445014ebdac58192910)	2013-06-12 17:46:38 -04:00
Leo Franchi	113180b7b7	nagios: Don't page about load/disk/ levels on non-critical servers. Add a pageable_servers and not_pageable_servers hostgroup, and only page for app/postgres/zmirror. (imported from commit 15c286324e942bd38e2a600a3b9091044f117e28)	2013-06-05 10:20:56 -04:00
Tim Abbott	efcf88a707	puppet: Fix paths in feedback-bot configuration. (imported from commit e9407af884dc75490de5168e067453e77aa612d7)	2013-06-04 19:48:13 -04:00
Tim Abbott	cd65aea287	Add our trac configuration to puppet. (imported from commit 8a9cf825344cdf83e8233f15ba66bbf050c920e4)	2013-06-04 19:48:13 -04:00
Leo Franchi	8cc0a9b4f9	[manual] Require redis-server to be installed on our servers This requires `redis-server` to be installed. Check it is installed before deploying this commit. It also requires 'python-redis' to be installed. (imported from commit e3434a04456e596f6c84c1a3c289a00aa7cbb2ed)	2013-06-04 09:43:09 -04:00
Leo Franchi	f9a99192df	Add supervisor conf file for stats (imported from commit e9104676e714dc36050fef50cabe8386b6c52e4d)	2013-06-03 16:16:22 -04:00
Luke Faraone	742d3bb511	Move check_send_receive.py to the naigos plugins directory, renaming it. For consistency, and because nobody could think of a reason to have it live in bots/ with a symlink. (imported from commit def372653fcdde2805729134fec9d4bc3ce294ec)	2013-05-29 15:36:47 -04:00
Luke Faraone	8570f5fe55	[manual] Configure prod to use our wildcard cert. These changes can be applied with "puppet apply". (imported from commit 999611539e81f452dd605bb98f70436737747c29)	2013-05-29 15:36:47 -04:00
Zev Benjamin	d92d62412f	[manual] Use humbug-deployments/current as the CWD for supervisor processes Some of our code uses the CWD, so we have to set it. The config file needs to be copied over. (imported from commit cec991ccbffddf7ea4d1ec8471377221ddd7c669)	2013-05-29 14:13:39 -04:00
Zev Benjamin	6824c94b7e	[manual] Remove dependence on /home/humbug/humbug git checkout on app frontends Modified files need to be copied into the right place. The checkout on git.humbughq.com also needs to be updated. (imported from commit dbe9e05a0512e1f59c7819dd8d44c2c4e9c83bcf)	2013-05-29 12:00:03 -04:00
Luke Faraone	08ad49184a	Switch memcached user to "nobody" to match production. (imported from commit 849ac9c1d7d6f06447b22e1c1ed2495f8c59943c)	2013-05-28 18:39:08 -04:00
Michael McCanna	0e77082873	[manual] Bump Nginx buffers, don't use fastcgi temp files Nginx's fastcgi buffers default to 8 pages (32KB). I've bumped it to 4MB, as queries like get_old_messages take something like 130KB, and was being ferried off to disk. In case this change to the buffers parameters isn't enough, we explicitly set the maximum temporary file size to 0; if the fastcgi request goes over the buffers allocated, the request will be handled synchronously, and never go out to disk on nginx's fastcgi requests. The manual step that must be done is to apply changes to /etc/nginx/humbug-include/app from servers/puppet/modules/humbug/files/nginx/humbug-include/app. The nginx process can be reloaded with `/etc/init.d/nginx restart`. This must be done for both staging and prod. (imported from commit 99c1bd6989c54b7e230b7c04f2fdf09be7423352)	2013-05-28 18:13:45 -04:00
Zev Benjamin	cce8dfab84	[manual] Use the same socket across server restarts We let supervisor create the socket for us by making humbug-django a fcig-program. Unfortunately, supevisor doesn't support putting fcgi-programs in groups (see https://github.com/Supervisor/supervisor/issues/148), so we have to restart tornado and django separately. To deploy, copy the config files over and restart nginx and supervisor (via stopping and then starting it because restart is broken). I believe the automated restart as part of update-deployment will fail because of the way supervisor treats programs in groups. If so, after restarting supervisor, you will also need to run restart-server manually to fill the caches and then delete the lock directory in humbug-deployments. (imported from commit bfb5db7dd42dcbc4bfefa2944355b3cbb2ef9104)	2013-05-23 00:19:17 -04:00
Zev Benjamin	8fd72a09bc	Restart Django and Tornado separately from the other worker processes The amount of process downtime during a supervisord-mediated restart appears to be linear in the number of processes that are being restarted. Therefore, restarting just Django and Tornado causes less downtime than doing them at the same time as the other worker processes. (imported from commit 1fa9ef547bcd88caeec49800664e37d5f2fcb7a8)	2013-05-21 16:13:39 -04:00
Zev Benjamin	de3ba5a038	puppet: Replace postgres2 with postgres1 in pg_hba.conf (imported from commit 2d8654f9382df7473ec12caf2067ef0af5fef791)	2013-05-20 23:55:03 -04:00
Leo Franchi	2fcc7c0c5c	Fix aggregation rules to sum at correct frequency (imported from commit a8a27c417ae6e9cc8a6c383313da27ff6d2e875f)	2013-05-20 23:55:03 -04:00
acrefoot	9d8f847fed	[manual] Run server using supervisord This change will make it so that processes related to the app.humbughq.com server are run under supervisord, which uses a state machine model to ensure that programs are running. It also ensure process startup order. We will need to manually switch the old way of running server (in screen) into this new way of doing things, on both staging and prod (app_frontend.pp has been updated appropriately). This means: 1) cp servers/puppet/modules/humbug/files/supervisord/conf.d/humbug.conf /etc/supervisord/conf.d 2) installing the supervisor package. 3) killing those while loops in that screen session 4) mkdir /var/log/humbug (as root) 5) /etc/init.d/supervisord start 6) check that nothing broke (imported from commit 055269a70973db89acd69049e01b185fabdc8f90)	2013-05-20 23:42:28 -04:00
Leo Franchi	25b915fa6a	Enable rabbitmq consumser checks on app (imported from commit e3df8bc849dc0e1ae2e7782c0c9be5c08d4818c2)	2013-05-20 23:29:54 -04:00
Leo Franchi	3d4e239247	Check rabbitmq consumers for all important queues (imported from commit 1279d33e3e1c36ee8da01859875d24b54e14e2e6)	2013-05-17 01:02:35 -04:00
Luke Faraone	c3421b31b9	Include certificate configuration for www.humbughq.com via Comodo This expires in on Aug 11 23:59:59 2013 GMT. I've set a calendar event for this :) (imported from commit fb426b703c88dd255536e10285375dc997e47b01)	2013-05-17 01:02:32 -04:00
Tim Abbott	0a36340216	check_user_zephyr_mirror_liveness: Fix query for new API. (imported from commit f6c477a1d5f0237109be339d099c41c7db5186cc)	2013-05-10 10:46:49 -04:00
Tim Abbott	d0540efa6a	nagios check_disk: check inode disk usage too. (imported from commit e920c4a11c2797904f0ca397ebdcd8b0a9fef8cf)	2013-05-09 10:35:47 -04:00
Leo Franchi	5a5ed28ab0	Create aggregate all-active-users data (imported from commit 4009a4eb15a3efb1c05e1e80151db7d1074f0617)	2013-05-01 17:24:38 -04:00
Leo Franchi	52f6c720d9	Add new stats server to logging (imported from commit b3647ab039c902d09a92082c3e98b5b066e6a5c8)	2013-04-29 16:44:41 -04:00
Zev Benjamin	2aadf6fc6e	[schema] [manual] Create a Postgres text search configuration for use with Humbug Text search was not that great partially because Postgres wasn't using a ispell dictionary (Postgres term) before. We now pull in Hunspell and use its dictionary and affix rules. It is Ok to run with this new configuration before updating our full text column and index that will be coming in the next few commits. Manual steps for deploy: 1) On both postgres0 and postgres1 (both before moving on to step 2), install the hunspell-en-us package 2) On staging, run migration 0022 3) On both postgres0 and postgres1, copy the appropriate postgresql.conf file over 4) On both postgres0 and postgres1, run `pg_ctlcluster 9.1 main reload` (imported from commit 706bf0f6ecc46c712cea10b73c34fd9d1dfd4767)	2013-04-27 20:06:26 -04:00
Leo Franchi	5c0cfc44e7	Add iptables rule for statsd (imported from commit 5311be29fd63151fb9d5a5c0f80ed34f8e8b76f5)	2013-04-26 17:47:00 -04:00
Zev Benjamin	af3ef8636c	puppet: Add Postgres recovery.conf Note that this file needs to be copied over manually as part of the process of starting up a new replica. (imported from commit a9f14b695ef2b6b4d48b6180d187c3babf5a667c)	2013-04-22 16:36:09 -04:00
Zev Benjamin	986ca06c44	puppet: Add wal-e to Postgres config (imported from commit 55727a95cc51afb69f14c27df89a6ae287ec0f3f)	2013-04-22 16:36:09 -04:00
Zev Benjamin	f280e7cdfa	puppet: Use deadline scheduler for disks on Postgres master (imported from commit 41061cb4535b94b4afea8c3a2228120073bf06ee)	2013-04-22 16:36:09 -04:00
Zev Benjamin	092cdff061	puppet: Log Postgres checkpoint information (imported from commit 41603ad1c3cf8419d315b44d5679e0817062ced0)	2013-04-22 16:36:09 -04:00
Zev Benjamin	387f63deaa	puppet: Add vm sysctl settings to Postgres configs (imported from commit e557815f490a603da635fb60d39569346a72aa85)	2013-04-22 16:36:09 -04:00
Zev Benjamin	a13b929d1f	puppet: Add script to configure Postgres master disks (imported from commit 61004aa839df8f3fa82ba0c4ea9e2a01ae43464c)	2013-04-22 16:36:09 -04:00
Zev Benjamin	e7cdea1c43	puppet: Tweak Postgres master tunables for its hardware (imported from commit 8644e82d00944203728a3214b2141f778e1c54ed)	2013-04-22 16:36:09 -04:00
Zev Benjamin	336db5c709	puppet: Split Postgres puppet config into master and slave versions (imported from commit adb02cc1904875eb8f56fe272b44dd51bb7d939d)	2013-04-22 16:36:09 -04:00
Leo Franchi	55449fb724	Fix carbon aggregation by sending to aggregator daemon not cache (imported from commit 1f96a6edd019d8be2844b33588fcdc2ebd61fff6)	2013-04-22 11:07:41 -04:00
Leo Franchi	b3a3054f64	Slightly raise thresholds for load on nagios (imported from commit 2dbc06c8ba204c10f6d6b590bc4858e07692540b)	2013-04-22 10:22:35 -04:00
Leo Franchi	499ef75c26	Add configuration files for graphite and statsd (imported from commit bb2c14d816f9ead54bed9da1f227c5e35c9a36bb)	2013-04-18 18:05:51 -04:00
Zev Benjamin	e9f6d9ceff	puppet: Fix location of stats directory (imported from commit b482d6c22e5c1844a65cbee41d1e39378500a9c7)	2013-04-18 17:14:32 -04:00
Leo Franchi	350cf79ba0	Add a nagios check for a notify_tornado consumer (imported from commit 050536bb4ac7384d5b98d5cf6cb7430b2b00dbd5)	2013-04-17 09:24:28 -04:00
Tim Abbott	99ce1ce9ac	munin: run the humbug_send_receive plugin against the current site. (imported from commit 594e77dd32b9ab0db0002e7dc357ebe93b3ca9cd)	2013-04-16 12:02:42 -04:00
Tim Abbott	5b1b2257bd	nagios: Commit Luke's testing contact. (imported from commit d88951f42ad7753777b8e0ab2d47b9bb61ff3f76)	2013-04-16 12:02:42 -04:00
Tim Abbott	bb3b63206a	nagios: Comment out the postgres time checks (they're too noisy). (imported from commit c9569cdbd2909ea7fb8c8c14a681201ee033c62b)	2013-04-16 12:02:42 -04:00
Tim Abbott	b73ac39a25	nagios: Run check_send_receive_time check against both staging and prod. (imported from commit 749c5f04fba4832debe8a4e702914fa47d1fbeaa)	2013-04-16 12:02:42 -04:00
Tim Abbott	73886a95fd	nagios: Update app.humbughq.com to use its primary hostname. (imported from commit 39d291e06b0fa223ae4bb76022b26464b969a505)	2013-04-16 12:02:42 -04:00
Tim Abbott	1b8cf16988	[manual] Update deployment process to run atomically. This requires manual steps on deploy to each of staging and prod: (1) Run the new update-deployment code to setup the initial deployment directory. (2) Restart all the programs running in screen sessions. (3) Deploy the nginx changes and restart nginx. (imported from commit 1ffe27933ee79274dc0a93d35c9938712de0ef36)	2013-04-12 11:54:50 -04:00

1 2

85 Commits