Note to the future: run this command to validate configs before deploying:
puppet parser validate servers/puppet/modules/*/manifests/*.pp
Maybe we want to add this to check-all...
(imported from commit e0eb6502380ff361b783830d45e8422bc0f76c02)
For consistency, and because nobody could think of a reason to have it live
in bots/ with a symlink.
(imported from commit def372653fcdde2805729134fec9d4bc3ce294ec)
Some of our code uses the CWD, so we have to set it.
The config file needs to be copied over.
(imported from commit cec991ccbffddf7ea4d1ec8471377221ddd7c669)
Modified files need to be copied into the right place. The checkout
on git.humbughq.com also needs to be updated.
(imported from commit dbe9e05a0512e1f59c7819dd8d44c2c4e9c83bcf)
Nginx's fastcgi buffers default to 8 pages (32KB). I've bumped it to 4MB,
as queries like get_old_messages take something like 130KB, and was
being ferried off to disk. In case this change to the buffers parameters isn't
enough, we explicitly set the maximum temporary file size to 0; if the fastcgi
request goes over the buffers allocated, the request will be handled synchronously,
and never go out to disk on nginx's fastcgi requests.
The manual step that must be done is to apply changes to /etc/nginx/humbug-include/app
from servers/puppet/modules/humbug/files/nginx/humbug-include/app.
The nginx process can be reloaded with `/etc/init.d/nginx restart`.
This must be done for both staging and prod.
(imported from commit 99c1bd6989c54b7e230b7c04f2fdf09be7423352)
This directory is needed for the event_queues.pickle file
that gets created as part of dumping the tornado queues.
(imported from commit 7c1bde0ecae59d2174327a981582b55a199c5b57)
We let supervisor create the socket for us by making humbug-django a
fcig-program. Unfortunately, supevisor doesn't support putting
fcgi-programs in groups (see
https://github.com/Supervisor/supervisor/issues/148), so we have to
restart tornado and django separately.
To deploy, copy the config files over and restart nginx and
supervisor (via stopping and then starting it because restart is
broken). I believe the automated restart as part of
update-deployment will fail because of the way supervisor treats
programs in groups. If so, after restarting supervisor, you will
also need to run restart-server manually to fill the caches and then
delete the lock directory in humbug-deployments.
(imported from commit bfb5db7dd42dcbc4bfefa2944355b3cbb2ef9104)
The amount of process downtime during a supervisord-mediated restart
appears to be linear in the number of processes that are being
restarted. Therefore, restarting just Django and Tornado causes less
downtime than doing them at the same time as the other worker
processes.
(imported from commit 1fa9ef547bcd88caeec49800664e37d5f2fcb7a8)
This change will make it so that processes related to the app.humbughq.com
server are run under supervisord, which uses a state machine model to ensure
that programs are running. It also ensure process startup order.
We will need to manually switch the old way of running server (in screen) into
this new way of doing things, on both staging and prod (app_frontend.pp has been
updated appropriately). This means:
1) cp servers/puppet/modules/humbug/files/supervisord/conf.d/humbug.conf /etc/supervisord/conf.d
2) installing the supervisor package.
3) killing those while loops in that screen session
4) mkdir /var/log/humbug (as root)
5) /etc/init.d/supervisord start
6) check that nothing broke
(imported from commit 055269a70973db89acd69049e01b185fabdc8f90)
Text search was not that great partially because Postgres wasn't
using a ispell dictionary (Postgres term) before. We now pull in
Hunspell and use its dictionary and affix rules.
It is Ok to run with this new configuration before updating our full
text column and index that will be coming in the next few commits.
Manual steps for deploy:
1) On both postgres0 and postgres1 (both before moving on to step 2),
install the hunspell-en-us package
2) On staging, run migration 0022
3) On both postgres0 and postgres1, copy the appropriate postgresql.conf
file over
4) On both postgres0 and postgres1, run `pg_ctlcluster 9.1 main reload`
(imported from commit 706bf0f6ecc46c712cea10b73c34fd9d1dfd4767)
Note that this file needs to be copied over manually as part of the
process of starting up a new replica.
(imported from commit a9f14b695ef2b6b4d48b6180d187c3babf5a667c)
This requires manual steps on deploy to each of staging and prod:
(1) Run the new update-deployment code to setup the initial deployment directory.
(2) Restart all the programs running in screen sessions.
(3) Deploy the nginx changes and restart nginx.
(imported from commit 1ffe27933ee79274dc0a93d35c9938712de0ef36)
I've already installed these packages on staging, and will install them on prod
when the commit hits master. So there is no manual prod deployment step here.
(imported from commit 1b77e771f938305dfbeb797c3ea2a7e3897e78a7)
The new nginx configuration file needs to be copied to
/etc/nginx/humbug-include and nginx needs to be restarted when this
commit is deployed.
(imported from commit 6c43f3c2c7a6acee6a852c672c96a38bda01dd0d)
Because we're storing 25,000+ message content objects in memcached on
server restart, we were sometimes exceeding the 64MB cap even during
the restart process. 256MB should be safely large enough to not have
the issue but not so large as to seriously consume resources on our
app frontends (which currently have 7GB of RAM).
(imported from commit 1a64319e50c9dadf0bc65e2e4dbf08f4cc1b9c38)
The name of the archive directory doesn't actually matter, but this
is what we're running in prod, so our config in git should match.
(imported from commit c3fbba4f0c988811b11f2c21cf4a2a32327575aa)
This code adds a dependency on python-django-auth-openid, installable as
django-openid-auth from PyPI.
On prod, one needs to run a syncdb in order to create the required
tables. A database *migration* is not required, as these are new tables
only.
(imported from commit c902a0df8d589d93743b27e480154a04402b2c41)
The defaults are quite large for a small site like ours where on
server down means an outage (e.g. only check every 5 minutes and then
require 4 failures before we alert the admins).
(imported from commit 3b2f436bbb716262f4ee939434749be535ffd6d3)
When deploying to prod, this will require a manual install of the
rabbitmq_* files to /etc/munin/plugins/ and an edit of
/etc/munin/plugin-conf.d/munin-node.conf
(imported from commit 4c10e634b04200dda1c4f4989e37fe232143240f)
Note: When deploying, restarting the process-user-activity-commandline script is needed
(imported from commit 63ee795c9c7a7db4a40170cff5636dc1dd0b46a8)