We were previously running it against the 'postgres' database, which
meant we weren't actually checking the non-clusterwide statistics.
(imported from commit a6be529b16d5f1927463e49a7f7f4cf0b5299213)
We only want to change cases where we're talking about the hostname; HTTP
requests should still go to staging.humbughq.com for now.
Before this commit is deployed the hostname of staging.humbughq.com should
be changed to staging.zulip.net on the VM.
(the same for prod)
(imported from commit 7412530773f720ac227f40061c9ddb1a851e19bb)
lb0.zulip.net will proxy connections to the relevant backend servers.
Depressingly, SSL certificate verification of the backend servers is not
performed at this time, see:
<http://trac.nginx.org/nginx/ticket/13>
The above-mentioned bug has existed since 2011, but a CVE was not
allocated until January. The nginx developers don't seem to care. Sigh.
In any case, this is of somewhat limited impact at Humbug, since we can
have reasonable confidence that communications within AWS are not
subject to active MITMs. Passive MITM is not a concern, because the
traffic *is* in fact encrypted.
(imported from commit c96e1235fc17192c7452e0417a1309cfcda62de2)
On EC2-VPC we have the ability to attach multiple addresses to one
interface, and multiple interfaces to one machine.
We should configure those interfaces whenever our system boots, and
ideally whenever networking is restarted.
This commit adds a script that is executed once eth0 is brought up that
proceeds to configure all subsequent interfaces, real and virtual.
The script is configured to be installed (along with the helper script
that calls it) on all systems via Puppet.
(imported from commit fdc153ef649edbb8fedd40ff4d77262aae593c39)
We switch to always specifying HostKeyAlgorithms=ssh-rsa because of a ECDSA
key bug in the Debian images which results in the fingerprint not being
printed to the console. Our config later forces RSA after we do a puppet
apply, so we might as well start using RSA from the beginning.
We start out sshing in as "admin", and delete the user (moving keys over to
"root") at the beginning.
We switch to the ops repo instead of backports, and drop the installation
of puppet from backports.
We no longer install humbug-self-signed.key on our servers; instead real
certificates must be installed manually.
(imported from commit cbabe65a4e0ef37df1fece6eaec053a2368f6ef5)
Unlike other directories, we explicitly enumerate the files we want to be
present in sites-available, so the previous commit series did not actually
instruct puppet to make the zulip-staging files accessible.
(imported from commit 22efc4d272eba8d6c869edbaa9114c50e1988288)
We create a new sites-available entry which is essentially a duplicate of
sites-available/humbug-staging with s/humbug/zulip, and add the associated
symlink directive in Puppet.
(imported from commit febcb585ce93c21c6849d96458cc2bd096b30538)
This must be deployed after we update our running nginx configuration
to serve api.humbughq.com.
(imported from commit b5c34ebdd595f55eecd6dca6a18a37f105107bd5)
This stop words file is just the default Postgres english stop file
with all the rest of the letters of the alphabet added. Adding the
extra letters ensures that, e.g., "bed" doesn't get transformed into
"bed | b".
(imported from commit 0be3ef9a43eb524ed4f081d5081a786cf602c487)
This saves something like 15ms on our 1000 message get_old_messages
queries, and will save even more when we start sending JSON dumps into
our memcached system.
We need to install python-ujson on servers and dev instances before
pushing this to prod.
(imported from commit 373690b7c056d00d2299a7588a33f025104bfbca)
The old bucket was versioned and didn't allow deletes. This was
great for paranoia, but not so great for being able to delete old
backups.
(imported from commit be79b5c582ca5ee466cdfea6d3093b6d5ba0e23d)
I hadn't changed it previously out of paranoia in the case we had a
faulty failover and had two masters both uploading to the same place.
However, I now don't think this can happen, as recovery completion
will cause Postgres to start a new timeline.
(imported from commit d58f1aa306eff4f6fd950664ff658539c1249bdf)
That's where it is supposed to be, and besides, that's what a Nagios
server is going to expect it to be.
(imported from commit c273f18533909fa8eac182246dbbe498a5381f6c)
It turns out that having a UID for one user that's 1000, and not
setting them for other users, is a disaster: puppet might create them
in the wrong order, using UID 1000, and thus breaking creating the
'humbug' user later on. The same issue applies to groups.
(imported from commit 02b4700278e5c495bd514802f41ae238e6b051ac)
This only affects DEPLOYED installations.
This does not take care of removing old versions of static files from
that directory. The problem is that staticfiles is clever and
doesn't copy files that are already there, so we can't depend on
mtime for detecting which files we no longer need. Hopefully that
won't be too much of a problem for now.
(imported from commit 4341460dd5bc6544086fd445014ebdac58192910)
Add a pageable_servers and not_pageable_servers hostgroup, and only page for
app/postgres/zmirror.
(imported from commit 15c286324e942bd38e2a600a3b9091044f117e28)
This requires `redis-server` to be installed. Check it is installed before
deploying this commit. It also requires 'python-redis' to be installed.
(imported from commit e3434a04456e596f6c84c1a3c289a00aa7cbb2ed)
So we can use the 'sponge' command in update-prod-static.
I've already installed it on app and staging.
(imported from commit 1527b1c0108d7a95b471dea82e8dedc88f944f70)
Note to the future: run this command to validate configs before deploying:
puppet parser validate servers/puppet/modules/*/manifests/*.pp
Maybe we want to add this to check-all...
(imported from commit e0eb6502380ff361b783830d45e8422bc0f76c02)
For consistency, and because nobody could think of a reason to have it live
in bots/ with a symlink.
(imported from commit def372653fcdde2805729134fec9d4bc3ce294ec)
Some of our code uses the CWD, so we have to set it.
The config file needs to be copied over.
(imported from commit cec991ccbffddf7ea4d1ec8471377221ddd7c669)
Modified files need to be copied into the right place. The checkout
on git.humbughq.com also needs to be updated.
(imported from commit dbe9e05a0512e1f59c7819dd8d44c2c4e9c83bcf)
Nginx's fastcgi buffers default to 8 pages (32KB). I've bumped it to 4MB,
as queries like get_old_messages take something like 130KB, and was
being ferried off to disk. In case this change to the buffers parameters isn't
enough, we explicitly set the maximum temporary file size to 0; if the fastcgi
request goes over the buffers allocated, the request will be handled synchronously,
and never go out to disk on nginx's fastcgi requests.
The manual step that must be done is to apply changes to /etc/nginx/humbug-include/app
from servers/puppet/modules/humbug/files/nginx/humbug-include/app.
The nginx process can be reloaded with `/etc/init.d/nginx restart`.
This must be done for both staging and prod.
(imported from commit 99c1bd6989c54b7e230b7c04f2fdf09be7423352)
This directory is needed for the event_queues.pickle file
that gets created as part of dumping the tornado queues.
(imported from commit 7c1bde0ecae59d2174327a981582b55a199c5b57)
We let supervisor create the socket for us by making humbug-django a
fcig-program. Unfortunately, supevisor doesn't support putting
fcgi-programs in groups (see
https://github.com/Supervisor/supervisor/issues/148), so we have to
restart tornado and django separately.
To deploy, copy the config files over and restart nginx and
supervisor (via stopping and then starting it because restart is
broken). I believe the automated restart as part of
update-deployment will fail because of the way supervisor treats
programs in groups. If so, after restarting supervisor, you will
also need to run restart-server manually to fill the caches and then
delete the lock directory in humbug-deployments.
(imported from commit bfb5db7dd42dcbc4bfefa2944355b3cbb2ef9104)
The amount of process downtime during a supervisord-mediated restart
appears to be linear in the number of processes that are being
restarted. Therefore, restarting just Django and Tornado causes less
downtime than doing them at the same time as the other worker
processes.
(imported from commit 1fa9ef547bcd88caeec49800664e37d5f2fcb7a8)
This change will make it so that processes related to the app.humbughq.com
server are run under supervisord, which uses a state machine model to ensure
that programs are running. It also ensure process startup order.
We will need to manually switch the old way of running server (in screen) into
this new way of doing things, on both staging and prod (app_frontend.pp has been
updated appropriately). This means:
1) cp servers/puppet/modules/humbug/files/supervisord/conf.d/humbug.conf /etc/supervisord/conf.d
2) installing the supervisor package.
3) killing those while loops in that screen session
4) mkdir /var/log/humbug (as root)
5) /etc/init.d/supervisord start
6) check that nothing broke
(imported from commit 055269a70973db89acd69049e01b185fabdc8f90)