We apparently still have some process that occationally sits idle in a
transaction for a while, which makes this alert super noisy.
(imported from commit 074b04ad746bac0da1b8714763538d1ce22da64e)
Doing so requires superuser privileges because check_postgres.pl only connects
to one database for that action. We could theoretically work around this, but I
don't think it's worthwhile for non-production DBs.
(imported from commit 3ab06e4dd6f844c81128b81709cdc3cdfbe37c47)
We believe these will generally no longer be disruptive now that we have
autocommit enabled.
(imported from commit c8c1301e0d4b188d6708173cd8c8b16279e3d910)
`/usr/bin/env python` is almost always preferred over specifying the
specific python to run (and this script doesn't work for me on OSX
with /usr/bin/python specified).
(imported from commit 531e6062ba0ac1f25e3c681bb5cf83a918d0e3e7)
This requires a puppet apply on prod, as well as manually
updating the symlinks of Zulip-latest and Humbug-latest on
prod0
(imported from commit c5ef8cd0e2d156144531b35af9a8c5226f5bf750)
To deploy this, the zulip_internal::base and zulip_internal::munin classes must
be added to nagios.zulip.net.
(imported from commit 50d6a4ed19fcc9c62c7104977d69043bf5b9bbf9)
Before we deploy this commit, we must migrate the data from the staging redis
server to the new, dedicated redis server. The steps for doing so are the
following:
* Remove the zulip::redis puppet class from staging's zulip.conf
* ssh once from staging to redis-staging.zulip.net so that the host key is known
* Create a tunnel from redis0.zulip.net to staging.zulip.net
* zulip@redis0:~$ ssh -N -L 127.0.0.1:6380:127.0.0.1:6379 -o ServerAliveInterval=30 -o ServerAliveCountMax=3 staging.zulip.net
* Set the redis instance on redis0.zulip.net to replicate the one on staging.zulip.net
* redis 127.0.0.1:6379> slaveof 127.0.0.1 6380
* Stop the app on staging
* Stop redis-server on staging
* Promote the redis server on redis0.zulip.net to a master
* redis 127.0.0.1:6379> slaveof no one
* Do a puppet apply at this commit on staging (this will bring up the tunnel to redis0)
* Deploy this commit to staging (start the app on staging)
* Kill the tunnel from redis0.zulip.net to staging.zulip.net
* Uninstall redis-server on staging
The steps for migrating prod will be the same modulo s/staging/prod0/.
(imported from commit 546d258883ac299d65e896710edd0974b6bd60f8)
The zulip::redis puppet class should be added to all our frontends' zulip.conf
after this is deployed. No puppet apply is required.
(imported from commit ccea89f4779c6c49c0cbe837adcb5be21bfe55ab)
Otherwise, we will enable the postfix config on all frontends,
regardless of whether Enterprise deployments requested it.
(imported from commit 9592be3706adcee7547f6795f32fe7b8d85e71ee)
This removed the cronjob from all app_frontend servers and enables the
local Postfix mail server on the same.
This is a no-op on staging if the parent commit has already been
applied.
To deploy this commit, run a puppet-apply on prod.
(imported from commit 6d3977fd12088abcd33418279e9fa28f9b2a2006)
This will cause us to recieve messages sent to streams.staging.zulip.com
via the local Postfix daemon running on staging.
This commit does not impact prod. To deploy, a puppet-apply is needed on
staging.
(imported from commit 9eaedc28359f55a65b672a2e078c57362897c0de)
This will allow us to roll out the Postfix-based mirror on staging in
the future without impacting production mirroring.
This branch should be puppet-deployed first on prod, then staging.
(imported from commit eceaa6c02a06f7074cacc19c6439e5928eef3ae4)
This allows the utility to run on trac.zulip.net, which doesn't have a 'zulip'
database.
(imported from commit c8eabb89e5e161191d6f2c92ca2b1428b17a9aa0)
We don't do the sequence check because that requires read access to the database
itself, which the zulip user doesn't have.
(imported from commit fba7604826353b2974e9757f01dcb426297993b3)
We cannot use SNI for these legacy domains because old plugins still
connect to them.
This commit (along with the three previous commits) requires a lb0 nginx
deployment to function.
(imported from commit f47f3d7b597666508b3817d965fe8ce19d50c2c0)
To deploy, the certs need to manually be copied to lb0's /etc/ssl/certs
directory, the nginx config updated, and the server restarted
(imported from commit c70c7678cd010a1b2b0aba830ab3d862005bd627)
This replaces the old www.humbughq.com cert.
Contains these hostnames:
* www.humbughq.com
* api.humbughq.com
* humbughq.com
Generated per 9d674d6a0.
(imported from commit 0ef3f0ff2a02996246868466b5e634ebf45439a2)
These are redirect hosts, so they don't need their own IP. Supporting
non-SNI clients isn't a priority for us.
(imported from commit b1a8de8763ab944885518c868e4e30307d84c11d)
This will let us do normal puppet applies on our postgres hosts again.
Crudini is already installed and /etc/zulip/zulip.conf has already been edited
on the relevant hosts.
(imported from commit 8e2b88d2fe2f7b2367ecb73a50a299200fe381a0)
Note that this change can not currently be applied on postgres hosts due to the
postgres puppet config currently being slightly broken.
(imported from commit 5d8ddeabfd9612d469a048256d22949c0bfa6aba)
This is used by the Android app to authenticate without prompting for a
password.
To do so, we implement a custom authentication backend that validates
the ID token provided by Google and then tries to see if we have a
corresponding UserProfile on file for them.
If the attestation is valid but the user is unregistered, we return that
fact by modifying a dictionary passed in as a parameter. We then return
the appropriate error message via the API.
This commit adds a dependency on the "googleapi" module. On Debian-based
systems with the Zulip APT repository:
sudo apt-get install python-googleapi
For OS X and other platforms:
pip install googleapi
(imported from commit dbda4e657e5228f081c39af95f956bd32dd20139)
MIT implemented NTP rate-limiting to defend against on-going reflection attacks,
which was causing our nagios checks to fail intermittently. When the attacks
die down or when external sites fix their NTP configurations, checking against
time.mit.edu will stop failing. However, there also isn't much of a reason to
stick with checking against a single server.
(imported from commit 2c2a1a04646b880b010cbb4b6d94016b1eccd1a0)
Manual instructions:
This commit requires a puppet apply after deployment on both staging
and prod.
(imported from commit 2d10e33c6db2f5e9cc1204cdd5f2c91833da2a8e)
The manual step here is that we need to do the `puppet apply` before
pushing this commit, or `restart-server` will crash.
Previously we shut down everything in one group, which performed
poorly with supervisor's bad performance on restarting many daemons at
once. Now we shut down the unimportant stuff, then the important
stuff, bring back the important stuff, and then bring back the
unimportant stuff.
This new model has a little over 5s of downtime for the core
user-facing daemons -- which is still far more than would be ideal,
but a lot less than the 13s or so that we had before.
Here's some logs with the current setup for the tornado/django downtime:
2013-12-19 20:16:51,995 restart-server: Stopping daemons
2013-12-19 20:16:53,461 restart-server: Starting daemons
2013-12-19 20:16:57,146 restart-server: Starting workers
Compare with the behavior on master today:
2013-12-19 20:21:45,281 restart-server: Stopping daemons
2013-12-19 20:21:49,225 restart-server: Starting daemons
2013-12-19 20:21:58,463 restart-server: Done!
(imported from commit b2c1ba77f3dc989551d0939779208465a8410435)
We also move uploads.types to zulip-include-frontend since its only
needed on the frontends.
(imported from commit cfdf15c0c537f7ea4c239b0f882aeaa561929777)
This reqires a puppet apply as well as a manual move of the installed
files and symlink switch. Leo will do it when it hits master.
(imported from commit e58e52087ad38f1cb8e0e606b82266a93cf91e53)
It's confusing to have our log data on different files on different
systems (e.g. loadbalancer vs. app).
(imported from commit be701072ee05e2659f146b226a39f33cb4707180)
This tool is a little crude; it runs out of a cron job and will
forward to staging a notice about any new lines in the declared log
files, truncating if there are more than 10 lines.
(imported from commit 6748ddff1def0907b061dc278a3a848bd2e933f1)
Manual deployment instructions:
On staging, do a puppet apply.
No action needs to be taken for the prod deploy.
(imported from commit 0f6e5ab22aaeacfcc69d57de12f2bb6fac6f0635)
They were being installed as executable anyway, but this will make
running them manually a bit easier.
(imported from commit a1181d2c90770af5aa44b0f65a47a460efdcf2d7)
There were a few recently introduced bugs, and this also cuts down on
our having to review diffs that don't actually affect the relevant
server when doing updates.
(imported from commit 43f3cff9a414bc1632f45a8222012846353e8501)
The trailing "/" actually means "replace the location with /", which
is either useless or actively harmful, depending on the location.
(imported from commit 58b9c4c9e55e3a162ffce49c954bc2182ec57dde)
Previously we sometimes set it to $proxy_add_x_forwarded_for and other
times to $remote_addr, but according to
http://wiki.nginx.org/HttpProxyModule#.24proxy_add_x_forwarded_for
$proxy_add_x_forwarded_for handles this for us -- it will be
$remote_addr if there was no X-Forwarded-For header anyway.
(imported from commit 67dc52250e3e7751b1bf375d1a71d0272475435c)
We now have 2 variablse:
EXTERNAL_API_PATH: e.g. staging.zulip.com/api
EXTERNAL_API_URI: e.g. https://staging.zulip.com/api
The former is primarily needed for certain integrations.
(imported from commit 3878b99a4d835c5fcc2a2c6001bc7eeeaf4c9363)
Now that we've debugged the memory leak, I don't think we need this
anymore.
This reverts commit 1bdc7ee2f72bdebb1cdc94601247834a434614d6.
Conflicts:
puppet/zulip/files/cron.d/rabbitmq-numconsumers
puppet/zulip/files/supervisor/conf.d/zulip.conf
(imported from commit ff87f2aebcbc71013fa7a05aedb24e2dcad82ae6)
This is something we forgot to do in the VPC migration, so our IPs
have all been the lb0 IP in our logs :(.
(imported from commit 9d3fc69cf72a84f7bd7c54e50fb1e776a67d971f)
This requires a puppet apply on prod0, and an update of the
Zulip-latest.dmg and Humbug-latest.dmg symlinks in
/src/www/dist/apps/mac and /srv/www/dist/apps/sso/mac
(imported from commit e83170a19ac2de6458a0fd43140068fab4135483)
This requires a puppet apply, and also a manual update of
the Zulip-latest.* symlinks in /srv/www/dist/apps
(imported from commit 991dd6924ba33d81f486e914bcbadfec5b350660)
You must run
autossh -2 -fN -M 20018 -L 5009:localhost:4949 nagios@postgres2.zulip.net
as nagios on nagios.zulip.net after deploying this commit.
(imported from commit bd8a61f99555ccf0a0010d79dbd89017aaafbb8f)
The /etc/init.d/iptables-persistent initfile changed to expect there to be two
files in /etc/iptables (rules.v4 and rules.v6) instead of a single rules file.
Several of our machines are currently running without iptables rules as a
result.
(imported from commit 266c2ff26b77f7c9ae793690b0d544ee4cfa5020)
The latter doesn't depend on the former; we can still fill in your full
name even if you didn't authenticate via LDAP.
This commit requires django_auth_ldap to be installed. On Debian
systems, you can do so via APT:
sudo apt-get install python-django-auth-ldap
On OS X, use your favourite package manager. For pip, I believe this
will work:
pip install django_auth_ldap
django_auth_ldap depends on the "ldap" Python package, which should be
installed automatically on your system.
(imported from commit 43967754285990b06b5a920abe95b8bce44e2053)
This is for the interval while staging is running in VPC and postgres
is not; we can clean up these changes once that's no longer the case.
This also updates test1's IP, which apparently someone forgot to
commit previously.
We're currently running this on prod.
(imported from commit 3feced750f643bb218d4240e9a3d5cd7116963ee)
This is to ensure that if we have an interval where we're not doing
prod deploys, we don't have to worry about worker memory leaks killing
us.
(imported from commit 0b0180b0751f6c618d877b9c9ffc2b8287254e4d)
This requires a puppet apply on each of staging and prod0 to update
the nginx configuration to support the new URL when it is deployed.
(imported from commit a35a71a563fd1daca0d3ea4ec6874c5719a8564f)
This makes us not blow away a customer's ports.conf configuration on
upgrade if they needed to change it while setting up their SSO.
Also we change the NameVirtualHost line to better match the
VirtualHost line.
(imported from commit fd52e00c35afa8982e0377859ad794085ec2af80)
Now app.d is something that any app frontend will read, and we just
have secondary manifests add additional files to the app.d directory
for custom stuff.
This fixes the issue that we were incorrectly including the
lb0-related app configuration in the enterprise version.
(imported from commit dec8dcdf2506b82e51186ff936c26dc1cd6cf61b)
CUSTOMER13 doesn't want it, and there's currently no nginx config
or configurable Camo URI, so it wouldn't work if image preview
were enabled.
(imported from commit 615d4a32acbc4d4d590f88cf4e7d45d8f49db1d3)
Errors are sent to a queue processor that posts them to staging,
just like the feedback bot.
(imported from commit 4a8d099672a1b3e48a8bc94148d8b53db73d2c64)
We didn't remove python-argparse from the requirements when that was
removed, and we also still need python-pip to install wal-e :(.
(imported from commit b82d3b429cffe0a3993819358511e11268ee2fef)
Some sites may want to have small modifications to the base app config. This
directory helps support that, but does not need to exist if it is unused.
(imported from commit d23b19dc59bb56d00e69ff03af3279b66af9466d)
These are things that don't make sense to require on our local server
appliance systems.
(imported from commit 66a3ab750b0d27fa011b55c8f7ef9b22511de56c)
Run the following commands as root before deploying this branch:
# /root/zulip/tools/migrate-server-config
# rm /etc/zulip/machinetype /etc/zulip/server /etc/zulip/local /etc/humbug-machinetype /etc/humbug-server /etc/humbug-local
(imported from commit aa7dcc50d2f4792ce33834f14761e76512fca252)
This moves the list of removed files from .gitattributes to
tools/build-local-server-tarball because static/ and tools/ are
necessary for update-prod-static, and it seemed best to keep the
entire list in one place.
(imported from commit 2a447cbde29e90d776da43bb333650a40d4d363c)