The previous model for these Nagios checks was kinda crazy -- every
minute, we'd run a full `rabbitmctl list_consumers` for each of the
dozen+ consumers that we have, and then do the exact same parsing
logic for each to determine whether the target queue has a running
consumer to write out a state file.
Because `rabbitmctl list_consumers` takes a small amount of resources,
on systems where CPU is very limited (e.g. t2 style AWS instances),
this minor CPU wastage could be problematic.
Now we just do that `rabbitmqctl list_consumers` once per minute, and
output all the state files from a single command.
Further TODO items on this front include removing the hardcoded list
of queues.
Because rabbitmq doesn't support changing the nodename of a running
rabbitmq node, Zulip installations suffered a plague of issues where
e.g. a Zulip server would reboot, the hostname would change, and
suddenly the local rabbitmq instance being used by Zulip would stop
working.
We address this problem by using, by default, a fixed rabbitmq
nodename, but providing server administrators the option to set the
rabbitmq nodename used by Zulip however they choose.
To upgrade an existing server to use this new configuration, one will
need to add something like the following to /etc/zulip/zulip.conf:
[rabbitmq]
nodename = zulip@localhost
However, I don't believe we have the puppet code in place to make this
work correctly at initial installation without rabbitmq-server being
already installed (but off), as we can easily setup in Travis CI but I
haven't been willing to do for the installer. So for now, this just
fixes our Travis CI problems.
Fixes: #1579.
Previously, we used a fixed memcached memory allocation of 512MB,
regardless of the size of the server. While that is a good allocation
for a server with 4GB of RAM, for servers with less, we should
decrease the allocation, and for a large server with much more RAM, we
should increase it. We still support the user overriding the
configuration setting, but this produces more sensible defaults.
Zulip had only patches the redis configuration in one small way, which
resulted in unnecessary portability issues for using Redis on
different versions of Linux. We replace this with just a adding an
include mechanism to the redis config.
While we're at it, we configure this to take advantage of the
new REDIS_PASSWORD secret to automatically configure redis passwords.
* Fixes passing a string argument rather than an actual Python
argument.
* Switches to hardcoding the database to connect to rather than the
user, so this check can be run as an arbitrary user.
All other zulip management command names have underscores, so
rename email-mirror to email_mirror.
This will also make it possible to import this module, which will
help in writing tests for it.
runtornado unbuffers its output using
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0).
This is not python 3 compatible since we can't specify
buffering on a text stream in python 3. So use the '-u'
option of python when calling runtornado.py to make output
unbuffered.
The manage.py change effectively switches the Zulip production server
to use the virtualenv, since all of our supervisord commands for the
various Python services go through manage.py.
Additionally, this migrates the production scripts and Nagios plugins
to use the virtualenv as well.
This results in a substantial performance improvement for all of
Zulip's backend templates.
Changes in templates:
- Change `block.super` to `super()`.
- Remove `load` tag because Jinja2 doesn't support it.
- Use `minified_js()|safe` instead of `{% minified_js %}`.
- Use `compressed_css()|safe` instead of `{% compressed_css %}`.
- `forloop.first` -> `loop.first`.
- Use `{{ csrf_input }}` instead of `{% csrf_token %}`.
- Use `{# ... #}` instead of `{% comment %}`.
- Use `url()` instead of `{% url %}`.
- Use `_()` instead of `{% trans %}` because in Jinja `trans` is a block tag.
- Use `{% trans %}` instead of `{% blocktrans %}`.
- Use `{% raw %}` instead of `{% verbatim %}`.
Changes in tools:
- Check for `trans` block in `check-templates` instead of `blocktrans`
Changes in backend:
- Create custom `render_to_response` function which takes `request` objects
instead of `RequestContext` object. There are two reasons to do this:
1. `RequestContext` is not compatible with Jinja2
2. `RequestContext` in `render_to_response` is deprecated.
- Add Jinja2 related support files in zproject/jinja2 directory. It
includes a custom backend and a template renderer, compressors for js
and css and Jinja2 environment handler.
- Enable `slugify` and `pluralize` filters in Jinja2 environment.
Fixes#620.
In theory these should be the same, but in misconfigured environments
(such at Travis CI) where /etc/hosts has multiple entries for
"localhost", 127.0.0.1 is safer than "localhost".
Camo is a caching image proxy, used in Zulip to avoid mixed-content
warnings by proxying HTTP image content over HTTPS. We've been using
it in zulip.com production for years; this change makes it available
in standalone Zulip deployments.
This fixes an issue where this worker wasn't even being installed
properly in a way that sets us up for doing further reorganization of
the Zulip Nagios plugins.
cd2348e9ae broke installing Zulip in
production since it didn't correctly update the puppet configuration
to call the process_queue script using the new argument format.
This commit isn't ideal in that I'd prefer to not require updating
puppet in sync with the actual running code, but we don't have a great
mechanism for doing that.
Fixes#586.
Previously, even though the Zulip digest emails were documented in the
settings, the cron job to run the script that actually sends the daily
digest emails wasn't included in the non-zulip.com part of the Zulip
production distribution. The overall consequence is that digest
emails didn't work for non-zulip.com users. This fixes that issue by
moving that cron job into the zulip manifests.
[commit message details expanded by tabbott]
Apparently, previously nginx was only compressing text/html content.
This should result in a substantial savings in network traffic -- some
quick testing I did found it cut the total data transferred for
loading a logged-in zulip.com instance from 3MB to 1.2MB.
If running on Django 1.8, running these plugins would die with the below. A fix
for this is to run `django.setup()` before interacting with Django.
Refs:
https://docs.djangoproject.com/en/1.8/ref/applications/#troubleshooting
```
Traceback (most recent call last):
File "/usr/lib/nagios/plugins/check_send_receive_time", line 103, in <module>
sender = get_user_profile_by_email(settings.NAGIOS_SEND_BOT)
File "/home/zulip/deployments/current/zerver/lib/cache.py", line 113, in func_with_caching
val = func(*args, **kwargs)
File "/home/zulip/deployments/current/zerver/models.py", line 1073, in get_user_profile_by_email
return UserProfile.objects.select_related().get(email__iexact=email.strip())
File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 328, in get
num = len(clone)
File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 144, in __len__
self._fetch_all()
File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 977, in _fetch_all
self._result_cache = list(self.iterator())
File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 238, in iterator
results = compiler.execute_sql()
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 829, in execute_sql
sql, params = self.as_sql()
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 378, in as_sql
extra_select, order_by, group_by = self.pre_sql_setup()
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 48, in pre_sql_setup
self.setup_query()
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 39, in setup_query
self.select, self.klass_info, self.annotation_col_map = self.get_select()
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 206, in get_select
related_klass_infos = self.get_related_selections(select)
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 700, in get_related_selections
[f.name], opts, root_alias)
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1471, in setup_joins
names, opts, allow_many, fail_on_missing=True)
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1372, in names_to_path
if field.is_relation and not field.related_model:
File "/usr/lib/python2.7/dist-packages/django/utils/functional.py", line 60, in __get__
res = instance.__dict__[self.name] = self.func(instance)
File "/usr/lib/python2.7/dist-packages/django/db/models/fields/related.py", line 110, in related_model
apps.check_models_ready()
File "/usr/lib/python2.7/dist-packages/django/apps/registry.py", line 131, in check_models_ready
raise AppRegistryNotReady("Models aren't loaded yet.")
django.core.exceptions.AppRegistryNotReady: Models aren't loaded yet.
```
Ideally some of these templates should really point to the
local installation's support email address, but this is a
good start.
Exceptions:
* Where to report security incidents
* MIT Zephyr-related pages
* zulip.com terms and conditions
Previously, in Zulip voyager, the cron jobs would spew error emails
every time they ran, due to this directory not existing.
This also tightens the permissions for the folder and avoids needing
to create a nagios user for Zulip voyager; it should be writeable by
both root and the zulip user and world-readable (and thus readable by
the Nagios user on zulip.com systems).
Previously our redis config was built for precise.
Synced from redis-server 2:2.8.4-2 plus our one change, which is
disabling saving to disk, so just put that at the bottom for maximum
obviousness.
I wish there was a better way to represent the fact that this is all
we're doing, since this will make life more difficult for running on
precise as well.
Fixes#28.
This is in some ways a regression, but because we don't have
python-postmonkey packaged right now, this is required to make the
Zulip production installation process work on Trusty.
(imported from commit 539d253eb7fedc20bf02cc1f0674e9345beebf48)
This needs to be deployed on both prod and lb0 to be functional
DEPLOY INSTRUCTIONS: restart carefully
(imported from commit d97a450754608357418c80e5b3c7b3bbcd1d09fb)
This is safe because we have the wildcard-all cert.
DEPLOY INSTRUCTIONS: Change the CNAME in R53 for external-content.zulipcdn.net
to the same as www.zulip.com
(imported from commit 075984943ce3a3b17518b913ea650992e45f705e)
The one time use email addresses are prefixed with mm and need be sent
to the local zulip user to be picked up by the email mirror.
(imported from commit e17cfe6855ab7886f25ded52790b8f31df955ef2)
Thanks Tom Cook for getting these through Digicert!
We no longer need separate wildcard certificates, etc, because we have SAN star
certs.
(imported from commit 40a8961da51b6a0ae90c68b40b2af6d59cb5cf9f)
This allows us to specify different rules for the zmirror machines, which need
ports open for Zephyr.
(imported from commit f3c061e9492cbb99783f156debccf03161347e47)
This removes "X-Frame-Options DENY" from our nginx config. We need to be able
to load Zulip in an iframe for embedding and we decided that it doesn't actually
provide much protection.
(imported from commit 5bc363693db949010f6163cb3000c12229618a83)
We apparently still have some process that occationally sits idle in a
transaction for a while, which makes this alert super noisy.
(imported from commit 074b04ad746bac0da1b8714763538d1ce22da64e)
Doing so requires superuser privileges because check_postgres.pl only connects
to one database for that action. We could theoretically work around this, but I
don't think it's worthwhile for non-production DBs.
(imported from commit 3ab06e4dd6f844c81128b81709cdc3cdfbe37c47)
We believe these will generally no longer be disruptive now that we have
autocommit enabled.
(imported from commit c8c1301e0d4b188d6708173cd8c8b16279e3d910)
`/usr/bin/env python` is almost always preferred over specifying the
specific python to run (and this script doesn't work for me on OSX
with /usr/bin/python specified).
(imported from commit 531e6062ba0ac1f25e3c681bb5cf83a918d0e3e7)
This requires a puppet apply on prod, as well as manually
updating the symlinks of Zulip-latest and Humbug-latest on
prod0
(imported from commit c5ef8cd0e2d156144531b35af9a8c5226f5bf750)
To deploy this, the zulip_internal::base and zulip_internal::munin classes must
be added to nagios.zulip.net.
(imported from commit 50d6a4ed19fcc9c62c7104977d69043bf5b9bbf9)
Before we deploy this commit, we must migrate the data from the staging redis
server to the new, dedicated redis server. The steps for doing so are the
following:
* Remove the zulip::redis puppet class from staging's zulip.conf
* ssh once from staging to redis-staging.zulip.net so that the host key is known
* Create a tunnel from redis0.zulip.net to staging.zulip.net
* zulip@redis0:~$ ssh -N -L 127.0.0.1:6380:127.0.0.1:6379 -o ServerAliveInterval=30 -o ServerAliveCountMax=3 staging.zulip.net
* Set the redis instance on redis0.zulip.net to replicate the one on staging.zulip.net
* redis 127.0.0.1:6379> slaveof 127.0.0.1 6380
* Stop the app on staging
* Stop redis-server on staging
* Promote the redis server on redis0.zulip.net to a master
* redis 127.0.0.1:6379> slaveof no one
* Do a puppet apply at this commit on staging (this will bring up the tunnel to redis0)
* Deploy this commit to staging (start the app on staging)
* Kill the tunnel from redis0.zulip.net to staging.zulip.net
* Uninstall redis-server on staging
The steps for migrating prod will be the same modulo s/staging/prod0/.
(imported from commit 546d258883ac299d65e896710edd0974b6bd60f8)
The zulip::redis puppet class should be added to all our frontends' zulip.conf
after this is deployed. No puppet apply is required.
(imported from commit ccea89f4779c6c49c0cbe837adcb5be21bfe55ab)