This is preferred, since we don't currently have a way to run Django
logic on the postgres hosts with the Docker implementation.
This is a necessary part of removing the need for the docker-zulip
package to patch this file to make Zulip work with Docker.
The main purpose of this change is to make it guaranteed that
`manage.py register_server --rotate-key` can edit the
/etc/zulip/zulip-secrets.conf configuration via crudini.
But it also adds value by ensuring zulip-secrets.conf is not readable
by other users.
With modern apt-key, the fingerprints are displayed in the more fully
written-out format with spaces, and so `apt-key add` was being run
every time.
This fixes some unnecessary work being done on each puppet run on
Debian stretch.
I would have preferred to not need to do this by upgrading to
upstream, but see #7423 for notes on why that isn't going to work
(basically they broke support for puppet older than 4).
Apparently, these confused the puppet template parser, since they are
somewhat similar to its syntax, resulting in errors trying to use
these templates. It's easy enough to just remove the example
content from the base postgres config file.
We can't really do this in the zulip manifests (since it's sorta a
sysadmin policy decision), but these scripts can cause significant
load when Nagios logs into a server (because many of them take 50ms or
more of work to run). So we just get rid of them.
It seems unlikely we're going to add support for additional older
Debian-based distributions, so it makes sense to just use an else
statement. This should save a bit of busywork every time we add a new
distro.
Mostly, this involves adding the big block at the bottom and making
10 a variable so that it's easier to compare different versions of
these.
I did an audit of the configuration changes between 9.6 and 10, so
this should be fine, but it hasn't been tested yet.
Our recent addition of Content-Security-Policy to the file uploads
backend broke in-browser previews of PDFs.
The content-types change in the last commit fixed loading PDFs for
most users; but the result was ugly, because e.g. Chrome would put the
PDF previewer into a frame (so there were 2 left scrollbars).
There were two changes needed to fix this:
* Loading the style to use the plugin. We corrected this by adding
`style-src 'self' 'unsafe-inline';`
* Loading the plugin. Our CSP blocked loading the PDf viewer plugin.
To correct this, we add object-src 'self', and then limit the
plugin-type to just the one for application/pdf.
We verified this new CSP using https://csp-evaluator.withgoogle.com/
in addition to manual testing.
Previously, user-uploaded PDF files were not properly rendered by
browsers with the local uploads backend, because we weren't setting
the correct content-type.
This adds a basic Content-Security-Policy for user-uploaded avatars
served by the LOCAL_UPLOADS backend.
I think this is for now an unnecessary follow-up to
d608a9d315, but is worth doing because
we may later change what can be uploaded in the avatars directory.
This adds a basic Content-Security-Policy for user-uploaded files with local uploads.
While over time, we plan to add CSP for the main site as well, this CSP is particularly
important for the local-uploads backend, which often shares a domain with the main site.
Running this on additional machines would be redundant; additionally,
the FillState checker cron job runs only on cron systems, so this will
crash on other app frontends.
While this is a different system than I'd written up in #8004, I think
this is a better solution to the general problem of cron jobs to run
on just one server.
Fixes#8004.
Revert c8f034e9a "queue: Remove missedmessage_email_senders code."
As the comment in the code says, it ensures a smooth upgrade path
from 1.7.x; we can delete it in master after 1.8.0 is released.
The removal commit was merged early due to a communication failure.
From here on we start to authenticate uploaded file request before
serving this files in production. This involves allowing NGINX to
pass on these file requests to Django for authentication and then
serve these files by making use on internal redirect requests having
x-accel-redirect field. The redirection on requests and loading
of x-accel-redirect param is handled by django-sendfile.
NOTE: This commit starts to authenticate these requests for Zulip
servers running platforms either Ubuntu Xenial (16.04) or above.
Fixes: #320 and #291 partially.
This should make it possible to use the zulip_ops base rules
successfully on chat.zulip.org. Many of the changes in this commit
are hacks and probably can be cleaned up later, but given that we plan
to drop trusty support soon, it's likely that most of them will simply
be deleted then.
We've been running this change on zulipchat.com for a couple of months
now. Before then, we used to regularly get exceptions like this:
File "./zerver/views/messages.py", line 749, in get_messages_backend
setter=stringify_message_dict)
File "./zerver/lib/cache.py", line 275, in generic_bulk_cached_fetch
cache_set_many(items_for_remote_cache)
File "./zerver/lib/cache.py", line 215, in cache_set_many
get_cache_backend(cache_name).set_many(items, timeout=timeout)
File "/home/zulip/deployments/2017-09-28-21-04-12/zulip-py3-venv/lib/python3.5/site-packages/django/core/cache/backends/memcached.py", line 150, in set_many
self._cache.set_multi(safe_data, self.get_backend_timeout(timeout))
pylibmc.Error: error 48 from memcached_set_multi
This error means memcached was unable to find space for the new value.
You might think that because memcached provides an LRU cache, this
shouldn't happen because it would just evict something... but in fact
* memcached splits its data into "slabs" by object size, and
* until recently, once a 1MiB "chunk" is allocated to a given "slab"
i.e. size class, it wouldn't be reclaimed to allocate to another.
So once the cache has been filled up with objects of some distribution
of sizes, if some objects come in that would go in a different size
class, we have no chunks for that size class / slab, and can't get one.
And that's exactly what was happening on zulipchat.com.
Useful background can be found in
https://github.com/memcached/memcached/wiki/ServerMaint#slab-imbalancehttps://github.com/memcached/memcached/wiki/ReleaseNotes1411https://github.com/memcached/memcached/wiki/ReleaseNotes1425https://github.com/memcached/memcached/wiki/ReleaseNotes150
We're already running v1.4.25, which provides an "automover" that should
be well equipped to fix this; v1.5.0 turns it on by default.
With this commit, adopt the "modern start line" recommended in the
release notes for our v1.4.25, including turning on the automover.
This doesn't yet pass all Nagios checks correctly, and still has a few
flaws:
* The ideal setup code for the `nagios` user in the database isn't included.
* Some of the other details are a bit off; we need to split some host roles.
But it's better than nothing, and we can iterate from here.
This commit just copies all the code from MissedMessageSendingWorker
class to a new EmailSendingWorker class. All the logic to send an email
through a queue was already there. This commit only makes the logic
generic. It does so by creating a special purpose queue called
'email_senders' to send any type of email. To make
MissedMessageSendingWorker still work we derive it from
EmailSendingWorker. All the tests that were testing
MissedMessageSendingWorker now run against EmailSendingWorker.
This fixes a bug where, when a user is unsubscribed from a stream,
they might have unread messages on that stream leak. While it might
seem to be a minor problem, it can cause significant problems for
computing the `unread_msgs` data structures, since it means we need to
add an extra filter for whether the user is still subscribed, either
in the backend or in the UI.
Fixes#7095.
This causes the cron job to run only when a Zulip-managed certbot
install is actually set up.
Inside `install`, zulip.conf doesn't yet exist when we run
setup-certbot, so we write the setting later. But we also give
setup-certbot the ability to write the setting itself, so that we
can recommend it in instructions for adopting certbot in an
existing Zulip installation.
If we were making an old-fashioned webroot where hand-written static
HTML files went, somewhere under `/srv` would be most appropriate.
Here, this webroot is really more of an implementation detail of the
certbot set up by the Zulip installer/packaging, containing transient
state. So someplace under `/var` is appropriate, and specifically
under `/var/lib/zulip` in order to properly namespace it.
For background on `/var/www` and friends, see the top couple of answers
on
https://unix.stackexchange.com/questions/47436/why-web-server-var-www
For some reason, we have the USING_PGROONGA setting on in development
right now. I'm going to disable that in another commit to match what
we're doing in production, but we'll still want that setting to work
in development.
The problem here was that process_fts_updates only attempted to read
the USING_PGROONGA setting from a /etc/zulip/zulip.conf source, and
thus would just not be updating the index in development.
We weren't compressing SVG, while at the same time were incorrectly
compressing octet-stream (Which meant downloading .tar.gz files in
Chrome would get double-compressed).
Sparkle was the auto-update system used by the legacy desktop app. We
haven't been capable of using it for auto-update in years, so there's
no reason to keep around the configuration.
The new Electron app uses a different system anyway.
Whatever dist/ functionality this had in 2014 is now served by
zulip.org, and since this serves as a sample, it should be as simple
as possible.
Previously, this was more cluttered than it needed to be.
The old limits were such that these would sometimes oscillated too
high and page erroneously. The purpose of this check is to prevent
large memory leaks, and will still achieve that with a higher limit.
This allows the Nagios user to access redis without having full access
to the redis system. Ideally, this would eventually use a password
that only has statistics read access, but I'm not sure redis supports
that.
This old puppet configuration was never really used, and regardless
hardcoded an ancient zulip.net hostname. We fix this to use the
zulipconf system to get the host domain (though not, at present, the
hostname).
If a machine is configured with no swap intentationally, that
shouldn't be a Nagios problem. This alert is intended to flag
machines which are swapping.
Arguably, we should make this a symlink, but it's probably a good idea
to have every change in the production Nagios configuration go through
the zulip-puppet-apply diff experience.
Since we've found that it's fairly frequent that we want to recommend
to developers that they upgrade to a version of Zulip from Git, it
makes sense to include that by default.
It's needed by scripts/install-yarn.sh. This hadn't been discovered
because most systems end up having curl installed even though it isn't
technically a required package.
This code empirically doesn't work. It's not entirely clear why, even
having done quite a bit of debugging; partly because the code is quite
convoluted, and because it shows the symptoms of people making changes
over time without really understanding how it was supposed to work.
Moreover, this code targets an old version of the APNs provider API.
Apple deprecated that in 2015, in favor of a shiny new one which uses
HTTP/2 to meet the same needs for concurrency and scale that the old
one had to do a bunch of ad-hoc protocol design for.
So, rip this code out. We'll build a pathway to the new API from
scratch; it's not that complicated.
Whenever you restarted supervisord services, we'd end up leaking one
process from the process_queue group, eventually resulting in running
out of memory.
Fixes#6184.
This causes `upgrade-zulip-from-git`, as well as a no-option run of
`tools/build-release-tarball`, to produce a Zulip install running
Python 3, rather than Python 2. In particular this means that the
virtualenv we create, in which all application code runs, is Python 3.
One shebang line, on `zulip-ec2-configure-interfaces`, explicitly
keeps Python 2, and at least one external ops script, `wal-e`, also
still runs on Python 2. See discussion on the respective previous
commits that made those explicit. There may also be some other
third-party scripts we use, outside of this source tree and running
outside our virtualenv, that still run on Python 2.
The Zulip server's settings are only available if process-fts-updates
is running is on the same server as a Zulip production deployment. So
we instead check whether we have pgroonga configured in
/etc/zulip/zulip.conf.
On `trusty` there is no package for `boto` or `gevent` on Python 3, both
of which are dependencies of `wal-e` (at the version we've pinned.) This
is something used only on database servers and only in a replication
scenario, and it doesn't involve any of our code outside the wal-e repo,
so the Python version it uses is quite independent of the Zulip
application server itself and the rest of our code. For now, keep it
explicitly on Python 2 while we move forward for most everything else.
This script in `zulip_ops` is handy for managing EC2 instances. It uses
`boto`, which isn't available in `trusty` for Python 3. The use of
`boto` here isn't particularly deep, so we could replace it with some
more manual HTTP calls if it comes to that. For now, just mark it to
stay on Python 2 while we move the app and all the rest of the ops code
(except this and another straggler or two) to Python 3.
Also make a comment on this package in the Puppet manifest clearer
about what it specifically refers to.
This consists of the `zulip_ops::stats` Puppet class, which has apparently
not been used since 2014, and a number of files that I believe were
only used for that. Also a couple of tiny loose ends in other files.
This is only actually used in our `wal-e` setup, which is in
zulip_ops::postgres_common. (In fact the only mentions of `gevent` in
our whole Git history are for `wal-e`.) So remove where we mention it
on the broader zulip::postgres_common module, and move it where it's
needed.
This follows up on 98cef0ab4 by eliminating the only dependency
outside of the `zulip_ops` Puppet tree on a system Python-library
package which isn't available in `trusty` for Python 3.
This follows up on 207cf6302 from last year to clean up cases that
have apparently popped up since then. Invoking the scripts directly
makes a cleaner command line in any case, and moreover is essential
to how we control running a Zulip install as either Python 2 or 3
(soon, how we always ensure it runs as Python 3.)
One exception: we're currently forcing `provision` in dev to run
Python 3, while still running both Python 2 and Python 3 jobs in CI.
We use a non-shebang invocation to do the forcing of Python 3.
In some of these contexts, we may still be *using* the Python 2
version, but at least this should eliminate running into
`ImportError`s one by one in scripts that run outside a virtualenv,
as we update their shebangs to refer to Python 3.
Several Python libraries we use don't come in Python 3 versions on
trusty: gevent, boto, twisted, django, django-tagging, whisper.
The latter two don't come in Python 3 versions even on xenial.
So some work required before we can actually switch the code that
relies on those libraries to run as Python 3 -- probably the best
solution will be to backport them all in our apt repo. (All but
`whisper` are packaged in zesty; `whisper` upstream just grew Python 3
support this year.)
These are no longer useful, with our spiffy new analytics framework,
and we haven't in fact been using them for some time, while the
`active-user-stats` cron job does cause regular mail from cron.
Just delete them.
It's rare that there's value in having the log files get this big, and
these changes mean these log files should never consume more than a
few gigabytes.
And in particular, the server.log is far more important than the other
log files, and grows much faster, so we might as well spend most of
the space we are spending on that.
I estimate that the total size of log files from this is going to be
under 1-2GB, since 75MB (compressed size) * 10 (compressed logs) +
500MB (uncompressed size) = 1.25GB from server.log, and the rest is
negligible.
Fixes part of #5724.
Most of these log files are useless except a few minutes after an
event happens, and the aggregate effect of the originals size limits
meant that Zulip's logs could consume many gigabytes of disk.
The new logging strategy should limit our usage from supervisor logs
to at most 3 Gigabytes:
* 20 * 3 = 60MB per queue worker => <1GB.
* 100 * 10 = 1GB for Django and Tornado logs.
Fixes part of #5724.
While running queue processors multithreaded will limit the
performance available to very small systems, it's easy to fix that by
adding more RAM, and previously, Zulip didn't work on such systems at
all, so this is unambiguously an improvement there.
Fixes#32.
Fixes#34.
(Commit message expanded significantly by tabbott.)
Also puts them into a processing queue, though the queue processor
does nothing.
Rewritten by tabbott to avoid unnecessary database queries in
do_send_messages.
This fixes a performance problem where we were previously starting up
a full Django process (~0.7s even on a fast machine) every time a new
email came in, potentially allowing users to accidentally DoS a Zulip
server. Now, we just post over HTTPS, allowing the existing thread
pool support to do its job.
- Add script wrapper to communicate postfix pipe with django web server
over HTTP(S). It uses shared_secret authentication mode.
- Add django view to process messages from email mirror server.
- Clean management command `email-mirror`. Left just functional
for cron email processing.
- Add routes for new tornado view.
- Change pipe script in master process postfix config template
based on updated script.
- Add tests.
Tweaked by tabbott to adjust the directory and set better defaults.
Fixes#2421.
- Enable `master` parameter for `uswgi` configuration.
It allows cleaning leaked processes if the parent
process is closed unexpectedly or with SIGKILL command.
Child processes follow to the master and kill themselves
after the main process.
Fixes#3855
- Add new 'missedmessage_email_senders' queue for sending missed messages emails.
- Add the new worker to process 'missedmessage_email_senders' queue.
- Split aggregation missed messages and sending missed messages email
to separate queue workers.
- Adapt tests for sending missed emails to the new logic.
Fixes#2607
Using `supervisorctl restart all` carried longer downtime (since it
just restarts everything at the same moment) and was less under our
control; I'm not sure it had any advantages.
Since browser clients send messages via websockets and not the API,
this is an important element in making sure mission-critical Zulip
functionality is working.
I'm not altogether happy with this (a better solution would be
database-level locking), but I think it solves the immediate problem
of folks with 2 servers being very likely to run analytics on both of
them.
This results in a brief service interruption (not a graceful restart),
but fixes a bug where on a `supervisorctl restart zulip-django`, we'd
end up leaking a bunch of uwsgi processes.
The mechanism was that sending SIGHUP to uwsgi was a command for it to
gracefully restart, so it'd start doing that (whereas supervisor
expected it to be dying)... and then supervisor would start up the new
uwsgi process group, resulting in 2 uwsgi process groups running.
This, in turn, led to a memory leak that could eventually result in
OOM kills.
The old zulip_ops Nagios configuration depended on Nagios having the
ability to login as the zulip user (with essentially full write
access); this configuration is helpful for limiting nagios to special
"nagios" user with more limited credentials.
Previously, the CRITICAL state would never fire (because x > 6 =>
x > 3). Additionally, 6s is not so unusually high as to deserve being
immediately pageable.
- Add websocket client to create connection with SockJS websocket server.
It contains callback method to launch after connection setup.
- Add '--websocket' parameter to 'check_send_receive_time' script to
check websocket connection.
- Add testing websocket connection to production installation checking.
- Add cronjob to launch websocket connection nagios test.
This makes it possible for Zulip Nagios monitoring to check for
problems impacting the websockets sending code path, which is what all
web users use.
This change adds support for displaying inline open graph previews for
links posted into Zulip.
It is designed to interact correctly with message editing.
This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control
whether this feature is enabled.
By default, this setting is currently disabled, so that we can burn it
in for a bit before it impacts users more broadly.
Eventually, we may want to make this manageable via a (set of?)
per-realm settings. E.g. I can imagine a realm wanting to be able to
enable/disable it for certain URLs.
(Why is -u needed at all? I’m not sure, but test-run-dev spins forever
“Polling run-dev...” without it.)
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This allows the actual nagios work involved with
check_send_receive_time nagios checks to be done by an unprivileged
"nagios" user rather than the "zulip" user.
There's no longer a reason to have copies of forked postgres
configuration files in our repository, since some time ago we merged
the features of these configuration files into the main
postgres_appdb_tuned.pp.
The old "zulip_internal" name was from back when Zulip, Inc. had two
distributions of Zulip, the enterprise distribution in puppet/zulip/
and the "internal" SAAS distribution in puppet/zulip_internal. I
think the name is a bit confusing in the new fully open-source Zulip
work, so we're replacing it with "zulip_ops". I don't think the new
name is perfect, but it's better.
In the following commits, we'll delete a bunch of pieces of Zulip,
Inc.'s infrastructure that don't exist anymore and thus are no longer
useful (e.g. the old Trac configuration), with the goal of cleaning
the repository of as much unnecessary content as possible.
This adds support for using PGroonga to back the Zulip full-text
search feature. Because built-in PostgreSQL full text search doesn't
support languages that don't put space between terms such as Japanese,
Chinese and so on. PGroonga supports all languages including Japanese
and Chinese.
Developers will need to re-provision when rebasing past this patch for
the tests to pass, since provision is what installs the PGroonga
package and extension.
PGroonga is enabled by default in development but not in production;
the hope is that after the PGroonga support is tested further, we can
enable it by default.
Fixes#615.
[docs and tests tweaked by tabbott]
The previous model for these Nagios checks was kinda crazy -- every
minute, we'd run a full `rabbitmctl list_consumers` for each of the
dozen+ consumers that we have, and then do the exact same parsing
logic for each to determine whether the target queue has a running
consumer to write out a state file.
Because `rabbitmctl list_consumers` takes a small amount of resources,
on systems where CPU is very limited (e.g. t2 style AWS instances),
this minor CPU wastage could be problematic.
Now we just do that `rabbitmqctl list_consumers` once per minute, and
output all the state files from a single command.
Further TODO items on this front include removing the hardcoded list
of queues.
Because rabbitmq doesn't support changing the nodename of a running
rabbitmq node, Zulip installations suffered a plague of issues where
e.g. a Zulip server would reboot, the hostname would change, and
suddenly the local rabbitmq instance being used by Zulip would stop
working.
We address this problem by using, by default, a fixed rabbitmq
nodename, but providing server administrators the option to set the
rabbitmq nodename used by Zulip however they choose.
To upgrade an existing server to use this new configuration, one will
need to add something like the following to /etc/zulip/zulip.conf:
[rabbitmq]
nodename = zulip@localhost
However, I don't believe we have the puppet code in place to make this
work correctly at initial installation without rabbitmq-server being
already installed (but off), as we can easily setup in Travis CI but I
haven't been willing to do for the installer. So for now, this just
fixes our Travis CI problems.
Fixes: #1579.
Previously, we used a fixed memcached memory allocation of 512MB,
regardless of the size of the server. While that is a good allocation
for a server with 4GB of RAM, for servers with less, we should
decrease the allocation, and for a large server with much more RAM, we
should increase it. We still support the user overriding the
configuration setting, but this produces more sensible defaults.
Zulip had only patches the redis configuration in one small way, which
resulted in unnecessary portability issues for using Redis on
different versions of Linux. We replace this with just a adding an
include mechanism to the redis config.
While we're at it, we configure this to take advantage of the
new REDIS_PASSWORD secret to automatically configure redis passwords.
* Fixes passing a string argument rather than an actual Python
argument.
* Switches to hardcoding the database to connect to rather than the
user, so this check can be run as an arbitrary user.
All other zulip management command names have underscores, so
rename email-mirror to email_mirror.
This will also make it possible to import this module, which will
help in writing tests for it.
runtornado unbuffers its output using
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0).
This is not python 3 compatible since we can't specify
buffering on a text stream in python 3. So use the '-u'
option of python when calling runtornado.py to make output
unbuffered.
The manage.py change effectively switches the Zulip production server
to use the virtualenv, since all of our supervisord commands for the
various Python services go through manage.py.
Additionally, this migrates the production scripts and Nagios plugins
to use the virtualenv as well.
This results in a substantial performance improvement for all of
Zulip's backend templates.
Changes in templates:
- Change `block.super` to `super()`.
- Remove `load` tag because Jinja2 doesn't support it.
- Use `minified_js()|safe` instead of `{% minified_js %}`.
- Use `compressed_css()|safe` instead of `{% compressed_css %}`.
- `forloop.first` -> `loop.first`.
- Use `{{ csrf_input }}` instead of `{% csrf_token %}`.
- Use `{# ... #}` instead of `{% comment %}`.
- Use `url()` instead of `{% url %}`.
- Use `_()` instead of `{% trans %}` because in Jinja `trans` is a block tag.
- Use `{% trans %}` instead of `{% blocktrans %}`.
- Use `{% raw %}` instead of `{% verbatim %}`.
Changes in tools:
- Check for `trans` block in `check-templates` instead of `blocktrans`
Changes in backend:
- Create custom `render_to_response` function which takes `request` objects
instead of `RequestContext` object. There are two reasons to do this:
1. `RequestContext` is not compatible with Jinja2
2. `RequestContext` in `render_to_response` is deprecated.
- Add Jinja2 related support files in zproject/jinja2 directory. It
includes a custom backend and a template renderer, compressors for js
and css and Jinja2 environment handler.
- Enable `slugify` and `pluralize` filters in Jinja2 environment.
Fixes#620.
In theory these should be the same, but in misconfigured environments
(such at Travis CI) where /etc/hosts has multiple entries for
"localhost", 127.0.0.1 is safer than "localhost".
Camo is a caching image proxy, used in Zulip to avoid mixed-content
warnings by proxying HTTP image content over HTTPS. We've been using
it in zulip.com production for years; this change makes it available
in standalone Zulip deployments.
This fixes an issue where this worker wasn't even being installed
properly in a way that sets us up for doing further reorganization of
the Zulip Nagios plugins.
cd2348e9ae broke installing Zulip in
production since it didn't correctly update the puppet configuration
to call the process_queue script using the new argument format.
This commit isn't ideal in that I'd prefer to not require updating
puppet in sync with the actual running code, but we don't have a great
mechanism for doing that.
Fixes#586.
Previously, even though the Zulip digest emails were documented in the
settings, the cron job to run the script that actually sends the daily
digest emails wasn't included in the non-zulip.com part of the Zulip
production distribution. The overall consequence is that digest
emails didn't work for non-zulip.com users. This fixes that issue by
moving that cron job into the zulip manifests.
[commit message details expanded by tabbott]
Apparently, previously nginx was only compressing text/html content.
This should result in a substantial savings in network traffic -- some
quick testing I did found it cut the total data transferred for
loading a logged-in zulip.com instance from 3MB to 1.2MB.
If running on Django 1.8, running these plugins would die with the below. A fix
for this is to run `django.setup()` before interacting with Django.
Refs:
https://docs.djangoproject.com/en/1.8/ref/applications/#troubleshooting
```
Traceback (most recent call last):
File "/usr/lib/nagios/plugins/check_send_receive_time", line 103, in <module>
sender = get_user_profile_by_email(settings.NAGIOS_SEND_BOT)
File "/home/zulip/deployments/current/zerver/lib/cache.py", line 113, in func_with_caching
val = func(*args, **kwargs)
File "/home/zulip/deployments/current/zerver/models.py", line 1073, in get_user_profile_by_email
return UserProfile.objects.select_related().get(email__iexact=email.strip())
File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 328, in get
num = len(clone)
File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 144, in __len__
self._fetch_all()
File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 977, in _fetch_all
self._result_cache = list(self.iterator())
File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 238, in iterator
results = compiler.execute_sql()
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 829, in execute_sql
sql, params = self.as_sql()
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 378, in as_sql
extra_select, order_by, group_by = self.pre_sql_setup()
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 48, in pre_sql_setup
self.setup_query()
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 39, in setup_query
self.select, self.klass_info, self.annotation_col_map = self.get_select()
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 206, in get_select
related_klass_infos = self.get_related_selections(select)
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 700, in get_related_selections
[f.name], opts, root_alias)
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1471, in setup_joins
names, opts, allow_many, fail_on_missing=True)
File "/usr/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1372, in names_to_path
if field.is_relation and not field.related_model:
File "/usr/lib/python2.7/dist-packages/django/utils/functional.py", line 60, in __get__
res = instance.__dict__[self.name] = self.func(instance)
File "/usr/lib/python2.7/dist-packages/django/db/models/fields/related.py", line 110, in related_model
apps.check_models_ready()
File "/usr/lib/python2.7/dist-packages/django/apps/registry.py", line 131, in check_models_ready
raise AppRegistryNotReady("Models aren't loaded yet.")
django.core.exceptions.AppRegistryNotReady: Models aren't loaded yet.
```
Ideally some of these templates should really point to the
local installation's support email address, but this is a
good start.
Exceptions:
* Where to report security incidents
* MIT Zephyr-related pages
* zulip.com terms and conditions
Previously, in Zulip voyager, the cron jobs would spew error emails
every time they ran, due to this directory not existing.
This also tightens the permissions for the folder and avoids needing
to create a nagios user for Zulip voyager; it should be writeable by
both root and the zulip user and world-readable (and thus readable by
the Nagios user on zulip.com systems).
Previously our redis config was built for precise.
Synced from redis-server 2:2.8.4-2 plus our one change, which is
disabling saving to disk, so just put that at the bottom for maximum
obviousness.
I wish there was a better way to represent the fact that this is all
we're doing, since this will make life more difficult for running on
precise as well.
Fixes#28.
This is in some ways a regression, but because we don't have
python-postmonkey packaged right now, this is required to make the
Zulip production installation process work on Trusty.
(imported from commit 539d253eb7fedc20bf02cc1f0674e9345beebf48)
This needs to be deployed on both prod and lb0 to be functional
DEPLOY INSTRUCTIONS: restart carefully
(imported from commit d97a450754608357418c80e5b3c7b3bbcd1d09fb)
This is safe because we have the wildcard-all cert.
DEPLOY INSTRUCTIONS: Change the CNAME in R53 for external-content.zulipcdn.net
to the same as www.zulip.com
(imported from commit 075984943ce3a3b17518b913ea650992e45f705e)
The one time use email addresses are prefixed with mm and need be sent
to the local zulip user to be picked up by the email mirror.
(imported from commit e17cfe6855ab7886f25ded52790b8f31df955ef2)
Thanks Tom Cook for getting these through Digicert!
We no longer need separate wildcard certificates, etc, because we have SAN star
certs.
(imported from commit 40a8961da51b6a0ae90c68b40b2af6d59cb5cf9f)
This allows us to specify different rules for the zmirror machines, which need
ports open for Zephyr.
(imported from commit f3c061e9492cbb99783f156debccf03161347e47)
This removes "X-Frame-Options DENY" from our nginx config. We need to be able
to load Zulip in an iframe for embedding and we decided that it doesn't actually
provide much protection.
(imported from commit 5bc363693db949010f6163cb3000c12229618a83)
We apparently still have some process that occationally sits idle in a
transaction for a while, which makes this alert super noisy.
(imported from commit 074b04ad746bac0da1b8714763538d1ce22da64e)
Doing so requires superuser privileges because check_postgres.pl only connects
to one database for that action. We could theoretically work around this, but I
don't think it's worthwhile for non-production DBs.
(imported from commit 3ab06e4dd6f844c81128b81709cdc3cdfbe37c47)
We believe these will generally no longer be disruptive now that we have
autocommit enabled.
(imported from commit c8c1301e0d4b188d6708173cd8c8b16279e3d910)
`/usr/bin/env python` is almost always preferred over specifying the
specific python to run (and this script doesn't work for me on OSX
with /usr/bin/python specified).
(imported from commit 531e6062ba0ac1f25e3c681bb5cf83a918d0e3e7)
This requires a puppet apply on prod, as well as manually
updating the symlinks of Zulip-latest and Humbug-latest on
prod0
(imported from commit c5ef8cd0e2d156144531b35af9a8c5226f5bf750)
To deploy this, the zulip_internal::base and zulip_internal::munin classes must
be added to nagios.zulip.net.
(imported from commit 50d6a4ed19fcc9c62c7104977d69043bf5b9bbf9)
Before we deploy this commit, we must migrate the data from the staging redis
server to the new, dedicated redis server. The steps for doing so are the
following:
* Remove the zulip::redis puppet class from staging's zulip.conf
* ssh once from staging to redis-staging.zulip.net so that the host key is known
* Create a tunnel from redis0.zulip.net to staging.zulip.net
* zulip@redis0:~$ ssh -N -L 127.0.0.1:6380:127.0.0.1:6379 -o ServerAliveInterval=30 -o ServerAliveCountMax=3 staging.zulip.net
* Set the redis instance on redis0.zulip.net to replicate the one on staging.zulip.net
* redis 127.0.0.1:6379> slaveof 127.0.0.1 6380
* Stop the app on staging
* Stop redis-server on staging
* Promote the redis server on redis0.zulip.net to a master
* redis 127.0.0.1:6379> slaveof no one
* Do a puppet apply at this commit on staging (this will bring up the tunnel to redis0)
* Deploy this commit to staging (start the app on staging)
* Kill the tunnel from redis0.zulip.net to staging.zulip.net
* Uninstall redis-server on staging
The steps for migrating prod will be the same modulo s/staging/prod0/.
(imported from commit 546d258883ac299d65e896710edd0974b6bd60f8)
The zulip::redis puppet class should be added to all our frontends' zulip.conf
after this is deployed. No puppet apply is required.
(imported from commit ccea89f4779c6c49c0cbe837adcb5be21bfe55ab)
Otherwise, we will enable the postfix config on all frontends,
regardless of whether Enterprise deployments requested it.
(imported from commit 9592be3706adcee7547f6795f32fe7b8d85e71ee)
This removed the cronjob from all app_frontend servers and enables the
local Postfix mail server on the same.
This is a no-op on staging if the parent commit has already been
applied.
To deploy this commit, run a puppet-apply on prod.
(imported from commit 6d3977fd12088abcd33418279e9fa28f9b2a2006)
This will cause us to recieve messages sent to streams.staging.zulip.com
via the local Postfix daemon running on staging.
This commit does not impact prod. To deploy, a puppet-apply is needed on
staging.
(imported from commit 9eaedc28359f55a65b672a2e078c57362897c0de)
This will allow us to roll out the Postfix-based mirror on staging in
the future without impacting production mirroring.
This branch should be puppet-deployed first on prod, then staging.
(imported from commit eceaa6c02a06f7074cacc19c6439e5928eef3ae4)
This allows the utility to run on trac.zulip.net, which doesn't have a 'zulip'
database.
(imported from commit c8eabb89e5e161191d6f2c92ca2b1428b17a9aa0)