Nowm unless you specify `--fill-cache`, memcached caches will not be
pre-filled after a server restart. This will be helpful when someone
is in a hurry (e.g. if the server is down right now, or if he/she
testing a configuration change in a newly setup server), it's best to
just restart without pre-filling the cache.
Fixes: #10900.
Update the list of ciphers that nginx will use to the current
Mozilla recommended ones.
These are Intermediate compatibility ones suitable for clients
running anything newer than Firefox 1, Chrome 1, IE 7, Opera 5
and Safari 1. Modern compatibility is not suitable as it excludes
Andriod 4 which is still seen on ~1% of traffic.
More info: https://wiki.mozilla.org/Security/Server_Side_TLS
It hasn't been working for years, but more importantly, it spams up
root's mail queue so that one can't find important things in there
(e.g. the fact that the long-term-idle cron job was failing).
This now checks if the user is zulip, and if not, switches to the
zulip user, making it possible to run it as root.
Significantly modified by tabbott to not break existing behavior.
/bin/sh and /usr/bin/env are the only two binaries that NixOS provides
at a fixed path (outside a buildFHSUserEnv sandbox).
This discussion was split from #11004.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This is a common bug that users might be tempated to introduce.
And also fix two instances of this bug that were present in our
codebase, including an important one in our upgrade code path.
This commit works by vendoring the couple functions we still use from
puppetlabs stdlib (join and range), but removing the rest of the
puppetlabs codebase, and of course cleaning up our linter rules in the
process.
Fixes#7423.
This should be a nice performance improvement for browsers that
support it.
We can't yet enabled this in the Zulip on-premise nginx configuration,
because that still has to support Trusty.
This isn't super required, in that we add these repositories via
`setup-apt-repo` in any case, but the previous code was wrong and
worth fixing in any case.
This fixes a bug where our API routes for uploaded files (where we
need to use a consistent URL between session auth and API auth) were
not properly configured to pass through the API authentication headers
(and otherwise provide REST endpoint settings).
In particular, this prevented the Zulip mobile apps from being able to
access authenticated image files using these URLs.
Apparently, we can use the process group naming style of having dashes
in the names without using the explicit nun_procs feature of
supervisord configuration.
The new configuration is perfectly satisfactory, so there's no real
reason to prefer the old approach.
Previously, this script needed access to Django settings, which in
turn required access to /etc/zulip/zulip-secrets.conf. Since that
isn't world-readable, this meant that this couldn't run as an
unprivileged `nagios` user.
Fix that by just hardcoding the appropriate path under /var/log/.
When using the Python 3 typing style, Python scripts can't import from
typing inside an `if False` (in contrast, one needs to import inside
an `if False` to support the Python 3 syntax without needing
python-typing installed). So this was just incorrectly half-converted
from the Python 2 style to the Python 3 style.
Apparently, `puppet-lint` on Ubuntu trusty throws warnings for certain
quoting patterns that are OK in modern `puppet-lint`. I believe the
old Zulip code was actually correct (i.e. the old `puppet-lint`
implementation was the problem), but it seems worth changing anyway to
suppress the warnings.
We also exclude more of puppet-apt from linting, since it's
third-party code.
This was converted to Python 3 incorrectly, in a way that actually
completely broke the script (the .decode() that this adds is critical,
since 'f' != b'f').
We fix this, and also add an assert that makes the parsing code
safer against future refactors.
We fix "ERROR: safepackage not in autoload module layout" error
which was caused by a defined type "safepackage" definitation
lying in the wrong place. We refactor to create the defined type
according to puppet guidelines. Link below:
https://docs.puppet.com/puppet/2.7/lang_defined_types.html
We fix these by adding ignore statements in a bunch of files
where this error popped up. We target only specific lines using
the ignore statements and not the entire files.
In puppet/zulip_ops/files/postgresql/setup_disks.sh line 15:
array_name=$(mdadm --examine --scan | sed 's/.*name=//')
^-- SC2034: array_name appears unused. Verify use (or export if used externally).
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In puppet/zulip_ops/files/munin-plugins/rabbitmq_connections line 66:
echo "connections.value $(HOME=$HOME rabbitmqctl list_connections | grep -v "^Listing" | grep -v "done.$" | wc -l)"
^-- SC2126: Consider using grep -c instead of grep|wc -l.
In puppet/zulip_ops/files/munin-plugins/rabbitmq_consumers line 32:
VHOST=${vhost:-"/"}
^-- SC2034: VHOST appears unused. Verify use (or export if used externally).
In puppet/zulip_ops/files/munin-plugins/rabbitmq_messages line 32:
VHOST=${vhost:-"/"}
^-- SC2034: VHOST appears unused. Verify use (or export if used externally).
In puppet/zulip_ops/files/munin-plugins/rabbitmq_messages_unacknowledged line 32:
VHOST=${vhost:-"/"}
^-- SC2034: VHOST appears unused. Verify use (or export if used externally).
In puppet/zulip_ops/files/munin-plugins/rabbitmq_messages_uncommitted line 32:
VHOST=${vhost:-"/"}
^-- SC2034: VHOST appears unused. Verify use (or export if used externally).
In puppet/zulip_ops/files/munin-plugins/rabbitmq_queue_memory line 32:
VHOST=${vhost:-"/"}
^-- SC2034: VHOST appears unused. Verify use (or export if used externally).
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In puppet/zulip/files/postgresql/env-wal-e line 6:
export AWS_ACCESS_KEY_ID=$(crudini --get "$ZULIP_SECRETS_CONF" secrets s3_backups_key)
^-- SC2155: Declare and assign separately to avoid masking return values.
In puppet/zulip/files/postgresql/env-wal-e line 7:
export AWS_SECRET_ACCESS_KEY=$(crudini --get "$ZULIP_SECRETS_CONF" secrets s3_backups_secret_key)
^-- SC2155: Declare and assign separately to avoid masking return values.
In puppet/zulip/files/postgresql/env-wal-e line 9:
if [ $? -ne 0 ]; then
^-- SC2181: Check exit code directly with e.g. 'if mycmd;', not indirectly with $?.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In puppet/zulip/files/nagios_plugins/zulip_app_frontend/check_email_deliverer_process line 16:
elif [ "$(echo "$STATUS" | egrep '(STOPPED)|(STARTING)|(BACKOFF)|(STOPPING)|(EXITED)|(FATAL)|(UNKNOWN)$')" ]
^-- SC2143: Use egrep -q instead of comparing output with [ -n .. ].
^-- SC2196: egrep is non-standard and deprecated. Use grep -E instead.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In puppet/zulip/files/nagios_plugins/zulip_app_frontend/check_email_deliverer_backlog line 8:
cd /home/zulip/deployments/current
^-- SC2164: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
With this change, all that one needs to do to start using thumbor in
production is to set the `THUMBOR_URL` setting.
Since without THUMBOR_URL enabled, the thumbor service doesn't
actually do anything, this is pretty safe.
As part of our effort to change the data model away from each user
having a single API key, we're eliminating the couple requests that
were made from Django to Tornado (as part of a /register or home
request) where we used the user's API key grabbed from the database
for authentication.
Instead, we use the (already existing) internal_notify_view
authentication mechanism, which uses the SHARED_SECRET setting for
security, for these requests, and just fetch the user object using
get_user_profile_by_id directly.
Tweaked by Yago to include the new /api/v1/events/internal endpoint in
the exempt_patterns list in test_helpers, since it's an endpoint we call
through Tornado. Also added a couple missing return type annotations.
This commits adds the necessary puppet configuration and
installer/upgrade code for installing and managing the thumbor service
in production. This configuration is gated by the 'thumbor.pp'
manifest being enabled (which is not yet the default), and so this
commit should have no effect in a default Zulip production environment
(or in the long term, in any Zulip production server that isn't using
thumbor).
Credit for this effort is shared by @TigorC (who initiated the work on
this project), @joshland (who did a great deal of work on this and got
it working during PyCon 2017) and @adnrs96, who completed the work.
While there are legitimate use cases for embedded Zulip in an iFrame,
they're rare, and it's more important to prevent this category of
attack by default.
Sysadmins can switch this to a whitelist when they want to use frames.
We no longer need or use these, since Zulip installs a pinned version
of node directly with the scripts/setup/install-node tool.
Noticed because in the effort of adding Ubuntu bionic support, we
noticed the package names changed again.
Apparently, our nginx configuration's use of "localhost", combined
with the default in modern Linux of having localhost resolve to both
the IPv4 and IPv6 addresses on a given machine, resulted in `nginx`
load-balancing requests to a given Zulip server between the IPv4 and
IPv6 addresses. This, in turn, resulted in irrelevant 502 errors
problems every few minutes on the /events endpoints for some clients.
Disabling IPv6 on the server resolved the problem, as does simply
spelling localhost as 127.0.0.1 for the `nginx` upstreams that we
declare for proxying to non-Django services on localhost.
Now, one can just set `no_serve_uploads` in `zulip.conf` to prevent
`nginx` from serving locally uploaded files.
This should help simplify the S3 integration setup process.
This option is intended to support situations like a quick Docker
setup where doing HTTPS adds more setup overhead than it's worth.
It's not intended to be used in actual production environments.
This is preferred, since we don't currently have a way to run Django
logic on the postgres hosts with the Docker implementation.
This is a necessary part of removing the need for the docker-zulip
package to patch this file to make Zulip work with Docker.
The main purpose of this change is to make it guaranteed that
`manage.py register_server --rotate-key` can edit the
/etc/zulip/zulip-secrets.conf configuration via crudini.
But it also adds value by ensuring zulip-secrets.conf is not readable
by other users.
With modern apt-key, the fingerprints are displayed in the more fully
written-out format with spaces, and so `apt-key add` was being run
every time.
This fixes some unnecessary work being done on each puppet run on
Debian stretch.
I would have preferred to not need to do this by upgrading to
upstream, but see #7423 for notes on why that isn't going to work
(basically they broke support for puppet older than 4).
Apparently, these confused the puppet template parser, since they are
somewhat similar to its syntax, resulting in errors trying to use
these templates. It's easy enough to just remove the example
content from the base postgres config file.
We can't really do this in the zulip manifests (since it's sorta a
sysadmin policy decision), but these scripts can cause significant
load when Nagios logs into a server (because many of them take 50ms or
more of work to run). So we just get rid of them.
It seems unlikely we're going to add support for additional older
Debian-based distributions, so it makes sense to just use an else
statement. This should save a bit of busywork every time we add a new
distro.
Mostly, this involves adding the big block at the bottom and making
10 a variable so that it's easier to compare different versions of
these.
I did an audit of the configuration changes between 9.6 and 10, so
this should be fine, but it hasn't been tested yet.
Our recent addition of Content-Security-Policy to the file uploads
backend broke in-browser previews of PDFs.
The content-types change in the last commit fixed loading PDFs for
most users; but the result was ugly, because e.g. Chrome would put the
PDF previewer into a frame (so there were 2 left scrollbars).
There were two changes needed to fix this:
* Loading the style to use the plugin. We corrected this by adding
`style-src 'self' 'unsafe-inline';`
* Loading the plugin. Our CSP blocked loading the PDf viewer plugin.
To correct this, we add object-src 'self', and then limit the
plugin-type to just the one for application/pdf.
We verified this new CSP using https://csp-evaluator.withgoogle.com/
in addition to manual testing.
Previously, user-uploaded PDF files were not properly rendered by
browsers with the local uploads backend, because we weren't setting
the correct content-type.
This adds a basic Content-Security-Policy for user-uploaded avatars
served by the LOCAL_UPLOADS backend.
I think this is for now an unnecessary follow-up to
d608a9d315, but is worth doing because
we may later change what can be uploaded in the avatars directory.
This adds a basic Content-Security-Policy for user-uploaded files with local uploads.
While over time, we plan to add CSP for the main site as well, this CSP is particularly
important for the local-uploads backend, which often shares a domain with the main site.
Running this on additional machines would be redundant; additionally,
the FillState checker cron job runs only on cron systems, so this will
crash on other app frontends.
While this is a different system than I'd written up in #8004, I think
this is a better solution to the general problem of cron jobs to run
on just one server.
Fixes#8004.
Revert c8f034e9a "queue: Remove missedmessage_email_senders code."
As the comment in the code says, it ensures a smooth upgrade path
from 1.7.x; we can delete it in master after 1.8.0 is released.
The removal commit was merged early due to a communication failure.
From here on we start to authenticate uploaded file request before
serving this files in production. This involves allowing NGINX to
pass on these file requests to Django for authentication and then
serve these files by making use on internal redirect requests having
x-accel-redirect field. The redirection on requests and loading
of x-accel-redirect param is handled by django-sendfile.
NOTE: This commit starts to authenticate these requests for Zulip
servers running platforms either Ubuntu Xenial (16.04) or above.
Fixes: #320 and #291 partially.
This should make it possible to use the zulip_ops base rules
successfully on chat.zulip.org. Many of the changes in this commit
are hacks and probably can be cleaned up later, but given that we plan
to drop trusty support soon, it's likely that most of them will simply
be deleted then.
We've been running this change on zulipchat.com for a couple of months
now. Before then, we used to regularly get exceptions like this:
File "./zerver/views/messages.py", line 749, in get_messages_backend
setter=stringify_message_dict)
File "./zerver/lib/cache.py", line 275, in generic_bulk_cached_fetch
cache_set_many(items_for_remote_cache)
File "./zerver/lib/cache.py", line 215, in cache_set_many
get_cache_backend(cache_name).set_many(items, timeout=timeout)
File "/home/zulip/deployments/2017-09-28-21-04-12/zulip-py3-venv/lib/python3.5/site-packages/django/core/cache/backends/memcached.py", line 150, in set_many
self._cache.set_multi(safe_data, self.get_backend_timeout(timeout))
pylibmc.Error: error 48 from memcached_set_multi
This error means memcached was unable to find space for the new value.
You might think that because memcached provides an LRU cache, this
shouldn't happen because it would just evict something... but in fact
* memcached splits its data into "slabs" by object size, and
* until recently, once a 1MiB "chunk" is allocated to a given "slab"
i.e. size class, it wouldn't be reclaimed to allocate to another.
So once the cache has been filled up with objects of some distribution
of sizes, if some objects come in that would go in a different size
class, we have no chunks for that size class / slab, and can't get one.
And that's exactly what was happening on zulipchat.com.
Useful background can be found in
https://github.com/memcached/memcached/wiki/ServerMaint#slab-imbalancehttps://github.com/memcached/memcached/wiki/ReleaseNotes1411https://github.com/memcached/memcached/wiki/ReleaseNotes1425https://github.com/memcached/memcached/wiki/ReleaseNotes150
We're already running v1.4.25, which provides an "automover" that should
be well equipped to fix this; v1.5.0 turns it on by default.
With this commit, adopt the "modern start line" recommended in the
release notes for our v1.4.25, including turning on the automover.