Our priority hierarchy is:
(1) Tornado and base services like memcached, redis, etc.
(2) Django and message sender queue workers.
(3) Everything else.
Ideally, we'd have something a bit more fine-grained (e.g. some queue
workers are potentially in the sending path, while others aren't), but
this should have a big impact on ensuring Tornado gets the resources
it needs during load spikes.
I think this has a good chance of causing some load spikes that would
previously have resulted in a user-facing delivery delays no longer
having any significant user-facing impact.
Previously, if process_fts_updates ended up very far behind
(e.g. 100,000s of messages), it was unable to recover without doing
some very expensive databsae operations to fetch and then delete the
list of message IDs needing updates. This change fixes that issue by
doing the catch-up work in batches.
Also use psql -e (--echo-queries) in scripts that use ‘set -x’, so
errors can be traced to a specific query from the output.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
The commit 87d1809657 changed the time when
digests are sent by 3 hours to account for moving from the US East Coast to the
West Coast, but didn't change the time period exception in the
`check-rabbitmq-queue` script.
Closes#5415
The construction `su postgres -c -- bash -c 'psql …'` didn’t behave the
way it reads, and only worked by accident:
1. `-c --` sets the command to `--`.
2. `bash` sets the first argument to `bash`.
3. `-c 'psql …'` replaces the command with `psql …`.
Thus, `su` ended up executing `<shell> -c 'psql …' bash`, where
`<shell>` is the `postgres` user’s login shell, usually also `bash`,
which then executed 'psql …' and ignored the extra `bash`.
Unconfuse this construction.
Note from tabbott: The old code didn't even work by accident, it was
just broken. The right fix is to move the quoting around properly.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Lengthen the session timeout and enlarge the session cache. Upgrade
Diffie-Hellman parameters from fixed 1024-bit to custom 2048-bit.
Enable OCSP stapling.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Apparently, our testing environment for this configuration was broken
and didn't test the code we thought it did; as a result, a variable
redefinition bug slipped through.
Fixes#11786.
The overall goal of this change is to fix an issue where on Ubuntu
Trusty, we were accidentally overriding the configuration to serve
uploads from disk with the regular expressions for adding access
control headers.
However, while investigating this, it became clear that we could
considerably simplify the mental energy required to understand this
system by making the uploads-route file be unconditionally available
and included from `zulip-include/app` (which means the zulip_ops code
can share behavior here).
We also move the Access-Control-Allow-* headers to a separate include
file, to avoid duplicating it in 5 places. Fixing this duplication
discovered a potential bug in the settings used for Tornado, where
DELETE was not allowed on a route that definitely expects DELETE.
Fixes#11758.
For users putting Zulip behind certain proxies (and potentially some
third-party API clients), buffer sizes can exceed the uwsgi default of
4096. Since we aren't doing such high-throughput APIs that a small
buffer size is valuable, we should just raise this for everyone.
Nowm unless you specify `--fill-cache`, memcached caches will not be
pre-filled after a server restart. This will be helpful when someone
is in a hurry (e.g. if the server is down right now, or if he/she
testing a configuration change in a newly setup server), it's best to
just restart without pre-filling the cache.
Fixes: #10900.
Update the list of ciphers that nginx will use to the current
Mozilla recommended ones.
These are Intermediate compatibility ones suitable for clients
running anything newer than Firefox 1, Chrome 1, IE 7, Opera 5
and Safari 1. Modern compatibility is not suitable as it excludes
Andriod 4 which is still seen on ~1% of traffic.
More info: https://wiki.mozilla.org/Security/Server_Side_TLS
It hasn't been working for years, but more importantly, it spams up
root's mail queue so that one can't find important things in there
(e.g. the fact that the long-term-idle cron job was failing).