This PR updates the puppet manifest to allow /etc/zulip to be a symlink. The current behaviour overwrites /etc/zulip if it is link to another directory, which is problematic with docker-zulip and
in particular the `LINK_SETTINGS_TO_DATA` setting.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
The nginx ‘add_header’ directive doesn’t inherit the way you’d
want (https://trac.nginx.org/nginx/ticket/854), so we need to manually
simulate inheritance using ‘include’, like we previously did with
api_headers.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
isort 5 knows not to reorder imports across function calls, so this
will stop isort from breaking our code.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Puppet doesn’t re-run an exec blocks that’s declared as creating an
existing file, even if it’s notified. Remove the creates declaration.
Fixes#13730.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
"Zulip Voyager" was a name invented during the Hack Week to open
source Zulip for what a single-system Zulip server might be called, as
a Star Trek pun on the code it was based on, "Zulip Enterprise".
At the time, we just needed a name quickly, but it was never a good
name, just a placeholder. This removes that placeholder name from
much of the codebase. A bit more work will be required to transition
the `zulip::voyager` Puppet class, as that has some migration work
involved.
This legacy cross-realm bot hasn't been used in several years, as far
as I know. If we wanted to re-introduce it, I'd want to implement it
as an embedded bot using those common APIs, rather than the totally
custom hacky code used for it that involves unnecessary queue workers
and similar details.
Fixes#13533.
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013. We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not. Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.
While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.
This WebSockets code path had a lot of downsides/complexity,
including:
* The messy hack involving constructing an emulated request object to
hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
needs and must be provisioned independently from the rest of the
server).
* A duplicate check_send_receive_time Nagios test specific to
WebSockets.
* The requirement for users to have their firewalls/NATs allow
WebSocket connections, and a setting to disable them for networks
where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
been poorly maintained, and periodically throws random JavaScript
exceptions in our production environments without a deep enough
traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
server restart, and especially for large installations like
zulipchat.com, resulting in extra delay before messages can be sent
again.
As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients. We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.
If we later want WebSockets, we’ll likely want to just use Django
Channels.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Resolves these warnings from puppet-lint.
puppet-lint| puppet/zulip/manifests/app_frontend_base.pp - WARNING: double quoted string containing no variables on line 14
puppet-lint| puppet/zulip/manifests/app_frontend_base.pp - WARNING: double quoted string containing no variables on line 19
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This has been a spurious alert for a long time.
It's unclear that this check is useful at all, but if it spikes
dramatically above what's normal, there's perhaps still utility in
being alerted.
Since LoopQueueProcessingWorker jobs cannot be monitored by checking
for connected consumers (since they poll, rather than consuming as
events arrive), they can't be monitored with check_consumers. It's
OK, because that monitoring was redundant with monitoring for
potential growth in their queue that we have as well.
Also clean up the block comments for the two other similar queue
procesors.
Now that we're implemented tsearch_extras in pure postgres, we no
longer need a custom extension. This should help us considerably, as
it means we no longer need to ship custom apt packages at all.
Fixes#467.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This commit finishes adding end-to-end support for the install script
on Debian Buster (making it production ready). Some support for this
was already added in prior commits such as
99414e2d96.
We plan to revert the postgres hunks of this once we've built
tsearch_extras for our packagecloud archive.
Fixes#9828.
The issue here was that the '.' character was unescaped and the
regex was not anchored with a terminal '$'. This was detected by
Anders Kaseorg.
Co-authored-by: Anders Kaseorg <anders@zulipchat.com>
`/etc/postfix/virtual` is of `regexp:` type, not `hash:` type, so
running `postmap` on it has no effect; we need to reload Postfix when
it changes.
http://www.postfix.org/DATABASE_README.html#detect
In the interest of forcing a reload now, optimize the regexes by
eliding the unanchored `.*`s at the beginnings and ends.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Introduced by #12966.
puppet/zulip/manifests/base.pp - WARNING: double quoted string containing no variables on line 93
puppet/zulip/manifests/base.pp - WARNING: string containing only a variable on line 93
scanf doesn’t accept a number as input, so uh, add a dummy space
character.
What. You can’t give me a bad language and then complain when I write
bad programs in it.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Fixes this warning:
Warning: The string '8167976' was automatically coerced to the numerical value 8167976 (file: /root/zulip/puppet/zulip/manifests/base.pp, line: 93, column: 19)
Fixes#9682.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Delete trailing newlines from all files, except
tools/ci/success-http-headers.txt and tools/setup/dev-motd, where they
are significant, and static/third, where we want to stay close to
upstream.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Previous cleanups (mostly the removals of Python __future__ imports)
were done in a way that introduced leading newlines. Delete leading
newlines from all files, except static/assets/zulip-emoji/NOTICE,
which is a verbatim copy of the Apache 2.0 license.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This gives us access to typing_extensions.Deque, which was not added
to typing until 3.5.4.
(PROVISION_VERSION is not bumped because the transitive dependency set
in dev.txt hasn’t changed.)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
As a result of dropping support for trusty, we can remove our old
pattern of putting `if False` before importing the typing module,
which was essential for Python 3.4 support, but not required and maybe
harmful on newer versions.
cron_file_helper
check_rabbitmq_consumers
hash_reqs
check_zephyr_mirror
check_personal_zephyr_mirrors
check_cron_file
zulip_tools
check_postgres_replication_lag
api_test_helpers
purge-old-deployments
setup_venv
node_cache
clean_venv_cache
clean_node_cache
clean_emoji_cache
pg_backup_and_purge
restore-backup
generate_secrets
zulip-ec2-configure-interfaces
diagnose
check_user_zephyr_mirror_liveness
Since 204 responses don’t contain a payload body, Content-Type is
neither required nor encouraged (RFC 7231 §3.1.1.5), and ours was
missing a semicolon to boot; Content-Length is expressly
forbidden (RFC 7230 §3.3.2).
Furthermore, these add_header directives were silencing the CORS
headers set in api_headers, because add_header inheritance doesn’t
work the way you think it does, as was known before commit
5614d51afc.
Fixes: #12902.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Disable TLS 1.0 and TLS 1.1. (We no longer need to support IE8 on
Windows XP.)
Prefer client-selected cipher order. (Now that all enabled ciphers
provide good security, this allows mobile clients lacking AES hardware
acceleration to pick ChaCha20 for better performance.)
Disable session tickets. (Mozilla discourages them based on
https://www.imperialviolet.org/2013/06/27/botchingpfs.html.)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Since we no longer support Ubuntu Trusty, we no longer need this
backwards-compatibility cruft (which we only kept around to avoid
randomizing configuration for existing systems).
This was only used in Ubuntu 14.04 Trusty.
Removing this also finally lets us simplify our security model
discussion of uploaded files.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
One of smtpd_relay_restrictions or smtpd_recipient_restrictions is
required by postfix ≥ 2.10 (see
http://www.postfix.org/SMTPD_ACCESS_README.html).
This is important for using the email mirror on Ubuntu Bionic.
Our priority hierarchy is:
(1) Tornado and base services like memcached, redis, etc.
(2) Django and message sender queue workers.
(3) Everything else.
Ideally, we'd have something a bit more fine-grained (e.g. some queue
workers are potentially in the sending path, while others aren't), but
this should have a big impact on ensuring Tornado gets the resources
it needs during load spikes.
I think this has a good chance of causing some load spikes that would
previously have resulted in a user-facing delivery delays no longer
having any significant user-facing impact.
Previously, if process_fts_updates ended up very far behind
(e.g. 100,000s of messages), it was unable to recover without doing
some very expensive databsae operations to fetch and then delete the
list of message IDs needing updates. This change fixes that issue by
doing the catch-up work in batches.
Also use psql -e (--echo-queries) in scripts that use ‘set -x’, so
errors can be traced to a specific query from the output.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
The commit 87d1809657 changed the time when
digests are sent by 3 hours to account for moving from the US East Coast to the
West Coast, but didn't change the time period exception in the
`check-rabbitmq-queue` script.
Closes#5415
The construction `su postgres -c -- bash -c 'psql …'` didn’t behave the
way it reads, and only worked by accident:
1. `-c --` sets the command to `--`.
2. `bash` sets the first argument to `bash`.
3. `-c 'psql …'` replaces the command with `psql …`.
Thus, `su` ended up executing `<shell> -c 'psql …' bash`, where
`<shell>` is the `postgres` user’s login shell, usually also `bash`,
which then executed 'psql …' and ignored the extra `bash`.
Unconfuse this construction.
Note from tabbott: The old code didn't even work by accident, it was
just broken. The right fix is to move the quoting around properly.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Lengthen the session timeout and enlarge the session cache. Upgrade
Diffie-Hellman parameters from fixed 1024-bit to custom 2048-bit.
Enable OCSP stapling.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Apparently, our testing environment for this configuration was broken
and didn't test the code we thought it did; as a result, a variable
redefinition bug slipped through.
Fixes#11786.
The overall goal of this change is to fix an issue where on Ubuntu
Trusty, we were accidentally overriding the configuration to serve
uploads from disk with the regular expressions for adding access
control headers.
However, while investigating this, it became clear that we could
considerably simplify the mental energy required to understand this
system by making the uploads-route file be unconditionally available
and included from `zulip-include/app` (which means the zulip_ops code
can share behavior here).
We also move the Access-Control-Allow-* headers to a separate include
file, to avoid duplicating it in 5 places. Fixing this duplication
discovered a potential bug in the settings used for Tornado, where
DELETE was not allowed on a route that definitely expects DELETE.
Fixes#11758.
For users putting Zulip behind certain proxies (and potentially some
third-party API clients), buffer sizes can exceed the uwsgi default of
4096. Since we aren't doing such high-throughput APIs that a small
buffer size is valuable, we should just raise this for everyone.
Nowm unless you specify `--fill-cache`, memcached caches will not be
pre-filled after a server restart. This will be helpful when someone
is in a hurry (e.g. if the server is down right now, or if he/she
testing a configuration change in a newly setup server), it's best to
just restart without pre-filling the cache.
Fixes: #10900.
Update the list of ciphers that nginx will use to the current
Mozilla recommended ones.
These are Intermediate compatibility ones suitable for clients
running anything newer than Firefox 1, Chrome 1, IE 7, Opera 5
and Safari 1. Modern compatibility is not suitable as it excludes
Andriod 4 which is still seen on ~1% of traffic.
More info: https://wiki.mozilla.org/Security/Server_Side_TLS
It hasn't been working for years, but more importantly, it spams up
root's mail queue so that one can't find important things in there
(e.g. the fact that the long-term-idle cron job was failing).
This now checks if the user is zulip, and if not, switches to the
zulip user, making it possible to run it as root.
Significantly modified by tabbott to not break existing behavior.
/bin/sh and /usr/bin/env are the only two binaries that NixOS provides
at a fixed path (outside a buildFHSUserEnv sandbox).
This discussion was split from #11004.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This is a common bug that users might be tempated to introduce.
And also fix two instances of this bug that were present in our
codebase, including an important one in our upgrade code path.
This commit works by vendoring the couple functions we still use from
puppetlabs stdlib (join and range), but removing the rest of the
puppetlabs codebase, and of course cleaning up our linter rules in the
process.
Fixes#7423.
This should be a nice performance improvement for browsers that
support it.
We can't yet enabled this in the Zulip on-premise nginx configuration,
because that still has to support Trusty.
This isn't super required, in that we add these repositories via
`setup-apt-repo` in any case, but the previous code was wrong and
worth fixing in any case.
This fixes a bug where our API routes for uploaded files (where we
need to use a consistent URL between session auth and API auth) were
not properly configured to pass through the API authentication headers
(and otherwise provide REST endpoint settings).
In particular, this prevented the Zulip mobile apps from being able to
access authenticated image files using these URLs.
Apparently, we can use the process group naming style of having dashes
in the names without using the explicit nun_procs feature of
supervisord configuration.
The new configuration is perfectly satisfactory, so there's no real
reason to prefer the old approach.
Previously, this script needed access to Django settings, which in
turn required access to /etc/zulip/zulip-secrets.conf. Since that
isn't world-readable, this meant that this couldn't run as an
unprivileged `nagios` user.
Fix that by just hardcoding the appropriate path under /var/log/.
When using the Python 3 typing style, Python scripts can't import from
typing inside an `if False` (in contrast, one needs to import inside
an `if False` to support the Python 3 syntax without needing
python-typing installed). So this was just incorrectly half-converted
from the Python 2 style to the Python 3 style.
Apparently, `puppet-lint` on Ubuntu trusty throws warnings for certain
quoting patterns that are OK in modern `puppet-lint`. I believe the
old Zulip code was actually correct (i.e. the old `puppet-lint`
implementation was the problem), but it seems worth changing anyway to
suppress the warnings.
We also exclude more of puppet-apt from linting, since it's
third-party code.
This was converted to Python 3 incorrectly, in a way that actually
completely broke the script (the .decode() that this adds is critical,
since 'f' != b'f').
We fix this, and also add an assert that makes the parsing code
safer against future refactors.
We fix "ERROR: safepackage not in autoload module layout" error
which was caused by a defined type "safepackage" definitation
lying in the wrong place. We refactor to create the defined type
according to puppet guidelines. Link below:
https://docs.puppet.com/puppet/2.7/lang_defined_types.html
We fix these by adding ignore statements in a bunch of files
where this error popped up. We target only specific lines using
the ignore statements and not the entire files.
In puppet/zulip_ops/files/postgresql/setup_disks.sh line 15:
array_name=$(mdadm --examine --scan | sed 's/.*name=//')
^-- SC2034: array_name appears unused. Verify use (or export if used externally).
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In puppet/zulip_ops/files/munin-plugins/rabbitmq_connections line 66:
echo "connections.value $(HOME=$HOME rabbitmqctl list_connections | grep -v "^Listing" | grep -v "done.$" | wc -l)"
^-- SC2126: Consider using grep -c instead of grep|wc -l.
In puppet/zulip_ops/files/munin-plugins/rabbitmq_consumers line 32:
VHOST=${vhost:-"/"}
^-- SC2034: VHOST appears unused. Verify use (or export if used externally).
In puppet/zulip_ops/files/munin-plugins/rabbitmq_messages line 32:
VHOST=${vhost:-"/"}
^-- SC2034: VHOST appears unused. Verify use (or export if used externally).
In puppet/zulip_ops/files/munin-plugins/rabbitmq_messages_unacknowledged line 32:
VHOST=${vhost:-"/"}
^-- SC2034: VHOST appears unused. Verify use (or export if used externally).
In puppet/zulip_ops/files/munin-plugins/rabbitmq_messages_uncommitted line 32:
VHOST=${vhost:-"/"}
^-- SC2034: VHOST appears unused. Verify use (or export if used externally).
In puppet/zulip_ops/files/munin-plugins/rabbitmq_queue_memory line 32:
VHOST=${vhost:-"/"}
^-- SC2034: VHOST appears unused. Verify use (or export if used externally).
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In puppet/zulip/files/postgresql/env-wal-e line 6:
export AWS_ACCESS_KEY_ID=$(crudini --get "$ZULIP_SECRETS_CONF" secrets s3_backups_key)
^-- SC2155: Declare and assign separately to avoid masking return values.
In puppet/zulip/files/postgresql/env-wal-e line 7:
export AWS_SECRET_ACCESS_KEY=$(crudini --get "$ZULIP_SECRETS_CONF" secrets s3_backups_secret_key)
^-- SC2155: Declare and assign separately to avoid masking return values.
In puppet/zulip/files/postgresql/env-wal-e line 9:
if [ $? -ne 0 ]; then
^-- SC2181: Check exit code directly with e.g. 'if mycmd;', not indirectly with $?.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In puppet/zulip/files/nagios_plugins/zulip_app_frontend/check_email_deliverer_process line 16:
elif [ "$(echo "$STATUS" | egrep '(STOPPED)|(STARTING)|(BACKOFF)|(STOPPING)|(EXITED)|(FATAL)|(UNKNOWN)$')" ]
^-- SC2143: Use egrep -q instead of comparing output with [ -n .. ].
^-- SC2196: egrep is non-standard and deprecated. Use grep -E instead.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
In puppet/zulip/files/nagios_plugins/zulip_app_frontend/check_email_deliverer_backlog line 8:
cd /home/zulip/deployments/current
^-- SC2164: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
With this change, all that one needs to do to start using thumbor in
production is to set the `THUMBOR_URL` setting.
Since without THUMBOR_URL enabled, the thumbor service doesn't
actually do anything, this is pretty safe.
As part of our effort to change the data model away from each user
having a single API key, we're eliminating the couple requests that
were made from Django to Tornado (as part of a /register or home
request) where we used the user's API key grabbed from the database
for authentication.
Instead, we use the (already existing) internal_notify_view
authentication mechanism, which uses the SHARED_SECRET setting for
security, for these requests, and just fetch the user object using
get_user_profile_by_id directly.
Tweaked by Yago to include the new /api/v1/events/internal endpoint in
the exempt_patterns list in test_helpers, since it's an endpoint we call
through Tornado. Also added a couple missing return type annotations.
This commits adds the necessary puppet configuration and
installer/upgrade code for installing and managing the thumbor service
in production. This configuration is gated by the 'thumbor.pp'
manifest being enabled (which is not yet the default), and so this
commit should have no effect in a default Zulip production environment
(or in the long term, in any Zulip production server that isn't using
thumbor).
Credit for this effort is shared by @TigorC (who initiated the work on
this project), @joshland (who did a great deal of work on this and got
it working during PyCon 2017) and @adnrs96, who completed the work.
While there are legitimate use cases for embedded Zulip in an iFrame,
they're rare, and it's more important to prevent this category of
attack by default.
Sysadmins can switch this to a whitelist when they want to use frames.
We no longer need or use these, since Zulip installs a pinned version
of node directly with the scripts/setup/install-node tool.
Noticed because in the effort of adding Ubuntu bionic support, we
noticed the package names changed again.
Apparently, our nginx configuration's use of "localhost", combined
with the default in modern Linux of having localhost resolve to both
the IPv4 and IPv6 addresses on a given machine, resulted in `nginx`
load-balancing requests to a given Zulip server between the IPv4 and
IPv6 addresses. This, in turn, resulted in irrelevant 502 errors
problems every few minutes on the /events endpoints for some clients.
Disabling IPv6 on the server resolved the problem, as does simply
spelling localhost as 127.0.0.1 for the `nginx` upstreams that we
declare for proxying to non-Django services on localhost.
Now, one can just set `no_serve_uploads` in `zulip.conf` to prevent
`nginx` from serving locally uploaded files.
This should help simplify the S3 integration setup process.
This option is intended to support situations like a quick Docker
setup where doing HTTPS adds more setup overhead than it's worth.
It's not intended to be used in actual production environments.
This is preferred, since we don't currently have a way to run Django
logic on the postgres hosts with the Docker implementation.
This is a necessary part of removing the need for the docker-zulip
package to patch this file to make Zulip work with Docker.
The main purpose of this change is to make it guaranteed that
`manage.py register_server --rotate-key` can edit the
/etc/zulip/zulip-secrets.conf configuration via crudini.
But it also adds value by ensuring zulip-secrets.conf is not readable
by other users.