We now have two functions related to digests
for processes:
is_digest_obsolete
write_digest_file
In most cases we now **wait** to write the
digest file until after we've successfully
run a process with its new inputs.
In one place, for database migrations, we
continue to write the digest optimistically.
We'll want to fix this, but it requires a
little more code cleanup.
Here is the typical sequence of events:
NEVER RUN -
is_digest_obsolete returns True
quickly (we don't compute a hash)
write_digest_file does a write (duh)
AFTER NO CHANGES -
is_digest_obsolete returns False
after reading one file for old
hash and multiple files to compute
hash
most callers skip write_digest_file
(no files are changed)
AFTER SOME CHANGES -
is_digest_obsolete returns False
after doing full checks
most callers call write_digest_file
*after* running a process
I remove `is_force` from `file_or_package_hash_updated`
and modernize its mypy annotations.
If `is_force` is `True`, we just now run the thing
we want to force-run without having to call
`file_or_package_hash_updated` to expensively
and riskily return `True`.
Another nice outcome of this change is that if
`file_or_package_hash_updated` returns `True`,
you can know that the file or package has
indeed been updated.
For the case of `build_pygments_data` we also
skip an `os.path.exists` check when `is_force`
is `True`.
We will short-circuit more logic in the next
few commits, as well as cleaning up some of
the long/wrapper lines in the `if` statements.
We stopped using tsearch-extras in Zulip 2.1.0 after Anders figured
out how to achieve its goals with native postgres. However, we never
did a `DROP EXTENSION` on systems thta had upgraded, which meant that
backups created on systems originally installed with Zulip 2.0.x and
older, and later upgraded to Zulip 2.1.x, could not be restored on
Zulip servers created with a fresh install of Zulip 2.1.x.
We can't do this with a normal database migration, because DROP
EXTENSION has to be done as the postgres user, so we add some custom
migration code in the upgrade-zulip-stage-2 tool.
It's safe to run this whenever tsearch_extras.control is installed because:
* Zulip is AFAIK the only software that ever used tsearch_extras.
* The package was only installed via puppet on production servers configured to
run a local Zulip database.
* We'll only run this code once per system, because it removes the
package and thus the control files.
Fixes#13612.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
After some testing, I've confirmed that this seems to behave
significantly better in terms of the number of failed requests due to
Tornado being the process of restarting compared with the previous
version, as each individual process is only down for a short time,
rather than all of them being down at once.
Missing commas in the definition of all the queues to check meant that it would be looking for queues with concatenated names, rather than the correct ones. Added the commas.
Used get_venv_dependencies function to return the correct dependencies
for RHEL, Centos, Fedora rather than importing them as separate
COMMON_YUM_DEPENDENCIES in provision and create-production-venv.
In virtualenv ≥ 20, the site_packages variable was removed from
activate_this.py. To avoid a KeyError, replace
activate_locals['site_packages'] with os.path.join(venv, 'lib',
python_version), where python_version is the 'pythonX.Y' name of the
directory where site-packages resides in the virtualenv.
Fixes#14025.
Added a get_venv_dependencies() function in setup_venv.py which
returns VENV_DEPENDENCIES according to the vendor and os_version.
The reason for adding this function was because python-dev will be
depreciated in Focal but can be used as python2-dev so when adding
support for Focal VENV_DEPENDENCIES should to be os_version dependent.
isort 5 knows not to reorder imports across function calls, so this
will stop isort from breaking our code.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This adds Ubuntu 19.10 as a valid provisioning target.
The release test in setup-apt-repo was changed from a list of values to
a regex check for brevity.
The “Smileys & People” category has been split into “Smilys & Emotion”
and “People & Body”.
Also, fix generate_sha1sum_emoji to read the emoji-datasource-google
version from yarn.lock, since package.json only gives a version range.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
"Zulip Voyager" was a name invented during the Hack Week to open
source Zulip for what a single-system Zulip server might be called, as
a Star Trek pun on the code it was based on, "Zulip Enterprise".
At the time, we just needed a name quickly, but it was never a good
name, just a placeholder. This removes that placeholder name from
much of the codebase. A bit more work will be required to transition
the `zulip::voyager` Puppet class, as that has some migration work
involved.
These docstrings hadn't been properly updated in years, and bad an
awkward mix of a bad version of the user-facing documentation and
details that are no longer true (e.g. references to "Voyager").
(One important detail is that we have real documentation for this
system now).
This legacy cross-realm bot hasn't been used in several years, as far
as I know. If we wanted to re-introduce it, I'd want to implement it
as an embedded bot using those common APIs, rather than the totally
custom hacky code used for it that involves unnecessary queue workers
and similar details.
Fixes#13533.
At some point the PostgreSQL Docker image started creating the zulip
database for us, which caused our CREATE DATABASE to fail.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013. We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not. Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.
While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.
This WebSockets code path had a lot of downsides/complexity,
including:
* The messy hack involving constructing an emulated request object to
hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
needs and must be provisioned independently from the rest of the
server).
* A duplicate check_send_receive_time Nagios test specific to
WebSockets.
* The requirement for users to have their firewalls/NATs allow
WebSocket connections, and a setting to disable them for networks
where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
been poorly maintained, and periodically throws random JavaScript
exceptions in our production environments without a deep enough
traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
server restart, and especially for large installations like
zulipchat.com, resulting in extra delay before messages can be sent
again.
As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients. We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.
If we later want WebSockets, we’ll likely want to just use Django
Channels.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Our recent fixes to using the system's configured memcached settings
broke populate_db, because its hacky clear_database helper is called
with a hacked-up settings module.
We fix this by first moving this out-of-place code from models.py into
populate_db, and then saving the settings required to access memcached
so that we can use them in clear_database.
We also fix a mypy erorr in flush-memcached that matches the same
issue fixed in clear_database.
This simplifies the RDS installation process to avoid awkwardly
requiring running the installer twice, and also is significantly more
robust in handling issues around rerunning the installer.
Finally, the answer for whether dictionaries are missing is available
to Django for future use in warnings/etc. around full-text search not
being great with this configuration, should they be required.
We'll be soon documenting a production workflow that involves using
it, and that means it needs to live under scripts/ (since tools/ isn't
present in release tarballs).
`copytree` throws an error if the target already exists, and we don’t
really want to rerun the copy anyway.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Ultimately, this isn't an effective way to monitor this queue; we want
time-based monitoring, not count-based monitoring. Doing that
properly will likely involve modifying the queue processor to write
something about its status.
But until we add the monitoring we want, it makes sense to leave this
active with low limits.
This is needed on at least Debian 10, otherwise xmlsec fails to
install: `Could not find xmlsec1 config. Are libxmlsec1-dev and
pkg-config installed?`
Also remove libxmlsec1-openssl, which libxmlsec1-dev already depends.
(No changes are needed on RHEL, where libxml2-devel and xmlsec1-devel
already declare a requirement on /usr/bin/pkg-config.)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
The output log from running clean_unused_caches was too verbose as
part of the `upgrade-zulip` overall output. While this output is
potentially helpful when running it directly for debugging, it's
certainly redundant for the main production use case.
So a new flag --no-print-headers is introduced. It suppresses the
header outputs for the subtools.
Fixes#13214.
This allows the system to get updates to the Groonga repository
signing key, so `apt update` doesn’t start failing when the key
changes (like it recently did).
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
debian-archive-keyring is a dependency of the essential package apt,
so it is present in every Debian system.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
virtualenv on Ubuntu 16.04, when creating a new environment, downloads
the current version of setuptools, then replaces its pkg_resources
with an old copy from
/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl.
This causes problems, a simple example of which is reproducible from
the ubuntu:16.04 Docker base image as follows:
apt-get update
apt-get -y install python3-virtualenv
python3 -m virtualenv -p python3 /ve
/ve/bin/pip install sockjs-tornado
/ve/bin/pip download sockjs-tornado
→ `AttributeError: '_NamespacePath' object has no attribute 'sort'`
More relevantly, it breaks pip-compile in the same way. To fix this,
we need to force setuptools to be reinstalled, even if we’re asking
for the same version.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This should dramatically improve the queue processor's performance in
cases where there's a very high volume of requests on a given endpoint
by a given user, as described in the new docstring.
Until we test this more broadly in production, we won't know if this
is a full solution to the problem, but I think it's likely. We've
never seen the UserActivityInterval worker end up backlogged without a
total queue processor outage, and it should have a similar workload.
Fixes#13180.
To replace DISTRIB_FAMILY, there’s now an os_families function using
the standard ID and ID_LIKE information in /etc/os-release.
Fixes#13070; fixes#13071.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
We no longer use tsearch_extras, and the camo patch is irrelevant on
systemd systems (Xenial and newer). So we no longer need to
provide/install a PPA at all.
Closes#13027.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Now that we're implemented tsearch_extras in pure postgres, we no
longer need a custom extension. This should help us considerably, as
it means we no longer need to ship custom apt packages at all.
Fixes#467.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
As predicted in https://www.kb.cert.org/vuls/id/319816/, a malicious
worm is beginning to spread across the npm ecosystem through package
postinstall scripts. Only instead of direct self-replicating code,
the replication vector is the temptation to monetize postinstall
scripts by polluting the console logs with paid advertisements. The
effect will be the same unless we all put a stop to this while we
still can.
Apply the recommended VU#319816 workaround, which is to disable
lifecycle scripts when installing npm packages. The only fallout is:
* node-sass can’t run because it uses compiled native code; we replace
it with Dart Sass.
* phantomjs-prebuilt doesn’t download the binary at install time; we
tell it to download it in run-casper.
* ttf2woff2 transparently falls back from native code to an Emscripten
build.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>