There's no need for sharding, but this allows one to spend a bit of
extra memory to reduce image-processing latency when bursts of images
are uploaded at once.
Fixes “E713 Test for membership should be `not in`” found by ruff (now
that I’ve fixed it not to ignore scripts lacking a .py extension).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This has never actually been used -- and does not make sense with the
check-all-queues-at-once model switched to in 88a123d5e0. The
Tornado processes are the only ones we expect to be non-1, and since
they were added in 3f03dcdf5e the right number has been read from
config, not passed as an argument.
We can compute the intended number of processes from the sharding
configuration. In doing so, also validate that all of the ports are
contiguous.
This removes a discrepancy between `scripts/lib/sharding.py` and other
parts of the codebase about if merely having a `[tornado_sharding]`
section is sufficient to enable sharding. Having behaviour which
changes merely based on if an empty section exists is surprising.
This does require that a (presumably empty) `9800` configuration line
exist, but making that default explicit is useful.
After this commit, configuring sharding can be done by adding to
`zulip.conf`:
```
[tornado_sharding]
9800 = # default
9801 = other_realm
```
Followed by running `./scripts/refresh-sharding-and-restart`.
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Missing commas in the definition of all the queues to check meant that it would be looking for queues with concatenated names, rather than the correct ones. Added the commas.
This legacy cross-realm bot hasn't been used in several years, as far
as I know. If we wanted to re-introduce it, I'd want to implement it
as an embedded bot using those common APIs, rather than the totally
custom hacky code used for it that involves unnecessary queue workers
and similar details.
Fixes#13533.
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013. We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not. Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.
While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.
This WebSockets code path had a lot of downsides/complexity,
including:
* The messy hack involving constructing an emulated request object to
hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
needs and must be provisioned independently from the rest of the
server).
* A duplicate check_send_receive_time Nagios test specific to
WebSockets.
* The requirement for users to have their firewalls/NATs allow
WebSocket connections, and a setting to disable them for networks
where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
been poorly maintained, and periodically throws random JavaScript
exceptions in our production environments without a deep enough
traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
server restart, and especially for large installations like
zulipchat.com, resulting in extra delay before messages can be sent
again.
As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients. We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.
If we later want WebSockets, we’ll likely want to just use Django
Channels.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Ultimately, this isn't an effective way to monitor this queue; we want
time-based monitoring, not count-based monitoring. Doing that
properly will likely involve modifying the queue processor to write
something about its status.
But until we add the monitoring we want, it makes sense to leave this
active with low limits.
This should dramatically improve the queue processor's performance in
cases where there's a very high volume of requests on a given endpoint
by a given user, as described in the new docstring.
Until we test this more broadly in production, we won't know if this
is a full solution to the problem, but I think it's likely. We've
never seen the UserActivityInterval worker end up backlogged without a
total queue processor outage, and it should have a similar workload.
Fixes#13180.
As a result of dropping support for trusty, we can remove our old
pattern of putting `if False` before importing the typing module,
which was essential for Python 3.4 support, but not required and maybe
harmful on newer versions.
cron_file_helper
check_rabbitmq_consumers
hash_reqs
check_zephyr_mirror
check_personal_zephyr_mirrors
check_cron_file
zulip_tools
check_postgres_replication_lag
api_test_helpers
purge-old-deployments
setup_venv
node_cache
clean_venv_cache
clean_node_cache
clean_emoji_cache
pg_backup_and_purge
restore-backup
generate_secrets
zulip-ec2-configure-interfaces
diagnose
check_user_zephyr_mirror_liveness
The commit 87d1809657 changed the time when
digests are sent by 3 hours to account for moving from the US East Coast to the
West Coast, but didn't change the time period exception in the
`check-rabbitmq-queue` script.
Closes#5415
This allows our Tornado monitoring to correctly report whether
multiple configured Tornado processes are running.
This setup isn't ideal, in that it can't detect cases where the wrong
set of Tornado processes are running, but it's nice and simple and
should catch most actual problems.
Structurally, this queue has the same property as the missed_message
one, namely that it accumulates things and processes them only every
few minutes.
This should stop Zulip from paging in response to slow queries
accumulating when a server restart happens.
Revert c8f034e9a "queue: Remove missedmessage_email_senders code."
As the comment in the code says, it ensures a smooth upgrade path
from 1.7.x; we can delete it in master after 1.8.0 is released.
The removal commit was merged early due to a communication failure.
This commit just copies all the code from MissedMessageSendingWorker
class to a new EmailSendingWorker class. All the logic to send an email
through a queue was already there. This commit only makes the logic
generic. It does so by creating a special purpose queue called
'email_senders' to send any type of email. To make
MissedMessageSendingWorker still work we derive it from
EmailSendingWorker. All the tests that were testing
MissedMessageSendingWorker now run against EmailSendingWorker.
This fixes a bug where, when a user is unsubscribed from a stream,
they might have unread messages on that stream leak. While it might
seem to be a minor problem, it can cause significant problems for
computing the `unread_msgs` data structures, since it means we need to
add an extra filter for whether the user is still subscribed, either
in the backend or in the UI.
Fixes#7095.