Commit Graph

78 Commits

Author SHA1 Message Date
Tim Abbott 54c2c02011 thumbnail: Add support for multiple queue workers.
There's no need for sharding, but this allows one to spend a bit of
extra memory to reduce image-processing latency when bursts of images
are uploaded at once.
2024-07-21 19:15:43 -07:00
Anders Kaseorg e08a24e47f ruff: Fix UP006 Use `list` instead of `List` for type annotation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Alex Vandiver 1b10bf1921 check-rabbitmq-consumers: Add missing f on format string. 2024-06-05 08:59:50 -07:00
Alex Vandiver f246b82f67 puppet: Factor out pattern of writing a nagios state file atomically. 2024-05-24 11:31:25 -07:00
Tim Abbott 0a756c652c push_notifications: Shard mobile push notifications. 2024-05-02 14:25:10 -07:00
Anders Kaseorg e1ed44907b ruff: Fix SIM118 Use `key in dict` instead of `key in dict.keys()`.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-04 16:25:07 -08:00
Anders Kaseorg b267b17677 python: Use ‘not in’ for more negated membership tests.
Fixes “E713 Test for membership should be `not in`” found by ruff (now
that I’ve fixed it not to ignore scripts lacking a .py extension).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-09-26 12:09:46 -07:00
Alex Vandiver 41deef40cf nagios: Switch to generic check_cron_file for queues and consumers.
These share a common root; 91da4bd59b duplicated the code, but
didn't move the existing uses to the new utility.
2022-06-22 12:07:38 -07:00
Alex Vandiver 27b63d0baf check-rabbitmq-consumers: Fix a misleading comment. 2022-06-22 12:07:38 -07:00
Alex Vandiver 4e06ee45c7 check-rabbitmq-consumers: Remove unused --min-threshold.
This has never actually been used  -- and does not make sense with the
check-all-queues-at-once model switched to in 88a123d5e0.  The
Tornado processes are the only ones we expect to be non-1, and since
they were added in 3f03dcdf5e the right number has been read from
config, not passed as an argument.
2022-06-22 12:07:38 -07:00
Alex Vandiver 53c01aa299 check-rabbitmq-consumers: Remove --queue argument from help.
This has not been accepted since 88a123d5e0.
2022-06-22 12:07:38 -07:00
Anders Kaseorg 97e4e9886c python: Replace universal_newlines with text.
This is supported in Python ≥ 3.7.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-23 22:16:01 -08:00
Anders Kaseorg 6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Alex Vandiver 2a12fedcf1 tornado: Remove explicit tornado_processes setting; compute it.
We can compute the intended number of processes from the sharding
configuration.  In doing so, also validate that all of the ports are
contiguous.

This removes a discrepancy between `scripts/lib/sharding.py` and other
parts of the codebase about if merely having a `[tornado_sharding]`
section is sufficient to enable sharding.  Having behaviour which
changes merely based on if an empty section exists is surprising.

This does require that a (presumably empty) `9800` configuration line
exist, but making that default explicit is useful.

After this commit, configuring sharding can be done by adding to
`zulip.conf`:

```
[tornado_sharding]
9800 =              # default
9801 = other_realm
```

Followed by running `./scripts/refresh-sharding-and-restart`.
2020-09-18 15:13:40 -07:00
Alex Vandiver 13fb7875e2 nagios: Remove an unnecessary path.append. 2020-09-14 18:20:12 -07:00
Anders Kaseorg f91d287447 python: Pre-fix a few spots for better Black formatting.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 17:51:09 -07:00
Anders Kaseorg fbfd4b399d python: Elide action="store" for argparse arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 16:17:14 -07:00
Anders Kaseorg 5050fb19f6 nagios: Don’t crash on missing cron file.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-13 16:49:32 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Anders Kaseorg 67e7a3631d python: Convert percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-10 15:02:09 -07:00
Anders Kaseorg f8339f019d python: Convert assignment type annotations to Python 3.6 style.
Commit split by tabbott; this has changes to scripts/, tools/, and
puppet/.

scripts/lib/hash_reqs.py, scripts/lib/setup_venv.py,
scripts/lib/zulip_tools.py, and tools/lib/provision.py are excluded so
tools/provision still gives the right error message on Ubuntu 16.04
with Python 3.5.

Generated by com2ann, with whitespace fixes and various manual fixes
for runtime issues:

-shebang_rules: List[Rule] = [
+shebang_rules: List["Rule"] = [

-trailing_whitespace_rule: Rule = {
+trailing_whitespace_rule: "Rule" = {

-whitespace_rules: List[Rule] = [
+whitespace_rules: List["Rule"] = [

-comma_whitespace_rule: List[Rule] = [
+comma_whitespace_rule: List["Rule"] = [

-prose_style_rules: List[Rule] = [
+prose_style_rules: List["Rule"] = [

-html_rules: List[Rule] = whitespace_rules + prose_style_rules + [
+html_rules: List["Rule"] = whitespace_rules + prose_style_rules + [

-    target_port: int = None
+    target_port: int

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-24 13:06:54 -07:00
Anders Kaseorg 5901e7ba7e python: Convert function type annotations to Python 3 style.
Generated by com2ann (slightly patched to avoid also converting
assignment type annotations, which require Python 3.6), followed by
some manual whitespace adjustment, and six fixes for runtime issues:

-    def __init__(self, token: Token, parent: Optional[Node]) -> None:
+    def __init__(self, token: Token, parent: "Optional[Node]") -> None:

-def main(options: argparse.Namespace) -> NoReturn:
+def main(options: argparse.Namespace) -> "NoReturn":

-def fetch_request(url: str, callback: Any, **kwargs: Any) -> Generator[Callable[..., Any], Any, None]:
+def fetch_request(url: str, callback: Any, **kwargs: Any) -> "Generator[Callable[..., Any], Any, None]":

-def assert_server_running(server: subprocess.Popen[bytes], log_file: Optional[str]) -> None:
+def assert_server_running(server: "subprocess.Popen[bytes]", log_file: Optional[str]) -> None:

-def server_is_up(server: subprocess.Popen[bytes], log_file: Optional[str]) -> bool:
+def server_is_up(server: "subprocess.Popen[bytes]", log_file: Optional[str]) -> bool:

-    method_kwarg_pairs: List[FuncKwargPair],
+    method_kwarg_pairs: "List[FuncKwargPair]",

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-18 20:42:48 -07:00
Mateusz Mandera 96fe2e5a42 nagios: Deduplicate queue list between check-rabbitmq scripts. 2020-04-09 13:41:01 -07:00
Mateusz Mandera 122d0bca83 check-rabbitmq-queue: Add a simple algorithm to analyze queue stats.
This new algorithm is designed to avoid monitoring paging when a queue
simply has bursty behavior.
2020-04-09 13:41:01 -07:00
Tom Daff 2f213f7c8e
monitoring: Fix check-rabbitmq-consumers.
Missing commas in the definition of all the queues to check meant that it would be looking for queues with concatenated names, rather than the correct ones. Added the commas.
2020-03-25 17:19:16 -07:00
Mateusz Mandera ea93810d9a check-rabbitmq-queue: Put user check before rabbitmqctl call. 2020-03-22 18:46:28 -07:00
Mateusz Mandera 4c5a8e6f0c queue: Remove missedmessage_email_senders. 2020-01-31 12:13:51 -08:00
Tim Abbott d70e799466 bots: Remove FEEDBACK_BOT implementation.
This legacy cross-realm bot hasn't been used in several years, as far
as I know.  If we wanted to re-introduce it, I'd want to implement it
as an embedded bot using those common APIs, rather than the totally
custom hacky code used for it that involves unnecessary queue workers
and similar details.

Fixes #13533.
2020-01-25 22:41:39 -08:00
Anders Kaseorg ea6934c26d dependencies: Remove WebSockets system for sending messages.
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013.  We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not.  Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.

While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.

This WebSockets code path had a lot of downsides/complexity,
including:

* The messy hack involving constructing an emulated request object to
  hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
  needs and must be provisioned independently from the rest of the
  server).
* A duplicate check_send_receive_time Nagios test specific to
  WebSockets.
* The requirement for users to have their firewalls/NATs allow
  WebSocket connections, and a setting to disable them for networks
  where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
  been poorly maintained, and periodically throws random JavaScript
  exceptions in our production environments without a deep enough
  traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
  server restart, and especially for large installations like
  zulipchat.com, resulting in extra delay before messages can be sent
  again.

As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients.  We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.

If we later want WebSockets, we’ll likely want to just use Django
Channels.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-01-14 22:34:00 -08:00
Tim Abbott bbc1484253 check-rabbitmq-queue: Adjust threshholds for paging.
Ultimately, this isn't an effective way to monitor this queue; we want
time-based monitoring, not count-based monitoring.  Doing that
properly will likely involve modifying the queue processor to write
something about its status.

But until we add the monitoring we want, it makes sense to leave this
active with low limits.
2019-10-13 22:39:52 -07:00
Tim Abbott 1c73ce2450 user_activity: Use LoopQueueProcessingWorker strategy.
This should dramatically improve the queue processor's performance in
cases where there's a very high volume of requests on a given endpoint
by a given user, as described in the new docstring.

Until we test this more broadly in production, we won't know if this
is a full solution to the problem, but I think it's likely.  We've
never seen the UserActivityInterval worker end up backlogged without a
total queue processor outage, and it should have a similar workload.

Fixes #13180.
2019-09-21 11:48:24 -07:00
Wyatt Hoodes a109508e34 typing: Remove now-unnecessary conditional import.
As a result of dropping support for trusty, we can remove our old
pattern of putting `if False` before importing the typing module,
which was essential for Python 3.4 support, but not required and maybe
harmful on newer versions.

cron_file_helper
check_rabbitmq_consumers
hash_reqs
check_zephyr_mirror
check_personal_zephyr_mirrors
check_cron_file
zulip_tools
check_postgres_replication_lag
api_test_helpers
purge-old-deployments
setup_venv
node_cache
clean_venv_cache
clean_node_cache
clean_emoji_cache
pg_backup_and_purge
restore-backup
generate_secrets
zulip-ec2-configure-interfaces
diagnose
check_user_zephyr_mirror_liveness
2019-07-29 15:18:22 -07:00
Wyatt Hoodes e331a758c3 python: Migrate open statements to use with.
This is low priority, but it's nice to be consistently using the best
practice pattern.

Fixes: #12419.
2019-07-20 15:48:52 -07:00
Tim Abbott ad81f700a1 scripts: Remove nagios overrides for missedmessage_emails.
Since 5cec566cb9, the
missedmessage_emails queue no longer is expected to grow a backlog
over time.
2019-04-13 20:43:07 -07:00
Puneeth Chaganti 9876f1b14e check_rabbitmq_queue: Fix the time period when we ignore long queues.
The commit 87d1809657 changed the time when
digests are sent by 3 hours to account for moving from the US East Coast to the
West Coast, but didn't change the time period exception in the
`check-rabbitmq-queue` script.

Closes #5415
2019-04-13 20:43:07 -07:00
Anders Kaseorg e984107966 scripts: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:02:58 -08:00
Tim Abbott 2558f101af docs: Add documentation for `if False` mypy pattern in scripts.
This should help make it clear what's going on with these scripts.
2018-12-17 11:12:53 -08:00
Tim Abbott 3f03dcdf5e nagios: Support multiple tornado processes.
This allows our Tornado monitoring to correctly report whether
multiple configured Tornado processes are running.

This setup isn't ideal, in that it can't detect cases where the wrong
set of Tornado processes are running, but it's nice and simple and
should catch most actual problems.
2018-11-06 16:50:03 -08:00
Tim Abbott 0cac7e1cd3 tornado: Extract functions for Tornado queue names.
This moves all control for what queue to use for which realm in our
Tornado system to just the sharding.py file; no actual sharding is
done yet.
2018-11-02 17:00:10 -07:00
Anders Kaseorg 09b8ccd510 scripts/nagios/check-rabbitmq-consumers: Avoid shelling out for mv.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2018-07-19 10:43:37 -07:00
Tim Abbott 999f264ad3 check_rabbitmq_queue: Exclude slow_queries queue from alerting.
Structurally, this queue has the same property as the missed_message
one, namely that it accumulates things and processes them only every
few minutes.

This should stop Zulip from paging in response to slow queries
accumulating when a server restart happens.
2018-05-25 13:06:50 -07:00
Greg Price 4475950ddf queue: Restore prematurely-cut upgrade path.
Revert c8f034e9a "queue: Remove missedmessage_email_senders code."
As the comment in the code says, it ensures a smooth upgrade path
from 1.7.x; we can delete it in master after 1.8.0 is released.
The removal commit was merged early due to a communication failure.
2018-02-28 11:15:53 -08:00
Umair Khan c8f034e9a0 queue: Remove missedmessage_email_senders code.
After 68513952fb, all emails are sent through email_senders queue.
This commit removes code related to the legacy queue.
2018-02-21 16:43:56 -08:00
Umair Khan 68513952fb email-worker: Create EmailSendingWorker.
This commit just copies all the code from MissedMessageSendingWorker
class to a new EmailSendingWorker class. All the logic to send an email
through a queue was already there. This commit only makes the logic
generic. It does so by creating a special purpose queue called
'email_senders' to send any type of email. To make
MissedMessageSendingWorker still work we derive it from
EmailSendingWorker. All the tests that were testing
MissedMessageSendingWorker now run against EmailSendingWorker.
2017-12-20 19:36:27 -08:00
rht 54fb88f331 scripts: Replace optparse with argparse. 2017-11-21 21:23:41 -08:00
Vishnu Ks 766511e519 actions: Mark all messages as read when user unsubscribes from stream.
This fixes a bug where, when a user is unsubscribed from a stream,
they might have unread messages on that stream leak.  While it might
seem to be a minor problem, it can cause significant problems for
computing the `unread_msgs` data structures, since it means we need to
add an extra filter for whether the user is still subscribed, either
in the backend or in the UI.

Fixes #7095.
2017-11-21 20:09:17 -08:00
rht 53e37aa511 scripts: Text-wrap long lines exceeding 110. 2017-11-10 16:22:26 -08:00
rht 71188d7b0a scripts: Remove import print_function. 2017-09-29 15:43:30 -07:00