Commit Graph

41 Commits

Author SHA1 Message Date
Alex Vandiver e1acd7b974 process_queue: For threaded workers, create them when they start.
Creating the QueueProcessingWorker objects when the ThreadedWorker is
created can lead to a race which caused confusing error messages:

1. A thread tries to call `self.worker = get_worker()`
2. This call raises an exception, which is caught by
   `log_and_exit_if_exception`
3. `log_and_exit_if_exception` sends our process a SIGUSR1, _but
    otherwise swallows the error_.
4. The thread's `.run()` is called, which tries to access
   `self.worker`, which was never set, and throws another exception.
5. The process handles the SIGUSR1, restarting.

Move the creation of the worker to when it is started, so the worker
object does not need to be stored, and possibly have a decoupled
failure.
2024-01-12 08:38:46 -08:00
Anders Kaseorg a50eb2e809 mypy: Enable new error explicit-override.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-10-12 12:28:41 -07:00
Alex Vandiver 9f231322c9 workers: Pass down if they are running multi-threaded.
This allows them to decide for themselves if they should enable
timeouts.
2023-05-16 14:05:01 -07:00
Anders Kaseorg a7f9c4f958 logging: Pass more format arguments to logging.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-06-03 12:27:23 -07:00
Anders Kaseorg 702ce071f4 python: Accept Optional[FrameType] in signal handlers.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-28 09:31:55 -08:00
Mateusz Mandera 2d3d0f862a process_queue: Rename Threaded_worker to ThreadedWorker.
Threaded_worker doesn't fit the python naming convention we rely on.
2021-11-16 11:21:05 -08:00
Mateusz Mandera 7cc345d7b1 process_queue: Improve handling of exceptions in process_queue.
Unhandled exceptions propagating to process_queue were not caught there,
causing improper logging - errors didn't land in errors.log as expected.
Exceptions should be caught and explicitly logged by the process_queue
logger. Exceptions occurring during consuming events are caught and
handled inside the worker's logic - however those that happen while
setting up the worker were not addressed at all, and that's the core bug
we mean to address here.

Furthermore, in multi-threaded mode we want the autoreload mechanism to
be working - which it doesn't without catching the exceptions. The
correct approach is to - again - catch the exception, log it and then
send SIGUSR1 signal to trigger exit and autoreload.
2021-11-16 11:21:05 -08:00
Anders Kaseorg 6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Alex Vandiver d47637fa40 queue: Set a max consume timeout with SIGALRM.
SIGALRM is the simplest way to set a specific maximum duration that
queue workers can take to handle a specific message.  This only works
in non-threaded environments, however, as signal handlers are
per-process, not per-thread.

The MAX_CONSUME_SECONDS is set quite high, at 10s -- the longest
average worker consume time is embed_links, which hovers near 1s.
Since just knowing the recent mean does not give much information[1],
it is difficult to know how much variance is expected.  As such, we
set the threshold to be such that only events which are significant
outliers will be timed out.  This can be tuned downwards as more
statistics are gathered on the runtime of the workers.

The exception to this is DeferredWorker, which deals with quite-long
requests, and thus has no enforceable SLO.

[1] https://www.autodesk.com/research/publications/same-stats-different-graphs
2020-10-06 17:26:14 -07:00
Alex Vandiver de1db2c838 sentry: Provide more metadata in queue processors.
This allows aggregation by queue, makes the event data more readily
accessible, and clears out the breadcrumbs upon every batch that is
serviced.
2020-09-18 15:13:08 -07:00
Anders Kaseorg a50fae89e2 python: Elide type=str from argparse arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 16:17:14 -07:00
Anders Kaseorg 3c5b39da9c python: Elide nargs for argparse flag arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 16:17:14 -07:00
Anders Kaseorg b4597a8ca8 python: Elide default for store_{true,false} argparse arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 16:17:14 -07:00
Anders Kaseorg a5dbab8fb0 python: Remove redundant dest for argparse arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-02 11:04:10 -07:00
Anders Kaseorg bdc365d0fe logging: Pass format arguments to logging.
https://docs.python.org/3/howto/logging.html#optimization

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-02 10:18:02 -07:00
rht 41e3db81be dependencies: Upgrade to Django 2.2.10.
Django 2.2.x is the next LTS release after Django 1.11.x; I expect
we'll be on it for a while, as Django 3.x won't have an LTS release
series out for a while.

Because of upstream API changes in Django, this commit includes
several changes beyond requirements and:

* urls: django.urls.resolvers.RegexURLPattern has been replaced by
  django.urls.resolvers.URLPattern; affects OpenAPI code and related
  features which re-parse Django's internals.
  https://code.djangoproject.com/ticket/28593
* test_runner: Change number to suffix. Django changed the name in this
  ticket: https://code.djangoproject.com/ticket/28578
* Delete now-unnecessary SameSite cookie code (it's now the default).
* forms: urlsafe_base64_encode returns string in Django 2.2.
  https://docs.djangoproject.com/en/2.2/ref/utils/#django.utils.http.urlsafe_base64_encode
* upload: Django's File.size property replaces _get_size().
  https://docs.djangoproject.com/en/2.2/_modules/django/core/files/base/
* process_queue: Migrate to new autoreload API.
* test_messages: Add an extra query caused by .refresh_from_db() losing
  the .select_related() on the Realm object.
* session: Sync SessionHostDomainMiddleware with Django 2.2.

There's a lot more we can do to take advantage of the new release;
this is tracked in #11341.

Many changes by Tim Abbott, Umair Waheed, and Mateusz Mandera squashed
are squashed into this commit.

Fixes #10835.
2020-02-13 16:27:26 -08:00
Tim Abbott 8e7ce7cc79 python: Sort migrations/management command imports with isort.
This is a preparatory commit for using isort for sorting all of our
imports, merging changes to files where we can easily review the
changes as something we're happy with.

These are also files with relatively little active development, which
means we don't expect much merge conflict risk from these changes.
2020-01-14 13:07:47 -08:00
Anders Kaseorg becef760bf cleanup: Delete leading newlines.
Previous cleanups (mostly the removals of Python __future__ imports)
were done in a way that introduced leading newlines.  Delete leading
newlines from all files, except static/assets/zulip-emoji/NOTICE,
which is a verbatim copy of the Apache 2.0 license.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-08-06 23:29:11 -07:00
Vishnu Ks 123bcea518 management: Don't use sys.exit(1).
Using sys.exit in a management command makes it impossible
to unit test the code in question.  The correct approach to do the same
thing in Django management commands is to raise CommandError.

Followup of b570c0dafa
2019-05-03 14:20:39 -07:00
Tim Abbott f04d6ed19e python: Sort imports in management commands. 2017-11-15 15:43:47 -08:00
rht a311678190 zerver/management: Use python 3 syntax for typing. 2017-10-26 15:24:56 -07:00
derAnfaenger d1afab7199 Replace deprecated Logging.warn calls with Logging.warning. 2017-10-02 11:11:42 +02:00
rht e239e97351 zerver/management: Remove absolute_import. 2017-09-27 10:00:39 -07:00
Vishnu Ks 1b0b135bfc management: Remove unused imports from process_queue. 2017-08-08 14:13:19 -07:00
Umair Khan 0e8231d0f1 process_queue: Recover gracefully after PostgreSQL restart.
- For threaded workers:
Django's autoreloader catches SIGQUIT(3) to reload the program. If
a process being watched by autoreloader exits with status code 3,
reloader will restart the process. To reload, we send SIGUSR1(10)
signal from consumers to a handler in process_queue which then
exits with status code 3.

- For single worker per process:
Catch the SIGUSR1 and quit; supervisorctl will restart the worker
automatically.

Fixes #5512
2017-07-07 16:33:15 -07:00
rht 940cf9db3b Run queue processors multithreaded in production if system memory <3.5GB.
While running queue processors multithreaded will limit the
performance available to very small systems, it's easy to fix that by
adding more RAM, and previously, Zulip didn't work on such systems at
all, so this is unambiguously an improvement there.

Fixes #32.
Fixes #34.

(Commit message expanded significantly by tabbott.)
2017-06-03 12:19:58 -07:00
Rafid Aslam 41bd88d5ed pep8: Fix E301 pep8 violations.
Fix "E301: expected (1 or 2) blank line" pep8 violations.
2016-11-29 08:51:44 -08:00
Steve Howell ca43cbc654 logging: Reducing logging in run-dev.py for queue workers.
In dev, we no longer log that individual queue workers were launched.

Instead, in dev (and prod as well), we log a message with the total
count.
2016-11-17 11:12:02 -08:00
Tim Abbott 86e933a4a1 process_queue: Suppress USING_RABBITMQ warnings in test suite. 2016-10-27 12:36:06 -07:00
Tim Abbott 1e54897ca7 process_queue: Add missing type annotation. 2016-08-04 15:57:03 -07:00
Tim Abbott 92062b9526 process_queue: Add missing annotation. 2016-08-04 15:57:02 -07:00
rahuldeve a3745178e5 Use django.utils.autoreload to restart queue workers at code change.
Fixes #621, #1045.
2016-06-26 20:12:11 -07:00
Tim Abbott a1a27b1789 Annotate most Zulip management commands. 2016-06-04 10:12:06 -07:00
Tim Abbott 06b33da709 process_queue: Fix missing worker.setup() in single-threaded codepath. 2016-03-27 23:17:16 -07:00
Tim Abbott cd2348e9ae Run queue processers multithreaded in development.
This change drops the memory used for Python processes run by Zulip in
development from about 1GB to 300MB on my laptop.

On the front of safety, http://pika.readthedocs.org/en/latest/faq.html
explains "Pika does not have any notion of threading in the code. If
you want to use Pika with threading, make sure you have a Pika
connection per thread, created in that thread. It is not safe to share
one Pika connection across threads.".  Since this code only connects
to rabbitmq inside the individual threads, I believe this should be
safe.

Progress towards #32.
2016-03-20 18:04:24 -07:00
Tim Abbott 7595e4b05f process_queue: Fix worker variable being accessed before initialization. 2016-02-03 19:29:44 -08:00
Reid Barton ae0ae3dde8 Django 1.8: declare positional arguments in management commands
(imported from commit d9efca1376de92c8187d25f546c79fece8d2d8c6)
2015-08-20 23:35:40 -07:00
Zev Benjamin ffb6266319 Refuse to run a queue processor if USING_RABBITMQ is False
(imported from commit 39beff47cdbb18ba39756989e6f07facbd16864f)
2013-10-28 14:30:53 -04:00
Zev Benjamin 871afde142 Prepare process_queue to be used with supervisor's numprocs
(imported from commit 5a652b93f1e8b32b5ed89d622035161abaedfb11)
2013-09-24 20:44:15 -04:00
Zev Benjamin 6cf5ebc63d Add new system for defining and running queue-processing workers
(imported from commit 7e77da57b8ad0b70837785c85f546601ba5b1957)
2013-09-24 20:44:15 -04:00