Commit Graph

479 Commits

Author SHA1 Message Date
Alex Vandiver 142de0f670 queue: Increase default timeout to 30s, from 10s.
Not all of the workers are known to be safe to interrupt; they might
leave inconsistent state.  As such, terminating them with timeouts
should currently only be a last-resort against stalled queues, not a
regular occurrence.
2020-10-27 16:39:31 -07:00
Alex Vandiver c73dd194f0 sentry: Group all worker timeouts together, by queue.
Since the exception can be triggered at arbitrary places in the stack
based on whenever the alarm happens to fire, they do not often group
together.

Explicitly group them together, grouped only by which queue the work
is in.
2020-10-27 16:39:31 -07:00
Alex Vandiver 7cf737988d queue: Be more explicit about test/real queue division. 2020-10-26 12:32:47 -07:00
Mateusz Mandera 716df658fa queue_processors: Don't run test queues with run-dev.py. 2020-10-18 14:07:31 -07:00
Steve Howell 378062cc83 performance: Avoid call to access_stream_by_id.
We already trust ids that are put on our queue
for deferred work. For example, see the code for
"mark_stream_messages_as_read_for_everyone"

We now pass stream_recipient_id when we queue
up work for do_mark_stream_messages_as_read.

This generally saves about 3 queries per
user when we unsubscribe them from a stream.
2020-10-16 12:58:11 -07:00
Steve Howell 31eb97ddde performance: Fix do_mark_stream_messages_as_read.
This function no longer asks for data that it
doesn't need.
2020-10-16 12:58:11 -07:00
Anders Kaseorg 6564540d15 docs: Fix some spelling errors.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-13 15:47:13 -07:00
Alex Vandiver f0b23b0752 queue: Switch non-batch consumer to also use start_json_consumer.
This has no effect on consumption rate, but unifies the codepaths.
Before:
```
$ ./manage.py queue_rate --count 50000
Purging queue...
Enqueue rate: 11187 / sec
Dequeue rate: 4158 / sec
```

After:
```
$ ./manage.py queue_rate --count 50000
Purging queue...
Enqueue rate: 11010 / sec
Dequeue rate: 4113 / sec
```
2020-10-11 14:19:42 -07:00
Alex Vandiver 45c9c3cc30 queue: Monitor user_activity queue, now that it has a consumer.
Since this was using repead individual get() calls previously, it
could not be monitored for having a consumer.  Add it in, by marking
it of queue type "consumer" (the default), and adding Nagios lines for
it.

Also adjust missedmessage_emails to be monitored; it stopped using
LoopQueueProcessingWorker in 5cec566cb9, but was never added back
into the set of monitored consumers.
2020-10-11 14:19:42 -07:00
Alex Vandiver f9358d5330 queue: Switch batch interface to use the channel.consume iterator.
This low-level interface allows consuming from a queue with timeouts.
This can be used to either consume in batches (with an upper timeout),
or one-at-a-time.  This is notably more performant than calling
`.get()` repeatedly (what json_drain_queue does under the hood), which
is "*highly discouraged* as it is *very inefficient*"[1].

Before this change:
```
$ ./manage.py queue_rate --count 10000 --batch
Purging queue...
Enqueue rate: 11158 / sec
Dequeue rate: 3075 / sec
```

After:
```
$ ./manage.py queue_rate --count 10000 --batch
Purging queue...
Enqueue rate: 11511 / sec
Dequeue rate: 19938 / sec
```

[1] https://www.rabbitmq.com/consumers.html#fetching
2020-10-11 14:19:40 -07:00
Alex Vandiver 2547bdbf4a queue: Rename consume_wrapper to a better name. 2020-10-09 20:40:51 -07:00
Alex Vandiver d5a6b0f99a queue: Rename queue_size, and update for all local queues.
Despite its name, the `queue_size` method does not return the number
of items in the queue; it returns the number of items that the local
consumer has delivered but unprocessed.  These are often, but not
always, the same.

RabbitMQ's queues maintain the queue of unacknowledged messages; when
a consumer connects, it sends to the consumer some number of messages
to handle, known as the "prefetch."  This is a performance
optimization, to ensure the consumer code does not need to wait for a
network round-trip before having new data to consume.

The default prefetch is 0, which means that RabbitMQ immediately dumps
all outstanding messages to the consumer, which slowly processes and
acknowledges them.  If a second consumer were to connect to the same
queue, they would receive no messages to process, as the first
consumer has already been allocated them.  If the first consumer
disconnects or crashes, all prior events sent to it are then made
available for other consumers on the queue.

The consumer does not know the total size of the queue -- merely how
many messages it has been handed.

No change is made to the prefetch here; however, future changes may
wish to limit the prefetch, either for memory-saving, or to allow
multiple consumers to work the same queue.

Rename the method to make clear that it only contains information
about the local queue in the consumer, not the full RabbitMQ queue.
Also include the waiting message count, which is used by the
`consume()` iterator for similar purpose to the pending events list.
2020-10-09 20:40:39 -07:00
Anders Kaseorg 9bfbb29763 queue_processors: Use try…finally to prevent leaking an alarm.
Otherwise, if consume_func raised an exception for any reason *other*
than the alarm being fired, the still-pending alarm would have fired
later at some arbitrary point in the calling code.

We need two try…finally blocks in case the signal arrives just before
signal.alarm(0).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-07 15:37:46 -07:00
Alex Vandiver d47637fa40 queue: Set a max consume timeout with SIGALRM.
SIGALRM is the simplest way to set a specific maximum duration that
queue workers can take to handle a specific message.  This only works
in non-threaded environments, however, as signal handlers are
per-process, not per-thread.

The MAX_CONSUME_SECONDS is set quite high, at 10s -- the longest
average worker consume time is embed_links, which hovers near 1s.
Since just knowing the recent mean does not give much information[1],
it is difficult to know how much variance is expected.  As such, we
set the threshold to be such that only events which are significant
outliers will be timed out.  This can be tuned downwards as more
statistics are gathered on the runtime of the workers.

The exception to this is DeferredWorker, which deals with quite-long
requests, and thus has no enforceable SLO.

[1] https://www.autodesk.com/research/publications/same-stats-different-graphs
2020-10-06 17:26:14 -07:00
Alex Vandiver baf882a133 queue: Only ACK drain_queue once it has completed work on the list.
Currently, drain_queue and json_drain_queue ack every message as it is
pulled off of the queue, until the queue is empty.  This means that if
the consumer crashes between pulling a batch of messages off the
queue, and actually processing them, those messages will be
permanently lost.  Sending an ACK on every message also results in a
significant amount lot of traffic to rabbitmq, with notable
performance implications.

Send a singular ACK after the processing has completed, by making
`drain_queue` into a contextmanager.  Additionally, use the `multiple`
flag to ACK all of the messages at once -- or explicitly NACK the
messages if processing failed.  Sending a NACK will re-queue them at
the front of the queue.

Performance of a no-op dequeue before this change:
```
$ ./manage.py queue_rate --count 50000 --batch
Purging queue...
Enqueue rate: 10847 / sec
Dequeue rate: 2479 / sec
```
Performance of a no-op dequeue after this change (a 25% increase):
```
$ ./manage.py queue_rate --count 50000 --batch
Purging queue...
Enqueue rate: 10752 / sec
Dequeue rate: 3079 / sec
```
2020-10-06 17:26:14 -07:00
Alex Vandiver df86a564dc queue: Let stop() work with LoopQueueProcessingWorker. 2020-10-06 17:26:14 -07:00
Alex Vandiver 8cf37a0d4b queue: Add a tool to profile no-op enqueue and dequeue actions. 2020-10-06 17:26:14 -07:00
Tim Abbott 7fa8bafe81 lint: Fix type of initial 0 in queue monitoring. 2020-09-21 15:47:30 -07:00
Mateusz Mandera 810514dd9d queue: Update stats file every 30 seconds.
This system can't update stats while the queue is idle, without using
threads for this, but at least we ensure to update the file after
consuming an event if more than MAX_SECONDS_BEFORE_UPDATE_STATS passed
since the last update, regardless of the number of iterations done so
far.
2020-09-21 15:24:02 -07:00
Mateusz Mandera 40c4511a9c queue: Fix misspelled consume_iteration_counter variable. 2020-09-21 15:22:58 -07:00
Mateusz Mandera 2365a53496 queue: Fix a race condition in monitoring after queue stops being idle.
The race condition is described in the comment block removed by this
commit. This leaves room for another, remaining race condition
that should be virtually impossible, but nevertheless it seems
worthwhile to have it documented in the code, so we put a new comment
describing it.
As a final note, this is not a new race condition,
it was hypothetically possible with the old code as well.
2020-09-21 15:22:56 -07:00
Alex Vandiver de1db2c838 sentry: Provide more metadata in queue processors.
This allows aggregation by queue, makes the event data more readily
accessible, and clears out the breadcrumbs upon every batch that is
serviced.
2020-09-18 15:13:08 -07:00
Mateusz Mandera bb4567f57e queue: Extract get_remaining_queue_size method. 2020-09-11 15:51:07 -07:00
Mateusz Mandera 1d466a4fc5 queue: Make embed_link updates stats on every iteration. 2020-09-11 15:51:07 -07:00
Anders Kaseorg bef46dab3c python: Prefer kwargs form of dict.update.
For less inflation by Black.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 17:51:09 -07:00
Anders Kaseorg ab120a03bc python: Replace unnecessary intermediate lists with generators.
Mostly suggested by the flake8-comprehension plugin.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-02 11:15:41 -07:00
Mateusz Mandera 9b50c49ea7 streams: Mark all messages as read when deactivating a stream.
The query to finds and marks all unread UserMessages in the stream as read
can be quite expensive, so we'll move that work to the deferred_work
queue and split it into batches.

Fixes #15770.
2020-09-01 11:24:27 -07:00
Mateusz Mandera 06151672ba
queue: Use locking to avoid race conditions in missedmessage_emails.
This queue had a race condition with creation of another Timer while
maybe_send_batched_emails is still doing its work, which may cause
two or more threads to be running maybe_send_batched_emails
at the same time, mutating the shared data simultaneously.

Another less likely potential race condition was that
maybe_send_batched_emails after sending out its email, can call
ensure_timer(). If the consume function is run simultaneously
in the main thread, it will call ensure_timer() too, which,
given unfortunate timings, might lead to both calls setting a new Timer.

We add locking to the queue to avoid such race conditions.

Tested manually, by print debugging with the following setup:
1. Making handle_missedmessage_emails sleep 2 seconds for each email,
   and changed BATCH_DURATION to 1s to make the queue start working
   right after launching.
2. Putting a bunch of events in the queue.
3. ./manage.py process_queue --queue_name missedmessage_emails
4. Once maybe_send_batched_emails is called and while it's processing
the events, I pushed more events to the queue. That triggers the
consume() function and ensure_timer().

Before implementing the locking mechanism, this causes two threads
to run maybe_send_batched_emails at the same time, mutating each other's
shared data, causing a traceback such as

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 1182, in run
    self.function(*self.args, **self.kwargs)
  File "/srv/zulip/zerver/worker/queue_processors.py", line 507, in maybe_send_batched_emails
    del self.events_by_recipient[user_profile_id]
KeyError: '5'

With the locking mechanism, things get handled as expected, and
ensure_timer() exits if it can't obtain the lock due to
maybe_send_batched_emails still working.

Co-authored-by: Tim Abbott <tabbott@zulip.com>
2020-08-26 12:40:59 -07:00
Anders Kaseorg 61d0417e75 python: Replace ujson with orjson.
Fixes #6507.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-08-11 10:55:12 -07:00
Anders Kaseorg 60a25b2721 docs: Fix spelling errors caught by codespell.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-08-11 10:23:06 -07:00
Alex Vandiver 2928bbc8bd logging: Report stack_info on logging.exception calls.
The exception trace only goes from where the exception was thrown up
to where the `logging.exception` call is; any context as to where
_that_ was called from is lost, unless `stack_info` is passed as well.
Having the stack is particularly useful for Sentry exceptions, which
gain the full stack trace.

Add `stack_info=True` on all `logging.exception` calls with a
non-trivial stack; we omit `wsgi.py`.  Adjusts tests to match.
2020-08-11 10:16:54 -07:00
Mateusz Mandera a7039c815e queue_processors: Fix UnboundLocalError in QueueProcessingWorker.
consume_time_seconds wasn't properly defined at the beginning, so when
a BaseException that isn't a subclass of Exception is thrown, the
finally: block could be entered with it still undefined.
2020-08-11 10:09:42 -07:00
Anders Kaseorg 8e6a439529 queue_processors: Fix strict_optional errors.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-06 11:25:48 -07:00
Tim Abbott 52676c0670 lint: Work around a pyflakes bug.
Without this change, pyflakes reports this exception:

pyflakes  | zerver/worker/queue_processors.py:152:9 local variable 'e' is assigned to but never used
pyflakes  | zerver/worker/queue_processors.py:155:81 undefined name 'e'
2020-07-03 17:24:36 -07:00
Mateusz Mandera d51afcf485 emails: Improve handling of timeouts when sending.
We use the EMAIL_TIMEOUT django setting to timeout after 15s of trying
to send an email. This will nicely lead to retries in the email_senders
queue, due to the retry_send_email_failures decorator.

smtlib documentation suggests that socket.timeout can be raised as the
result of timing out, so in attempts I'm getting
smtplib.SMTPServerDisconnected. Either way, seems appropriate to add
socket.timeout to the exception that we catch.
2020-07-03 16:52:50 -07:00
Vishnu KS 0a36f04c20 i18n: Mark notification bot message in queue_processors for translation. 2020-06-26 14:57:18 -07:00
Mateusz Mandera 85d4536486 docs: Update some comments for the new release versioning scheme.
With the new scheme, the equivalent of 2.3 is 4.0.
2020-06-25 10:33:03 -07:00
Anders Kaseorg 579f05f3ed queue_processors: Avoid unchecked casts.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-22 17:18:19 -07:00
Tim Abbott 4d7550d705 views: Extract message_edit.py for message editing views.
This is a pretty clean extraction of files that lets us shrink one of
our largest files.
2020-06-22 15:08:34 -07:00
Mateusz Mandera 8d2d64c100 CVE-2020-14215: Fix validation in PreregistrationUser queries.
The most import change here is the one in maybe_send_to_registration
codepath, as the insufficient validation there could lead to fetching
an expired PreregistrationUser that was invited as an administrator
admin even years ago, leading to this registration ending up in the
new user being a realm administrator.

Combined with the buggy migration in
0198_preregistrationuser_invited_as.py, this led to users incorrectly
joining as organizations administrators by accident.  But even without
that bug, this issue could have allowed a user who was invited as an
administrator but then had that invitation expire and then joined via
social authentication incorrectly join as an organization administrator.

The second change is in ConfirmationEmailWorker, where this wasn't a
security problem, but if the server was stopped for long enough, with
some invites to send out email for in the queue, then after starting it
up again, the queue worker would send out emails for invites that
had already expired.
2020-06-16 23:35:39 -07:00
Anders Kaseorg 5dc9b55c43 python: Manually convert more percent-formatting to f-strings.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 23:27:22 -07:00
Anders Kaseorg 1ed2d9b4a0 logging: Use logging.exception and exc_info for unexpected exceptions.
logging.exception() and logging.debug(exc_info=True),
etc. automatically include a traceback.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 23:27:22 -07:00
Anders Kaseorg a803e68528 email-mirror-postfix: Handle 8-bit messages correctly.
Since JSON can’t represent bytes, we encode them with base64.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 20:24:06 -07:00
Anders Kaseorg bff3dcadc8 email: Migrate to new Python ≥ 3.3 email API.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 20:24:06 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Graham Bleaney 461d5b1a3e pysa: Introduce sanitizers, models, and inline marking safe.
This commit adds three `.pysa` model files: `false_positives.pysa`
for ruling out false positive flows with `Sanitize` annotations,
`req_lib.pysa` for educating pysa about Zulip's `REQ()` pattern for
extracting user input, and `redirects.pysa` for capturing the risk
of open redirects within Zulip code. Additionally, this commit
introduces `mark_sanitized`, an identity function which can be used
to selectively clear taint in cases where `Sanitize` models will not
work. This commit also puts `mark_sanitized` to work removing known
false postive flows.
2020-06-11 12:57:49 -07:00
Anders Kaseorg 67e7a3631d python: Convert percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-10 15:02:09 -07:00
Anders Kaseorg 8dd83228e7 python: Convert "".format to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus --keep-percent-format, but with the
NamedTuple changes reverted (see commit
ba7906a3c6, #15132).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-08 15:31:20 -07:00
Anders Kaseorg 19cc22e5ab queue: Fix types to reflect that Pika channels transmit bytes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-07 11:09:24 -07:00
Mateusz Mandera 200ce821a2 user_activity: Put client id instead of name in event dicts.
This saves the completely unnecessary work of mapping the Client name
to its ID.  Because we had in-process caching of the immutable Client
objects, this isn't a material performance win, but it will eventually
let us delete that caching logic and have a simpler system.
2020-05-29 15:19:55 -07:00
Mateusz Mandera e2262b0b64 queue_processors: Log time spent getting data for url in embed_links. 2020-05-21 12:13:46 -07:00
Mateusz Mandera dd40649e04 queue_processors: Remove the slow_queries queue.
While this functionality to post slow queries to a Zulip stream was
very useful in the early days of Zulip, when there were only a few
hundred accounts, it's long since been useless since (1) the total
request volume on larger Zulip servers run by Zulip developers, and
(2) other server operators don't want real-time notifications of slow
backend queries.  The right structure for this is just a log file.

We get rid of the queue and replace it with a "zulip.slow_queries"
logger, which will still log to /var/log/zulip/slow_queries.log for
ease of access to this information and propagate to the other logging
handlers.  Reducing the amount of queues is good for lowering zulip's
memory footprint and restart performance, since we run at least one
dedicated queue worker process for each one in most configurations.
2020-05-11 00:45:13 -07:00
Anders Kaseorg bdc365d0fe logging: Pass format arguments to logging.
https://docs.python.org/3/howto/logging.html#optimization

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-02 10:18:02 -07:00
Wyatt Hoodes 82e7ad8e25 data exports: Handle pending and failed exports.
Prior to this change, there were reports of 500s in
production due to `export.extra_data` being a
Nonetype.  This was reproducible using the s3
backend in development when a row was created in
the `RealmAuditLog` table, but the export failed in
the `DeferredWorker`.  This left an entry lying
about that was never updated with an `extra_data`
field.

To fix this, we catch any exceptions in the
`DeferredWorker`, and then update `extra_data` to
encode the failure.  We also fix the fact that we
never updated the export UI table with pending exports.

These changes also negated the use for the somewhat
hacky `clear_success_banner` logic.
2020-04-30 13:00:59 -07:00
Anders Kaseorg fead14951c python: Convert assignment type annotations to Python 3.6 style.
This commit was split by tabbott; this piece covers the vast majority
of files in Zulip, but excludes scripts/, tools/, and puppet/ to help
ensure we at least show the right error messages for Xenial systems.

We can likely further refine the remaining pieces with some testing.

Generated by com2ann, with whitespace fixes and various manual fixes
for runtime issues:

-    invoiced_through: Optional[LicenseLedger] = models.ForeignKey(
+    invoiced_through: Optional["LicenseLedger"] = models.ForeignKey(

-_apns_client: Optional[APNsClient] = None
+_apns_client: Optional["APNsClient"] = None

-    notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
-    signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)

-    author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)
+    author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)

-    bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)
+    bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)

-    default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
-    default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)

-descriptors_by_handler_id: Dict[int, ClientDescriptor] = {}
+descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {}

-worker_classes: Dict[str, Type[QueueProcessingWorker]] = {}
-queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {}
+worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {}
+queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {}

-AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None
+AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-22 11:02:32 -07:00
Mateusz Mandera fe8f57b8b7 queue_processors: Write a newline char at the end of stats files. 2020-04-10 13:48:16 -07:00
Mateusz Mandera 5252b081bd queue_processors: Gather statistics on queue worker operations. 2020-04-01 16:44:06 -07:00
arpit551 8f7733cb20 emails: Added placeholders strings in FormAddress.
We've had a bug for a while that if any ScheduledEmail objects get
created with the wrong email sender address, even after the sysadmin
corrects the problem, they'll still get errors because of the objects
stored with the wrong format.

We solve this by using FromAddress placeholders strings in
send_future_email function, so that ScheduledEmail objects end up
setting the final `from_address` value when mail is actually sent
using the setting in effect at that time.

Fixes #11008.
2020-03-27 16:41:02 -07:00
Mateusz Mandera 5da2f80140 queue_processors: Extract a duplicated logic block into do_consume. 2020-03-22 18:45:46 -07:00
Tim Abbott 783a77c532 queue processors: Flush per-request caches after each item.
Several of our queues are capable of doing work that includes
rendering markdown (outgoing_webhook, embedded_bots, embed_links, and
email_mirror).  As a result, it's essential that these don't cache
per-request data (specifically, realm filters) longer than they
should, making editing/deleting linkifiers potentially use old
settings until the relevant process was restarted.

Flushing these caches is extremely cheap (just clearing two
dictionaries) and thus is reasonable to do after every queue event,
rather than trying to do it only the ~1/3 of queues that specifically
do markdown processing.  We do the same in our middleware for
reset_queries.

It's not worth writing a test for this because it's very difficult to
create the test setup situation for this bug with a single test worker
process; one needs to edit the linkifier configuration in a different
process than the one sending the message in order to see the bug.

This was a much larger visible bug on Zulip 2.1.x, where the presence
of the message_sender queue meant that this would apply to messages
sent via a browser.

Fixes #14095.
2020-03-03 15:29:11 -08:00
Steve Howell 2e8dec233e slow queries: Use internal_send_stream_message().
Note that while the test mocks the actual message
send, we now have a `get_stream` call in the queue
worker, so we have to set up a real stream for
testing (or we could have mocked that as well, but
it didn't seem necessary).  The setup queries add
to the amount of queries reported by the test,
plus the `get_stream` call.  I just made the
query count a digits regex, which is a little bit
lame, but I don't think it's worth risking test
flakes for this.
2020-02-11 12:20:54 -08:00
Mateusz Mandera 4c5a8e6f0c queue: Remove missedmessage_email_senders. 2020-01-31 12:13:51 -08:00
Tim Abbott d70e799466 bots: Remove FEEDBACK_BOT implementation.
This legacy cross-realm bot hasn't been used in several years, as far
as I know.  If we wanted to re-introduce it, I'd want to implement it
as an embedded bot using those common APIs, rather than the totally
custom hacky code used for it that involves unnecessary queue workers
and similar details.

Fixes #13533.
2020-01-25 22:41:39 -08:00
Anders Kaseorg ea6934c26d dependencies: Remove WebSockets system for sending messages.
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013.  We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not.  Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.

While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.

This WebSockets code path had a lot of downsides/complexity,
including:

* The messy hack involving constructing an emulated request object to
  hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
  needs and must be provisioned independently from the rest of the
  server).
* A duplicate check_send_receive_time Nagios test specific to
  WebSockets.
* The requirement for users to have their firewalls/NATs allow
  WebSocket connections, and a setting to disable them for networks
  where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
  been poorly maintained, and periodically throws random JavaScript
  exceptions in our production environments without a deep enough
  traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
  server restart, and especially for large installations like
  zulipchat.com, resulting in extra delay before messages can be sent
  again.

As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients.  We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.

If we later want WebSockets, we’ll likely want to just use Django
Channels.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-01-14 22:34:00 -08:00
Mateusz Mandera 89046ea1a9 email_mirror: Give extract_and_validate a more descriptive name. 2020-01-12 11:30:18 -08:00
Mateusz Mandera c011d2c6d3 email_mirror: Migrate missed message addresses from redis to database.
Addresses point 1 of #13533.

MissedMessageEmailAddress objects get tied to the specific that was
missed by the user. A useful benefit of that is that email message sent
to that address will handle topic changes - if the message that was
missed gets its topic changed, the email response will get posted under
the new topic, while in the old model it would get posted under the
old topic, which could potentially be confusing.

Migrating redis data to this new model is a bit tricky, so the migration
code has comments explaining some of the compromises made there, and
test_migrations.py tests handling of the various possible cases that
could arise.
2020-01-07 13:03:22 -08:00
Mateusz Mandera e90866876c queue: Take advantage of ABC for defining abstract worker base classes.
QueueProcessingWorker and LoopQueueProcessingWorker are abstract classes
meant to be subclassed by a class that will define its own consume()
or consume_batch() method. ABCs are suited for that and we can tag
consume/consume_batch with the @abstractmethod wrapper which will
prevent subclasses that don't define these methods properly to be
impossible to even instantiate (as opposed to only crashing once
consume() is called). It's also nicely detected by mypy, which will
throw errors such as this on invalid use:

error: Only concrete class can be given where "Type[TestWorker]" is
expected
error: Cannot instantiate abstract class 'TestWorker' with abstract
attribute 'consume'

Due to it being detected by mypy, we can remove the test
test_worker_noconsume which just tested the old version of this -
raising an exception when the unimplemented consume() gets called. Now
it can be handled already on the linter level.
2019-12-28 10:52:17 -08:00
Mateusz Mandera a54640fc68 queue: Share exception handling code between loop and normal workers.
LoopQueueProcessingWorker can handle exceptions inside consume_batch in
a similar manner to how QueueProcessingWorker handles exceptions inside
consume.
2019-12-28 10:47:36 -08:00
Tim Abbott 1465628c95 queue workers: Use self.queue_name in retry_event calls.
This just adds a bit of robustness if we ever end up renaming queues.
2019-12-04 10:08:48 -08:00
Mateusz Mandera 7d0444f903 push_notifs: Improve handling of errors when talking to the bouncer.
We use the plumbing introduced in a previous commit, to now raise
PushNotificationBouncerRetryLaterError in send_to_push_bouncer in case
of issues with talking to the bouncer server. That's a better way of
dealing with the errors than the previous approach of returning a
"failed" boolean, which generally wasn't checked in the code anyway and
did nothing.
The PushNotificationBouncerRetryLaterError exception will be nicely
handled by queue processors to retry sending again, and due to being a
JsonableError, it will also communicate the error to API users.
2019-12-04 09:58:22 -08:00
Mateusz Mandera 20b30e1503 push_notifs: Set up plumbing for retrying in case of bouncer error.
We add PushNotificationBouncerRetryLaterError as an exception to signal
an error occurred when trying to communicate with the bouncer and it
should be retried. We use JsonableError as the base class, because this
signal will need to work in two roles:
1. When the push notification was being issued by the queue worker
PushNotificationsWorker, it will signal to the worker to requeue the
event and try again later.
2. The exception will also possibly be raised (this will be added in the
next commit) on codepaths coming from a request to an API endpoint (for
example to add a token, to users/me/apns_device_token). In that case,
it'll be needed to provide a good error to the API user - and basing
this exception on JsonableError will allow that.
2019-12-04 09:58:22 -08:00
Tim Abbott 6407d0b1f9 push_notifications: Clear PushDeviceToken on API key change.
This includes adding a new endpoint to the push notification bouncer
interface, and code to call it appropriately after resetting a user's
personal API key.

When we add support for a user having multiple API keys, we may need
to add an additional key here to support removing keys associated with
just one client.
2019-11-19 15:37:43 -08:00
Tim Abbott bb64b0fa4d queue processors: Switch SignupWorker to logging user IDs.
This is a better setup than logging emails, especially with
EMAIL_ADDRESS_VISIBILITY_ADMINS.
2019-11-15 17:07:24 -08:00
Tim Abbott d2970a56c2 lint: Remove some unused imports.
These were introduced in ae5bc92602.
2019-10-10 18:06:30 -07:00
Vishnu KS ae5bc92602 queue: Don't create confirmation objects twice during invite.
A confirmation object is already created when
do_send_confirmation_email is called just above.

Tweaked by tabbott to remove an unnecessary somewhat hacky database
query.
2019-10-10 16:19:42 -07:00
Tim Abbott 1c73ce2450 user_activity: Use LoopQueueProcessingWorker strategy.
This should dramatically improve the queue processor's performance in
cases where there's a very high volume of requests on a given endpoint
by a given user, as described in the new docstring.

Until we test this more broadly in production, we won't know if this
is a full solution to the problem, but I think it's likely.  We've
never seen the UserActivityInterval worker end up backlogged without a
total queue processor outage, and it should have a similar workload.

Fixes #13180.
2019-09-21 11:48:24 -07:00
Tim Abbott f0d8951035 do_update_user_activity: Refactor to support passing a count.
We'll use this in upcoming commits.
2019-09-21 11:47:14 -07:00
Tim Abbott 5c960b3e0f user_activity: Make the queue processor a bit more efficient.
We don't actually need to go to the memcached (falling back to the
database) to fetch either user or client objects on every event.  For
user objects, we actually can just pass through the user ID
transparently; for client objects, we can use an in-process cache,
since the mapping of string to ID never changes.
2019-09-21 11:47:14 -07:00
Rishi Gupta e058558a52 emails: Send invitation reminder email two days before expiry.
Hopefully this does a better job of spurring people to action, and also
suggests a self-service fix if they don't (i.e. contacting the person that
invited them).
2019-08-23 12:53:11 -07:00
Rishi Gupta 2d260031ed emails: Use referrer.delivery_email in invitation emails. 2019-08-23 12:53:11 -07:00
Anders Kaseorg a5596011a0 queue_processors, python_examples: Fix mypy errors.
zerver/openapi/python_examples.py:105: error: Argument 1 to "get_user_presence" of "Client" has incompatible type "str"; expected "Dict[str, Any]"
    zerver/openapi/python_examples.py:563: error: Argument 1 to "add_reaction" of "Client" has incompatible type "Dict[str, object]"; expected "Dict[str, str]"
    zerver/openapi/python_examples.py:576: error: Argument 1 to "remove_reaction" of "Client" has incompatible type "Dict[str, object]"; expected "Dict[str, str]"
    zerver/worker/queue_processors.py:587: error: Argument "client" to "extract_query_without_mention" has incompatible type "EmbeddedBotHandler"; expected "ExternalBotHandler"

These were only missed because mypy daemon mode requires us to set
`follow_imports = skip` for the `zulip` package.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-08-16 14:13:40 -07:00
Tim Abbott 1d63f129bf data export: Add summary logging of runtime.
This should help us investigate cases where this runs a very long
time.
2019-08-12 18:21:08 -07:00
Wyatt Hoodes 7d178bbb0f queue_processors: Clean up the extra_data dict code.
We don't want to add a `deleted_timestamp` key until
the export is actually deleted.
2019-08-12 17:51:46 -07:00
Wyatt Hoodes 22842dab34 events: Rename notify_export_completed.
notify_realm_export is more reasonable for the context of doing
deletion events as well.
2019-08-07 14:18:27 -07:00
Wyatt Hoodes 11db0c23fb exports: Update extra_data field to a JSON structure.
We add the `deleted_timestamp` key to the new `extra_data`
dictionary.
2019-08-07 12:04:28 -07:00
neiljp (Neil Pilgrim) accf4411f0 mypy: Remove type ignore on MissedMessageWorker.stop_timer. 2019-08-06 23:24:56 -07:00
Wyatt Hoodes bbbea9ec87 events: Rewrite system for managing realm exports.
This feature is intended to cover all of our ways of exporting a
realm, not just the initial "public export" feature, so we should name
things appropriately for that goal.

Additionally, we don't want to include data exports in page_params;
the original implementation was actually buggy and would have.
2019-07-26 16:38:52 -07:00
Wyatt Hoodes d070f27359 queue_processors: Change the extra_data field to a relative url path.
A better approach as compared to saving the full public url.
2019-07-26 15:50:02 -07:00
Wyatt Hoodes 5686821150 middleware: Change write_log_line to publish as a dict.
We were seeing errors when pubishing typical events in the form of
`Dict[str, Any]` as the expected type to be a `Union`.  So we instead
change the only non-dictionary call, to pass a dict instead of `str`.
2019-07-22 17:06:41 -07:00
Wyatt Hoodes db69cdbcde public_export: Add support for deleting export after access.
The RealmAuditLog object ID was stored in the event sent to the
deferred_work queue as a means to update the row's extra_data field.
The extra_data field then stores the location of the export.
2019-05-31 22:54:27 -07:00
Wyatt Hoodes c0ef6c2fc6 export: Add LOCAL_UPLOADS_DIR support to the export feature.
A unique path was created using the `LOCAL_UPLOADS_DIR` backend, similar
to the code used in `LocalUploadBackend`.  The exported tarball was
copied to the directory, and an nginx url was created to serve the file
publicly.

Tweaked by tabbott to output an actual URL.
2019-05-27 20:06:35 -07:00
Wyatt Hoodes 4dd8c133a9 export: Rename `--upload-to-s3` to be `--upload`.
The upload option will no longer be limited to strictly S3 uploads. This
commit serves as a preliminary step for supporting LOCAL_UPLOADS_DIR as
part of the public only export feature.
2019-05-20 19:59:57 -07:00
Wyatt Hoodes d4715f23d7 public_export: Add backend API endpoint for triggering export.
An endpoint was created in zerver/views.  Basic rate-limiting was
implemented using RealmAuditLog.  The idea here is to simply log each
export event as a realm_exported event.  The number of events
occurring in the time delta is checked to ensure that the weekly
limit is not exceeded.

The event is published to the 'deferred_work' queue processor to
prevent the export process from being killed after 60s.

Upon completion of the export the realm admin(s) are notified.
2019-04-26 17:24:29 -07:00
Anders Kaseorg 643bd18b9f lint: Fix code that evaded our lint checks for string % non-tuple.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-04-23 15:21:37 -07:00
Mateusz Mandera 1901775383 email_mirror: Add realm-based rate limiting.
Closes #2420

We add rate limiting (max X emails withing Y seconds per realm) to the
email mirror. By creating RateLimitedRealmMirror class, inheriting from
RateLimitedObject, and rate_limit_mirror_by_realm function, following a
mechanism used by rate_limit_user, we're able to have this
implementation mostly rely on the already existing, and proven over
time, rate_limiter.py code. The rules are configurable in settings.py in
RATE_LIMITING_MIRROR_REALM_RULES, analogically to RATE_LIMITING_RULES.

Rate limit verification happens in the MirrorWorker in
queue_processors.py. We don't rate limit missed message emails, as due
to using one time addresses, they're not a spam threat.

test_mirror_worker is adapted to the altered MirrorWorker code and a new
test - test_mirror_worker_rate_limiting is added in test_queue_worker.py
to provide coverage for these changes.
2019-03-18 11:16:58 -07:00
Tim Abbott 50dc317466 notifications: Rename notifications.py to email_notifications.py.
This library is entirely about email notifications specifically, and
this rename should help make the codebase more readable.
2019-03-15 11:02:17 -07:00
Greg Price 9869153ae8 push notif: Send a batch of message IDs in one `remove` payload.
When a bunch of messages with active notifications are all read at
once -- e.g. by the user choosing to mark all messages, or all in a
stream, as read, or just scrolling quickly through a PM conversation
-- there can be a large batch of this information to convey.  Doing it
in a single GCM/FCM message is better for server congestion, and for
the device's battery.

The corresponding client-side logic is in zulip/zulip-mobile#3343 .

Existing clients today only understand one message ID at a time; so
accommodate them by sending individual GCM/FCM messages up to an
arbitrary threshold, with the rest only as a batch.

Also add an explicit test for this logic.  The existing tests
that happen to cause this function to run don't exercise the
last condition, so without a new test `--coverage` complains.
2019-02-26 16:41:54 -08:00
Anders Kaseorg f0ecb93515 zerver core: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:41:24 -08:00
Pragati Agrawal e1772b3b8f tools: Upgrade Pycodestyle and fix new linter errors.
Here, we are upgrading pycodestyle version from 2.4.0 to 2.5.0.

Fixes: #11396.
2019-01-31 12:21:41 -08:00
Raymond Akornor 254bf4c08f send_email: Add support for passing language into send_future_email.
This adds language paramater to send_future_email. As a result, this
properly internationalizes invitation reminder emails, by passing
correct language into send_future_email.

Fixes #11240.
2019-01-09 17:47:58 -08:00
Tim Abbott 02a79b677b send_email: Extract handle_email_format_changes and use.
Apparently, we have a second code path where we might try to call
send_email library functions on old data, namely in the
queue_processors codebase.  So we apply the same migration logic here.
2018-12-04 16:08:18 -08:00
Raymond Akornor 92dc3637df send_email: Add support for multiple recipients.
This adds a function that sends provided email to all administrators
of a realm, but in a single email. As a result, send_email now takes
arguments to_user_ids and to_emails instead of to_user_id and
to_email.

We adjust other APIs to match, but note that send_future_email does
not yet support the multiple recipients model for good reasons.

Tweaked by tabbott to modify `manage.py deliver_email` to handle
backwards-compatibily for any ScheduledEmail objects already in the
database.

Fixes #10896.
2018-12-03 15:12:11 -08:00
Tim Abbott adf27aae4c python: Remove now-unnecessary str_utils library.
This library was absolutely essential as part of our Python 2->3
migration process, but all of its calls should be either no-ops or
encode/decode operations.

Note also that the library has been wrong since the incorrect
refactoring in 1f9244e060.

Fixes #10807.
2018-11-27 11:57:54 -08:00
Tim Abbott e06668c7e8 queue_processors: Fix misleading copied comment.
This comment was clearly copied from the previous processor.
2018-11-27 11:44:09 -08:00
Tim Abbott 38a6003472 push notifications: Improve logging for missing configuration.
While it could make sense to print these logging statements at WARN
level on server startup, it doesn't make sense to do so on every
message (though it perhaps did make sense to do so before more recent
changes added good ways to discover you forgot to configure push
notifications).

Instead, we now just do a WARN log on queue processor startup, and
then at DEBUG level for individual messages.

Fixes #10894.
2018-11-27 09:37:57 -08:00
Tim Abbott 48810f43be queue_processors: Remove unnecessary spammy logging output.
This logging statement was incorrectly not removed before merging
5cec566cb9.
2018-10-31 16:31:35 -07:00
Tim Abbott 5cec566cb9 queue_processors: Rewrite MissedMessageWorker to always wait.
Previously, MissedMessageWorker used a batching strategy of just
grabbing all the events from the last 2 minutes, and then sending them
off as emails.  This suffered from the problem that you had a random
time, between 0s and 120s, to edit your message before it would be
sent out via an email.

Additionally, this made the queue had to monitor, because it was
expected to pile up large numbers of events, even if everything was
fine.

We fix this by batching together the events using a timer; the queue
processor itself just tracks the items, and then a timer-handler
process takes care of ensuring that the emails get sent at least 120s
(and at most 130s) after the first triggering message was sent in Zulip.

This introduces a new unpleasant bug, namely that when we restart a
Zulip server, we can now lose some missed_message email events;
further work is required on this point.

Fixes #6839.
2018-10-24 14:43:36 -07:00
Tim Abbott 9ed3fe3596 events: Improve logging for batched missed-message email handler. 2018-10-24 11:21:51 -07:00
Steve Howell 69ee84bb14 refactor: Extract build_bot_request().
This fixes a couple things:

    * process_event() is a pretty vague name
    * returning tuples should generally be avoided
    * we were producing the same REST parameters in both
      subclasses
    * relative_url_path was always blank
    * request_kwargs was always empty

Now process_event() is called build_bot_request(),
and it only returns request data,
not a tuple of `rest_operation` and `request_data`.

By no longer returning `rest_operation`, there are
fewer moving parts.  We just have `do_rest_call` make
a POST call.
2018-10-11 16:12:07 -07:00
Steve Howell 16eff75e49 refactor: Simplify how we use base_url.
Before this change, we instantiated base_url into a superclass
of subclasses that returned base_url into a dictionary that
gets returned to our caller.

Now we just pull base_url out of service when we need to make
the REST call.
2018-10-11 16:12:07 -07:00
Tim Abbott 165078b484 queue_processors: Fix bug in handling removed push notifications.
Apparently, we were falling through to the "add" case after correctly
processing the "remove" case, throwing a 500.
2018-09-20 17:36:54 -07:00
Tim Abbott da8f4bc0e9 push notifications: Add support for removing GCM push notifications.
This uses the recently introduced active_mobile_push_notification
flag; messages that have had a mobile push notification sent will have
a removal push notification sent as soon as they are marked as read.

Note that this feature is behind a setting,
SEND_REMOVE_PUSH_NOTIFICATIONS, since the notification format is not
supported by the mobile apps yet, and we want to give a grace period
before we start sending notifications that appear as (null) to
clients.  But the tracking logic to maintain the set of message IDs
with an active push notification runs unconditionally.

This is designed with at-least-once semantics; so mobile clients need
to handle the possibility that they receive duplicat requests to
remove a push notification.

We reuse the existing missedmessage_mobile_notifications queue
processor for the work, to avoid materially impacting the latency of
marking messages as read.

Fixes #7459, though we'll need to open a follow-up issue for
using these data on iOS.
2018-08-10 13:58:39 -07:00
Rhea Parekh cf60b8821d outgoing webhooks: Warn user that PMs are not supported in Slack-format webhook.
Private messages are not supported in Slack-format webhook.
Instead of raising a NotImplementedError, we warn the user
that PM service is not supported by sending a message to the
user.

Added tests for the same.

Fixes #9239
2018-08-09 17:44:26 -07:00
Tim Abbott c775be8ea4 do_mark_stream_messages_as_read: Accept a Client object.
We also fix an incorrect Optional in the type annotations.
2018-08-01 16:49:57 -07:00
Tim Abbott 7ea5987e5d errors: Use a setting to control the stream for slow-query logs.
We already had a setting for whether these logs were enabled; now it
also controls which stream the messages go to.

As part of this migration, we disable the feature in dev/production by
default; it's not useful for most environments.

Fixes the proximal data-export issue reported in #10078 (namely, a
stream with nobody ever subscribed to having been created).
2018-07-30 17:40:20 -07:00
Vishnu Ks e34fcf982f registration: Use tokenized noreply address in user invite. 2018-06-23 12:03:30 -07:00
Sampriti Panda 3f4200db3c tests: Disable slow query messages in test environment.
Slow queries during backend tests sends messages to Error Bot
which affects the database state causing the tests to fail.
This fixes the occasional flakes due to that.
2018-05-20 10:16:53 -07:00
Vishnu Ks 47b0a0d843 queue: Remove unused enqueue_welcome_emails import. 2018-05-17 07:45:37 -07:00
neiljp (Neil Pilgrim) e322c2161c mypy: Remove need for cast by using ConcreteQueueWorker TypeVar. 2018-03-11 15:34:11 -07:00
neiljp (Neil Pilgrim) cb8d574648 mypy: Replace Any by Type[QueueProcessingWorker] in queue_processors.py. 2018-03-11 15:34:11 -07:00
neiljp (Neil Pilgrim) fed51666d6 mypy: Migrate queue_processors.py to python3 syntax.
Forward-declarations of QueueProcessingWorker allow code order to remain
unchanged.

Note added re use of Dict vs Mapping.
2018-03-11 15:34:11 -07:00
neiljp (Neil Pilgrim) 6fc4d5bf40 mypy: Correct annotations in queue_processors.py. 2018-03-10 10:04:14 -08:00
Robert Hönig 81ba7a1e40 Mark DigestWorker and PushNotificationsWorker as nocoverage.
These two classes are tricky to test, and nocoverage-ing them
allows us to mark queue_processors.py as fully covered. We
still want to cover these two workers at some point, but for
now, it's nice to enforce full coverage for any future changes
to queue_processors.py.

Fixes (sort of) #6542.
2018-03-04 13:31:33 -08:00
Robert Hönig 93b2f0ea35 Mark MissedMessageSendingWorker as nocoverage.
The MissedMessageSendingWorker queue will be removed
after the 1.8 release, so covering it in tests is
useless.
2018-03-04 13:31:33 -08:00
Robert Hönig 4381ce74aa EmbeddedBotWorker: Remove @-mention check.
We already check in get_service_bot_events() if a bot is mentioned,
and then only pass on the call to the bot handler if it is. The
commit removes the additional check in the embedded bot queue
processor simply because it is impossible to obtain test coverage
for it (there is no meaningful way to trigger the content of the
if-clause, because there will never be a message reaching the bot
without @-mentioning it.

To alleviate the danger of a potential regression, the check is not
removed completely, but rather replaced by an assert statement.
2018-03-04 13:31:33 -08:00
Robert Hönig 9bee64de93 Exclude QueueProcessingWorker.stop() from coverage test.
This function is just a one-line wrapper, so adding
coverage isn't important.
2018-02-28 12:31:38 -08:00
Greg Price 4475950ddf queue: Restore prematurely-cut upgrade path.
Revert c8f034e9a "queue: Remove missedmessage_email_senders code."
As the comment in the code says, it ensures a smooth upgrade path
from 1.7.x; we can delete it in master after 1.8.0 is released.
The removal commit was merged early due to a communication failure.
2018-02-28 11:15:53 -08:00
Umair Khan c8f034e9a0 queue: Remove missedmessage_email_senders code.
After 68513952fb, all emails are sent through email_senders queue.
This commit removes code related to the legacy queue.
2018-02-21 16:43:56 -08:00
Robert Hönig a19a69bfe3 embedded bots: Log warning when bot quit()s.
External bots may call bot_handler.quit() when
they wish to terminate, e.g. due to a misconfiguration.
Currently, embedded bots ignore calls to quit(), even
though they signal a problem. This commit does the first
step in handling quit() calls by logging a warning.
2018-02-13 14:56:37 -08:00
rht 32f1523de2 zerver/worker: Remove u prefix from strings. 2018-02-05 12:12:58 -08:00
Tim Abbott c2ceb3c13b EmailSendingWorker: Fix retry for sending emails.
If an exception was thrown inside `send_email` resulting in a retry,
we would include the `failed_tries` data in the event, which turned
out to thrown an exception itself.

This fixes that flow, including deepening the test so that it would
fail if we didn't have the new logic.
2018-01-30 11:28:09 -08:00
XavierCooney 35dc203d58 queue_processors.py: Remove unecessary imports. 2018-01-16 08:16:43 -05:00
Robert Hönig 83d0c7be31 embedded bots: Run bot.initialize() if bot has this function. 2018-01-07 20:05:52 +01:00
derAnfaenger 94c8e8b8e7 embedded bots: Strip @-mention from message.
This is in order to comply with the most recent
code in the `zulip_bots` package.
2017-12-26 08:50:00 -05:00
Umair Khan 68513952fb email-worker: Create EmailSendingWorker.
This commit just copies all the code from MissedMessageSendingWorker
class to a new EmailSendingWorker class. All the logic to send an email
through a queue was already there. This commit only makes the logic
generic. It does so by creating a special purpose queue called
'email_senders' to send any type of email. To make
MissedMessageSendingWorker still work we derive it from
EmailSendingWorker. All the tests that were testing
MissedMessageSendingWorker now run against EmailSendingWorker.
2017-12-20 19:36:27 -08:00
Vishnu Ks 16d8244c0a tests: Eliminate 'Sending invitation' output spam in test_signup.
Fixes #7563
2017-12-20 18:50:01 -08:00
Rishi Gupta 869b4d41ef models: Add ScheduledEmail.realm.
The two extra queries in the test are due to the assert in
send_future_email.
2017-12-19 17:46:36 -08:00
Shreyansh Dwivedi 47fcb27e39 invitations: Remove custom_body.
Fixes #7672
2017-12-11 19:23:54 -08:00
Rishi Gupta 968aae167b invitations: Remove get_prereg_user_by_email.
The original logic is buggy now that emails can belong to (and be
invited to) multiple realms.

The new logic in the `invites` queue worker also avoids the bug where
when the PreregistrationUser was gone by the time the queue worker got
to the invite (e.g., because it'd been revoked), we threw an exception.

[greg: fix upgrade-compatibility logic; add test; explain
revoked-invite race above]
2017-12-06 20:35:50 -08:00
Tim Abbott b2cb443d24 subs: Fix clearing unread counts when leaving private streams.
Because we use access_stream_by_id here, and that checks for an active
subscription to interact with a private stream, this didn't work.

The correct fix to add an option to active_stream_by_id to accept an
argument indicating whether we need an active subscription; for this
use case, we definitely do not.
2017-11-29 14:40:08 -08:00
Eeshan Garg c45517f544 python-zulip-api: Upgrade to PyPI package release 0.3.8.
There's one migration required by this release:

* queue_processors: Stop passing state_handler to handle_message.

  state_handler is now a property of bot_handler and thus, does
  not need to be passed to bot_handler.handle_message().

  The commit responsible is:
  2a74ad11c5
2017-11-27 20:31:37 -08:00
Vishnu Ks 766511e519 actions: Mark all messages as read when user unsubscribes from stream.
This fixes a bug where, when a user is unsubscribed from a stream,
they might have unread messages on that stream leak.  While it might
seem to be a minor problem, it can cause significant problems for
computing the `unread_msgs` data structures, since it means we need to
add an extra filter for whether the user is still subscribed, either
in the backend or in the UI.

Fixes #7095.
2017-11-21 20:09:17 -08:00
Tim Abbott 638eb7a8e4 docs: Update links to ReadTheDocs to always use https.
This is better security practice.

We also add a lint rule to enforce this for the future.
2017-11-16 10:59:24 -08:00
Tim Abbott 054952a44a docs: Update links from codebase to point to ReadTheDocs. 2017-11-16 10:53:49 -08:00
Steve Howell dbcc76b996 Fix MissedMessageWorker.
This fixes a regression in 25c669df52.

We were draining the queue in both the superclass and the subclass,
so by the time the subclass started processing events, there were
no events to process.  Now the subclass properly uses the events
passed in from the superclass.
2017-11-15 10:26:28 -08:00
derAnfaenger d3bee4fcc5 queue_processors: Exclude various lines from coverage.
The os.mkdir call is straightforward and doesn't testing.
Workers relying on LoopQueueProcessingWorker are tested
with its consume method that exists solely for this purpose.
2017-11-10 13:23:53 -08:00
Tim Abbott 25c669df52 queue_processors: Extract LoopQueueProcessingWorker class.
This should help make it easier to add tests coverage for these queue
processors, since they now use a system more similar to the other
queue processors.
2017-11-09 15:20:40 -08:00
rht de319b4558 refactor: Remove six.moves.StringIO import. 2017-11-07 10:51:44 -08:00
derAnfaenger c8e87a322d queue processors: Exclude TestWorker from coverage. 2017-11-07 09:38:58 -08:00
rht 8990b1046d zerver: Remove inheritance from object. 2017-11-06 08:53:48 -08:00
Tim Abbott eade4d0052 tornado: Use call_consume_in_tests for message_sender queue.
This increases test coverage of queue_processors.py significantly,
with no real cost, achieving valuable progress torwards #6542.
2017-11-03 14:09:48 -07:00
rht c4fcff7178 refactor: Replace super(.*self) with Python 3-specific super().
We change all the instances except for the `test_helpers.py`
TimeTrackingCursor monkey-patching, which actually needs to specify
the base class.
2017-10-30 14:30:25 -07:00
Tim Abbott b936e8c24b lint: Fix lines in Python codebase longer than 125 characters. 2017-10-26 17:36:54 -07:00
derAnfaenger 1792dcbd09 tests: Call real consume method of queue processors.
This switches to more real tests for a first batch of
queue_json_publish() calls that don't cause trouble when
used with call_consume_tests=True.
2017-10-26 14:58:03 -07:00
Tim Abbott ad165a6f8f settings: Remove remaining DEPLOYMENT_ROLE_* code remnants.
These should have been removed when we removed Zilencer.
2017-10-23 21:15:03 -07:00
derAnfaenger c8a2702b9a embedded bots: Add functional state handler.
This replaces the former non-functional StateHandler
stub with a dictionary-like state object. Accessing it will
will read and store strings in the BotUserStateData model.

Each bot has a limited state size. To enforce this limit while
keeping data updates efficient, StateHandler caches the expensive
query for getting a bot's total state size. Assignments to a key
then only need to fetch that entry's previous size, if any, and
compare it to the new entry's size.
2017-10-19 13:09:23 -07:00
Tim Abbott 6176d0fbca json: Replace most use of simplejson with json.
This is progress towards removing simplejson as a dependency.
2017-10-11 22:55:35 -07:00
Umair Khan 435fe40199 worker: Retry signups queue event on 400. 2017-10-05 23:14:19 -07:00
Umair Khan 19e2551e82 mypy: Change type to Dict for SignupWorker. 2017-10-05 23:14:19 -07:00
Umair Khan d95d34a66a Retry email failures in missed-message emails queue.
Fixes #6518.
2017-10-03 10:35:07 -07:00
Umair Khan 7d6ddaad26 mypy: Use Dict instead of Mapping in queues. 2017-10-03 10:35:07 -07:00
Greg Price 2a832386dc queue: Move more received-event logs to DEBUG level.
These can spew a lot of noise when starting the dev server, and aren't
in a super helpful form for normal server administration anyway.
2017-09-28 18:32:42 -07:00
Tim Abbott 630ecf9669 queue: Move user presence log events to DEBUG log level.
These are pretty noisy, and generally of little interest, since
presence requests come with a server log entry anyway.
2017-09-28 14:35:55 -07:00
rht 2ae47525d3 zerver/worker: Remove `import six`. 2017-09-27 17:06:16 -07:00
rht 41d8db373a zerver/worker: Remove absolute_import. 2017-09-27 10:00:39 -07:00
Tim Abbott 84926ff321 slow_queries: Log slow queries even if not reported via Zulip.
This should help make these logs more useful for debugging on an
arbitary Zulip system.
2017-09-22 14:11:07 -07:00
Tim Abbott ec61a070b4 signups: Add basic logging for new signup processing. 2017-09-22 14:11:07 -07:00
Tim Abbott f78dd33037 invites: Add logging for newly sent invitation emails. 2017-09-22 14:11:07 -07:00
Rishi Gupta a7c8770f97 emails: Move enqueue_welcome_emails outside of signups queue.
The only thing this queue should do is sign you up for the newsletter, since
it is only populated if newsletter_data is not None.
2017-09-22 06:20:33 -07:00
Tim Abbott f706f657c0 signup: Fix invitation emails not being cleared properly.
Previously, invitation reminder emails were only being cleared after a
successful signup if newsletter_data was available, since that was the
circumstance in which we were calling the relevant queue processor
code.  Now, we (1) clear them when a human user finishes signing up
and (2) correctly clear them using the 'address' field of
ScheduleEmail, not user_id.
2017-09-21 06:15:11 -07:00
Steve Howell ba397b5109 Use user_ids, not full objects, in render path.
There is no reason for either render_incoming_message() or
render_markdown() to require full UserProfile objects just to
triage alert words.

By only asking for user_ids, we save extra queries in two
callpaths and we make it easier to start using user_ids in
do_send_messages().
2017-09-12 04:22:55 -07:00
neiljp (Neil Pilgrim) 116b8dd9c9 mypy: Set assign_queue() parameter queue_type to not be Optional. 2017-08-07 21:27:50 -07:00
Abhijeet Kaur 1deb58b178 bots: Add complete test-coverage for bot_lib.py file.
Also, add error handling for get_bot_handler instead of
throwing an assertion error.
2017-07-27 15:50:29 -07:00
Abhijeet Kaur 6f60c65a65 embedded bots: Add tests for verification of embedded bot services.
Add test to check if the embedded bot service being used is in the
registry or not.
Add test to check if the bot being added to the registry has a valid
bot corresponding to it.
Move 'get_bot_handler' to 'zerver/lib/bot_lib.py' as it is an independent
function, not related to the 'EmbeddedBotWorker' class that it was
previously a part of.
2017-07-24 17:14:14 -07:00
Elliott Jin 6a61a8a431 outgoing webhooks: Consolidate interfaces into lib/outgoing_webhook.py 2017-07-24 14:10:14 -07:00
vaibhav 87dcd5442a outgoing webhook system: Minor fixes.
This fixes some error message strings and skips converting request_data
into json. From now, conversion would be the responsibility of interface.
Also, base_url is now not passed into event structure.
2017-07-24 14:10:14 -07:00
Abhijeet Kaur 5980d420a8 Embedded bots: Fix minor errors to make embedded bots/service run.
Splitting bot_lib.py file into 2 files led to unnecessary
redirection of the code workflow. For an embedded bot/service to
send a reply, it was being redirected 3 times.

First, the code flow comes to "EmbeddedBotHandler" class to send
reply, then it goes to the common function in "zulip_bots/lib.py",
then it would come back to "EmbeddedBotHandler". Later on, if we
create an abstract class, from where the bot work flow would
directly hit and then from there it is classified into
EmbeddedBotHandler or ExternalBotHandler and accordingly it would
get redirected.

Now, first the bot flow goes to it's handler class External or
Embedded (where we pass that this is External or Embedded bot as
parameter) and then goes to a common point and then comes back to
the same class.
2017-07-20 10:22:52 -07:00
Eeshan Garg 148bb4db09 requirements: Update requirements/ to install bots/API packages.
This is required, since we just reorganized the python-zulip-api
repository into 3 packages.

A nice side effect is that we get to eliminate some now-unnecessary
code for editing sys.path.
2017-07-18 00:10:30 -07:00
Rishi Gupta 0f4b71b766 confirmation: Liberate get_link_for_object from ConfirmationManager. 2017-07-17 23:18:47 -07:00
Rishi Gupta 36dbb76516 emails: Rename clear_followup_emails_queue. 2017-07-17 16:05:38 -07:00
Rishi Gupta 5b3e6af2e5 emails: Remove only emails of the correct type when clearing queue. 2017-07-17 16:05:38 -07:00
Rishi Gupta f51bd898dc notifications: Change clear_followup_emails_queue to take a user_id. 2017-07-17 16:05:38 -07:00
Rishi Gupta 898269bbac email: Change send_email to raise exception on failure.
More in line with how we do error handling in the rest of Zulip.
2017-07-16 16:56:39 -07:00
Rishi Gupta eacdb0b302 emails: Change welcome emails to use to_user_id. 2017-07-16 16:56:39 -07:00
Rishi Gupta b0d325b8c5 emails: Change send_future_email to accept a to_user_id.
Also changes digest emails to use a to_user_id instead of a to_email.
2017-07-16 16:56:39 -07:00
Vishnu Ks ba4ea7dd8a notifications.py: Replace get_user_profile_by_email.
Replace with get_user.
2017-07-13 00:45:24 +05:30
Aditya Bansal d46cf59b0d pep8: Add compliance with rule E261 to worker/queue_processors.py. 2017-07-11 11:55:02 -07:00
Rishi Gupta 8fed9eeb75 confirmation: Make host a required argument in get_link_for_object.
Removes some lines of test from test_email_change.py. The relevant code path
was never utilized by the code itself, just by the tests.
2017-07-07 18:53:00 -07:00
Umair Khan ee6e07d13f message_sender: Use response_data variable.
response_data already contains the result of ujson.loads.
2017-07-07 16:33:36 -07:00
Umair Khan 0e8231d0f1 process_queue: Recover gracefully after PostgreSQL restart.
- For threaded workers:
Django's autoreloader catches SIGQUIT(3) to reload the program. If
a process being watched by autoreloader exits with status code 3,
reloader will restart the process. To reload, we send SIGUSR1(10)
signal from consumers to a handler in process_queue which then
exits with status code 3.

- For single worker per process:
Catch the SIGUSR1 and quit; supervisorctl will restart the worker
automatically.

Fixes #5512
2017-07-07 16:33:15 -07:00
James Rowan 0951666cbb emails: Confirmation emails should come from the NOREPLY address.
This prevents users from accidentally sending a confirmation link
specific to their account to their Zulip administrator if they reply
to the invitation, invitation reminder, account confirmation, or new
email confirmation emails.
2017-07-05 15:18:33 -07:00
James Rowan d88e7308bf emails: Add a FromAddress class to control access to certain settings emails.
No change in behavior.

Also makes the first step towards converting all uses of
settings.ZULIP_ADMINISTRATOR and settings.NOREPLY_EMAIL_ADDRESS to
FromAddress.*.

Once everything is converted, it will be easier to ensure that future
development doesn't break backwards compatibility with the old style of
settings emails.
2017-07-04 14:25:01 -07:00
James Rowan 368bd66d8b emails: Refactor send_email functions to take both a sender name and address.
This will allow for customized senders for emails, e.g. 'Zulip Digest' for
digest emails and 'Zulip Missed Messages' for missed message emails.

Also:
* Converts the sender name to always be "Zulip", if the from_email used to
  be settings.NOREPLY_EMAIL_ADDRESS or settings.ZULIP_ADMINISTRATOR.

* Changes the default value of settings.NOREPLY_EMAIL_ADDRESS in the
  prod_setting_template to no longer have a display name. The only use of
  that display name was in the email pathway.
2017-07-04 14:25:01 -07:00
vaibhav 7f9b4d3a94 Outgoing Webhook System: Add usage of Interfaces in DoRestCall. 2017-06-28 11:11:21 -04:00
Abhijeet Kaur cf0040c402 bots: De-duplicate removing at-mention pattern for calling bots.
Similar code appeared at two places (the code is meant to remove the
leading @-mention before passing the remainder of the message to the
bot handler)—both 'zerver/worker/queue_processors.py' and
'api/bots_api/bot_lib.py'.

Remove function from the queue_processors.py file and let the entire message
be processed (includes removing of @-mention) at bot_lib.py.
2017-06-26 08:33:25 -04:00
Abhijeet Kaur c42d935be8 bots: Move "EmbeddedBotHandler" class to "zerver/lib/bot_lib.py" file.
This would keep embedded classes for zulip at one place, that is, in
"zerver" directory. This also fixes break in PyPI package for bindings.
2017-06-21 16:01:16 -04:00
Abhijeet Kaur a45c224fda bots: Update EmbeddedBotWorker class in queue_processors.py.
Update EmbeddedBotWorker.get_bot_api_client in
'zerver/workers/queue_processors.py' to return an EmbeddedBotHandler
object instead of ExternalBotHandler object.

This change is as a followup for splitting the BotHandler class into
two separate classes for external and embedded bots.
2017-06-21 10:22:53 -04:00
Abhijeet Kaur 68777e93a0 bots: Rename BotHandlerApi class to ExternalBotHandler.
This change is done for better understanding of the class functionality
from its class name. Now there will be 3 different classes for bots,
namely 'ExternalBotHandler', 'FlaskBotHandler' and  'EmbeddedBotHandler'.
2017-06-21 10:22:53 -04:00
Rishi Gupta 15b967fc3e emails: Move support_email into a common context. 2017-06-10 01:25:44 -07:00