Commit Graph

292 Commits

Author SHA1 Message Date
Anders Kaseorg 579f05f3ed queue_processors: Avoid unchecked casts.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-22 17:18:19 -07:00
Tim Abbott 4d7550d705 views: Extract message_edit.py for message editing views.
This is a pretty clean extraction of files that lets us shrink one of
our largest files.
2020-06-22 15:08:34 -07:00
Mateusz Mandera 8d2d64c100 CVE-2020-14215: Fix validation in PreregistrationUser queries.
The most import change here is the one in maybe_send_to_registration
codepath, as the insufficient validation there could lead to fetching
an expired PreregistrationUser that was invited as an administrator
admin even years ago, leading to this registration ending up in the
new user being a realm administrator.

Combined with the buggy migration in
0198_preregistrationuser_invited_as.py, this led to users incorrectly
joining as organizations administrators by accident.  But even without
that bug, this issue could have allowed a user who was invited as an
administrator but then had that invitation expire and then joined via
social authentication incorrectly join as an organization administrator.

The second change is in ConfirmationEmailWorker, where this wasn't a
security problem, but if the server was stopped for long enough, with
some invites to send out email for in the queue, then after starting it
up again, the queue worker would send out emails for invites that
had already expired.
2020-06-16 23:35:39 -07:00
Anders Kaseorg 5dc9b55c43 python: Manually convert more percent-formatting to f-strings.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 23:27:22 -07:00
Anders Kaseorg 1ed2d9b4a0 logging: Use logging.exception and exc_info for unexpected exceptions.
logging.exception() and logging.debug(exc_info=True),
etc. automatically include a traceback.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 23:27:22 -07:00
Anders Kaseorg a803e68528 email-mirror-postfix: Handle 8-bit messages correctly.
Since JSON can’t represent bytes, we encode them with base64.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 20:24:06 -07:00
Anders Kaseorg bff3dcadc8 email: Migrate to new Python ≥ 3.3 email API.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 20:24:06 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Graham Bleaney 461d5b1a3e pysa: Introduce sanitizers, models, and inline marking safe.
This commit adds three `.pysa` model files: `false_positives.pysa`
for ruling out false positive flows with `Sanitize` annotations,
`req_lib.pysa` for educating pysa about Zulip's `REQ()` pattern for
extracting user input, and `redirects.pysa` for capturing the risk
of open redirects within Zulip code. Additionally, this commit
introduces `mark_sanitized`, an identity function which can be used
to selectively clear taint in cases where `Sanitize` models will not
work. This commit also puts `mark_sanitized` to work removing known
false postive flows.
2020-06-11 12:57:49 -07:00
Anders Kaseorg 67e7a3631d python: Convert percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-10 15:02:09 -07:00
Anders Kaseorg 8dd83228e7 python: Convert "".format to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus --keep-percent-format, but with the
NamedTuple changes reverted (see commit
ba7906a3c6, #15132).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-08 15:31:20 -07:00
Anders Kaseorg 19cc22e5ab queue: Fix types to reflect that Pika channels transmit bytes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-07 11:09:24 -07:00
Mateusz Mandera 200ce821a2 user_activity: Put client id instead of name in event dicts.
This saves the completely unnecessary work of mapping the Client name
to its ID.  Because we had in-process caching of the immutable Client
objects, this isn't a material performance win, but it will eventually
let us delete that caching logic and have a simpler system.
2020-05-29 15:19:55 -07:00
Mateusz Mandera e2262b0b64 queue_processors: Log time spent getting data for url in embed_links. 2020-05-21 12:13:46 -07:00
Mateusz Mandera dd40649e04 queue_processors: Remove the slow_queries queue.
While this functionality to post slow queries to a Zulip stream was
very useful in the early days of Zulip, when there were only a few
hundred accounts, it's long since been useless since (1) the total
request volume on larger Zulip servers run by Zulip developers, and
(2) other server operators don't want real-time notifications of slow
backend queries.  The right structure for this is just a log file.

We get rid of the queue and replace it with a "zulip.slow_queries"
logger, which will still log to /var/log/zulip/slow_queries.log for
ease of access to this information and propagate to the other logging
handlers.  Reducing the amount of queues is good for lowering zulip's
memory footprint and restart performance, since we run at least one
dedicated queue worker process for each one in most configurations.
2020-05-11 00:45:13 -07:00
Anders Kaseorg bdc365d0fe logging: Pass format arguments to logging.
https://docs.python.org/3/howto/logging.html#optimization

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-02 10:18:02 -07:00
Wyatt Hoodes 82e7ad8e25 data exports: Handle pending and failed exports.
Prior to this change, there were reports of 500s in
production due to `export.extra_data` being a
Nonetype.  This was reproducible using the s3
backend in development when a row was created in
the `RealmAuditLog` table, but the export failed in
the `DeferredWorker`.  This left an entry lying
about that was never updated with an `extra_data`
field.

To fix this, we catch any exceptions in the
`DeferredWorker`, and then update `extra_data` to
encode the failure.  We also fix the fact that we
never updated the export UI table with pending exports.

These changes also negated the use for the somewhat
hacky `clear_success_banner` logic.
2020-04-30 13:00:59 -07:00
Anders Kaseorg fead14951c python: Convert assignment type annotations to Python 3.6 style.
This commit was split by tabbott; this piece covers the vast majority
of files in Zulip, but excludes scripts/, tools/, and puppet/ to help
ensure we at least show the right error messages for Xenial systems.

We can likely further refine the remaining pieces with some testing.

Generated by com2ann, with whitespace fixes and various manual fixes
for runtime issues:

-    invoiced_through: Optional[LicenseLedger] = models.ForeignKey(
+    invoiced_through: Optional["LicenseLedger"] = models.ForeignKey(

-_apns_client: Optional[APNsClient] = None
+_apns_client: Optional["APNsClient"] = None

-    notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
-    signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)

-    author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)
+    author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)

-    bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)
+    bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)

-    default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
-    default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)

-descriptors_by_handler_id: Dict[int, ClientDescriptor] = {}
+descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {}

-worker_classes: Dict[str, Type[QueueProcessingWorker]] = {}
-queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {}
+worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {}
+queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {}

-AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None
+AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-22 11:02:32 -07:00
Mateusz Mandera fe8f57b8b7 queue_processors: Write a newline char at the end of stats files. 2020-04-10 13:48:16 -07:00
Mateusz Mandera 5252b081bd queue_processors: Gather statistics on queue worker operations. 2020-04-01 16:44:06 -07:00
arpit551 8f7733cb20 emails: Added placeholders strings in FormAddress.
We've had a bug for a while that if any ScheduledEmail objects get
created with the wrong email sender address, even after the sysadmin
corrects the problem, they'll still get errors because of the objects
stored with the wrong format.

We solve this by using FromAddress placeholders strings in
send_future_email function, so that ScheduledEmail objects end up
setting the final `from_address` value when mail is actually sent
using the setting in effect at that time.

Fixes #11008.
2020-03-27 16:41:02 -07:00
Mateusz Mandera 5da2f80140 queue_processors: Extract a duplicated logic block into do_consume. 2020-03-22 18:45:46 -07:00
Tim Abbott 783a77c532 queue processors: Flush per-request caches after each item.
Several of our queues are capable of doing work that includes
rendering markdown (outgoing_webhook, embedded_bots, embed_links, and
email_mirror).  As a result, it's essential that these don't cache
per-request data (specifically, realm filters) longer than they
should, making editing/deleting linkifiers potentially use old
settings until the relevant process was restarted.

Flushing these caches is extremely cheap (just clearing two
dictionaries) and thus is reasonable to do after every queue event,
rather than trying to do it only the ~1/3 of queues that specifically
do markdown processing.  We do the same in our middleware for
reset_queries.

It's not worth writing a test for this because it's very difficult to
create the test setup situation for this bug with a single test worker
process; one needs to edit the linkifier configuration in a different
process than the one sending the message in order to see the bug.

This was a much larger visible bug on Zulip 2.1.x, where the presence
of the message_sender queue meant that this would apply to messages
sent via a browser.

Fixes #14095.
2020-03-03 15:29:11 -08:00
Steve Howell 2e8dec233e slow queries: Use internal_send_stream_message().
Note that while the test mocks the actual message
send, we now have a `get_stream` call in the queue
worker, so we have to set up a real stream for
testing (or we could have mocked that as well, but
it didn't seem necessary).  The setup queries add
to the amount of queries reported by the test,
plus the `get_stream` call.  I just made the
query count a digits regex, which is a little bit
lame, but I don't think it's worth risking test
flakes for this.
2020-02-11 12:20:54 -08:00
Mateusz Mandera 4c5a8e6f0c queue: Remove missedmessage_email_senders. 2020-01-31 12:13:51 -08:00
Tim Abbott d70e799466 bots: Remove FEEDBACK_BOT implementation.
This legacy cross-realm bot hasn't been used in several years, as far
as I know.  If we wanted to re-introduce it, I'd want to implement it
as an embedded bot using those common APIs, rather than the totally
custom hacky code used for it that involves unnecessary queue workers
and similar details.

Fixes #13533.
2020-01-25 22:41:39 -08:00
Anders Kaseorg ea6934c26d dependencies: Remove WebSockets system for sending messages.
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013.  We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not.  Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.

While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.

This WebSockets code path had a lot of downsides/complexity,
including:

* The messy hack involving constructing an emulated request object to
  hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
  needs and must be provisioned independently from the rest of the
  server).
* A duplicate check_send_receive_time Nagios test specific to
  WebSockets.
* The requirement for users to have their firewalls/NATs allow
  WebSocket connections, and a setting to disable them for networks
  where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
  been poorly maintained, and periodically throws random JavaScript
  exceptions in our production environments without a deep enough
  traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
  server restart, and especially for large installations like
  zulipchat.com, resulting in extra delay before messages can be sent
  again.

As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients.  We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.

If we later want WebSockets, we’ll likely want to just use Django
Channels.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-01-14 22:34:00 -08:00
Mateusz Mandera 89046ea1a9 email_mirror: Give extract_and_validate a more descriptive name. 2020-01-12 11:30:18 -08:00
Mateusz Mandera c011d2c6d3 email_mirror: Migrate missed message addresses from redis to database.
Addresses point 1 of #13533.

MissedMessageEmailAddress objects get tied to the specific that was
missed by the user. A useful benefit of that is that email message sent
to that address will handle topic changes - if the message that was
missed gets its topic changed, the email response will get posted under
the new topic, while in the old model it would get posted under the
old topic, which could potentially be confusing.

Migrating redis data to this new model is a bit tricky, so the migration
code has comments explaining some of the compromises made there, and
test_migrations.py tests handling of the various possible cases that
could arise.
2020-01-07 13:03:22 -08:00
Mateusz Mandera e90866876c queue: Take advantage of ABC for defining abstract worker base classes.
QueueProcessingWorker and LoopQueueProcessingWorker are abstract classes
meant to be subclassed by a class that will define its own consume()
or consume_batch() method. ABCs are suited for that and we can tag
consume/consume_batch with the @abstractmethod wrapper which will
prevent subclasses that don't define these methods properly to be
impossible to even instantiate (as opposed to only crashing once
consume() is called). It's also nicely detected by mypy, which will
throw errors such as this on invalid use:

error: Only concrete class can be given where "Type[TestWorker]" is
expected
error: Cannot instantiate abstract class 'TestWorker' with abstract
attribute 'consume'

Due to it being detected by mypy, we can remove the test
test_worker_noconsume which just tested the old version of this -
raising an exception when the unimplemented consume() gets called. Now
it can be handled already on the linter level.
2019-12-28 10:52:17 -08:00
Mateusz Mandera a54640fc68 queue: Share exception handling code between loop and normal workers.
LoopQueueProcessingWorker can handle exceptions inside consume_batch in
a similar manner to how QueueProcessingWorker handles exceptions inside
consume.
2019-12-28 10:47:36 -08:00
Tim Abbott 1465628c95 queue workers: Use self.queue_name in retry_event calls.
This just adds a bit of robustness if we ever end up renaming queues.
2019-12-04 10:08:48 -08:00
Mateusz Mandera 7d0444f903 push_notifs: Improve handling of errors when talking to the bouncer.
We use the plumbing introduced in a previous commit, to now raise
PushNotificationBouncerRetryLaterError in send_to_push_bouncer in case
of issues with talking to the bouncer server. That's a better way of
dealing with the errors than the previous approach of returning a
"failed" boolean, which generally wasn't checked in the code anyway and
did nothing.
The PushNotificationBouncerRetryLaterError exception will be nicely
handled by queue processors to retry sending again, and due to being a
JsonableError, it will also communicate the error to API users.
2019-12-04 09:58:22 -08:00
Mateusz Mandera 20b30e1503 push_notifs: Set up plumbing for retrying in case of bouncer error.
We add PushNotificationBouncerRetryLaterError as an exception to signal
an error occurred when trying to communicate with the bouncer and it
should be retried. We use JsonableError as the base class, because this
signal will need to work in two roles:
1. When the push notification was being issued by the queue worker
PushNotificationsWorker, it will signal to the worker to requeue the
event and try again later.
2. The exception will also possibly be raised (this will be added in the
next commit) on codepaths coming from a request to an API endpoint (for
example to add a token, to users/me/apns_device_token). In that case,
it'll be needed to provide a good error to the API user - and basing
this exception on JsonableError will allow that.
2019-12-04 09:58:22 -08:00
Tim Abbott 6407d0b1f9 push_notifications: Clear PushDeviceToken on API key change.
This includes adding a new endpoint to the push notification bouncer
interface, and code to call it appropriately after resetting a user's
personal API key.

When we add support for a user having multiple API keys, we may need
to add an additional key here to support removing keys associated with
just one client.
2019-11-19 15:37:43 -08:00
Tim Abbott bb64b0fa4d queue processors: Switch SignupWorker to logging user IDs.
This is a better setup than logging emails, especially with
EMAIL_ADDRESS_VISIBILITY_ADMINS.
2019-11-15 17:07:24 -08:00
Tim Abbott d2970a56c2 lint: Remove some unused imports.
These were introduced in ae5bc92602.
2019-10-10 18:06:30 -07:00
Vishnu KS ae5bc92602 queue: Don't create confirmation objects twice during invite.
A confirmation object is already created when
do_send_confirmation_email is called just above.

Tweaked by tabbott to remove an unnecessary somewhat hacky database
query.
2019-10-10 16:19:42 -07:00
Tim Abbott 1c73ce2450 user_activity: Use LoopQueueProcessingWorker strategy.
This should dramatically improve the queue processor's performance in
cases where there's a very high volume of requests on a given endpoint
by a given user, as described in the new docstring.

Until we test this more broadly in production, we won't know if this
is a full solution to the problem, but I think it's likely.  We've
never seen the UserActivityInterval worker end up backlogged without a
total queue processor outage, and it should have a similar workload.

Fixes #13180.
2019-09-21 11:48:24 -07:00
Tim Abbott f0d8951035 do_update_user_activity: Refactor to support passing a count.
We'll use this in upcoming commits.
2019-09-21 11:47:14 -07:00
Tim Abbott 5c960b3e0f user_activity: Make the queue processor a bit more efficient.
We don't actually need to go to the memcached (falling back to the
database) to fetch either user or client objects on every event.  For
user objects, we actually can just pass through the user ID
transparently; for client objects, we can use an in-process cache,
since the mapping of string to ID never changes.
2019-09-21 11:47:14 -07:00
Rishi Gupta e058558a52 emails: Send invitation reminder email two days before expiry.
Hopefully this does a better job of spurring people to action, and also
suggests a self-service fix if they don't (i.e. contacting the person that
invited them).
2019-08-23 12:53:11 -07:00
Rishi Gupta 2d260031ed emails: Use referrer.delivery_email in invitation emails. 2019-08-23 12:53:11 -07:00
Anders Kaseorg a5596011a0 queue_processors, python_examples: Fix mypy errors.
zerver/openapi/python_examples.py:105: error: Argument 1 to "get_user_presence" of "Client" has incompatible type "str"; expected "Dict[str, Any]"
    zerver/openapi/python_examples.py:563: error: Argument 1 to "add_reaction" of "Client" has incompatible type "Dict[str, object]"; expected "Dict[str, str]"
    zerver/openapi/python_examples.py:576: error: Argument 1 to "remove_reaction" of "Client" has incompatible type "Dict[str, object]"; expected "Dict[str, str]"
    zerver/worker/queue_processors.py:587: error: Argument "client" to "extract_query_without_mention" has incompatible type "EmbeddedBotHandler"; expected "ExternalBotHandler"

These were only missed because mypy daemon mode requires us to set
`follow_imports = skip` for the `zulip` package.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-08-16 14:13:40 -07:00
Tim Abbott 1d63f129bf data export: Add summary logging of runtime.
This should help us investigate cases where this runs a very long
time.
2019-08-12 18:21:08 -07:00
Wyatt Hoodes 7d178bbb0f queue_processors: Clean up the extra_data dict code.
We don't want to add a `deleted_timestamp` key until
the export is actually deleted.
2019-08-12 17:51:46 -07:00
Wyatt Hoodes 22842dab34 events: Rename notify_export_completed.
notify_realm_export is more reasonable for the context of doing
deletion events as well.
2019-08-07 14:18:27 -07:00
Wyatt Hoodes 11db0c23fb exports: Update extra_data field to a JSON structure.
We add the `deleted_timestamp` key to the new `extra_data`
dictionary.
2019-08-07 12:04:28 -07:00
neiljp (Neil Pilgrim) accf4411f0 mypy: Remove type ignore on MissedMessageWorker.stop_timer. 2019-08-06 23:24:56 -07:00