Commit Graph

228 Commits

Author SHA1 Message Date
Tim Abbott 220620e7cf sharding: Add basic sharding configuration for Tornado.
This allows straight-forward configuration of realm-based Tornado
sharding through simply editing /etc/zulip/zulip.conf to configure
shards and running scripts/refresh-sharding-and-restart.

Co-Author-By: Mateusz Mandera <mateusz.mandera@zulip.com>
2020-05-20 13:47:20 -07:00
Pragati Agrawal bd9b74436c org settings: Enable message_retention_days in org settings UI.
Since production testing of `message_retention_days` is finished, we can
enable this feature in the organization settings page. We already had this
setting in frontend but it was bit rotten and not rendered in templates.

Here we replaced our past text-input based setting with a
dropdown-with-text-input setting approach which is more consistent with our
existing UI.

Along with frontend changes, we also incorporated a backend change to
handle making retention period forever. This change introduces a new
convertor `to_positive_or_allowed_int` which only allows positive integers
and an allowed value for settings like `message_retention_days` which can
be a positive integer or has the value `Realm.RETAIN_MESSAGE_FOREVER` when
we change the setting to retain message forever.

This change made `to_not_negative_int_or_none` redundant so removed it as
well.

Fixes: #14854
2020-05-08 14:09:31 -07:00
Anders Kaseorg bdc365d0fe logging: Pass format arguments to logging.
https://docs.python.org/3/howto/logging.html#optimization

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-02 10:18:02 -07:00
Anders Kaseorg fead14951c python: Convert assignment type annotations to Python 3.6 style.
This commit was split by tabbott; this piece covers the vast majority
of files in Zulip, but excludes scripts/, tools/, and puppet/ to help
ensure we at least show the right error messages for Xenial systems.

We can likely further refine the remaining pieces with some testing.

Generated by com2ann, with whitespace fixes and various manual fixes
for runtime issues:

-    invoiced_through: Optional[LicenseLedger] = models.ForeignKey(
+    invoiced_through: Optional["LicenseLedger"] = models.ForeignKey(

-_apns_client: Optional[APNsClient] = None
+_apns_client: Optional["APNsClient"] = None

-    notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
-    signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)

-    author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)
+    author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)

-    bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)
+    bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)

-    default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
-    default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)

-descriptors_by_handler_id: Dict[int, ClientDescriptor] = {}
+descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {}

-worker_classes: Dict[str, Type[QueueProcessingWorker]] = {}
-queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {}
+worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {}
+queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {}

-AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None
+AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-22 11:02:32 -07:00
Anders Kaseorg f8c95cda51 mypy: Add specific codes to type: ignore annotations.
https://mypy.readthedocs.io/en/stable/error_codes.html

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-22 10:46:33 -07:00
Anders Kaseorg 029bfb9fee mypy: Remove unnecessary type: ignore annotations.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-22 10:46:33 -07:00
Anders Kaseorg 1cf63eb5bf python: Whitespace fixes from autopep8.
Generated by autopep8, with the setup.cfg configuration from #14532.
I’m not sure why pycodestyle didn’t already flag these.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-21 17:58:09 -07:00
Anders Kaseorg c734bbd95d python: Modernize legacy Python 2 syntax with pyupgrade.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-09 16:43:22 -07:00
Mateusz Mandera 4283a513d4 tornado: Reuse retry_event functions for failures in tornado queues.
We use retry_event in queue_processors.py to handle trying on failures,
without getting stuck in permanent retry loops if the event ends up
leading to failure on every attempt and we just keep sending NACK to
rabbitmq forever (or until the channel crashes). Tornado queues haven't
been using this, but they should.
2020-04-09 12:43:38 -07:00
Tim Abbott a373387009 tornado: Fix parsing of delete_message events with no users.
The change in 180d8abed6, while correct
for the Django part of the codebase, had the nasty side effect of
exposing a failure mode in the process_notification logic if the users
list was empty.

This, in turn, could cause our process_notification code to fail with
an IndexError when trying to process the event, which would result in
that tornado process not automatically recovering, due to the outer
try/except handler for consume triggering a NACK and thus repeating
the event.
2020-04-09 05:39:47 -07:00
Udit107710 ef741bf317 messages: Return shallow copy of message object.
When more than one outgoing webhook is configured,
the message which is send to the webhook bot passes
through finalize_payload function multiple times,
which mutated the message dict in a way that many keys
were lost from the dict obj.

This commit fixes that problem by having
`finalize_payload` return a shallow copy of the
incoming dict, instead of mutating it.  We still
mutate dicts inside of `post_process_dicts`, though,
for performance reasons.

This was slightly modified by @showell to fix the
`test_both_codepaths` test that was added concurrently
to this work.  (I used a slightly verbose style in the
tests to emphasize the transformation from `wide_dict`
to `narrow_dict`.)

I also removed a deepcopy call inside
`get_client_payload`, since we now no longer mutate
in `finalize_payload`.

Finally, I added some comments here and there.

For testing, I mostly protect against the root
cause of the bug happening again, by adding a line
to make sure that `sender_realm_id` does not get
wiped out from the "wide" dictionary.

A better test would exercise the actual code that
exposed the bug here by sending a message to a bot
with two or more services attached to it.  I will
do that in a future commit.

Fixes #14384
2020-03-29 15:12:27 -07:00
Steve Howell 4c51a94bcd message: Move transitional shim for delivery email.
If we have an old event that's missing the field
`sender_delivery_email`, we now patch it at the top
of `process_message_event`, rather than for each call
to `get_client_payload`.  This will make an upcoming
commit a bit easier to reason about.  Basically, it's
simpler to shim the incoming event one time rather
than doing it up to four times.  We know that
`get_client_payload` is non-destructive, because it
does a deepcopy.
2020-03-29 15:12:27 -07:00
Anders Kaseorg 39f9abeb3f python: Convert json.loads(f.read()) to json.load(f).
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-03-24 10:46:32 -07:00
Mateusz Mandera 89394fc1eb middleware: Use request.user for logging when possible.
Instead of trying to set the _requestor_for_logs attribute in all the
relevant places, we try to use request.user when possible (that will be
when it's a UserProfile or RemoteZulipServer as of now). In other
places, we set _requestor_for_logs to avoid manually editing the
request.user attribute, as it should mostly be left for Django to manage
it.
In places where we remove the "request._requestor_for_logs = ..." line,
it is clearly implied by the previous code (or the current surrounding
code) that request.user is of the correct type.
2020-03-09 13:54:58 -07:00
Mateusz Mandera 0255ca9b6a middleware: Log user.id/realm.string_id instead of _email. 2020-03-09 13:54:58 -07:00
Mateusz Mandera 2d544250b7 events: Add block for compatibility with old delete_message events. 2020-03-03 15:52:42 -08:00
Mateusz Mandera 3922fb3a92 events: Clean up delete_message even processing code. 2020-03-03 15:52:42 -08:00
Steve Howell 862515b7a4 presence: Avoid failures with obsolete events.
We only recently added `user_id` to presence
events.
2020-03-03 11:45:45 -08:00
Tim Abbott 1ea2f188ce tornado: Rewrite Django integration to duplicate less code.
Since essentially the first use of Tornado in Zulip, we've been
maintaining our Tornado+Django system, AsyncDjangoHandler, with
several hundred lines of Django code copied into it.

The goal for that code was simple: We wanted a way to use our Django
middleware (for code sharing reasons) inside a Tornado process (since
we wanted to use Tornado for our async events system).

As part of the Django 2.2.x upgrade, I looked at upgrading this
implementation to be based off modern Django, and it's definitely
possible to do that:
* Continue forking load_middleware to save response middleware.
* Continue manually running the Django response middleware.
* Continue working out a hack involving copying all of _get_response
  to change a couple lines allowing us our Tornado code to not
  actually return the Django HttpResponse so we can long-poll.  The
  previous hack of returning None stopped being viable with the Django 2.2
  MiddlewareMixin.__call__ implementation.

But I decided to take this opportunity to look at trying to avoid
copying material Django code, and there is a way to do it:

* Replace RespondAsynchronously with a response.asynchronous attribute
  on the HttpResponse; this allows Django to run its normal plumbing
  happily in a way that should be stable over time, and then we
  proceed to discard the response inside the Tornado `get()` method to
  implement long-polling.  (Better yet might be raising an
  exception?).  This lets us eliminate maintaining a patched copy of
  _get_response.

* Removing the @asynchronous decorator, which didn't add anything now
  that we only have one API endpoint backend (with two frontend call
  points) that could call into this.  Combined with the last bullet,
  this lets us remove a significant hack from our
  never_cache_responses function.

* Calling the normal Django `get_response` method from zulip_finish
  after creating a duplicate request to process, rather than writing
  totally custom code to do that.  This lets us eliminate maintaining
  a patched copy of Django's load_middleware.

* Adding detailed comments explaining how this is supposed to work,
  what problems we encounter, and how we solve various problems, which
  is critical to being able to modify this code in the future.

A key advantage of these changes is that the exact same code should
work on Django 1.11, Django 2.2, and Django 3.x, because we're no
longer copying large blocks of core Django code and thus should be
much less vulnerable to refactors.

There may be a modest performance downside, in that we now run both
request and response middleware twice when longpolling (once for the
request we discard).  We may be able to avoid the expensive part of
it, Zulip's own request/response middleware, with a bit of additional
custom code to save work for requests where we're planning to discard
the response.  Profiling will be important to understanding what's
worth doing here.
2020-02-13 16:13:11 -08:00
Tim Abbott 986706c7e5 tornado: Use common code for copying headers.
This fixes a bug where our asynchronous requests were only copying the
Content-Type header (i.e. the one case where we're noticed) from the
Django HttpResponse.  I'm not sure what the impact of this would be;
the rate-limiting headers rarely come up when breaking a long-polled
request.  But it seems clearly an improvement to do this in a
consistent fashion.

Only the headers piece is a change; in Tornado

    self.finish(x)

is equivalent to:

    self.write(x)
    self.finish()
2020-02-07 16:14:19 -08:00
Tim Abbott 224a73a3ec tornado: Extract a function for writing Tornado responses.
This increases the readability of what's happening in our core Tornado
handlers code, as well as making this logic reusable.
2020-02-07 16:13:49 -08:00
Tim Abbott 5305e8af85 tornado: Extract convert_tornado_request_to_django_request. 2020-02-07 16:03:58 -08:00
Tim Abbott fc58ae117a handlers: Rename confusingly named response to result_dict.
This should somewhat increase the readability of zulip_finish.
2020-02-07 16:03:58 -08:00
Tim Abbott 2aab71e153 event_queue: Fix confusing event_queue.push interface.
In e3ad9baf1d, we introduced yet another
bug where we incorrectly shared event dictionaries between multiple
queues.

Fortunately, the logging that reports on "event was not in the queue"
issues worked and detected this on chat.zulip.org, but this is a clear
indication that the comments we have around this system were not
sufficient to produce correct behavior.

We fix this by changing event_queue.push, the code that mutates the
event dictionaries, to do the shallow copies itself.  The only
downside here is process_message_event, a relatively low-traffic code
path, does an extra per-queue dictionary copy.  Given that presence,
heartbeat, and message reading events are likely more traffic and
dealing with HTTP is likely much more expensive than a dictionary
copy, this probably doesn't matter performance-wise.

(And if profiling later finds it is, there are potential workarounds
like passing a skip_copy argument we can do).
2020-02-05 12:40:01 -08:00
Steve Howell e3ad9baf1d presence: Add process_presence_event.
This lets us conditionally remove the email
field from a presence event if the client
has registered with the slim_presence flag.
2020-02-04 12:30:36 -08:00
Steve Howell bf9144ff69 presence: Add slim_presence flag.
This flag affects page_params and the
payload you get back from POSTs to this
url:

    users/me/presence

The flag does not yet affect the
presence events that get sent to a
client.
2020-02-04 12:30:34 -08:00
Anders Kaseorg ea6934c26d dependencies: Remove WebSockets system for sending messages.
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013.  We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not.  Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.

While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.

This WebSockets code path had a lot of downsides/complexity,
including:

* The messy hack involving constructing an emulated request object to
  hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
  needs and must be provisioned independently from the rest of the
  server).
* A duplicate check_send_receive_time Nagios test specific to
  WebSockets.
* The requirement for users to have their firewalls/NATs allow
  WebSocket connections, and a setting to disable them for networks
  where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
  been poorly maintained, and periodically throws random JavaScript
  exceptions in our production environments without a deep enough
  traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
  server restart, and especially for large installations like
  zulipchat.com, resulting in extra delay before messages can be sent
  again.

As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients.  We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.

If we later want WebSockets, we’ll likely want to just use Django
Channels.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-01-14 22:34:00 -08:00
Tim Abbott f0fd812cc5 tornado: Add transitional code for sender_delivery_email.
This issue was introduced in 54e357e154.
2019-11-20 17:31:11 -08:00
Tim Abbott 1fe4f795af settings: Add notification settings checkboxes for wildcard mentions.
This change makes it possible for users to control the notification
settings for wildcard mentions as a separate control from PMs and
direct @-mentions.
2019-11-20 16:58:46 -08:00
Tim Abbott b85c9b0810 tornado: Use delivery_email in logging.
Eventually, we'll want to replace emails with user IDs here entirely,
but until we make that happen, we should at least use the same email
address present in our other logging.

I think we won't miss updating these in a future migration thanks to
mypy types.
2019-11-15 17:16:05 -08:00
Tim Abbott 993ed9c2b1 tornado: Remove stale user_profile_email field.
Since years ago, this field hasn't been used for anything other than
some logging that would be better off logging the user ID anyway.

It existed in the first place simply because we weren't passing the
user_profile_id to Tornado at all.
2019-11-15 17:07:52 -08:00
Anders Kaseorg 0d20145b93 mypy: Upgrade from 0.730 to 0.740.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-11-13 12:38:45 -08:00
Anders Kaseorg cafac83676 request: Tighten type checking on REQ.
Then, find and fix a predictable number of previous misuses.

With a small change by tabbott to preserve backwards compatibility for
sending `yes` for the `forged` field.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-11-13 12:35:55 -08:00
Tim Abbott b12d3d54c6 events: Fix documentation testing for /events.
Most of the failures were due to parameters that are not intended to
be used by third-party code, so the correct fix for those was the set
intentionally_undocumented=True.

Fixes #12969.
2019-10-21 16:50:10 -07:00
Tim Abbott 0ed0bb6828 messages: Add email/push notifications for wildcard mentions.
Historically, Zulip's implementation of wildcard mentions never
triggered either email or push notifications, instead being limited to
desktop notifications and the "mentions" counter.

We fix this just by plumbing the "wildcard_mentioned" flag through our
system.

Implements much of
https://github.com/zulip/zulip/issues/6040#issuecomment-510157264.
We're also now ready to seriously work on #3750.
2019-08-26 14:39:53 -07:00
Tim Abbott 7953e23d49 event_queue: Actually fix missing copy for edit-message events.
Apparently, our edit-message events did not guarantee that the outer
wrapper dictionary, which is intended to be unique for each client,
was unique for every client (instead only ensuring it was unique for
each user).

This led to clients unexpectedly getting last_event_id validation
errors in this code path when a user had multiple connected clients,
because the linear ordering of event IDs within a given queue was
corrupted.

In fd2a63b049, we accidentally fixed
this issue with a different set of userdata events, without fixing the
edit-message event bug.  This commit fixes the remaining issue.
2019-08-12 15:17:10 -07:00
Anders Kaseorg 7bf09067d1 event_queue: Clean up type ignores.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-08-09 17:42:33 -07:00
Anders Kaseorg becef760bf cleanup: Delete leading newlines.
Previous cleanups (mostly the removals of Python __future__ imports)
were done in a way that introduced leading newlines.  Delete leading
newlines from all files, except static/assets/zulip-emoji/NOTICE,
which is a verbatim copy of the Apache 2.0 license.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-08-06 23:29:11 -07:00
Tim Abbott fd2a63b049 event_queue: Fix missing copy for edit-message events.
Apparently, our edit-message events did not guarantee that the outer
wrapper dictionary, which is intended to be unique for each client,
was unique for every client (instead only ensuring it was unique for
each user).

This led to clients unexpectedly getting last_event_id validation
errors in this code path when a user had multiple connected clients,
because the linear ordering of event IDs within a given queue was
corrupted.
2019-08-06 13:40:30 -07:00
Anders Kaseorg 68dd8e4ec8 mypy: Migrate from mypy_extensions to typing_extensions.
This gives us access to typing_extensions.Deque, which was not added
to typing until 3.5.4.

(PROVISION_VERSION is not bumped because the transitive dependency set
in dev.txt hasn’t changed.)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-08-05 17:24:09 -07:00
Anders Kaseorg 86a7fdddd7 events: Check last_event_id for validity, take 2.
This verifies that the client passed a last_event_id that actually
came from the queue instead of making up an ID from the future.  It
turns out one of our tests was making up such an ID, but legitimate
clients are expected not to do so.

The previous version of this commit (commit
e00d4be6d5, #12888) had to be reverted
(commit b86c5cc490) because it was
missing the `to_dict`/`from_dict` migration code.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-08-05 17:18:49 -07:00
Tim Abbott b86c5cc490 Revert "events: Check last_event_id for validity."
This isn't correct without a proper migration for existing queues,
which may not be implementable.

This reverts commit e00d4be6d5.
2019-08-02 14:44:35 -07:00
Tim Abbott 8be3df0e29 tornado: Fix bugs in Tornado autoreload library.
This fixes two issues:

* The syntax check logic we had for zerver.tornado.autoreload would
  end up clearing _reload_hooks if one of the files that had changed
  was zerver.tornado.autoreload itself (because we'd had re-imported
  the current module), which could be incredibly confusing when trying
  to test the autoreload logic.  It seems better to just not run the
  syntax check for syntax errors in this file.

  Similarly, because reloading event_queue.py would destroy the state
  in the queues, we avoid that as well.

* We make sure to flush stdout after running and reload hooks, to make
  sure their output reaches the user.
2019-08-02 12:47:49 -07:00
Tim Abbott a43b2f7a43 tornado: Fix incorrect import of autoreload library.
We were apparently not running our own forked Tornado autoreload
library when adding reload hooks, which meant that our autoreload
hooks didn't run at all.

This fixes an issue that made dump_event_queues never run and thus the
local development environment difficult to use for testing event queues.
2019-08-02 12:47:49 -07:00
Wyatt Hoodes 4beec5c6b9 typing: Use TYPE_CHECKING when dealing with cyclic dependencies. 2019-07-31 12:19:39 -07:00
Anders Kaseorg e00d4be6d5 events: Check last_event_id for validity.
This verifies that the client passed a last_event_id that actually
came from the queue instead of making up an ID from the future.  It
turns out one of our tests was making up such an ID, but legitimate
clients are expected not to do so.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-07-26 17:18:28 -07:00
Wyatt Hoodes a2fa1a6f25 handlers: Remove duplicate type annotation.
`self._request_middleware` is already typed in the `__init__` method.
2019-07-22 16:27:39 -07:00
Anders Kaseorg 3968fe9fdf tornado: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:33:13 -08:00
Anders Kaseorg e5bf0c0a69 event_queue: Avoid hardcoded paths in /var/tmp.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-01-15 16:12:05 -08:00
Tim Abbott 930e65d1be push: Include type in add-push-notification events.
This should make us able to clean up the logic for this in the future
(right now, we still need to do the .get() for backwards compatibility).
2018-12-15 13:58:52 -08:00