When `update_message` events were updated to have a consistent
format for both normal message updates/edits and special
rendering preview updates, the logic used in the tornado event
queue processor to identify the special events for sending
notifications no longer applied.
Updates that logic to use the `rendering_only` flag (if present)
that was added to the `update_message` event format to identify
if the event processor should potentially send notifications to
users.
For upgrade compatibility, if `rendering_only` flag is not present,
uses previous event structure and checks for the absence of the
`user_id` property, which indicated the special rendering preview
updates.
Fixes#16022.
`cachify` is essentially caching the return value of a function using only
the non-keyword-only arguments as the key.
The use case of the function in the backend can be sufficiently covered by
`functools.lru_cache` as an unbound cache. There is no signficant difference
apart from `cachify` overlooking keyword-only arguments, and
`functools.lru_cache` being conveniently typed.
Signed-off-by: Zixuan James Li <359101898@qq.com>
Adds request as a parameter to json_success as a refactor towards
making `ignored_parameters_unsupported` functionality available
for all API endpoints.
Also, removes any data parameters that are an empty dict or
a dict with the generic success response values.
This replaces the temporary (and testless) fix in
24b1439e93 with a more permanent
fix.
Instead of checking if the user is a bot just before
sending the notifications, we now just don't enqueue
notifications for bots. This is done by sending a list
of bot IDs to the event_queue code, just like other
lists which are used for creating NotificationData objects.
Credit @andersk for the test code in `test_notification_data.py`.
A SIGTERM can show up at any point in the ioloop, even in places which
are not prepared to handle it. This results in the process ignoring
the `sys.exit` which the SIGTERM handler calls, with an uncaught
SystemExit exception:
```
2021-11-09 15:37:49.368 ERR [tornado.application:9803] Uncaught exception
Traceback (most recent call last):
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/http1connection.py", line 238, in _read_message
delegate.finish()
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/httpserver.py", line 314, in finish
self.delegate.finish()
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/routing.py", line 251, in finish
self.delegate.finish()
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 2097, in finish
self.execute()
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 2130, in execute
**self.path_kwargs)
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/gen.py", line 307, in wrapper
yielded = next(result)
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 1510, in _execute
result = method(*self.path_args, **self.path_kwargs)
File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/handlers.py", line 150, in get
request = self.convert_tornado_request_to_django_request()
File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/handlers.py", line 113, in convert_tornado_request_to_django_request
request = WSGIRequest(environ)
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/django/core/handlers/wsgi.py", line 66, in __init__
script_name = get_script_name(environ)
File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/event_queue.py", line 611, in <lambda>
signal.signal(signal.SIGTERM, lambda signum, stack: sys.exit(1))
SystemExit: 1
```
Supervisor then terminates the process with a SIGKILL, which results
in dropping data held in the tornado process, as it does not dump its
queue.
The only command which is safe to run in the signal handler is
`ioloop.add_callback_from_signal`, which schedules the callback to run
during the course of the normal ioloop. This callbacks does an
orderly shutdown of the server and the ioloop before exiting.
This fixes a problem where we could not import zerver.lib.streams from
zerver.lib.message, which would otherwise be reasonable, because the
former implicitly imported many modules due to this issue.
This utilizes the generic `BaseNotes` we added for multipurpose
patching. With this migration as an example, we can further support
more types of notes to replace the monkey-patching approach we have used
throughout the codebase for type safety.
These changes are all independent of each other; I just didn’t feel
like making dozens of commits for them.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This fixes a bug where email notifications were sent for wildcard
mentions even if the `enable_offline_email_notifications` setting was
turned off.
This was because the `notification_data` class incorrectly considered
`wildcard_mentions_notify` as an indeoendent setting, instead of a wrapper
around `enable_offline_email_notifications` and `enable_offline_push_notifications`.
Also add a test for this case.
This commit adds "user_settings_object" field to
client_capabilities which will be used to determine
if the client needs 'update_display_settings' and
'update_global_notifications' event.
Return zulip_merge_base alongside zulip_version
in `/register`, `/event` and `/server_settings`
endpoint so that the value can be used by other
clients.
Previously, we checked for the `enable_offline_email_notifications` and
`enable_offline_push_notifications` settings (which determine whether the
user will receive notifications for PMs and mentions) just before sending
notifications. This has a few problem:
1. We do not have access to all the user settings in the notification
handlers (`handle_missedmessage_emails` and `handle_push_notifications`),
and therefore, we cannot correctly determine whether the notification should
be sent. Checks like the following which existed previously, will, for
example, incorrectly not send notifications even when stream email
notifications are enabled-
```
if not receives_offline_email_notifications(user_profile):
return
```
With this commit, we simply do not enqueue notifications if the "offline"
settings are disabled, which fixes that bug.
Additionally, this also fixes a bug with the "online push notifications"
feature, which was, if someone were to:
* turn off notifications for PMs and mentions (`enable_offline_push_notifications`)
* turn on stream push notifications (`enable_stream_push_notifications`)
* turn on "online push" (`enable_online_push_notifications`)
then, they would still receive notifications for PMs when online.
This isn't how the "online push enabled" feature is supposed to work;
it should only act as a wrapper around the other notification settings.
The buggy code was this in `handle_push_notifications`:
```
if not (
receives_offline_push_notifications(user_profile)
or receives_online_push_notifications(user_profile)
):
return
// send notifications
```
This commit removes that code, and extends our `notification_data.py` logic
to cover this case, along with tests.
2. The name for these settings is slightly misleading. They essentially
talk about "what to send notifications for" (PMs and mentions), and not
"when to send notifications" (offline). This commit improves this condition
by restricting the use of this term only to the database field, and using
clearer names everywhere else. This distinction will be important to have
non-confusing code when we implement multiple options for notifications
in the future as dropdown (never/when offline/when offline or online, etc).
3. We should ideally re-check all notification settings just before the
notifications are sent. This is especially important for email notifications,
which may be sent after a long time after the message was sent. We will
in the future add code to thoroughly re-check settings before sending
notifications in a clean manner, but temporarily not re-checking isn't
a terrible scenario either.
This is necessary to break the uncollectable reference cycle created
by our ‘request_notes.saved_response = json_response(…)’, Django’s
‘response._resource_closers.append(request.close)’, and Python’s
https://bugs.python.org/issue44680.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This prevents a memory leak arising from Python’s inability to collect
a reference cycle from a WeakKeyDictionary value to its key
(https://bugs.python.org/issue44680).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This concludes the HttpRequest migration to eliminate arbitrary
attributes (except private ones that are belong to django) attached
to the request object during runtime and migrated them to a
separate data structure dedicated for the purpose of adding
information (so called notes) to a HttpRequest.
This includes the migration of fields that require trivial changes
to be migrated to be stored with ZulipRequestNotes.
Specifically _requestor_for_logs, _set_language, _query, error_format,
placeholder_open_graph_description, saveed_response, which were all
previously set on the HttpRequest object at some point. This migration
allows them to be typed.
We will no longer use the HttpRequest to store the rate limit data.
Using ZulipRequestNotes, we can access rate_limit and ratelimits_applied
with type hints support. We also save the process of initializing
ratelimits_applied by giving it a default value.
We create a class called ZulipRequestNotes as a new home to all the
additional attributes that we add to the Django HttpRequest object.
This allows mypy to do the typecheck and also enforces type safety.
Most of the attributes are added in the middleware, and thus it is
generally safe to assert that they are not None in a code path that
goes through the middleware. The caller is obligated to do manual
the type check otherwise.
This also resolves some cyclic dependencies that zerver.lib.request
have with zerver.lib.rate_limiter and zerver.tornado.handlers.
* `stream_name`: This field is actually redundant. The email/push
notifications handlers don't use that field from the dict, and they
anyways query for the message, so we're safe in deleting this field,
even if in the future we end up needing the stream name.
* `timestamp`: This is totally unused by the email/push notification
handlers, and aren't sent to push clients either.
* `type` is used only for the push notifications handler, since only
push notifications can be revoked, so we move them to only run there.
We will later use this data to include text like:
`<sender> mentioned @<user_group>` instead of the current
`<sender> mentioned you` when someone mentions a user group
the current user is a part of in email/push notification.
Part of #13080.
JsonableError has two major benefits over json_error:
* It can be raised from anywhere in the codebase, rather than
being a return value, which is much more convenient for refactoring,
as one doesn't potentially need to change error handling style when
extracting a bit of view code to a function.
* It is guaranteed to contain the `code` property, which is helpful
for API consistency.
Various stragglers are not updated because JsonableError requires
subclassing in order to specify custom data or HTTP status codes.
This removes some complexity from the event_queue module.
To avoid code duplication, we reduce the `is_notifiable` methods to
internally just call the `trigger` methods and check their return value.
* Modify `maybe_enqueue_notifications` to take in an instance of the
dataclass introduced in 951b49c048.
* The `check_notify` tests tested the "when to notify" logic in a way
which involved `maybe_enqueue_notifications`. To simplify things, we've
earlier extracted this logic in 8182632d7e.
So, we just kill off the `check_notify` test, and keep only those parts
which verify the queueing and return value behavior of that funtion.
* We retain the the missedmessage_hook and message
message_edit_notifications since they are more integration-style.
* There's a slightly subtle change with the missedmessage_hook tests.
Before this commit, we short-circuited the hook if the sender was muted
(5a642cea11).
With this commit, we delegate the check to our dataclass methods.
So, `maybe_enqueue_notifications` will be called even if the sender was
muted, and the test needs to be updated.
* In our test helper `get_maybe_enqueue_notifications_parameters` which
generates default values for testing `maybe_enqueue_notifications` calls,
we keep `message_id`, `sender_id`, and `user_id` as required arguments,
so that the tests are super-clear and avoid accidental false positives.
* Because `do_update_embedded_data` also sends `update_message` events,
we deal with that case with some hacky code for now. See the comment
there.
This mostly completes the extraction of the "when to notify" logic into
our new `notification_data` module.
We already have this data in the `flags` for each user, so no need to
send this set/list in the event dictionary.
The `flags` in the event dict represent the after-message-update state,
so we can't avoid sending `prior_mention_user_ids`.
Since `flags` here could be iterated through multiple times
(to check for push/email notifiability), we use `Collection`.
Inspired by 871e73ab8f.
The other change here in the `event_queue` code is prep for using
the `UserMessageNotificationsData` class there.
Before this commit, we used to pre-calculate flags for user data and send
it to Tornado, like so:
```
{
"id": 10,
"flags": ["mentioned"],
"mentioned": true,
"online_push_enabled": false,
"stream_push_notify": false,
"stream_email_notify": false,
"wildcard_mention_notify": false,
"sender_is_muted": false,
}
```
This has the benefit of simplifying the logic in the event_queue code a bit.
However, because we sent such an object for each user receiving the event,
the string keys (like "stream_email_notify") get duplicated in the JSON
blob that is sent to Tornado.
For 1000 users, this data may take up upto ~190KB of space, which can
cause performance degradation in large organisations.
Hence, as an alternative, we send just the list of user_ids fitting
each notification criteria, and then calculate the flags in Tornado.
This brings down the space to ~60KB for 1000 users.
This commit reverts parts of following commits:
- 2179275
- 40cd6b5
We will in the future, add helpers to create `UserMessageNotificationsData`
objects from these lists, so as to avoid code duplication.
This is separate from the next commit for ease of testing.
To verify that the compatibility code works correctly, all message send
and event_queue tests from our test suite should pass on just this commit.
Unlike `receiver_is_off_zulip`, fetching from a dict is pretty cheap,
so we can calculate `online_push_enabled` along with the other
variables.
This is a prep change to start using the dataclass introduced in the
earlier commit in this code.
This gives us a single place where all user data for the message
send event is calculated, and is a prep change for introducing
a TypedDict or dataclass to keep this data toghether.
We have already calculated these values, so storing them should not cause
significant performance degradation.
This is a prep chenge for sending a few more flags through internal_data,
namely if `sender_is_muted`.
* In `event_queue.py`, only the sender and recipient users who have muted
the sender will have the "read" flag set.
* We already skip enqueueing notifications for users who've muted the sender
after 58da384da3.
* The queue consume functions for email and push notifications already
check filter messages which have been read before sending notifications.
* So, the "read" logic in `event_queue.py` is unnecessary, and the
processing power saved from not enqueueing notifications for a single
user should be insignificant, so we remove these checks all toghether.
Earlier, the notification-blocking for messages from muted senders
was a side-effect of we never sending notifications for messages
with the "read" flag.
This commit decouples these two things, as a prep for having new
settings which will allow users to **always** receive email
notifications, including when/if they read the message during the
time the notifications is in the queue.
We still mark muted-sender messages as "read" when they are sent,
because that's desirable anyways.
The old name `push_notify_user_ids` was misleading, because
it does not contain user ids which should be notified for
the current message, but rather user ids who have the online
push notifications setting enabled.
When the Tornado server is restarted during an upgrade, if
server has old events with the `push_notify_user_ids` fields,
the server will throw error after this rename. Hence, we need
to explicitly handle such cases while processing the event.
This ensures it is present for all requests; while that was already
essentially true via process_client being called from every standard
decorator, this allows middleware and other code to rely on this
having been set.
This help mobile and terminal clients understand whether a server
restart changed API feature levels or not, which in turn determines
whether they will need to resynchronize their data.
Also add tests and documentation for this previously undocumented
event type.
Fixes: #18205.
This extends the /json/typing endpoint to also accept
stream_id and topic. With this change, the requests
sent to /json/typing should have these:
* `to`: a list set to
- recipients for a PM
- stream_id for a stream message
* `topic`, in case of stream message
along with `op`(start or stop).
On receiving a request with stream_id and topic, we send
typing events to clients with stream_typing_notifications set
to True for all users subscribed to that stream.
Since c3a8a15bae removed the last
instance of code using the dictionary code path, we actually need to
wait until one can no longer upgrade directly from 4.x to master in
order to avoid breakage should we remove this compatibility code,
since only today did we stop generating the old event format.
The bulk deletion codepath was using dicts instead of user ids in the
event, as opposed to the other codepath which was adjusted to pass just
user ids before. We make the bulk codepath consistent with the other
one. Due to the dict-type events happening in 3.*, we move the goal for
deleting the compat code in process_notification to 5.0.
django.utils.translation.ugettext is a deprecated alias of
django.utils.translation.gettext as of Django 3.0, and will be removed
in Django 4.0.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
A bug in the implementation of the all_public_streams API feature
resulted in guest users being able to receive message traffic to public
streams that should have been only accessible to members of the
organization.
This makes it much more clear that this feature does JSON encoding,
which previously was only indicated in the documentation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The Session middleware only adds `Vary: cookie` if it sees an access
to the from inside of it. Because we are effectively, from the Django
session middleware's point of view, returning the static content of
`request.saved_response` and never accessing the session, it does not
set `Vary: cookie` on longpoll requests.
Explicitly mark Tornado requests as varying by cookie.
A few internal fields used for tracking which types of notifications
have already been sent for a given message, like `hander_id` and the
`push_notified` bundle of fields were being incorrectly included in
message events delivered to clients clients.
One could argue these fields might be useful hints to clients, but
because notifications can be triggered later on via
`missedmessage_hook`, they have no useful purpose in the API.
This commit move these extended event field on a `internal_data`
object within the event object, and delete this field in `contents()`
for call points that would serve data to clients.
Tweaked by tabbott to provide a cleaner interface.
We're not bumping API_FEATURE_LEVEL because these fields have always
been documented as being present only due to a bug, so no clients
should be expecting or relying on them.
Fixes: #15947.
This logging is really only potentially interesting in a development
environment when the numbers are nonzero.
In production, it seems worth logging for consistency reasons.
Probably we'll eventually redo this block by change the log level, but
this is good enough to despam the development environment startup
output.
The `no_proxy` parameter does not work to remove proxying[1]; in this
case, since all requests with this adapter are to the internal Tornado
process, explicitly pass in an empty set of proxies to disable
proxying.
[1] https://github.com/psf/requests/issues/4600
Having both of these is confusing; TORNADO_SERVER is used only when
there is one TORNADO_PORT. Its primary use is actually to be _unset_,
and signal that in-process handling is to be done.
Rename to USING_TORNADO, to parallel the existing USING_RABBITMQ, and
switch the places that used it for its contents to using
TORNADO_PORTS.
In development and test, we keep the Tornado port at 9993 and 9983,
respectively; this allows tests to run while a dev instance is
running.
In production, moving to port 9800 consistently removes an odd edge
case, when just one worker is on an entirely different port than if
two workers are used.
tornado.web.Application does not share any inheritance with Django at
all; it has a similar router interface, but tornado.web.Application is
not an instance of Django anything.
Refold the long lines that follow it.
While urllib3 retries all connection errors, it only retries a subset
of read errors, since not all requests are safe to retry if they are
not idempotent, and the far side may have already processed them once.
By default, the only methods that are urllib3 retries read errors on
are GET, TRACE, DELETE, OPTIONS, HEAD, and PUT. However, all of the
requests into Tornado from Django are POST requests, which limits the
effectiveness of bb754e0902.
POST requests to `/api/v1/events/internal` are safe to retry; at worst,
they will result in another event queue, which is low cost and will be
GC'd in short order.
POST requests to `/notify_tornado` are _not_ safe to retry, but this
codepath is only used if USING_RABBITMQ is False, which only occurs
during testing.
Enable retries for read errors during all POSTs to Tornado, to better
handle Tornado restarts without 500's.
Without an explicit port number, the `stdout_logfile` values for each
port are identical. Supervisor apparently decides that it will
de-conflict this by appending an arbitrary number to the end:
```
/var/log/zulip/tornado.log
/var/log/zulip/tornado.log.1
/var/log/zulip/tornado.log.10
/var/log/zulip/tornado.log.2
/var/log/zulip/tornado.log.3
/var/log/zulip/tornado.log.7
/var/log/zulip/tornado.log.8
/var/log/zulip/tornado.log.9
```
This is quite confusing, since most other files in `/var/log/zulip/`
use `.1` to mean logrotate was used. Also note that these are not all
sequential -- 4, 5, and 6 are mysteriously missing, though they were
used in previous restarts. This can make it extremely hard to debug
logs from a particular Tornado shard.
Give the logfiles a consistent name, and set them up to logrotate.
These weren’t wrong since orjson.JSONDecodeError subclasses
json.JSONDecodeError which subclasses ValueError, but the more
specific ones express the intention more clearly.
(ujson raised ValueError directly, as did json in Python 2.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The exception trace only goes from where the exception was thrown up
to where the `logging.exception` call is; any context as to where
_that_ was called from is lost, unless `stack_info` is passed as well.
Having the stack is particularly useful for Sentry exceptions, which
gain the full stack trace.
Add `stack_info=True` on all `logging.exception` calls with a
non-trivial stack; we omit `wsgi.py`. Adjusts tests to match.
By defaults, `requests` has no timeout on requests, which can lead to
waiting indefinitely. Add a half-second timeout on these; this is
applied _inside_ each retry, not overall -- that is, with retries any
of these functions may take a total of 1.5s.
Use the `no_proxy` proxy, which explicitly disables proxy usage for
particular hosts. This is a slightly cleaner solution than ignoring
all of the environment, as removing proxies is specifically what we
are attempting to accomplish.
The change in #2764 provided a better error message on one of the
three calls into Tornado, but left the other two with the old error
message. `raise_for_status` was used on two out of three.
Use a custom HTTPAdapter to apply this pattern to all requests from
Django to Tornado.
Apparently, `update_message` events unexpectedly contained what were
intended to be internal data structures about which users were
mentioned in a given message.
The bug has been present and accumulating new data structures for
years.
Fixing this should improve the performance of handling update_message
events as well as cleaning up this API's interface.
This was discovered by our automated API documentation schema checking
tooling detecting these unexpected elements in these event
definitions; that same logic should prevent future bugs like this from
being introduced in the future.
There is still some miscellaneous cleanup that
has to happen for things like analytics queries
and dead code in node tests, but this should
remove the main use of pointers in the backend.
(We will also still need to drop the DB field.)
This is designed to have no user-facing change unless the client
declares bulk_message_deletion in its client_capabilities.
Clients that do so will receive a single bulk event for bulk deletions
of messages within a single conversation (topic or PM thread).
Backend implementation of #15285.
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This allows straight-forward configuration of realm-based Tornado
sharding through simply editing /etc/zulip/zulip.conf to configure
shards and running scripts/refresh-sharding-and-restart.
Co-Author-By: Mateusz Mandera <mateusz.mandera@zulip.com>
Since production testing of `message_retention_days` is finished, we can
enable this feature in the organization settings page. We already had this
setting in frontend but it was bit rotten and not rendered in templates.
Here we replaced our past text-input based setting with a
dropdown-with-text-input setting approach which is more consistent with our
existing UI.
Along with frontend changes, we also incorporated a backend change to
handle making retention period forever. This change introduces a new
convertor `to_positive_or_allowed_int` which only allows positive integers
and an allowed value for settings like `message_retention_days` which can
be a positive integer or has the value `Realm.RETAIN_MESSAGE_FOREVER` when
we change the setting to retain message forever.
This change made `to_not_negative_int_or_none` redundant so removed it as
well.
Fixes: #14854
Generated by autopep8, with the setup.cfg configuration from #14532.
I’m not sure why pycodestyle didn’t already flag these.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
We use retry_event in queue_processors.py to handle trying on failures,
without getting stuck in permanent retry loops if the event ends up
leading to failure on every attempt and we just keep sending NACK to
rabbitmq forever (or until the channel crashes). Tornado queues haven't
been using this, but they should.
The change in 180d8abed6, while correct
for the Django part of the codebase, had the nasty side effect of
exposing a failure mode in the process_notification logic if the users
list was empty.
This, in turn, could cause our process_notification code to fail with
an IndexError when trying to process the event, which would result in
that tornado process not automatically recovering, due to the outer
try/except handler for consume triggering a NACK and thus repeating
the event.
When more than one outgoing webhook is configured,
the message which is send to the webhook bot passes
through finalize_payload function multiple times,
which mutated the message dict in a way that many keys
were lost from the dict obj.
This commit fixes that problem by having
`finalize_payload` return a shallow copy of the
incoming dict, instead of mutating it. We still
mutate dicts inside of `post_process_dicts`, though,
for performance reasons.
This was slightly modified by @showell to fix the
`test_both_codepaths` test that was added concurrently
to this work. (I used a slightly verbose style in the
tests to emphasize the transformation from `wide_dict`
to `narrow_dict`.)
I also removed a deepcopy call inside
`get_client_payload`, since we now no longer mutate
in `finalize_payload`.
Finally, I added some comments here and there.
For testing, I mostly protect against the root
cause of the bug happening again, by adding a line
to make sure that `sender_realm_id` does not get
wiped out from the "wide" dictionary.
A better test would exercise the actual code that
exposed the bug here by sending a message to a bot
with two or more services attached to it. I will
do that in a future commit.
Fixes#14384
If we have an old event that's missing the field
`sender_delivery_email`, we now patch it at the top
of `process_message_event`, rather than for each call
to `get_client_payload`. This will make an upcoming
commit a bit easier to reason about. Basically, it's
simpler to shim the incoming event one time rather
than doing it up to four times. We know that
`get_client_payload` is non-destructive, because it
does a deepcopy.
Instead of trying to set the _requestor_for_logs attribute in all the
relevant places, we try to use request.user when possible (that will be
when it's a UserProfile or RemoteZulipServer as of now). In other
places, we set _requestor_for_logs to avoid manually editing the
request.user attribute, as it should mostly be left for Django to manage
it.
In places where we remove the "request._requestor_for_logs = ..." line,
it is clearly implied by the previous code (or the current surrounding
code) that request.user is of the correct type.
Since essentially the first use of Tornado in Zulip, we've been
maintaining our Tornado+Django system, AsyncDjangoHandler, with
several hundred lines of Django code copied into it.
The goal for that code was simple: We wanted a way to use our Django
middleware (for code sharing reasons) inside a Tornado process (since
we wanted to use Tornado for our async events system).
As part of the Django 2.2.x upgrade, I looked at upgrading this
implementation to be based off modern Django, and it's definitely
possible to do that:
* Continue forking load_middleware to save response middleware.
* Continue manually running the Django response middleware.
* Continue working out a hack involving copying all of _get_response
to change a couple lines allowing us our Tornado code to not
actually return the Django HttpResponse so we can long-poll. The
previous hack of returning None stopped being viable with the Django 2.2
MiddlewareMixin.__call__ implementation.
But I decided to take this opportunity to look at trying to avoid
copying material Django code, and there is a way to do it:
* Replace RespondAsynchronously with a response.asynchronous attribute
on the HttpResponse; this allows Django to run its normal plumbing
happily in a way that should be stable over time, and then we
proceed to discard the response inside the Tornado `get()` method to
implement long-polling. (Better yet might be raising an
exception?). This lets us eliminate maintaining a patched copy of
_get_response.
* Removing the @asynchronous decorator, which didn't add anything now
that we only have one API endpoint backend (with two frontend call
points) that could call into this. Combined with the last bullet,
this lets us remove a significant hack from our
never_cache_responses function.
* Calling the normal Django `get_response` method from zulip_finish
after creating a duplicate request to process, rather than writing
totally custom code to do that. This lets us eliminate maintaining
a patched copy of Django's load_middleware.
* Adding detailed comments explaining how this is supposed to work,
what problems we encounter, and how we solve various problems, which
is critical to being able to modify this code in the future.
A key advantage of these changes is that the exact same code should
work on Django 1.11, Django 2.2, and Django 3.x, because we're no
longer copying large blocks of core Django code and thus should be
much less vulnerable to refactors.
There may be a modest performance downside, in that we now run both
request and response middleware twice when longpolling (once for the
request we discard). We may be able to avoid the expensive part of
it, Zulip's own request/response middleware, with a bit of additional
custom code to save work for requests where we're planning to discard
the response. Profiling will be important to understanding what's
worth doing here.
This fixes a bug where our asynchronous requests were only copying the
Content-Type header (i.e. the one case where we're noticed) from the
Django HttpResponse. I'm not sure what the impact of this would be;
the rate-limiting headers rarely come up when breaking a long-polled
request. But it seems clearly an improvement to do this in a
consistent fashion.
Only the headers piece is a change; in Tornado
self.finish(x)
is equivalent to:
self.write(x)
self.finish()