Servers that had upgraded from a Zulip server version that did not yet
support the user_uuid field to one that did could end up with some
mobile devices having two push notifications registrations, one with a
user_id and the other with a user_uuid.
Fix this issue by sending both user_id and user_uuid, and clearing
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Previously, an active production Zulip server would experience a class
of deadlocks caused by two or more concurrent bulk update operations
on the UserMessage table.
This is because UPDATE ... SET ... WHERE statements that execute in
parallel take row-level UPDATE locks as they get results; since the
query plans may result in getting rows in different orders between two
queries, this can result in deadlocks.
Some databases allow ORDER BY on their UPDATE ... WHERE statements;
PostgreSQL does not. In PostgreSQL, the answer is to do a sub-select
with an ORDER BY ... FOR UPDATE to ensure consistent ordering on row
locks.
We do this all code paths using bitand or bitor as part of bulk
editing message flags, which should ensure that these concurrent
operations obtain row level locks on the table in the same order.
Fixes#19054.
Previous behavior was logging only the uuid if it was provided by the
remote server, but that's insufficient, because the user may actually
have no devices registered with uuis and we (at the bouncer) end up
sending notifications to id-based registrations. Not having that id
logged makes it impossible to figure out what's going on.
Fixes#18017.
In previous commits, the change to the bouncer API was introduced to
support this and then a series of migrations added .uuid to
UserProfiles.
Now the code for self-hosted servers that makes requests
to the bouncer is changed to make use of it.
This is the first step to making the full switch to self-hosted servers
use user uuids, per issue #18017. The old id format is still supported
of course, for backward compatibility.
This commit is separate in order to allow deploying *just* the bouncer
API change to production first.
When removing notifications, we skip the access control on if the user
still can read them -- they should not have a notification of them,
both because they currently cannot read the message, as well as
because they have already done so.
In steady-state, requests to FCM take about a second; however, in
cases where the remote FCM server is unstable, the request has been
observed to block for more than a minute.
As noted in the previous commit, pushes must complete within 30s;
fail fast, and let the retries and exponential backoff handle errors.
The worst-case total time taken with timeouts and errors for an FCM
notification is 19.5s. Unfortunately, `aioapns` does not appear to
have any timeouts, and thus this commit cannot guarantee a total of
fewer than 30s.
This reverts bc15085098 (which provided
not justification for its change) and moves further, down to 2 retries
from the default of 5.
10 retries, with exponential backoff, is equivalent to sleeping 2^11
seconds, or just about 34 minutes (though the code uses a jitter which
may make this up to 51 minutes). This is an unreasonable amount of
time to spend in this codepath -- as only one worker is used, and it
is single-threaded, this could effectively block all missed message
notifications for half an hour or longer.
This is also necessary because messages sent through the push bouncer
are sent synchronously; the sending server uses a 30-second timeout,
set in PushBouncerSession. Having retries which linger longer than
this can cause duplicate messages; the sending server will time out
and re-queue the message in RabbitMQ, while the push bouncer's request
will continue, and may succeed.
Limit to 2 retries (APNS currently uses 3), and results expected max
of 4 seconds of sleep, potentially up to 6. If this fails, there
exists another retry loop above it, at the RabbitMQ layer (either
locally, or via the remote server's queue), which will result in up to
3 additional retries -- all told, the request will me made to FCM up
to 12 times.
Calling `get_apns_context` opens (and caches) an open connection to
the APNs servers. Since `apns_enabled` is called from Django
codepaths, this means that the Django processes hold unnecessary
connections open to the APNs servers.
Switch `apns_enabled` to checking what `get_apns_context` checks when
we're just returning True/False.
aioapns 2.1 removed the loop parameter from the aioapns.APNs
constructor, because Python 3.10 removed the loop parameter from the
asyncio.Lock constructor.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Stop using `access_user_group_by_id` in notifications codepaths, as it
is meant to be used to check for _write_ access, not read
access (which is not limited). In the notification codepaths, there
are no ACLs to apply, and the ID is known-good; just load it
directly. The `for_mention` flag is removed, as it was not used in the
mention codepaths at all, only the notification ones.
This replaces the temporary (and testless) fix in
24b1439e93 with a more permanent
fix.
Instead of checking if the user is a bot just before
sending the notifications, we now just don't enqueue
notifications for bots. This is done by sending a list
of bot IDs to the event_queue code, just like other
lists which are used for creating NotificationData objects.
Credit @andersk for the test code in `test_notification_data.py`.
This makes logging more consistent between FCM and APNs codepaths, and
makes clear which user-ids are for local users, and which are opaque
integers namespaced from some remote zulip server.
Being able to determine how many distinct users are getting push
notifications per remote host is useful, as is the distribution of
device counts. This parallels the log line in
handle_push_notification for push notifications from local realms,
handled via the event queue.
This parallels fe25517295, but for mobile notifications. It also
adds a test, which verifies that such content does not crash either
mobile or email notifications.
aioapns already has a retry loop. By default it retries forever on
ConnectionError and ConnectionClosed, so our own retry loop would
never be reached. Remove our retry loop, and configure aioapns to
retry APNS_MAX_RETRIES times on ConnectionError like the previous
version did. It still retries forever on ConnectionClosed; that’s not
configurable but probably fine.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
de04f0ad67 changed now notifications recipients were calculated, in
a manner that caused them to be sent when they should not have been.
ac70a2d2e1 was supposed to resolve this, but appears to have been
insufficient, as all three of these cases have been observed to still
happen.
Add safety checks immediately before notification, until the
underlying logic error can be sussed out.
We do not allow any user to edit the system user groups (including
renaming, deleting, adding or removing members, etc.) from the
API. These user groups will change only by the code when a new
user is added or role of a user is changed.
This is implemented by rejecting access_user_group_by_id always
except the case when it is use to get the user group for sending
email and push notifications, as we would need to send notifications
to the mentioned user group.
Previously, we checked for the `enable_offline_email_notifications` and
`enable_offline_push_notifications` settings (which determine whether the
user will receive notifications for PMs and mentions) just before sending
notifications. This has a few problem:
1. We do not have access to all the user settings in the notification
handlers (`handle_missedmessage_emails` and `handle_push_notifications`),
and therefore, we cannot correctly determine whether the notification should
be sent. Checks like the following which existed previously, will, for
example, incorrectly not send notifications even when stream email
notifications are enabled-
```
if not receives_offline_email_notifications(user_profile):
return
```
With this commit, we simply do not enqueue notifications if the "offline"
settings are disabled, which fixes that bug.
Additionally, this also fixes a bug with the "online push notifications"
feature, which was, if someone were to:
* turn off notifications for PMs and mentions (`enable_offline_push_notifications`)
* turn on stream push notifications (`enable_stream_push_notifications`)
* turn on "online push" (`enable_online_push_notifications`)
then, they would still receive notifications for PMs when online.
This isn't how the "online push enabled" feature is supposed to work;
it should only act as a wrapper around the other notification settings.
The buggy code was this in `handle_push_notifications`:
```
if not (
receives_offline_push_notifications(user_profile)
or receives_online_push_notifications(user_profile)
):
return
// send notifications
```
This commit removes that code, and extends our `notification_data.py` logic
to cover this case, along with tests.
2. The name for these settings is slightly misleading. They essentially
talk about "what to send notifications for" (PMs and mentions), and not
"when to send notifications" (offline). This commit improves this condition
by restricting the use of this term only to the database field, and using
clearer names everywhere else. This distinction will be important to have
non-confusing code when we implement multiple options for notifications
in the future as dropdown (never/when offline/when offline or online, etc).
3. We should ideally re-check all notification settings just before the
notifications are sent. This is especially important for email notifications,
which may be sent after a long time after the message was sent. We will
in the future add code to thoroughly re-check settings before sending
notifications in a clean manner, but temporarily not re-checking isn't
a terrible scenario either.
This reduces loose strings in the codebase, and allows us to not worry
about the exact naming (`stream_email_enabled` or `stream_emails_enabled`?)
and tense (`mentioned` or `mention`?).
Ideally this new class should have been in `lib/notification_data.py`,
which is our file for things like this. But, the next commit requires
using this data in `models.py`, and importing from `notification_data.py`
to `models.py` causes recursive imports.
Work around Fatal1ty/aioapns#15, by silencing error-level logging from
the aioapns logger. We deal with the results of failed
send_notification calls by examining the `result.description` and
handling them; the extra logging message merely clutters the Sentry
logs.
The code to also notify for wildcard mentions was added in
0ed0bb6828.
But that showed the same text for both the cases. This commit fixes
that.
This is more of change for correctness. The mobile app currently does
not rely on this text for notifications, but constructs the text by
itself from the data in the payload.
This also fixes the "stream_push_notify" case to consistently show
a `#` before the stream name.