Fixes#17922.
These two places fetch subscriptions for the sake of getting user ids to
send events to. Clearly deactivated users should be excluded from that.
get_active_subscriptions_for_stream_id should allow specifying whether
subscriptions of deactivated users should be included in the result.
Active subs of deactivated users are a subtlety that's easy to miss
when writing relevant code, so we make include_deactivated_users a
mandatory kwarg - this will force callers to definitely give thought to
whether such subs should be included or not.
This commit is just a refactoring, we keep original behavior everywhere
- there are places where subs of deactivates users should probably be
excluded but aren't - we don't fix that here, it'll be addressed in
follow-up commits.
The bulk deletion codepath was using dicts instead of user ids in the
event, as opposed to the other codepath which was adjusted to pass just
user ids before. We make the bulk codepath consistent with the other
one. Due to the dict-type events happening in 3.*, we move the goal for
deleting the compat code in process_notification to 5.0.
django.utils.translation.ugettext is a deprecated alias of
django.utils.translation.gettext as of Django 3.0, and will be removed
in Django 4.0.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
A bug in the implementation of replies to messages sent by outgoing
webhooks to private streams meant that an outgoing webhook bot could be
used to send messages to private streams that the user was not intended
to be able to send messages to.
Completely skipping stream access check in check_message whenever the
sender is an outgoing webhook bot is insecure, as it might allow someone
with access to the bot's API key to send arbitrary messages to all
streams in the organization. The check is only meant to be bypassed in
send_response_message, where the stream message is only being sent
because someone mentioned the bot in that stream (and thus the bot
posting there is the desired outcome). We get much better control over
what's going by passing an explicit argument to check_message when
skipping the access check is desirable.
* This introduces a new event type `realm_linkifiers` and
a new key for the initial data fetch of the same name.
Newer clients will be expected to use these.
* Backwards compatibility is ensured by changing neither
the current event nor the /register key. The data which
these hold is the same as before, but internally, it is
generated by processing the `realm_linkifiers` data.
We send both the old and the new event types to clients
whenever the linkifiers are changed.
Older clients will simply ignore the new event type, and
vice versa.
* The `realm/filters:GET` endpoint (which returns tuples)
is currently used by none of the official Zulip clients.
This commit replaces it with `realm/linkifiers:GET` which
returns data in the new dictionary format.
TODO: Update the `get_realm_filters` method in the API
bindings, to hit this new URL instead of the old one.
* This also updates the webapp frontend to use the newer
events and keys.
When a user is muted, in the same request,
we mark any existing unreads from that user
as read.
This is done for all types of messages
(PM/huddle/stream) and regardless of whether
the user was mentioned in them.
This will not break the unread count logic
of the web frontend, because that algorithm
decides which messages to mark as read based
only on the pointer location and the whitespace
at the bottom, not on what messages have already
been marked as read.
Messages sent by muted users are marked as read
as soon as they are sent (or, more accurately,
while creating the database entries itself), regardless
of type (stream/huddle/PM).
ede73ee4cd, makes it easy to
pass a list to `do_send_messages` containing user-ids for
whom the message should be marked as read.
We add the contents of this list to the set of muter IDs,
and then pass it on to `create_user_messages`.
This benefits from the caching behaviour of `get_muting_users`
and should not cause performance issues long term.
The consequence is that messages sent by muted users will
not contribute to unread counts and notifications.
This commit does not affect the unread messages
(if any) present just before muting, but only handles
subsequent messages. Old unreads will be handled in
further commits.
This makes it so that RealmAuditLog entries are
created when a user mutes/unmutes someone.
We don't really need to store the time, but we
do so anyways, because the `event_time` field
is currently a non-nullable one in the `RealmAuditLog`
model, and making it nullable would risk allowing
not specifying the time in other more important
code which also creates `RealmAuditLog` entries.
This also fixes an incorrect test of successfully
unmuting with the API. Earlier it did not mock
the time in the `views/muting.py` code to return
`mute_time`.
Previously, when unmuting a user, we used to make
two database fetches - one to verify that the user
is has been muted before, and one while actually
unmuting the user.
This reduces that to one, by passing around the
`MutedUser` object fetched in the first round.
Since the new function returns `Optional[MutedUser]`,
we need to use a hack for events tests, because
mypy does not yet use the type inferred from
`assert foo is not None` in nested functions like lambdas.
See python/mypy@8780d45507.
This commit implements a subtle optimization (described in more detail
in the comment) that can save a few hundred milliseconds in when the
sender sees that their message has sent when sending to very large
streams.
Fixes#17898.
We send the whole data set as a part of the event rather than
doing an add/remove operation for couple of reasons:
* This would make the client logic simpler.
* The playground data is small enough for us to not worry
about performance.
Tweaked both `fetch_initial_state_data` and `apply_events` to
handle the new playground event.
Tests added to validate the event matches the expected schema.
Documented realm_playgrounds sections inside /events and
/register to support our openapi validation system in test_events.
Tweaked other tests like test_event_system.py and test_home.py
to account for the new event being generated.
Lastly, documented the changes to the API endpoints in
api/changelog.md and bumped API_FEATURE_LEVEL.
Tweaked by tabbott to add an `id` field in RealmPlayground objects
sent to clients, which is essential to sending the API request to
remove one.
Similar to the previous commit, we have added a `do_*` function
which does the deletion from the DB. The next commit handles sending
the events when both adding and deleting a playground entry.
Added the openAPI format data to zulip.yaml for DELETE
/realm/playgrounds/{playground_id}. Also added python and curl
examples to remove-playground.md.
Tests added.
This endpoint will allow clients to create a playground entry
containing the name, pygments language and url_prefix for the
playground of their choice.
Introduced the `do_*` function in-charge of creating the entry in
the model. Handling the process of sending events which will be
done in a follow up commit.
Added the openAPI format data to zulip.yaml for POST
/realm/playgrounds. Also added python and curl examples for using
the endpoint in its markdown documented (add-playground.md).
Tests added.
Adds backend code for the mute users feature.
This is just infrastructure work (database
interactions, helpers, tests, events, API docs
etc) and does not involve any behavioral/semantic
aspects of muted users.
Adds POST and DELETE endpoints, to keep the
URL scheme mostly consistent in terms of `users/me`.
TODOs:
1. Add tests for exporting `zulip_muteduser` database table.
2. Add dedicated methods to python-zulip-api to be used
in place of the current `client.call_endpoint` implementation.
Previously, if a user subscribed to a stream with
history_public_to_subscribers, and then was looking at old messages in
the stream, they would not get live-updates for that stream, because
of the structure in how notify_reaction_update only looked at
UserMessage rows (we had a previous workaround involving the
`historical` field in `UserMessage` which had already made it work if
the user themselves added the reaction).
We fix this by including all subscribers with history access in the
set of recipients for update events.
Fixes a bug that was confused with #16942.
Adding an additional `!` to the stream name each time a stream is
deactivated, to a maximum of 21 times, effectively limits number of
times a stream with a given name can be deactivated. This is unlikely
to come up in common usage, but may be confusing when testing.
Change what we prepend to deactivated stream names to something with
more entropy than just `!`, by instead prepending a substring of hash
of the stream's ID. `!`s. Using 128 bits of the hash means that it
will require more than 10^18th renames to have a 1% chance of collision.
Because too-long stream names are also truncated at 60 characters,
having this entropy in the beginning of the name also helps address
potential issues from stream names that differed only in, e.g. the
60th character.
Fixes#17016.
Instead of validating `op` value later, this commit does that
in `REQ`.
Also helps avoiding duplication of this validation when
stream typing notifications feature is added.
With the previous two commits deployed, we're ready to use the
denormalization to optimize the query.
With dev environment db prepared using
./manage.py populate_db --extra-users=2000 --extra-streams=400
this takes the execution time of the query in
bulk_get_subscriber_user_ids from 1.5-1.6s to 0.4-0.5s on my machine.
The new comment explains the issue in some detail, but basically if we
deactivate the bots first, then an error partway through is corrected
by a retry; if we deactivate the user first, then we may leak
undeactivated bots if a failure occurs.
This adds the is_user_active with the appropriate code for setting the
value correctly in the future. In the following commit a migration to
backfill the value for existing Subscriptions will be added.
To ensure correct user_profile.is_active handling also in tests, we
replace all direct .is_active mutation with calls to appropriate
functions.
These procedures should be done atomically overall, with the exception
of the code that sends events to avoid block if there's a delay
communicating with Tornado.
We add the savepoint=False on underlying function that already
executes inside an atomic context - to avoid the overhead of creating
savepoints where they aren't needed.
* `op` (operation) field, added in f6fb88549f, was never intended for
`custom_profile_fields` event. This commit removes the `op` as it doesn't
have any use in the code.
* As a part of cleanup, this also eliminates the schema check warnings
for `custom_profile_fields` event, mentioned in #17568.
Emails are not unique, so we can only sensibly cache using keys formed
with both email and realm.
This requires adding a new cache key function for caching by delivery
email - user_profile_delivery_email_cache_key.
This add the schema checker, openapi schema, and also a test for
realm/deactivated event.
With several block comments by tabbott explaining the logic behind our
behavior here.
Part of #17568.
This is a prep commit which modifies the
`send_message_moved_breadcrumbs` function to take
message strings as input.
This is done to reuse the function in other places
like the /digress command.
Fixes#17238.
In process_new_human user, the queries were wrong, revoking all invites
sent to the email address, even in other realms than the one where the
new account just got created.
This commmit includes ROLE_MODERATOR in realm_user_count_by_role.
We also update test_change_role in test_audit_log.py to include
changes for moderator role as well.
This is a minor refactor which renames the
notify_topic_moved_streams function to
send_message_moved_breadcrumbs.
This is done because this function will be also used
for other things in the future, when moving streams
or when using the /digress command, for example.
c2526844e9 removed the `signups` queue
worker, and the command-line tool that enqueues to it -- but not the
automated process that enqueues during signups itself.
Remove the signup, since it is no longer in use.
We eliminate some redundant checks.
We also consistently provide a `subscribers` field
in our stream data with `[]`, even if our users
can't access subscribers. We therefore bump
the API version and tweak the docs. (See further
down for a detailed justification of the change.)
Even though it is sometimes fine to have redundant code
that is defensive in nature, some upcoming changes are gonna
move subscriber-related logic out of build_stream_dict_for_sub
for certain codepaths as part of our effort to streamline
the payload for subscribers within page_params.
So we can't rely on the code that I removed here
inside of build_stream_dict_for_sub.
Anyway, it makes more sense to do these checks explicitly
in the validate function.
The code in build_stream_dict_for_sub was almost effectively
a noop, since the validation function was already preventing
us from getting subscriber info. The only difference it
made was sometimes converting `[]` to `None`, and then
subsequently omitting the subscribers field.
Neither ZT nor the webapp make any distinction between
`[]` or <missing key> for the `subscribers` data in
`page_params`.
The webapp has had this code for a long time (and now
equivalent code elsewhere in this PR):
if (!Object.prototype.hasOwnProperty.call(sub, "subscribers")) {
sub.subscribers = new LazySet([]);
}
The webapp calculates access based on booleans, anyway:
sub.can_access_subscribers =
page_params.is_admin || sub.subscribed ||
(!page_params.is_guest && !sub.invite_only);
And ZT would choke if `subscribers` were missing, except that
it never gets to the relevant code due to other checks:
def get_other_subscribers_in_stream(<snip>):
assert stream_id is not None or stream_name is not None
if stream_id:
assert self.is_user_subscribed_to_stream(stream_id)
return [sub
for sub in self.stream_dict[stream_id]['subscribers']
if sub != self.user_id]
else:
return [sub
for _, stream in self.stream_dict.items()
for sub in stream['subscribers']
if stream['name'] == stream_name
if sub != self.user_id]
You could make a semantic argument that we should prefer
<missing key> to `[]` when subscribers aren't even available, but
we have precedent from the way that `bulk_get_subscriber_user_ids`
has traditionally populated its result:
result: Dict[int, List[int]] =
{stream["id"]: [] for stream in stream_dicts}
If we changed `stream_dicts` to `target_stream_dicts` we
would faciliate a move toward `None`, but it would just cause
headaches for other server code as well as the frontends
(which, to reiterate, already prefer the empty array
for convenience).
As my comment indicates, I would prefer to handle
this explicitly by raising JsonableError in an
else statement here, but it's not a big deal.
This function can probably be simplified with a
bit of work, mostly on the testing side to make
sure we are covering all edge cases, but that
is out of the scope of my current PR.
The codepath for moving a topic changes the message.recipient_id to the
id of the new recipient, but later, in update_messages_for_topic_edit,
it uses message.recipient when querying for messages with the matching
topic in the *old* stream (because those are the other messages that
need to be moved). This is a bug which happens to work fine, because in
Django 2, if message.recipient gets fetched first and then
message.recipient_id is mutated, message.recipient will not be altered
and thus will retain the outdated, previously fetched value.
In Django 3 changing .recipient_id causes .recipient to be updated to
the new Recipient objects, which is the Recipient of the *new* stream.
That will cause the bug to manifest.
This is a bugfix preparing for the upgrade to Django 3.
When changing the subdomain of a realm, create a deactivated realm with
the old subdomain of the realm, and set its deactivated_redirect to the
new subdomain.
Doing this will help us to do the following:
- When a user visits the old subdomain of a realm, we can tell the user
that the realm has been moved.
- During the registration process, we can assure that the old subdomain
of the realm is not used to create a new realm.
If the subdomain is changed multiple times, the deactivated_redirect
fields of all the deactivated realms are updated to point to the new
uri.
Instead of just storing the edit history in the message which
triggered the topic edit, we store the edit history in all
the messages that changed. This helps users track the edit history
of a message more reliably.
Allowing any admins to create arbitrary users is not ideal because it
can lead to abuse issues. We should require something stronger that
requires the server operator's approval and thus we add a new
can_create_users permission.
We change the return type of check_message to be dataclass instead of
Dict[str, Any]. This refactoring helps us to understand the context of the
data structure returned by check_message clearly which was not possible
when using Dict.
SendMessageRequest class is added in zerver/lib/message.py inspite of it
not being used in that file itself just to maintain consistency as other
TypedDicts and dataclasses are defined in that file and to avoid circular
dependency as SendMessageRequest is being used in lib/widget.py as well.
We also rename local variable to 'send_request' for accessing
SendMessageRequest objects.
We always want to do these at the same time. Previously, message
editing did too much stripping (fixes#16837) and failed to check for
NUL bytes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Previously we were just returning a dict containing a message id when
trying to mirror a already sent message in 'zephyr_mirror' cases.
This commit changes this behaviour to raise an exception when trying
to mirror an already sent message by adding a new exception class
ZephyrMessageAlreadySentException and then the caller returns the
message_id directly, instead of calling do_send_messages which also
returns a list of size one containing the message_id only.
This is a prep commit for changing the return type of check_message to
be a dataclass instead of a Dict as now we have only single output for
check_message.
The message_dict['wildcard_mention_user_ids'] should be empty set instead
of empty list when there are no wildcard mentions similar to the case
when there are wildcard mentions, where it is equal to set of user ids and
not list of user ids.
We export a realm's data, and disable the realm, because the user
is moving from Zulip Cloud (e.g. https://example.zulipchat.com/) to
self-hosting or another platform (e.g. https://zulip.example.com/)
which we do not control. This commit adds a field in the realm object
called deactivated_redirect to store the url to which the realm has
moved.
While working on shifting toward native browser time zone APIs
(#16451), it was found that all but very recent Chrome and Node
versions reject certain legacy timezone aliases like US/Pacific
(https://crbug.com/364374).
For now, we only canonicalize the timezone property returned in user
objects and not the timezone setting itself.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
During the new user creation code path, there can be no existing
active clients for the user being created, so we can skip the code to
send events to that user's clients.
The tests here reflect that we need to send fewer events, and do fewer
queries that would have been spent computing data for these..
Fixes#16503, combined with the long series of recent changes by Steve
Howell to fix super-linear behavior in this code path.
We no bulk up peer_add/peer_remove events by user if the
same user has subscribed to multiple streams (and just
that single user).
This mostly optimizes the new-user codepath, but the
algorithm is a bit more general in nature.
We now can send an implied matrix of user/stream tuples
for peer_add and peer_remove events.
The client code basically does this:
for stream_id in event['stream_ids']:
for user_id in event['user_ids']:
update_sub(stream_id, user_id)
We used to send individual events, which gets real
expensive when you are creating new streams. For
the case of copy-to-stream case, we should see
events go from U to 1, where U is the number of users
added.
Note that we don't yet fully optimize the potential
of this schema. For adding a new user with lots
of default streams, we still send S peer_add events.
And if you subscribe a bunch of users to a bunch of
private streams, we only go from U * S to S; we can't
optimize it down to one event easily.
All the fields of a stream's recipient object can
be inferred from the Stream, so we just make a local
object. Django will create a Message object without
checking that the child Recipient object has been
saved. If that behavior changes in some upgrade,
we should see some pretty obvious symptom, including
query counts changing.
Tweaked by tabbott to add a longer explanatory comment, and delete a
useless old comment.
This saves us a query for edge cases like when
you try to unsubscribe from a public stream
that you have already unsubscribed from.
But this is mostly to prep for upcoming
optimizations.
This doesn't change anything yet, but the goal is
to eventually optimize events for the case where
one user (typically a new user) gets subscribed
to multiple public streams.
There was no need to put "stream_id" on the sub
dictionary here. It's kinda annoying to introduce
the little helper here, but I feel
that's better than crufting up the sub data
structure.
The is_web_public flag is already in Stream.API_FIELDS,
so there is no reason for all this complicated logic.
There's no reason to hack it on to the subscription
object.
We replace all_streams_id with a map.
We also use it to populate never_subscribed_streams.
And all_streams_map is a superset of stream_hash,
which we will soon kill off as well.
Apparently I put these parens in the code as
part of 73c30774cb
during 2017.
It looks like I extracted is_public during
the middle of my change and forgot to remove
the unnecessary parens. (The code was correct,
but it makes it look like a tuple if you're
skimming it too quickly.)
That class is an artifact of when Stream
didn't have recipient_id. Now it's simpler
to deal with stream subscriptions.
We also save a query during page load (and
other places where we get subscriber
info).
We already trust ids that are put on our queue
for deferred work. For example, see the code for
"mark_stream_messages_as_read_for_everyone"
We now pass stream_recipient_id when we queue
up work for do_mark_stream_messages_as_read.
This generally saves about 3 queries per
user when we unsubscribe them from a stream.
We get two speedups:
* The query to get existing subscribers only
gets the two fields we need. We no longer
need all the overhead of user_profile
and recipient data being returned in the
query.
* We avoid Django making extra hops to the
database to get user info.
Previously, the transaction.atomic() was not properly scoped to ensure
that RealmAuditLog entries were created in the same transaction,
making it possible for state changes to not be properly recorded in
RealmAuditLog.
When apps like mobile register for "streams", we
will now just use active streams as our baseline,
rather than "occupied" streams.
This means we will send a stream that is active,
even if it happens to have zero occupants. It's
actually pretty rare that a stream has zero occupants,
and it's not exactly clear that we want to exclude
a non-occupied but otherwise active stream from
our list of streams.
It also happens to be fairly expensive to compute
whether a stream is occupied.
This change only affects API clients (including
possibly our mobile app). The main webapp never
used the data from this codepath.
We replace get_peer_user_ids_for_stream_change
with two bulk functions to get peers and/or
subscribers.
Note that we have three codepaths that care about
peers:
subscribing existing users:
we need to tell peers about new subscribers
we need to tell subscribed user about old subscribers
unsubscribing existing users:
we only need to tell peers who unsubscribed
subscribing new user:
we only need to tell peers about the new user
(right now we generate send_event
calls to tell the new user about existing
subscribers, but this is a waste
of effort that we will fix soon)
The two bulk functions are this:
bulk_get_subscriber_peer_info
bulk_get_peers
They have some overlap in the implementation,
but there are some nuanced differences that are
described in the comments.
Looking up peers/subscribers in bulk leads to some
nice optimizations.
We will save some memchached traffic if you are
subscribing to multiple public streams.
We will save a query in the remove-subscriber
case if you are only dealing with private streams.
This will ensure that we always fully execute the database part of
modifying subscription objects. In particular, this should prevent
invariant failures like #16347 where Subscription objects were created
without corresponding RealmAuditLog entries.
Fixes#16347.
We don't need the select_related('user_profile')
optimization any more, because we just keep
track of user info in our own data structures.
In this codepath we are never actually modifying
users; we just occasionally need their ids or
emails.
This can be a pretty substantive improvement if
you are adding a bunch of users to a stream
who each have a bunch of their own subscriptions.
We could also limit the number of full rows in this
query by adding an extra hop to the DB just to
get colors (using values_list), and then only get
full sub info for the streams that we're adding, rather
than getting every single subscription, in full, for each user.
Apart from finding what colors the user has already
used, the only other reason we need all the columns
in Subscription here is to handle streams that
need to be reactivated. Otherwise we could do
only("id", "active", "recipient_id", "user_profile_id")
or similar. Fortunately, Subscription isn't
an overly wide table; it's mostly bool fields.
But by far the biggest thing to avoid is bringing
in all the extra user_profile data.
We have pretty good coverage on query counts here,
so I think this fix is pretty low risk.
This class removes a lot of the annoying tuples
we were passing around.
Also, by including the user everywhere, which
is easily available to us when we make instances
of SubInfo, it sets the stage to remove
select_related('user_profile').
We used to send occupy/vacate events when
either the first person entered a stream
or the last person exited.
It appears that our two main apps have never
looked at these events. Instead, it's
generally the case that clients handle
events related to stream creation/deactivation
and subscribe/unsubscribe.
Note that we removed the apply_events code
related to these events. This doesn't affect
the webapp, because the webapp doesn't care
about the "streams" field in do_events_register.
There is a theoretical situation where a
third party client could be the victim of
a race where the "streams" data includes
a stream where the last subscriber has left.
I suspect in most of those situations it
will be harmless, or possibly even helpful
to the extent that they'll learn about
streams that are in a "quasi" state where
they're activated but not occupied.
We could try to patch apply_event to
detect when subscriptions get added
or removed. Or we could just make the
"streams" piece of do_events_register
not care about occupy/vacate semantics.
I favor the latter, since it might
actually be what users what, and it will
also simplify the code and improve
performance.
If a user asks to be subscribed to a stream
that they are already subscribed to, then
that stream won't be in new_stream_user_ids,
and we won't need to send an event for it.
This change makes that happen more automatically.
Let
U = number of users to subscribe
S = number of streams to subscribe
We were technically doing N^3 amount of work
when we sent certain events, or to be more
precise, U * S * S amount of work. For each
stream, we were looping through a list of tuples
of size U * S to find the users for the stream.
In practice either U or S is usually 1, so the
performance gains here are probably negligible,
especially since the constant factors here
were just slinging around Python data.
But the code is actually more readable now, so
it's a double win.
We rename needs_new_sub (which sounds like
a boolean!) to new_recipient_ids, and we
calculate it explicitly within the loop, so
that we don't need to worry as much about
subsequent passes through the loop mutating it.
This allows us to also remove recipient_ids,
which in turn lets us remove recipients_map,
albeit with a small tweak for stream_map.
I also introduce the my_subs local, which
I use to more directly populate used_colors,
as well as using it as the loop var.
I think it's important that the callers understand
that bulk_add_subscriptions assumes all streams
are being created within a single realm, so I make
it an explicit parameter.
This may be overkill--I would also be happy if we
just included the assertions from this commit.
This function now does all the work that we used
to do with notify_subscriptions_added happening
inside a loop.
There's a small fine-tuning here, where we only
get recent traffic on streams that we're actually
sending events for.
We now just pass in all_subscribers_by_stream, rather
than a callback.
We also move sub_tuples_by_user closer to the
loop where we call notify_subscriptions_added.