Add this field to the Stream model will prevent us from having
to look at realm data for several types of stream operations, which
can be prone to either doing extra database lookups or making
our cached data bloated.
Going forward, we'll set stream.is_zephyr to True whenever the
realm's string id is "zephyr".
Since subscribed_to_stream is only doing an id lookup
on the Stream model to find out if a user is subscribed to
a stream, there's no reason to require a full Stream object.
It's currently the case that all callers do have full Stream
objects handy to pass in to this function, but it's still a
good practice to have functions only ask for objects that they
need.
We now return user_ids for subscribers to streams in add-stream
events. This allows us to eliminate the UserLite class for
both bulk adds and bulk removes. It also simplifies some JS
code that already wanted to use user_ids, not emails.
Fixes#6898
Using lightweight objects will speed up adding new users
to realms.
We also sort the query results, which lets us itertools.groupby
to more efficiently build the data structure.
Profiling on a large data set shows about a 25x speedup for this
function, and before the optimization, this function accounts
for most of the time spend in bulk_add_subscriptions.
There's a lot less memory to allocate. I didn't measure
the memory difference.
When we test-deployed this to chat.zulip.org, we got about a 6x
speedup.
We now do push notifications and missed message emails
for offline users who are subscribed to the stream for
a message that has been edited, but we short circuit
the offline-notification logic for any user who presumably
would have already received a notification on the original
message.
This effectively boils down to sending notifications to newly
mentioned users. The motivating use case here is that you
forget to mention somebody in a message, and then you edit
the message to mention the person. If they are offline, they
will now get pushed notifications and missed message emails,
with some minor caveats.
We try to mostly use the same techniques here as the
send-message code path, and we share common code with the
send-message path once we get to the Tornado layer and call
maybe_enqueue_notifications.
The major places where we differ are in a function called
maybe_enqueue_notifications_for_message_update, and the top
of that function short circuits a bunch of cases where we
can mostly assume that the original message had an offline
notification.
We can expect a couple changes in the future:
* Requirements may change here, and it might make sense
to send offline notifications on the update side even
in circumstances where the original message had a
notification.
* We may track more notifications in a DB model, which
may simplify our short-circuit logic.
In the view/action layer, we already had two separate codepaths
for send-message and update-message, but this mostly echoes
what the send-message path does in terms of collecting data
about recipients.
Postgres doesn't like them, we don't have an obvious way to escape
them, and they tend to be sent by buggy tools where it'd be better for
the user to get an error.
This fixes a 500 we were getting occasionally.
We have two different concepts of "idle", and this function
is based on the "presence" aspect of idleness. There is also
idleness in terms of a user having no current client
descriptors accepting messages, and we check that later in
the process for things like sending missed message emails.
check_send_stream_message is a simpler version of
check_send_message for sending messages where the addressee is
a stream. Instead of relying on Addressee.legacy_build,
check_send_stream_message uses Addressee.for_stream. Consequently,
it eschews many of check_send_message's kwargs that aren't needed
when the intended recipient of a message is a stream.
Having Addressee take care of setting stream_name to
sender.default_sending_stream.name makes us able to have
the invariant that stream_name is never None when the
message type is 'stream', which will help for mypy, among
other things.
One thing to be aware of is that Addressee does do a little
bit of validation work, and this adds yet another JsonableError
exception. I don't view this as a bad thing, just something to
know.
The dictionary result for get_user_info_for_message_updates()
now has a `mention_user_ids` field that is a set of user ids
who were mentioned in a message.
There are several reasons to extract this function:
* It's easy to unit test without extensive mocking.
* It will show up when we profile code.
* It is something that you can mostly ignore for
most messages.
The main reason to extract this, though, is that we are about
to do some fairly complex splicing of data for the use case
of mentioning service bots on streams they are not subscribed to,
and we want to localize the complexity.
This fixes a bug where the internal_prep_message code path would
incorrectly ignore the `realm` that was passed into it. As a result,
attempts to send messages using the system bots with this code path
would crash.
As a sidenote, we really need to make our test system consistent with
production in terms of whether the user's realm is the same as the
system realm.
We don't access any attributes of the sender other than the realm, and
as it turns out, we in some cases want to use a different realm than
the sender's.
Previously, invitation reminder emails were only being cleared after a
successful signup if newsletter_data was available, since that was the
circumstance in which we were calling the relevant queue processor
code. Now, we (1) clear them when a human user finishes signing up
and (2) correctly clear them using the 'address' field of
ScheduleEmail, not user_id.
We don't need full Realm objects to find DefaultStream
objects for a realm. So now a few functions related to
adding/removing default streams use realm_id for lookups.
Similarly, we don't need a full Stream object to find
out if a stream exists in DefaultStream, so we do id
lookups there as well.
This sets us up to use thinner objects in callers.
We now have a dedicated cache for active_user_ids() that only
stores a list of user_ids.
Before this commit, active_user_ids() used a cache of UserProfile
dictionaries, so it incurred unnecessary deserialization costs for
all the user fields that it sliced away in a list comprehension.
Because the cache is skinnier here, we also need to invalidate it
less frequently. Basically, all we care about is new users, realm
deactivations, and user deactivations.
It's hard to measure how much this will improve performance, because
the speedup for any operation here is pretty minor, but we use this
function a lot, so hopefully it will make the overall system more
healthy.
This is mostly a preparatory commit for an upcoming optimization
related to stream data, but it probably does save us an
occasional DB hop to the realm table.
This leads to more than a 2x speedup when tested with
20k+ total subscribers. (For large realms with lots of default
streams, this function deals with LOTS of data, so it is important
to optimize.)
This class encapsulates the mapping of stream ids to
recipient ids, and it is optimized for bulk use and
repeated use (i.e. it remembers values it already fetched).
This particular commit barely improves the performance
of gather_subscriptions_helper, but it sets us up for
further optimizations.
Long term, we may try to denormalize stream_id on to the
Subscriber table or otherwise modify the database so we
don't have to jump through hoops to do this kind of mapping.
This commit will help enable those changes, because we
isolate the mapping to this one new class.
We were mostly excluding inactive users before this fix, but
now we completely ignore them.
This potentially changes some of the data we return from
get_recipient_info(), but the extra user ids before this fix
were effectively ignored by the caller.
The prior code would queue up feedback messages even if the
feedback bot was deactivated, which was just due to oversight
most likely. (People probably rarely disable the feedback bot,
but they should have that option.)
This sets us up a subsequent commit where we need more data
from the Subscription table to build recipient info, so the
function boundary doesn't work any more for get_recipient_info,
which is part of the heavily optimized send-message
path.
We used to share code here with typing notifications, but
typing notifications need a lot less data than the
send-message path, so it's useful to decouple these two
things. The idioms that are duplicated here are pretty simple
one-liners.
This commit makes get_recipient_info() faster by never creating
Django ORM objects. We use the ORM to create a values query
instead, and then we iterate over the rows to create various
collections of ids.
In order to avoid lots of code duplication, this commit unifies
how we query UserProfile for PMs and streams. Prior to this
commit we were getting "wide" UserProfile objects out of
our memcached cache. Now we just go to the database with our
list of userids. The new approach at worst adds one hop to the
database for PMs, which aren't really a performance bottleneck
(compared to streams). And the new approach actually saves a
hop when both partners aren't in cache (plus we don't pay the
penalty of hitting the cache itself).
The performance improvement here is easy to measure for messages
to streams with many users, even with all the other activity
that goes on inside do_send_messages(). I took test_performance()
in test_messages.py, set num_extra_users to 3000, and consistently
measured a ~20% speedup in do_send_messages().
This commit also eliminates fetching of emails. We probably
could have done that in a prior commit, but in this commit it
is very explicit that we don't need it. While removing email
from the query is a no-brainer, it actually had a negigible
impact on performance. Almost all the savings here comes from
not create UserProfile objects.
This function returns a summary of recipient data for a message
that's being sent. It's mostly just moving code into the
old function called get_recipient_user_profiles().
This commit is necessary to prevent bringing back emails from the
DB for all N recipients of a message just to see if the feedback
bot is being invoked.
We calculate `service_bot_tuples` earlier in the function, so that
we don't need "full" UserProfile objects later in the function.
This is part of consolidating code that basically just needs to
triage user_ids.
This starts to phase out the need for UserProfile objects in
do_send_messages(). UserProfile objects are expensive to create
for large streams with lots of users. The objects in the code
before this commit aren't even full UserProfile objects.
This change mostly sets up future performance improvements, but
we also get a minor speedup here when we run a test with 3000
stream subscribers.
There is no reason for either render_incoming_message() or
render_markdown() to require full UserProfile objects just to
triage alert words.
By only asking for user_ids, we save extra queries in two
callpaths and we make it easier to start using user_ids in
do_send_messages().
This function is essentially a copy of get_recipient_user_profiles,
which is about to go away. The new function enforces the contract of
typing indicators, which is that they don't apply to streams, which
allows us to use a relatively simple approach for getting user
profile objects.
We are diverging this code, because the send-message path needs
more optimizations.
This change introduces an extra hop to the database, but it is
generally faster due to nuances of the DB and the ORM. It
also sets us up to optimize get_recipient_user_profiles() by
avoiding creating ORM objects.
I measured the impact of this using a stream with 3000
subscribers, half of whom were idle, and it speeds things up
by 10%.
Usually a small minority of users are eligible to receive missed
message emails or mobile notifications.
We now filter users first before hitting UserPresence to find idle
users. We also simply check for the existence of recent activity
rather than borrowing the more complicated data structures that we
use for the buddy list.
This commit completely switches us over to using a
dedicated model called MutedTopic to track which topics
a user has muted.
This includes the necessary migrations to create the
table and populate it from legacy data in UserProfile.
A subsequent commit will actually remove the old field
in UserProfile.
Admins need to know about private streams to delete them, even
if they are not subscribed. We send the minimal info possible
to the client to allow them to have a UI for that.
This never made sense to be a flag on the UserMessage table, since
it's not per-user state. And in fact it doesn't need to be in a
database at all, since it's easily computed from content anyway.
Fixes#1099.
This is mostly pure code extraction.
It also removes some dead code in update_muted_topic, where
were updating muted_topics spuriously before calling
do_update_muted_topic.
Unlike creating a stream, there's really no reason one would want to
call the function to create a realm while uncertain whether that realm
already existed.
This change is mostly based on a similar commit from hackerkid
in a feature branch. It borrows both code and ideas. Some of
it's my own stuff, as I was working on a newer branch.
We now call get_user_including_cross_realm_email() inside of
user_profiles_from_unvalidated_emails(), instead of using
get_user_profile_by_email.
This requires a few of our callers to pass down sender into us.
One consequence of this change is that we change the symptoms
for trying to send to emails outside of your realm. In some
cases, we simply raise an error that an email is invalid to us
instead of getting into the deeper validate_recipient_user_profiles
check.
We are trying to convert emails to user_profiles earlier in
the codepath. This may cause subtle changes in which errors
appear, but it's probably generally good to report on bad
addressees sooner than later.
This class simplifies the calling sequence to methods like
check_message and _internal_prep_message, and it's also more
type safe.
Checking for message types is encapsulated with calls to is_stream()
and is_private(). There are also shortcut constructors when you
know that the type of the address (stream vs. private), which is often.
This function optimizes marking streams and topics as read,
by using UserMessage.where_unread(), which uses a partial
index on the "read" flag.
This also simplifies the code path for ordinary message
flag updates.
In order to keep 100% line coverage, I simplified the
logging in update_message_flags, so now all requests
will show the "actually" format.
This is an interim step toward creating dedicated endpoints
for marking streams/topics as reads, so we do error checking
with asserts for flag/operation, so we don't introduce a
temporary translation string.
This is the first part of a larger migration to convert Zulip's
reactions storage to something based on the codepoint, not the emoji
name that the user typed in, so that we don't need to worry about
changes in the names we're using breaking the emoji storage.
We apparently were not correctly clearing the user_profile's email
address from caches when changing email addresses, which meant that
trying to look up the old email in the user_profile caches would still
work.
Fixes#6035.
The "all" option for 'message/flags' was dangerous, as it could
apply to any of our flags. The only flag it made sense for, the
"read" flag, now has a dedicated endpoint.
This change simplifies how we mark all messages as read. It also
speeds up the backend by taking advantage of our partial index
for unread messages. We also use a new statsd indicator.
We are adding a new list of unread message ids grouped by
conversation to the queue registration result. This will allow
clients to show accurate unread badges without needing to load an
unbound number of historic messages.
Jason started this commit, and then Steve Howell finished it.
We only identify conversations using stream_id/user_id info;
we may need a subsequent version that includes things like
stream names and user emails/names for API clients that don't
have data structures to map ids -> attributes.
In anticipation of have all unread message ids available to the
web app in page_params (via a separate effort), we are simplifying
the /topics endpoint to no longer return unread counts.
Instead we have a list of tiny dictionaries with these fields:
name - name of the topic
max_id - max message id for the topic (aka most recent)
The items in the list are order by most-recent-topic-first.
In most cases, we do have the data for which other user was
responsible for subscribing the target user to new streams.
The main case where we don't is when the user is created and gets the
default streams.
ScheduledJob was written for much more generality than it ended up being
used for. Currently it is used by send_future_email, and nothing
else. Tailoring the model to emails in particular will make it easier to do
things like selectively clear emails when people unsubscribe from particular
email types, or seamlessly handle using the same email on multiple realms.
Both the queue processor and ScheduledJob emails need to sometimes pass a
to_user_id and sometimes pass a to_email, and it's more convenient to just
have one function that they can call that can handle either.
Also removes the now redundant send_email_to_user.
This new setting controls whether or not users are allowed to see the
edit history in a Zulip organization. It controls access through 2
key mechanisms:
* For long-ago edited messages, get_messages removes the edit history
content from messages it sends to clients.
* For newly edited messages, clients are responsible for checking the
setting and not saving the edit history data. Since the webapp was
the only client displaying it before this change, this just required
some changes in message_events.js.
Significantly modified by tabbott to fix some logic bugs and add a
test.
I pushed a bunch of commits that attempted to introduce
the concept of `client_message_id` into our server, as
part of cleaning up our codepaths related to messages you
sent (both for the locally echoed case and for the host
case).
When we deployed this, we had some strange failures involving
double-echoed messages and issues advancing the pointer that appeared
related to #5779. We didn't get to the bottom of exactly why the PR
caused havoc, but I decided there was a cleaner approach, anyway.
We are deprecating local_id/local_message_id on the Python server.
Instead of the server knowing about the client's implementation of
local id, with the message id = 9999.01 scheme, we just send the
server an opaque id to send back to us.
This commit changes the name from local_id -> client_message_id,
but it doesn't change the actual values passed yet.
The goal for client_key in future commits will be to:
* Have it for all messages, not just locally rendered messages
* Not have it overlap with server-side message ids.
The history behind local_id having numbers like 9999.01 is that
they are actually interim message ids and the numerical value is
used for rendering the message list when we do client-side rendering.
This system hasn't been in active use for several years, and had some
problems with it's design. So it makes sense to just remove it to declutter
the codebase.
Fixes#5655.
This prevents users from accidentally sending a confirmation link
specific to their account to their Zulip administrator if they reply
to the invitation, invitation reminder, account confirmation, or new
email confirmation emails.
Instead of removing an emoji from the database, just mark them as
deactivated so that they can't be used further but can be rendered
properly in reactions and messages.
Fixes: #4750.
No change in behavior.
Also makes the first step towards converting all uses of
settings.ZULIP_ADMINISTRATOR and settings.NOREPLY_EMAIL_ADDRESS to
FromAddress.*.
Once everything is converted, it will be easier to ensure that future
development doesn't break backwards compatibility with the old style of
settings emails.
This will allow for customized senders for emails, e.g. 'Zulip Digest' for
digest emails and 'Zulip Missed Messages' for missed message emails.
Also:
* Converts the sender name to always be "Zulip", if the from_email used to
be settings.NOREPLY_EMAIL_ADDRESS or settings.ZULIP_ADMINISTRATOR.
* Changes the default value of settings.NOREPLY_EMAIL_ADDRESS in the
prod_setting_template to no longer have a display name. The only use of
that display name was in the email pathway.
I think it makes sense to wrest the email sending from confirmation, now
that we have a clean email-sending interface in send_email. A few other
reasons:
* send_confirmation is get_link_for_object followed by send_email, but those
two functions have no arguments in common.
* Sending email through confirmation obfuscates the context dict, and is a
relatively complicated piece of the codebase anyone trying to deal with
the email system has to understand.
* The three emails previously being sent through confirmation don't have
that much in common, other than that they happen to have a confirmation
link in them.
The .split('/')[-1] in registration.py is a hack, but a hack used several
places in the codebase, so maybe one day get_link_for_object will also
return the confirmation_key.
Server settings should just be added to the context in build_email, so that
the individual email pathways (and later, the email testing framework)
doesn't have to worry about it.
Realm.notifications_stream is not a boolean, Text or integer field, and
thus doesn't fit into the do_set_realm_property framework. Added function
to update it in actions.py. Altered the view, realm.py, to accept
stream-id. Also, notifications stream can be disabled by sending a
negative id.
When the last user on a private stream is removed, the stream is no
longer possible to administer, and thus should be marked as
deactivated, so that default streams entries are removed and it no
longer appears in the UI as a non-administerable broken stream.
If you deactivated a default stream, we would correctly remove it from
the list of default streams in the organization. However, we did not
call `send_event`, so browsers would still display it as a default
stream until the next reload.
This fixes that issue by calling do_remove_default_stream instead of
doing the database query directly.
This makes it possible for Zulip administrators to delete messages.
This is primarily intended for use in deleting early test messages,
but it can solve other problems as well.
Later we'll want to play with the permissions model for this, but for
now, the goal is just to integrate the feature.
Note that it saves the deleted messages for some time using the same
approach as Zulip's message retention policy feature.
Fixes#135.
Previously, all notification preference setting had a dedicated test
and setter. Now, all are handled through a modular function using the
property_types framework.
We now pre-populate the streams in DEFAULT_NEW_REALM_STREAMS
(social/general/zulip, unless somebody changes settings.py) with
welcome messages. This makes the streams appear to be active
right away, and it also gives the Zulip realm less of a
blank-slate feeling when you create it.
This change only affects the normal web-based create-realm flow.
It doesn't impact the management commands for creating realms
or setting default streams.
This makes the new user experience in an active community like
chat.zulip.org substantially nicer, since the new user will have the
same level of initial messages to populate topics (etc.) as an
existing user who is caught up.
Without this, there was an undue level of fading-for-inactivity in the
default streams.
Relies on the fact that all the email template names now follow the same
pattern.
Note that there was some template_prefix-like computation being done in
send_confirmation (conditioned on obj.realm.is_zephyr_mirror_realm); that
computation is now being done in the callers.
This increase in the list of needed fields carries a small performance
cost, but it should be very small, and this change is needed to
support outgoing webhooks without additional database queries.
Also puts them into a processing queue, though the queue processor
does nothing.
Rewritten by tabbott to avoid unnecessary database queries in
do_send_messages.
This is a better solution to the problem of how _pg_re_escape should
handle the null character. There's really no good reason to have a
null character in a stream name.
The new function takes a full UserProfile object for the sender,
which allows us to avoid O(N) calls when creating the stream to
find the user profile of the notification bot. (The calls were
already cached, so this won't necessarily be a huge performance
win.)
We also don't have to worry about sending a blank subject any more.
The new, more direct interface for prepping internal stream
messages circumvents the bug-prone extract_recipients() method,
which has the pitfall that it will try to parse a stream name
as JSON. It also takes a UserProfile object for the sender, so
it's a bit more type-safe.
- Add file_name field to `RealmEmoji` model and migration.
- Add emoji upload supporting to Upload backends.
- Add uploaded file processing to emoji views.
- Use emoji source url as based for display url.
- Change emoji form for image uploading.
- Fix back-end tests.
- Fix front-end tests.
- Add tests for emoji uploading.
Fixes#1134
This moves the avatar_ fields in page_params to come from
register_ret. Unlike many fields, changing this had a bit of
complexity, because the avatar update events didn't actually contain
some of the details required for moving these into register_ret to
work correctly without races.
We fix that as part of this change.
Modified significantly by tabbott.
The function internal_prep_message is kind of awkward to
call, so I'm moving most of its implementation to
_internal_prep_message() for upcoming refactorings.
- Add aggregated info to real-time updated presence status.
- Update `presence events` test case with adding aggregated
information to presence event.
- Add test case for updating presence status for user which
send state from multiple clients.
Fixes#4282.
The previous logic was that anyone with a link to a file could send it
to other users, but only the owner could make a file realm-public.
This had some confusing corner cases.
The new logic is much simpler:
* Only the file's owner/uploader can include a file in a message for
the first time.
* Anyone with access to read a file can share it with others by
including it in messages they send.
* Once a file has been sent to a public stream, any user in the realm
can access it.
This replaces individual tests for realm properties with a generic
do_test_realm_update_api function to test each property in the
Realm.property_types attribute.
Addresses part of #3854.
Users editing messages or updating message flags are either already
recorded or not interesting from an audit perspective, and so there's
no need to use log_event with them.
This commit adds the backend support for a new style of tutorial which
allows for highlighting of multiple areas of the page with hotspots that
disappear when clicked by the user.
Modify `bot_owner_user_ids()` to return the user_ids of only
admins and bot owners instead of all the current active users.
This was causing a traceback on the frontend.
Fixes: #3391.
- Add message retention period field to organization settings form.
- Add css for retention period field.
- Add convertor to not negative int or to None.
- Add retention period setting processing to back-end.
- Fix tests.
Modified by tabbott to hide the setting, since it doesn't work yet.
The goal of merging this setting code now is to avoid unnecessary
merge conflicts in the future.
Part of #106.
This fixes 2 issues:
* Being added to an invite_only stream did not correctly update the
"streams" key of the initial state.
* Once that's resolved, subscribe_to_stream when called on a
nonexistant stream would both send a "create" event (from
create_stream_if_needed) and an "occupy" event (from
bulk_add_subscriptions).
The second event should just be suppressed in that case, and this
implements that suppression.
This makes it possible for us to do some convenient validation for
developers, checking whether the correct types are passed for each
each realm property.
zerver/lib/actions: removed do_set_realm_* functions and added
do_set_realm_property, which takes in a realm object and the name and
value of an attribute to update on that realm.
zerver/tests/test_events.py: refactored realm tests with
do_set_realm_property.
Kept the do_set_realm_authentication_methods and
do_set_realm_message_editing functions because their function
signatures are different.
Addresses part of issue #3854.
This makes get_stream match get_realm, get_user_profile_by_email,
etc., in interface, and is more convenient for mypy annotations
because `get_stream` now doesn't return an Optional[Stream].
We use the same strategy Zulip already uses for starred messages,
namely, creating a new UserMessage row with the "historical" flag set
(which basically means Zulip can ignore this row for most purposes
that use UserMessage rows). The historical flag is ignored, however,
in determining which users' browsers to notify about new reactions,
and thus the user will get to see the reaction appear when they click
a message (and any reactions other users later add, as well!).
There's still something of a race here, in that if some users react to
a message while the user is looking at the unsubscribed stream but
before the user reacts to that message, those reactions will not be
displayed to that user (so counts will be a bit lower, or something).
This race feels small enough to ignore for now.
Fixes#3345.
This adds an organization description field to the Realm model, as well as
an input field to the organization settings template. Added three tests.
Set the max length of the field to 100 characters.
Fixes#3962.
validate_user_access_to_subscribers_helper never uses
stream_dict['realm__domain']. I imagine it was there originally to do the
is_zephyr_mirror_realm check.
Previously we used the topic "Realm.domain" for new user signups, but topic
"Realm.string_id" for the realm creation. This changes the user signup
messages to be on the same topic thread as the realm creation.
All current calls to do_activate_user just use the default value of
timezone.now(). Having a date_joined other than timezone.now() raises an
interesting RealmAuditLog question (namely, which time should be used),
which we don't have to answer if we remove the argument.