After a message was reset in our caches cache via message editing or
adding/removing a reaction, we were sending corrupt data to the cache
because build_message_dict (and thus build_dict_from_raw_db_row) was
improperly being called before sewing in the reaction data.
As a result, we were sending raw database data in the reaction
dictionaries, rather than the reformatted version expected by the API.
Bug introduced in 2a4c62a326.
Fixing this correctly required moving the rendering_realm_id logic one
step higher in the call chain, which is a useful refactoring anyway
(since we're no longer passing a `Message` object down)
During events such as stream / topic name edit for a topic, we were
running queries to db in loop for each message for reactions,
submessages and realm_id. This commit reduces the queries to be
done only for realm_id, which is yet to be fixed.
This is accomplished by building messages with empty reactions
and submessages and then updating them in the messages using bulk
queries.
Previously, alert words were a JSON list of strings stored in a
TextField on user_profile. That hacky model reflected the fact that
they were an early prototype feature.
This commit migrates from that to a separate table, 'AlertWord'. The
new AlertWord has user_profile, word, id and realm(denormalization so
we can provide a nice index for fetching all the alert words in a
realm).
This transition requires moving the logic for flushing the Alert Words
caches to their own independent feature.
Note that this commit should not be cherry-picked without the
following commit, which fixes case-sensitivity issues with Alert Words.
Previously, the message and event APIs represented the user differently
for the same reaction data. To make this more consistent, I added a
user_id field to the reaction dict for both messages and events. I
updated the front end to use the user_id field rather than the user
dict. Lastly, I updated front end and back end tests that used user
info.
I primarily tested this by running my local Zulip build and
adding/removing reactions from messages.
Fixes#12049.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
When more than one outgoing webhook is configured,
the message which is send to the webhook bot passes
through finalize_payload function multiple times,
which mutated the message dict in a way that many keys
were lost from the dict obj.
This commit fixes that problem by having
`finalize_payload` return a shallow copy of the
incoming dict, instead of mutating it. We still
mutate dicts inside of `post_process_dicts`, though,
for performance reasons.
This was slightly modified by @showell to fix the
`test_both_codepaths` test that was added concurrently
to this work. (I used a slightly verbose style in the
tests to emphasize the transformation from `wide_dict`
to `narrow_dict`.)
I also removed a deepcopy call inside
`get_client_payload`, since we now no longer mutate
in `finalize_payload`.
Finally, I added some comments here and there.
For testing, I mostly protect against the root
cause of the bug happening again, by adding a line
to make sure that `sender_realm_id` does not get
wiped out from the "wide" dictionary.
A better test would exercise the actual code that
exposed the bug here by sending a message to a bot
with two or more services attached to it. I will
do that in a future commit.
Fixes#14384
If I send a message from a normal Zulip client, it is
considered to be "read" by me. But if I send it via
an API program (using my human account), the message
is not immediately "read" by me.
Now we handle this correctly in `get_raw_unread_data`.
The symptom of this was that these messages would get
"stuck" in "Private Messages" narrows until the next
time you reloaded your app.
This change should prevent test flakes, plus
it's more deterministic behavior for clients,
who will generally comma-join the ids into
a key for their internal data structures.
I was able to verify test coverage on this
by making the sort reversed, which would
cause test_huddle_send_message_events to
fail.
Previously, get_recent_private_messages could take 100ms-1s to run,
contributing a substantial portion of the total runtime of `/`.
We fix this by taking advantage of the recent denormalization of
personal_recipient into the UserProfile model, allowing us to avoid
the complex join with Recipient that was previously required.
The change that requires additional commentary is the change to the
main, big SQL query:
1. We eliminate UserMessage table from the query, because the condition
m.recipient_id=%(my_recipient_id)d
implies m is a personal message to the user being processed - so joining
with usermessage to check for user_profile_id and flags&2048 (which
checks the message is private) is redundant.
2. We only need to join the Message table with UserProfile
(on sender_id) and get the sender's personal_recipient_id from their
UserProfile row.
Fixes#13437.
Previously, we were using user_profile.email rather than
user_profile.delivery_email in all calculations involving Gravatar
URLs, which meant that all organizations with the new
EMAIL_ADDRESS_VISIBILITY_ADMINS setting enabled had useless gravatars
not based on the `user15@host.domain` type fake email addresses we
generate for the API to refer to users.
The fix is to convert these calculations to use the user's
delivery_email. Some refactoring is required to ensure the data is
passed through to the parts of the codebase that do the check;
fortunately, our automated tests of schemas are effective in verifying
that the new `sender_delivery_email` field isn't visible to the API.
Fixes#13369.
Apparently, the refactor months ago that introduced finalize_payload
wasn't applied to the outgoing webhook code path, resulting in message
dicts with an unexpected format with no avatar_url and some extra
values that were intended to be internal details not relevant to
external clients.
Because this API is not widely used, we expect there to be little to
no impact of converting this back to matching the `get_messages`
interface, as it once was and has always been intended to be.
The one somewhat tricky detail is that we include both the `content`
and `rendered_content` fields, rather than asking the client to pick
which they want via the `apply_markdown` flag, because there is no
place for the client to configure that setting.
Priviously, we rendered the topic links using the msg.sender.realm.
This resulted in issues with Zulip's internal bots not having access
to the realm_filters of the destination stream's realm. For example,
sending a message via the email gateway or notification would not
linkify any realm filters that a user would expect them to.
Fixes#1727.
With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.
Previously, the unread_msgs data structure accounting (used for both
the web and mobile apps to determine the "Unread mentions" count
displayed in the UI) did not include wildcard mentions at all.
We fix this by adding the logic required to include properly that
data, with tests. As discussed in #6040, it makes sense to include
muted streams and topics for the purpose of this calculation.
Fixes part of #6040.
This was used as a helper to construct the final display_recipient when
fetching messages. With the new mechanism of constructing
display_recipient by fetching appropriate users/streams from the
database and cache, this shouldn't be needed anymore.
Instead of having the rather unclear type Union[str,
List[UserDisplayRecipient]] where display_recipient of message dicts was
involved, we use DisplayRecipientT (renamed from DisplayRecipientCacheT
- since there wasn't much reason to have the word Cache in there), which
makes it clearer what is the actual nature of the objects and gets rid
of this pretty big type declaration.
Since the display_recipients dictionaries corresponding to users are
always dictionaries with keys email, full_name, short_name, id,
is_mirror_dummy - instead of using the overly general Dict[str, Any]
type, we can define a UserDisplayRecipient type,
using an appropriate TypedDict.
The type definitions are moved from display_recipient.py to types.py, so
that they can be imported in models.py.
Appropriate type adjustments are made in various places in the code
where we operate on display_recipients.
The user information in display_recipient in cached message_dicts
becomes outdated if the information is changed in any way.
In particular, since we don't have a way to find all the message
objects that might contain PMs after an organization toggles the
setting to hide user email addresses from other users, we had a
situation where client might see inaccurate cached data from before
the transition for a period of up to hours.
We address this by using our generic_bulk_cached_fetch toolchain to
ensure we always are fetching display_recipient data from the database
(and/or a special recipient_id -> display_recipient cache, which we
can flush easily).
Fixes#12818.
The typing for generic_bulk_cached_fetch is complicated, and was
recorded incorrectly previously for the case where a cache_transformer
function is required. We fix this by adding the new CacheItemT, and
additionally add comments explaining what's going on with these types
for future reference.
Thanks to Mateusz Mandera for raising this issue.
Previous cleanups (mostly the removals of Python __future__ imports)
were done in a way that introduced leading newlines. Delete leading
newlines from all files, except static/assets/zulip-emoji/NOTICE,
which is a verbatim copy of the Apache 2.0 license.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This gives us access to typing_extensions.Deque, which was not added
to typing until 3.5.4.
(PROVISION_VERSION is not bumped because the transitive dependency set
in dev.txt hasn’t changed.)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This renames Subscription.in_home_view field to is_muted, for greater
clarity as to what it does just from seeing the setting name, without
having to look it up.
Also disabled an obsolete test_migrations test.
Fixes#10042.
This adds experimental support in /register for sending key
statistical data on the last 1000 private messages that the user is a
participant in. Because it's experimental, we require developers to
request it explicitly in production (we don't use these data yet in
the webapp, and it likely carries some perf cost).
We expect this to be extremely helpful in initializing the mobile app
user experience for showing recent private message conversations.
See the code comments, but this has been heavily optimized to be very
efficient and do all the filtering work at the database layer so that
we minimize network transit with the database.
Fixes#11944.
This commit leverages the ahocorasick algorithm to build a set of user_ids
that have their alert_words present in the message. It runs in linear time
of the order of length of the input message as opposed to number of
alert_words. This is after building a ahocorasick Automaton which runs
in O(number of alert_words in entire realm) which is usually cached.
We'll use this in the push-notifications code, in a context where
there should definitely already be UserMessage rows if everything's
gone normally... but explicitly checking at the top seems like the
right pattern from a secure-coding perspective.
For the import-data codepath, we will call
the extracted function directly in a
subsequent commit.
The do_render_markdown() function has more
required parameters, which allows for more
explicit code and also allows us to flatten
out some logic related to alert words. (We
just pass in empty sets/dicts as needed).
After the messages have been imported, set the rendered_content of the
messages instead of leaving its value to be 'None'.
This is important to ensure that:
(1) Performance for users is good after completing the import.
(2) The database's full-text indexes have all of the imported messages
(which only happens properly when Message rows have their
rendered_content field edited).
Fixes#9168.
This doesn't seem to have a huge performance downside (less than 1s
extra time for loading / on chat.zulip.org), and it means the
possibility of users having so many unreads that we get weird/buggy
behavior is much more unlikely to exist.
We'll still want a better experience for users who somehow go over
this limit, but it can be pretty firmly "you need to go mark some
things as read".
This should have no effect for now, but it'll make things a bit
simpler in case we make future changes to support public streams
without history public to subscribers (and other organization
members).
Generally emails are not written with markdown in mind and hence
sometimes render in strange ways. This commit fixes a particular
issue that was causing whitespace before paragraphs to be treated
as code block due to which email content was being rendered in a
box that scrolls in right direction a lot.
Fixes: #7045.
This refactoring doesn't change behavior, but it sets us up
to more easily handle a register setting for `client_gravatar`,
which will allow clients to tell us they're going to compute
their own gravatar URLs.
The `client_gravatar` flag already exists in our code, but it
is only used for Django views (users/messages) but not for
Zulip events.
The main change is to move the call to `set_sender_avatar` into
`finalize_payload`, which adds the boolean `client_gravatar`
parameter to that function. And then we update various callers
to supply that flag.
One small performance benefit of this change is that we now
lazily compute the client message payloads in
`event_queue.process_message_event` now, so this will improve
performance if all interested clients have the same value of
`apply_markdown`. But the change here is really preparing us
for the additional boolean parameter, which will cause us to
have four variations of the payload.
This adds the data model and bugdown support for the new UserGroup
mention feature.
Before it'll be fully operational, we'll still need:
* A backend API for making these.
* A UI for interacting with that API.
* Typeahead on the frontend.
* CSS to make them look pretty and see who's in them.
We now have a MentionData class that encapsulates
the users who are possibly mentioned in a message.
Not that the rendering code may not keep all the mentions,
since things like backticks will suppress the mention.
We populate this now in do_send_messages, so that we can use
the info earlier in the message-sending process. This info
now gets passed down the call stack as an optional parameter.
Note that bugdown.convert() still populates the data when its
callers decline to pass in a MentionData object.
This is mostly a preparatory commit, as we don't take advantage
of the data yet in do_send_messages.
In do_send_messages, we only produce one dictionary for
the event queues, instead of different flavors for text
vs. html. This prevents two unnecessary queries to the
database.
It also means we only put one dictionary on the "message"
event queue instead of two, albeit a wider one that has
some values that won't be sent to the actual clients.
This wider dictionary from MessageDict.wide_dict is also
used for the `feedback_messages` queue and service bot
queues. Since the extra fields are possibly useful down
the road, and they'll just be ignored for now, we don't
bother to remove them. Also, those queue processors won't
have access to `content_type`, which they shouldn't need.
Fixes#6947
Before this change, we populated two cache entries for each
message that we sent. The entries were largely redundant,
with the only difference being whether we sent the content
as raw markdown or as the rendered HTML.
This commit makes it so we only have one cache entry per
message, and it includes both content and rendered_content.
One legacy source on confusion here is that `content`
changes meaning when you're on the front end. Here is the
situation going forward:
database:
content = raw
rendered_contented = rendered
cache entry:
content = raw
rendered_contented = rendered
payload for the frontend:
content = raw (for apply_markdown=False)
content = rendered (for apply_markdown=True)
Clients fetching messages can now specify that they are able
to compute their avatar, and if they set client_gratavar to
True in the request (w/our normal encoding scheme), then the
backend will not compute it, and the payload will be smaller.
The fix starts with get_messages_backend. The flag gets
passed down through these functions:
* MessageDict.post_process_dicts.
* MessageDict.set_sender_avatar.
We also fix up the callers for post_process_dicts to explicitly
pass in the client_gravatar path, but for now they all just hard
code the value to False.
This change makes the cache entries smaller for message
dictionaries. It also ensures we get valid data put into
message dictionaries if, for example, the sender's avatar
changes.
After this change, all of the attributes for a message
sender are only fetched during post-processing with two
exceptions:
* We get sender_id for "free" from the message,
and it's the primary key that we need to figure
out which data to fetch in post-processing.
* We need sender_realm_id to be able to cache topic
links, and a sender's realm id will never change,
so it's not a concern for invalidating cache rows.
All the other attributes are either likely to change (e.g.
sender avatar_version) and/or impact the size of cache
entries more severely than the two small id fields above.
This change should improve our overall system performance
by reducing the amount of memory used by every N message
rows we cache, and typically N will be in the thousands or
so on a large realm.
The other major implication of this change is that when
a user changes their avatar, and then later messages that
the user sent are fetched, all of the fields that go into
computing the avatar url will be pulled from the database,
not from cache.
Message.get_raw_db_rows is moved to MessageDict, since its
implementation details are highly coupled to other methods
in MessageDict.
And then sew_messages_and_reactions comes along for the
ride.
We eventually want to move Reaction.get_raw_db_rows to there
as well.
We now populate the avatar url as part of the post
processing step of building message dictionaries,
so that the avatar url is no longer in cache.
This change makes the cache slimmer, because instead
of caching the avatar url (which often includes a long
hash), we just cache the smaller fields that are used
to compute the url.
Note that this commit still has the problem that we're
essentially computing the avatar url from cached fields
that can be invalid. We will address that a few commits
later.
An immediate benefit of this change is that how we compute
avatar urls (or whether we compute them all) is now decoupled
from caching concerns. We will address this later as
well. (Some clients will be capable of computing their
own gravatar urls, for example.)
We're about to have multiple post-processing stages for building
message dictionaries. Rather than having individual "hydration"
methods remove intermediate values, we just wait until the end.
This decouples the hyrdration steps. The potentional problem
here is that we may have a field like sender_is_mirror_dummy
that isn't part of the final payload, but we need it for
calculating display recipients and avatars. We don't want to
delete it too early from the objects.
The `is_mentioned` flag in message events was buggy. We now
look directly at flags.
We will kill off `is_mentioned` in a subsequent commit.
We also remove some debugging code in the test that was failing
before this fix. The test would only fail when `is_mentioned`
was wrong, which never happened when you ran a single test, and
which would happen randomly when you ran multiple tests.
This removes sender names from the message cache, since
they aren't guaranteed to be valid, and they're inexpensive
to add.
This commit will make the message cache entries smaller
by removing sender___full_name and sender__short_name
fields.
Then we add in the sender fields to the message payloads
by doing a query against the unique sender ids of the
messages we are processing.
This change leads to 2 extra database hops for most of
our message-related codepaths. The reason there are 2 hops
instead of 1 is that we basically re-calculate way too
much data to get a no-markdown dictionary.
Introduce MessageDict.post_process_dicts() will allow us
the ability to do the following:
* use less memory in the cache for repeated data
* prevent cache invalidation
* format data according to different client needs
The first use of this function is pretty inconsequential, but
it sets us up for more consequential changes.
In this commit we defer the MessageDict.hydrate_recipient_info
step until after we pull data out of the cache. This impacts
cache size as follows:
* streams - negligibly bigger
* PMs/huddles - slimmer due to not needing to repeat
sender data like email/full_name
Again, the main point of this change is to start setting up
the infrastructure to do post-processing.
This is a first step to eventually slimming the message cache,
but there are still some moving parts there to be worked through.
The more immediate benefit of extracting this function is that
we can put tests on it. Also, it isolates some functionality
that may go away as our clients gets smarter.
The logic to apply events to page_params['unread_msgs'] was
complicated due to the aggregated data structures that we pass
down to the client.
Now we defer the aggregation logic until after we apply the
events. This leads to some simplifications in that codepath,
as well as some performance enhancements.
The intermediate data structure has sets and dictionaries that
generally are keyed by message_id, so most message-related
updates are O(1) in nature.
Also, by waiting to compute the counts until the end, it's a
bit less messy to try to keep track of increments/decrements.
Instead, we just update the dictionaries and sets during the
event-apply phase.
This change also fixes some corner cases:
* We now respect mutes when updating counts.
* For message updates, instead of bluntly updating
the whole topic bucket, we update individual
message ids.
Unfortunately, this change doesn't seem to address the pesky
test that fails sporadically on Travis, related to mention
updates. It will change the symptom, slightly, though.
We now have two helper functions:
* get_raw_unread_data
* aggregate_unread_data
Separating the concerns is nice. The first function does
all the data collection. The second function should be fast,
and it only re-organizes the data into an aggregated form
that makes the page_params payload smaller and easier for
clients to work with.
For the first function, we try to return data structures
that are easier to manipulate than the end result. This
will allow us to apply events more easily, in a subsequent
commit.
There is no reason for either render_incoming_message() or
render_markdown() to require full UserProfile objects just to
triage alert words.
By only asking for user_ids, we save extra queries in two
callpaths and we make it easier to start using user_ids in
do_send_messages().
This never made sense to be a flag on the UserMessage table, since
it's not per-user state. And in fact it doesn't need to be in a
database at all, since it's easily computed from content anyway.
Fixes#1099.