This makes tests of queue processors more realistic,
by adding a parameter to `queue_json_publish` that
calls a queue's consumer function if accessed in a test.
Fixes part of #6542.
Nobody has used this feature in years, and it causes certain types of
markdown issues in development to completely DoS the development
environment by making it possible for the "Bugdown timeout" exception
handler to timeout in bugdown.
Since we already send an email to the server administrators, there's
no need to replace this feature with anything.
This function is designed to replace avatar_url() and
avatar_url_from_dict() over time.
There are a few things new about it:
* We make the parameters more explicit, rather than
passing in an opaque dictionary or requiring a
UserProfile object. (A lot of our callers want
to use `values()` for efficiency sake, since we
are often doing bulk user operations.)
* We start to support the client_gravatar option.
The `is_mentioned` flag in message events was buggy. We now
look directly at flags.
We will kill off `is_mentioned` in a subsequent commit.
We also remove some debugging code in the test that was failing
before this fix. The test would only fail when `is_mentioned`
was wrong, which never happened when you ran a single test, and
which would happen randomly when you ran multiple tests.
Add this field to the Stream model will prevent us from having
to look at realm data for several types of stream operations, which
can be prone to either doing extra database lookups or making
our cached data bloated.
Going forward, we'll set stream.is_zephyr to True whenever the
realm's string id is "zephyr".
This removes sender names from the message cache, since
they aren't guaranteed to be valid, and they're inexpensive
to add.
This commit will make the message cache entries smaller
by removing sender___full_name and sender__short_name
fields.
Then we add in the sender fields to the message payloads
by doing a query against the unique sender ids of the
messages we are processing.
This change leads to 2 extra database hops for most of
our message-related codepaths. The reason there are 2 hops
instead of 1 is that we basically re-calculate way too
much data to get a no-markdown dictionary.
Introduce MessageDict.post_process_dicts() will allow us
the ability to do the following:
* use less memory in the cache for repeated data
* prevent cache invalidation
* format data according to different client needs
The first use of this function is pretty inconsequential, but
it sets us up for more consequential changes.
In this commit we defer the MessageDict.hydrate_recipient_info
step until after we pull data out of the cache. This impacts
cache size as follows:
* streams - negligibly bigger
* PMs/huddles - slimmer due to not needing to repeat
sender data like email/full_name
Again, the main point of this change is to start setting up
the infrastructure to do post-processing.
This is a first step to eventually slimming the message cache,
but there are still some moving parts there to be worked through.
The more immediate benefit of extracting this function is that
we can put tests on it. Also, it isolates some functionality
that may go away as our clients gets smarter.
Since subscribed_to_stream is only doing an id lookup
on the Stream model to find out if a user is subscribed to
a stream, there's no reason to require a full Stream object.
It's currently the case that all callers do have full Stream
objects handy to pass in to this function, but it's still a
good practice to have functions only ask for objects that they
need.
We now return user_ids for subscribers to streams in add-stream
events. This allows us to eliminate the UserLite class for
both bulk adds and bulk removes. It also simplifies some JS
code that already wanted to use user_ids, not emails.
Fixes#6898
This function truncates the textual content at correct length.
(It will be updated later to handle corner cases of unicode
combining characters and tags when we start supporting them.)
Using lightweight objects will speed up adding new users
to realms.
We also sort the query results, which lets us itertools.groupby
to more efficiently build the data structure.
Profiling on a large data set shows about a 25x speedup for this
function, and before the optimization, this function accounts
for most of the time spend in bulk_add_subscriptions.
There's a lot less memory to allocate. I didn't measure
the memory difference.
When we test-deployed this to chat.zulip.org, we got about a 6x
speedup.
Sort of a hacky hammer, but
* The original design of the analytics system mistakenly attempted to play
nicely with non-UTC datetimes.
* Timezone errors are really hard to find and debug, and don't jump out that
easily when reading code.
I don't know of any outstanding errors, but putting a few "assert this
timezone is in UTC" around will hopefully reduce the chance that there are
any current or future timezone errors.
Note that none of these functions are called outside of the analytics code
(and tests). This commit also doesn't change any current behavior, assuming
a database where all datetimes have been being stored in UTC.
Previously, entering a non-UTC end time for a daily stat would give you
incorrect results. This is because:
* All daily stats are collected at and have end_times in the database in
midnight UTC.
* For daily stats, time_range returns a list of datetimes at midnight in the
timezone of its end argument. These datetimes are the only ones we look
for when looking for rows corresponding to the stat in the database.
* Previously, we passed on the end argument from the API to time_range,
without modification.
The logic to apply events to page_params['unread_msgs'] was
complicated due to the aggregated data structures that we pass
down to the client.
Now we defer the aggregation logic until after we apply the
events. This leads to some simplifications in that codepath,
as well as some performance enhancements.
The intermediate data structure has sets and dictionaries that
generally are keyed by message_id, so most message-related
updates are O(1) in nature.
Also, by waiting to compute the counts until the end, it's a
bit less messy to try to keep track of increments/decrements.
Instead, we just update the dictionaries and sets during the
event-apply phase.
This change also fixes some corner cases:
* We now respect mutes when updating counts.
* For message updates, instead of bluntly updating
the whole topic bucket, we update individual
message ids.
Unfortunately, this change doesn't seem to address the pesky
test that fails sporadically on Travis, related to mention
updates. It will change the symptom, slightly, though.
We now have two helper functions:
* get_raw_unread_data
* aggregate_unread_data
Separating the concerns is nice. The first function does
all the data collection. The second function should be fast,
and it only re-organizes the data into an aggregated form
that makes the page_params payload smaller and easier for
clients to work with.
For the first function, we try to return data structures
that are easier to manipulate than the end result. This
will allow us to apply events more easily, in a subsequent
commit.
Instead of using `unified_reactions` mapping start using
`name_to_codepoint` mapping for converting emoji name to
codepoints. We were using `unified_reactions` mapping
because prior to emoji web PR `name_to_codepoint` mapping
was generated using emoji_map.json which contained old
codepoints but for reactions new codepoints were required
to display them using sprite sheets.