This implements a significant performance optimization for users
clicking the `Private messages` narrow in the Zulip UI, especially for
those users who do not have 50 recent private messages in an
organization with a lot of stream message traffic (because then
previously, postgres needed to scan through a huge amount of history
to find enough private messages).
The database index powering it can also support many other queries we
might want to do in the future to support "recent conversations" type
features.
Fixes#6896.
The use_first_unread_anchor parameter allows automatically setting the
anchor to the first message that hasn't been read in this narrow.
Therefore it isn't necessary to specify an anchor when this parameter is
enabled.
Note from Tim: Arguably, we should think about making
`use_first_unread_anchor` the default behavior when anchor is
unspecified, but that's for later consideration.
The only changes visible at the AST level, checked using
https://github.com/asottile/astpretty, are
zerver/lib/test_fixtures.py:
'\x1b\\[(1|0)m' ↦ '\\x1b\\[(1|0)m'
'\\[[X| ]\\] (\\d+_.+)\n' ↦ '\\[[X| ]\\] (\\d+_.+)\\n'
which is fine because re treats '\\x1b' and '\\n' the same way as
'\x1b' and '\n'.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This commit adds a new field history_public_to_subscribers to the
Stream model, which serves a similar function to the old
settings.PRIVATE_STREAM_HISTORY_FOR_SUBSCRIBERS; we still use that
setting as the default value for new streams to avoid breaking
backwards-compatibility for those users before we are ready with an
actual UI for users to choose directly.
This also comes with a migration to set the value of the new field for
existing streams with an algorithm matching that used at runtime.
With significant changes by Tim Abbott.
This is an initial part of our efforts on #9232.
For certain queries where both include_history and
use_first_unread_anchor are set to True, we were excluding
historical rows. Now we only use the use_first_unread_anchor
flag to filter rows that we use to find the anchor, without
having it filter the actual search results.
The bug went unreported for a long time, because it only
affected mobile users who had newly subscribed to streams.
Note that we make a small change to the test called
test_use_first_unread_anchor_with_muted_topics, which has
a very scary comment about being "arcane" and "be
absolutely sure you know what you're doing." I think it's
fine.
Also, the new test code would fail before this fix, so it
should help prevent future regressions.
Fixes#8958
This is a bit more than a pure refactor, because we duplicate a
chunk of code to calculate a query inside of
find_first_unread_anchor(), so we're doing a bit more work
than before.
We need this refactoring to start decoupling find_first_unread_anchor
from get_messages_backend for the case where include_history is
True. This will happen in a subsequent commit.
The only test that changes here is a direct test on
find_first_unread_anchor(). All other tests pass without
modification, and we have decent coverage on get_messages_backend.
We don't indend for this server-level setting to exist in the long
term; the purpose of this is just to make it easy to test this code
path for development purposes.
This implements much of the Message side part of #2745.
We now consistently set our query limits so that we get at
least `num_after` rows such that id > anchor. (Obviously, the
caveat is that if there aren't enough rows that fulfill the
query, we'll return the full set of rows, but that may be less
than `num_after`.) Likewise for `num_before`.
Before this change, we would sometimes return one too few rows
for narrow queries.
Now, we're still a bit broken, but in a more consistent way. If
we have a query that does not match the anchor row (which could
be true even for a non-narrow query), but which does match lots
of rows after the anchor, we'll return `num_after + 1` rows
on the right hand side, whether or not the query has narrow
parameters.
The off-by-one semantics here have probably been moot all along,
since our windows are approximate to begin with. If we set
num_after to 100, its just a rough performance optimization to
begin with, so it doesn't matter whether we return 99 or 101 rows,
as long as we set the anchor correctly on the subsequent query.
We will make the results more rigorous in a follow up commit.
If anchor is 0, there is no sense doing a before_query.
Likewise, if anchor is `LARGER_THAN_MAX_MESSAGE_ID`, there is
no sense doing an after_query.
We introduce variables called `need_before_query` and
`need_after_query` to enforce those conditions.
This also adds some comments explaining the fallthrough case
where neither query makes sense.
If use_first_unread_anchor is set and we don't have any unread
messages, then our anchor is effectively "positive infinity" and
we can streamline queries.
In the past we'd have clauses like `message_id <= 999999999999999`
in the query that were harmless but crufty.
This field has been unused by clients for some time, and isn't great
for our public archive feature plans (where we'll not want to be
including email addresses in messages).
This requires updating one of the tests for the group_pm_with feature
in test_narrow to use the new style of tautology generated by SQLAlchemy.
Thanks to Sinwar for investigating this.
Fixes#8381.
We don't have our linter checking test files due to ultra-long strings
that are often present in test output that we verify. But it's worth
at least cleaning out all the ultra-long def lines.
This refactoring doesn't change behavior, but it sets us up
to more easily handle a register setting for `client_gravatar`,
which will allow clients to tell us they're going to compute
their own gravatar URLs.
The `client_gravatar` flag already exists in our code, but it
is only used for Django views (users/messages) but not for
Zulip events.
The main change is to move the call to `set_sender_avatar` into
`finalize_payload`, which adds the boolean `client_gravatar`
parameter to that function. And then we update various callers
to supply that flag.
One small performance benefit of this change is that we now
lazily compute the client message payloads in
`event_queue.process_message_event` now, so this will improve
performance if all interested clients have the same value of
`apply_markdown`. But the change here is really preparing us
for the additional boolean parameter, which will cause us to
have four variations of the payload.
tsearch_extras returns search offsets in bytes but our highlight
function treated them as character offsets. Added a check to subtract
extra bytes if the tsearch search backend is being used.
Fixes#4084.
Fixes#7021.
Do you call get_recipient(Recipient.STREAM, stream_id) or
get_recipient(stream_id, Recipient.STREAM)? I could never
remember, and it was not very type safe, since both parameters
are integers.