Commit Graph

34262 Commits

Author SHA1 Message Date
Steve Howell d227988519 tests: Split up sort_recipient tests. 2020-01-03 14:58:05 -08:00
Steve Howell cde01aeeb0 tests: Avoid list mutation.
To test dups we can just create a new list.
2020-01-03 14:58:05 -08:00
Steve Howell 5c43180a70 tests: Use names for test objects. 2020-01-03 14:58:05 -08:00
Steve Howell 49ba916be7 refactor: Rename *_for_at_mentioning functions.
This name was misleading, since this code is used
in sort_recipients, which happens when you, for
example, autocomplete persons in the "To:" box
when composing (and has nothing to do with
mentioning).
2020-01-03 14:58:05 -08:00
Steve Howell 1577662a67 refactor: Clean up exports.compose_matches_sorter. 2020-01-02 12:11:50 -08:00
Steve Howell c2c5878c3a refactor: Clean up compose_content_matcher.
The switch statement is easier to read, and
we also want to eventually remove the "this"
that couples us to the awkward typeahead
hacks.
2020-01-02 12:11:50 -08:00
Steve Howell ebf4195bf3 refactor: Extract clean_query_lowercase().
This makes it a bit easier to find common patterns,
plus it sets us up to pull the calls even further
up the stack.

The first rule of dealing with user data is sanitize
at the edges, not deep down in some function that
has many callers.  Putting this code so deep down
in the stack means it's more likely to be called in
a loop.
2020-01-02 12:11:48 -08:00
Steve Howell 4699710856 refactor: Move clean_query further up the stack.
This moves clean_query into all the callers
of query_matches_source_attrs.

This doesn't change anything performance-wise,
but it sets up future commits.
2020-01-02 12:10:10 -08:00
Steve Howell 8448832bfe refactor: Move clean_query up the stack.
This change is easy--we only had one caller.

This change means any query going against a
target with multiple `match_attrs`, such as
user names (first name, last) only has to
clean the query once per person.
2020-01-02 12:10:10 -08:00
Steve Howell 5b01efda7b typeahead: Extract clean_query helper. 2020-01-02 12:10:07 -08:00
Steve Howell b5d0eab0c6 dict: Add filter_values() method.
This method can help us avoid some memory
allocations.
2020-01-02 12:03:45 -08:00
Steve Howell 8b04cf1288 people: Use is_my_user_id in get_people_for_stream_create.
We want to get away from email-based checks.
2020-01-02 12:03:43 -08:00
Steve Howell 7229a943f0 tests: Use add_in_realm for "me" in people tests.
This is more realistic for testing.
2020-01-02 12:03:04 -08:00
Steve Howell 54cb857fee refactor: Rename people.get_rest_of_realm().
We want to mostly deprecate this function (see
the comment I added), so I gave it a more specific
name.

Ideally I'd just fix `stream_create`, but it does
use this function in a couple places, and it's helpful
to reuse the same sort here.  In one place stream_create
actually unshifts the "me" user back to the top of the
list, which makes sense for its use case.
2020-01-02 12:03:04 -08:00
Steve Howell 405a529340 server: Sort user_ids in recent PM conversations.
This change should prevent test flakes, plus
it's more deterministic behavior for clients,
who will generally comma-join the ids into
a key for their internal data structures.

I was able to verify test coverage on this
by making the sort reversed, which would
cause test_huddle_send_message_events to
fail.
2020-01-02 11:59:58 -08:00
Steve Howell 6e93f330c6 bug fix: Fix huddles in "Private Messages".
If two user_ids in a recent huddle have ids
that sort lexically differently than numerically,
such as 7 and 66, then we were creating two
different buckets in pm_conversations.

This regression was introduced in
263ac0eb45 on
November 21, 2019.
2020-01-02 11:59:58 -08:00
Steve Howell 0e68387975 refactor: Have pm_conversations take user_ids.
Instead of having our callers pass in a possibly
non-canonical version of a user_ids_string, just
have them pass in a list.

The next commit will canonicalize the sort.
2020-01-02 11:59:58 -08:00
Steve Howell ab6f4af33a tests: Use tricky server data in unit tests.
The server may send us ids in the order
[11, 2], instead of [2, 11].  We don't want
to rely on server behavior, regardless, for
the sort.

Our tests now show we process that data.

The current code is is still buggy and causes
us to show the same huddle two different times
for situations where the lexical sort doesn't
match the numerical sort.

This happens on czo often, where Tim is user
7, and his id sorts lexically after ids like
58, 622, 4444, etc.
2020-01-02 11:59:58 -08:00
Anders Kaseorg 8f281c4fc9 apply_event: Replace list comprehension with list.remove.
This should be about 4 times faster, saving something like half a
millisecond on each stream of 10000 subscribers.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-12-31 10:06:09 -08:00
Mateusz Mandera bbafced254 api docs: Advertise "topic" argument instead of "subject" on /messages.
They have the same meaning but we're transitioning away from the
"subject" terminology, so we should advertise "topic" in docs.
2019-12-30 17:22:46 -08:00
Tim Abbott 86721c3be5 docs: Fix a few typos in Vagrant docs. 2019-12-30 10:34:48 -08:00
Jonas Svatos 9dd29438f5 base Zulip PostgreSQL Docker container on PGroonga official one 2019-12-30 10:20:25 -08:00
Steve Howell b3b83f223d minor: Avoid dict lookup for color.
The only thing get_color() does is look
up a sub:

    exports.get_color = function (stream_name) {
        const sub = exports.get_sub(stream_name);
        if (sub === undefined) {
            return stream_color.default_color;
        }
        return sub.color;
    };

So if we have a sub already, there's no point
calling the helper.

Obviously, this isn't a huge deal, but it happens
N times during page load.
2019-12-30 09:50:22 -08:00
Steve Howell 0711c7ea49 performance: Avoid dup calls to subscribed_streams().
In stream_sort.sort_groups, we now have the caller
pass us in the list of streams, since they are getting
them anyway.
2019-12-30 09:50:22 -08:00
Steve Howell 33246c5c49 streams: Simplify claim_colors.
This is about a millisecond faster for lots of streams,
since it does more work with native Set.
2019-12-30 09:50:22 -08:00
Steve Howell 631811e686 streams: Add BinaryDict for stream_data.
This should make any operation on subscribed
streams faster (we won't need to filter out
unsubscribed streams every time).

I started writing this before I realized we
had a bug where we call `subscribed_streams`
in a nested loop.

After fixing the bugs, this is not as much of
a bottleneck, but it's still a speedup in many
important places:

    * build left sidebar
    * every keystroke in search bar
    * first keystroke in making #stream_links
    * every keystroke in compose stream box

The streams settings code is kinda complicated.
It does a non-deterministic sort of the "others"
bucket when you add elements to the left panel.
They get hidden, anyway.  Our values() call now
puts subscribed streams first.  It never guaranteed
order, but putting subscribed streams first is
probably a good behavior for most situations.
2019-12-30 09:50:20 -08:00
Steve Howell a3512553a8 streams: Add LazySet for subscribers.
This defers O(N*S) operations, where

    N = number of streams
    S = number of subscribers per stream

In many cases we never do an O(N) operation on
a stream.  Exceptions include:

    - checking stream links from the compose box
    - editing a stream
    - adding members to a newly added stream

An operation that used to be O(N)--computing
the number of subscribers--is now O(1), and we
don't even pay O(N) on a one-time basis to
compute it (not counting the cost to build the
array from JSON, but we have to do that).
2019-12-30 09:47:55 -08:00
Steve Howell e804f39f0e performance: Avoid expensive call in stream_data.is_active.
Calling `set_filter_out_inactives` is expensive, since we
count up the number of subscribed streams, which iterates
through all your streams, creates a new list of subscribed
streams, then counts them.

In my dev setup, I created 700 streams, and this shaved
about 700ms off of the initial call to `build_stream_list`.
2019-12-30 09:45:46 -08:00
Steve Howell 70470dea1c settings: Use correct email when searching users.
If we aren't showing users emails, then we don't
want to use emails in the search.

And if we are showing users emails, we want to
search on the email that's displayed to them.
For admins this will be delivery_email.

For regular users we arguably shouldn't search
on emails either, since it mostly causes confusion,
but this commit just preserves the current
behavior for those users (unless `show_email` is
false).
2019-12-30 09:43:24 -08:00
Steve Howell 3e4326afda refactor: Extract email_for_user_settings.
We want to be able to unit test this value,
since it's conditional on several factors:

    - am I an admin?
    - can non-admins view emails?
    - do we have delivery_email for the user?

I'm mocking show_email in the tests, since the
show_email code is in `settings_org` and
kind of hard to unit test.  It's not impossible,
but it's too much for this commit.  (Either
we need to extract it out to a nice file or
deal with mocking jQuery.  That module is
mostly data-oriented, so it would be nice
to have something like `settings_config` that
is actually pure data.)
2019-12-28 11:22:24 -08:00
Steve Howell 3a95be2f2f refactor: Extract matches_user_settings_search.
This was duplicate code.  I'm moving it to people
for pragmatic reasons--it's hard to unit test stuff
in settings_users.js due to all the jQuery.

It's also nice to have all people-related search
code in one place, just for auditing purposes.
2019-12-28 11:22:24 -08:00
Steve Howell 5e0fc25f74 bug fix: Allow admins to filter users in settings.
It appears c28c3015 caused a regression where we
set `email` to undefined if a user does not have
`delivery_email` set, and this causes filtering
of users to fail for admins doing user settings.

This fixes only one of the issues reported in
issue #13554.

There's probably no easy fix to scrolling taking
long, but I think fixing search will mostly
address that complaint.

The Rust folks seem to agree with me that the
search results are too noisy.  If I search for
"s" I get:

    * names like Steve (good)
    * names like Jesse (noisy)
    * anybody with s in their email (super noisy)

Here is the relevant code:

    return (
        item.full_name.toLowerCase().indexOf(value) >= 0 ||
        email.toLowerCase().indexOf(value) >= 0
    );
2019-12-28 11:22:24 -08:00
Steve Howell 1df7a7280a Avoid unnecessary is_ascii checks on search termlets.
We now can call is_ascii only once per search termlet
when we are filtering multiple persons on the same
query.  (This requires the caller to use
`build_person_matcher` outside a loop or before
a `_.filter` call.)
2019-12-28 11:14:21 -08:00
Steve Howell 399e83aa70 minor: Tweak build_person_matcher.
This is not a major speedup, but we do a couple
simple things here:

    - trim the query outside the function we
      build (that might be called multiple times)

    - don't split names before we possibly
      early-exit with an email match
2019-12-28 11:14:21 -08:00
Steve Howell a718b47095 refactor: Speed up filter_people_by_search_terms.
We now call build_person_matcher outside the loop.
2019-12-28 11:14:21 -08:00
Steve Howell 9c525f8ecb refactor: Extract build_person_matcher().
This will allow use to change some O(N) behavior
to O(1) where we are performing the same query
on a bunch of people.  (Subsequent commits will
actually take advantage of this prefactoring.)
2019-12-28 11:14:21 -08:00
Steve Howell ab34ee0800 search performance: Stop at max_items.
Once we have max_items results, stop trying
to get more items.

This should really help large realms when
you do a search on streams that turns up
more than N streams (where N is about 12).
We won't even bother to find people.
2019-12-28 11:09:28 -08:00
Steve Howell 8406d34145 search: Extract make_attacher.
This class gives us more control over attaching
suggestions to our eventual result.  The main
thing we do now is remove duplicates as they're
encountered.

This will make sense in the follow up commit,
where we can short circuit actions as soon as
we get enough results.
2019-12-28 11:09:26 -08:00
Steve Howell 97293aef96 search: Simplify legacy search code.
We now have a list of filterers that we walk through.
2019-12-28 11:09:25 -08:00
Steve Howell 09326cb467 refactor: Extract finalize_results.
This has a few benefits:

    - we remove some duplicate code
    - we can see finalize_results in profiles

It turns out finalize_results is expensive
for some searches. If the search itself doesn't
do a ton of work but returns a lot of results,
we see it in finalize_results.  It brings to
attention that we should be truncating items
earlier instead of doing lots of unnecessary
work.
2019-12-28 11:09:25 -08:00
Steve Howell 4141abc171 search: Slightly speed up stream highlighting.
This isn't a huge speedup, but it's an easy
code change.

We remove the two-liner highlight_with_escaping,
which was only called in one place, and when
we inline it into the caller, we can pull the
first line, which builds the regex, out of the
loop.
2019-12-28 11:09:23 -08:00
Steve Howell 7a2d9a0579 refactor: Extract build_highlight_regex. 2019-12-28 10:57:53 -08:00
Steve Howell abfd39987c refactor: Remove duplicate code.
The code we removed in highlight_with_escaping
is exactly the same code as in
highlight_with_escaping_and_regex.

I actually copy/pasted this code five years
ago and am now removing the duplication. :)
2019-12-28 10:57:53 -08:00
Steve Howell abdd4b54f4 performance: Speed up search bar highlighting.
When we're highlighting all the people that show
up in a search from the search bar, we need
to fairly expensively build a regex from the
query:

    query = query.toLowerCase();
    query = query.replace(/[\-\[\]{}()*+?.,\\\^$|#\s]/g, '\\$&');
    const regex = new RegExp('(^' + query + ')', 'ig');

Even though the final regex is presumably cached, we
still needed to do that `query.replace` for every person.
Even for relatively small numbers of persons, this would
show up in profiles as expensive.

Now we just build the query once by using a pattern
where you call a function outside the loop to build
an inner function that's used in the loop that closes
on the `query` above.  The diff probably shows this
better than I explained it here.
2019-12-28 10:57:53 -08:00
Steve Howell 6a9eaebff2 populate_db: Make num_recips calculation more clear.
Extracting this calculation makes it easier to hack
it when you're trying to load lots of users.

We probably want a slightly more realistic calculation
here for stress testing.  And also fewer rows.  But
at least now it's a little more clear what it's doing.
2019-12-28 10:56:03 -08:00
Steve Howell 4a8c70593f populate_db: Add random names for large user batches.
If extra_users > 1000, add some names that are more
interesing than Extra111 User.
2019-12-28 10:56:03 -08:00
Mateusz Mandera e90866876c queue: Take advantage of ABC for defining abstract worker base classes.
QueueProcessingWorker and LoopQueueProcessingWorker are abstract classes
meant to be subclassed by a class that will define its own consume()
or consume_batch() method. ABCs are suited for that and we can tag
consume/consume_batch with the @abstractmethod wrapper which will
prevent subclasses that don't define these methods properly to be
impossible to even instantiate (as opposed to only crashing once
consume() is called). It's also nicely detected by mypy, which will
throw errors such as this on invalid use:

error: Only concrete class can be given where "Type[TestWorker]" is
expected
error: Cannot instantiate abstract class 'TestWorker' with abstract
attribute 'consume'

Due to it being detected by mypy, we can remove the test
test_worker_noconsume which just tested the old version of this -
raising an exception when the unimplemented consume() gets called. Now
it can be handled already on the linter level.
2019-12-28 10:52:17 -08:00
Mateusz Mandera ec209a9bc9 test_queue_worker: Extract a repetitive mock. 2019-12-28 10:52:13 -08:00
Mateusz Mandera a54640fc68 queue: Share exception handling code between loop and normal workers.
LoopQueueProcessingWorker can handle exceptions inside consume_batch in
a similar manner to how QueueProcessingWorker handles exceptions inside
consume.
2019-12-28 10:47:36 -08:00
Mateusz Mandera e559447f83 ldap: Improve logging.
Our ldap integration is quite sensitive to misconfigurations, so more
logging is better than less to help debug those issues.
Despite the following docstring on ZulipLDAPException:

"Since this inherits from _LDAPUser.AuthenticationFailed, these will
be caught and logged at debug level inside django-auth-ldap's
authenticate()"

We weren't actually logging anything, because debug level messages were
ignored due to our general logging settings. It is however desirable to
log these errors, as they can prove useful in debugging configuration
problems. The django_auth_ldap logger can get fairly spammy on debug
level, so we delegate ldap logging to a separate file
/var/log/zulip/ldap.log to avoid spamming server.log too much.
2019-12-28 10:47:08 -08:00