Commit Graph

34343 Commits

Author SHA1 Message Date
Steve Howell d87c5d7b1f search: Use people.filter_all_persons() in search.
This should avoid some memory allocations.

We also use build_person_matcher to avoid
repeating the same logic over and over
again to process the query into termlets.

We also remove people.get_all_persons() and
people.person_matches_query().
2020-01-04 12:53:32 -08:00
Steve Howell d91a0ab9c7 typeahead: Remove diacritics on full names, not pieces.
This may actually be a slowdown for the worst case
scenario, but it sets us up to be able to easily
short circuit the removal of diacritic characters
for users that have pure ascii names.

For example, czo has lots of names like this:

    - Tim Abbott
    - Steve Howell

Since they're pure ascii, we can do a one-time
check.  A subsequent commit will show how we use
this.
2020-01-03 17:46:59 -08:00
Steve Howell 7d7028b7d0 performance: Speed up PM lookaheads.
This looks like simple code cleanup, but it's more
than that.

The code cleanup here is that we don't have three
callbacks to get a list of typeaheads for bootstrap.
Instead, we just have one function that does all the
main work.

And then the speedup comes from the fact we no longer
need to remove diacritics from the query for every
time through our loop of seeing if a person matches
the query.

It's a bit subtle to see in the diff, but these are
the relevant lines:

    const matcher = exports.get_person_or_user_group_matcher(query);
    const filtered_results = _.filter(people_and_groups, matcher);

Before this, bootstrap was doing $.grep, and we'd have
to reinitialize the matcher for every person.

If you profile this before and after, you'll see that
remove_diacritics gets called fewer times.

To profile this, you want to loads lots of users into
your DB and try to autocomplete "Extra", as in "Extra1 User".

If you try to autocomplete something else, then my patch
won't really help, and `remove_diacritics` will still
show up as expensive.  Because it is that expensive a function.
2020-01-03 17:42:29 -08:00
Steve Howell a0a94b54c9 refactor: Extract helpers for user/stream matching.
These had to be done in tandem, since they were
both kinda coupled to the function that is now
called query_matches_name_description.

(This commit slightly negatively impacts PM
lookups, but this is addressed in the subsequent
commit, which makes PMs much faster.  The impact
is super minimal--it's just an extra function
dispatch.)
2020-01-03 17:42:29 -08:00
Steve Howell 303ab00760 typeahead: Extract get_topic_matcher. 2020-01-03 17:42:27 -08:00
Steve Howell e9c2a7ef7c typeahead: Extract get_language_matcher. 2020-01-03 17:42:25 -08:00
Steve Howell b23df43c1f typeahead: Extract get_slash__matcher. 2020-01-03 17:42:22 -08:00
Steve Howell 676397a026 typeahead: Extract get_emoji_matcher. 2020-01-03 17:42:20 -08:00
Steve Howell ccf6640660 refactor: Have compose_content_matcher return a function.
This may seem silly now, since we are returning a function
that still dispatches over all flavors of search for
every item, but subsequent commits will make it obvious
why I'm doing this.
2020-01-03 17:39:50 -08:00
Steve Howell b65da7cbe9 compose typeahead: Do matching/sorting without callbacks.
We want to do our own matching of items, rather than
just giving a callback to bootstrap, which does $.grep
on all the items.

Doing our own matching gives us flexibility for future
improvements like custom data structures for searching
through big amounts of data.  Even in the short term
we can speed up searches by pulling expensive operations
outside the grep/filter call.

This architecture has been in place for our search
bar since ~2014.
2020-01-03 17:39:48 -08:00
Steve Howell 9afad9e054 node tests: Add commented-out benchmarks for Dict.
The benchmark is commented out.  It takes only a few
milliseconds to run, so there may be no reason not
to always run it.  It doesn't test correctness, so
it would arguably inflate line coverage, but set/get
are obviously covered elsewhere.
2020-01-03 17:19:59 -08:00
Steve Howell 30ad1b6f16 zjsunit: Remove Dict dependency.
We now require the actual tests to explicitly
to zrequire Dict, rather than magically adding this.

In one case, the use of Dict was clearly just for
the test (not the app), so I converted that an ordinary
JS object (see timerender.js).
2020-01-03 17:19:59 -08:00
Steve Howell d41f714eff comments: Update comment for zjsunit/i18n.js. 2020-01-03 17:19:59 -08:00
Steve Howell 897320b2c4 zjquery: Use Map instead of Dict.
This seems to speed up the whole test suite
by about 20%, although measurements are a bit
noisy.
2020-01-03 17:19:59 -08:00
Steve Howell ee3e488e02 js: Extract FoldDict class.
We have ~5 years of proof that we'll probably never
extend Dict with more options.

Breaking the classes into makes both a little faster
(no options to check), and we remove some options
in FoldDict that are never used (from/from_array).

A possible next step is to fine-tune the Dict to use
Map internally.

Note that the TypeScript types for FoldDict are now
more specific (requiring string keys).  Of course,
this isn't really enforced until we convert other
modules to TS.
2020-01-03 17:19:50 -08:00
Steve Howell 9cd075ffb1 people: Use Set() in track_duplicate_full_name().
This is more idiomatic and probably
faster for most browsers.  (This function
gets called for each name in page load,
so any slowness is magnified.)
2020-01-03 17:19:38 -08:00
Mateusz Mandera 510bc60663 test_helpers: Set Recipient class attrs in use_db_models.
Model classes fetched through apps.get_model don't get methods or class
attributes. It's not feasible to add them to all these objects in
use_db_models, but Recipient.PERSONAL etc. are worth setting, since
doing that increases the range of functions that can successfully be
imported and called in test_migrations.py.
2020-01-03 16:56:58 -08:00
Mateusz Mandera a993604fae test_email_notifs: Clean up mocking.
These tests had a lot of very repetetive, identical mocking, in some
tests without even doing anything with the mocks. It's cleaner to put
the mock in the one relevant, common place for all the tests that need
it, and remove it from tests who had no use for the mocking.
2020-01-03 16:56:58 -08:00
Mateusz Mandera d691c249db api: Return a JsonableError if API key of invalid format is given. 2020-01-03 16:56:42 -08:00
Mateusz Mandera 72401b229f utils: Add a function to check if string can be an API key. 2020-01-03 16:56:42 -08:00
Mateusz Mandera 4f2897fafc cache: Validate keys before passing them to memcached.
Fixes #13504.

This commit is purely an improvement in error handling.

We used to not do any validation on keys before passing them to
memcached, which meant for invalid keys, memcached's own key
validation would throw an exception.  Unfortunately, the resulting
error messages are super hard to read; the traceback structure doesn't
even show where the call into memcached happened.

In this commit we add validation to all the basic cache_* functions, and
appropriate handling in their callers.

We also add a lot of tests for the new behavior, which has the nice
effect of giving us decent coverage of all these core caching
functions which previously had been primarily tested manually.
2020-01-03 16:56:42 -08:00
Mateusz Mandera 5bb84a2255 default_settings: Fix inaccurate "below" phrase in comments.
These are leftovers from where we had default settings in the
settings.py file. Now that the files are separate those references to
"below" are not correct.
2020-01-03 16:52:31 -08:00
Mateusz Mandera e477cae800 docs: Fix missing apostrophe in EMAIL_HOST_USER value. 2020-01-03 16:52:31 -08:00
Mateusz Mandera dc59850d15 docs: Fix incorrect path to get-django-setting script. 2020-01-03 16:52:31 -08:00
Mateusz Mandera d88494deae docs: Add some troubleshooting notes for ldap. 2020-01-03 16:52:30 -08:00
Mateusz Mandera e81aa740bc ldap: Protect against troublesome deactivations in ldap sync.
If ldap sync is run while ldap is misconfigured, it can end up causing
troublesome deactivations due to not finding users in ldap -
deactivating all users, or deactivating all administrators of a realm,
which then will require manual intervention to reactivate at least one
admin in django shell.
This change prevents such potential troublesome situations which are
overwhelmingly likely to be unintentional. If intentional, --force
option can be used to remove the protection.
2020-01-03 16:46:07 -08:00
Mateusz Mandera bfb963b9aa docs: Include suggested USERNAME_ATTR in example AD ldap configs. 2020-01-03 16:46:07 -08:00
Steve Howell b3a69154a6 refactor: Export compare_for_relevance.
This future-proofs us a bit more for test coverage.
2020-01-03 14:58:05 -08:00
Steve Howell 0985842c62 Fix sorting for broadcast mentions.
We had a potentially nasty bug where we
weren't guaranteeing that all/stream/everyone
collated in consistent ways inside of
`compare_people_for_relevance`, which can
send certain types of sort algorithms into
an infinite loop. I doubt this ever happened
in practice, but it's obviously worth fixing.

Now we also have a clear tiebreaker between
any two all/everyone/stream mentions, which
is the idx field.

Finally, this should be a bit more efficient.
2020-01-03 14:58:05 -08:00
Steve Howell 758786ab87 refactor: Extract broadcast_mentions.
This will be helpful for testing.
2020-01-03 14:58:05 -08:00
Steve Howell 773161cbb7 tests: Test "all" mentions more realistically.
We don't have people named "all".  Instead, we
create pseudo person objects with email/full_name
of "all" (along with some other fields).  The tests
now reflect this.
2020-01-03 14:58:05 -08:00
Steve Howell d227988519 tests: Split up sort_recipient tests. 2020-01-03 14:58:05 -08:00
Steve Howell cde01aeeb0 tests: Avoid list mutation.
To test dups we can just create a new list.
2020-01-03 14:58:05 -08:00
Steve Howell 5c43180a70 tests: Use names for test objects. 2020-01-03 14:58:05 -08:00
Steve Howell 49ba916be7 refactor: Rename *_for_at_mentioning functions.
This name was misleading, since this code is used
in sort_recipients, which happens when you, for
example, autocomplete persons in the "To:" box
when composing (and has nothing to do with
mentioning).
2020-01-03 14:58:05 -08:00
Steve Howell 1577662a67 refactor: Clean up exports.compose_matches_sorter. 2020-01-02 12:11:50 -08:00
Steve Howell c2c5878c3a refactor: Clean up compose_content_matcher.
The switch statement is easier to read, and
we also want to eventually remove the "this"
that couples us to the awkward typeahead
hacks.
2020-01-02 12:11:50 -08:00
Steve Howell ebf4195bf3 refactor: Extract clean_query_lowercase().
This makes it a bit easier to find common patterns,
plus it sets us up to pull the calls even further
up the stack.

The first rule of dealing with user data is sanitize
at the edges, not deep down in some function that
has many callers.  Putting this code so deep down
in the stack means it's more likely to be called in
a loop.
2020-01-02 12:11:48 -08:00
Steve Howell 4699710856 refactor: Move clean_query further up the stack.
This moves clean_query into all the callers
of query_matches_source_attrs.

This doesn't change anything performance-wise,
but it sets up future commits.
2020-01-02 12:10:10 -08:00
Steve Howell 8448832bfe refactor: Move clean_query up the stack.
This change is easy--we only had one caller.

This change means any query going against a
target with multiple `match_attrs`, such as
user names (first name, last) only has to
clean the query once per person.
2020-01-02 12:10:10 -08:00
Steve Howell 5b01efda7b typeahead: Extract clean_query helper. 2020-01-02 12:10:07 -08:00
Steve Howell b5d0eab0c6 dict: Add filter_values() method.
This method can help us avoid some memory
allocations.
2020-01-02 12:03:45 -08:00
Steve Howell 8b04cf1288 people: Use is_my_user_id in get_people_for_stream_create.
We want to get away from email-based checks.
2020-01-02 12:03:43 -08:00
Steve Howell 7229a943f0 tests: Use add_in_realm for "me" in people tests.
This is more realistic for testing.
2020-01-02 12:03:04 -08:00
Steve Howell 54cb857fee refactor: Rename people.get_rest_of_realm().
We want to mostly deprecate this function (see
the comment I added), so I gave it a more specific
name.

Ideally I'd just fix `stream_create`, but it does
use this function in a couple places, and it's helpful
to reuse the same sort here.  In one place stream_create
actually unshifts the "me" user back to the top of the
list, which makes sense for its use case.
2020-01-02 12:03:04 -08:00
Steve Howell 405a529340 server: Sort user_ids in recent PM conversations.
This change should prevent test flakes, plus
it's more deterministic behavior for clients,
who will generally comma-join the ids into
a key for their internal data structures.

I was able to verify test coverage on this
by making the sort reversed, which would
cause test_huddle_send_message_events to
fail.
2020-01-02 11:59:58 -08:00
Steve Howell 6e93f330c6 bug fix: Fix huddles in "Private Messages".
If two user_ids in a recent huddle have ids
that sort lexically differently than numerically,
such as 7 and 66, then we were creating two
different buckets in pm_conversations.

This regression was introduced in
263ac0eb45 on
November 21, 2019.
2020-01-02 11:59:58 -08:00
Steve Howell 0e68387975 refactor: Have pm_conversations take user_ids.
Instead of having our callers pass in a possibly
non-canonical version of a user_ids_string, just
have them pass in a list.

The next commit will canonicalize the sort.
2020-01-02 11:59:58 -08:00
Steve Howell ab6f4af33a tests: Use tricky server data in unit tests.
The server may send us ids in the order
[11, 2], instead of [2, 11].  We don't want
to rely on server behavior, regardless, for
the sort.

Our tests now show we process that data.

The current code is is still buggy and causes
us to show the same huddle two different times
for situations where the lexical sort doesn't
match the numerical sort.

This happens on czo often, where Tim is user
7, and his id sorts lexically after ids like
58, 622, 4444, etc.
2020-01-02 11:59:58 -08:00
Anders Kaseorg 8f281c4fc9 apply_event: Replace list comprehension with list.remove.
This should be about 4 times faster, saving something like half a
millisecond on each stream of 10000 subscribers.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-12-31 10:06:09 -08:00