This should avoid some memory allocations.
We also use build_person_matcher to avoid
repeating the same logic over and over
again to process the query into termlets.
We also remove people.get_all_persons() and
people.person_matches_query().
This may actually be a slowdown for the worst case
scenario, but it sets us up to be able to easily
short circuit the removal of diacritic characters
for users that have pure ascii names.
For example, czo has lots of names like this:
- Tim Abbott
- Steve Howell
Since they're pure ascii, we can do a one-time
check. A subsequent commit will show how we use
this.
This looks like simple code cleanup, but it's more
than that.
The code cleanup here is that we don't have three
callbacks to get a list of typeaheads for bootstrap.
Instead, we just have one function that does all the
main work.
And then the speedup comes from the fact we no longer
need to remove diacritics from the query for every
time through our loop of seeing if a person matches
the query.
It's a bit subtle to see in the diff, but these are
the relevant lines:
const matcher = exports.get_person_or_user_group_matcher(query);
const filtered_results = _.filter(people_and_groups, matcher);
Before this, bootstrap was doing $.grep, and we'd have
to reinitialize the matcher for every person.
If you profile this before and after, you'll see that
remove_diacritics gets called fewer times.
To profile this, you want to loads lots of users into
your DB and try to autocomplete "Extra", as in "Extra1 User".
If you try to autocomplete something else, then my patch
won't really help, and `remove_diacritics` will still
show up as expensive. Because it is that expensive a function.
These had to be done in tandem, since they were
both kinda coupled to the function that is now
called query_matches_name_description.
(This commit slightly negatively impacts PM
lookups, but this is addressed in the subsequent
commit, which makes PMs much faster. The impact
is super minimal--it's just an extra function
dispatch.)
This may seem silly now, since we are returning a function
that still dispatches over all flavors of search for
every item, but subsequent commits will make it obvious
why I'm doing this.
We want to do our own matching of items, rather than
just giving a callback to bootstrap, which does $.grep
on all the items.
Doing our own matching gives us flexibility for future
improvements like custom data structures for searching
through big amounts of data. Even in the short term
we can speed up searches by pulling expensive operations
outside the grep/filter call.
This architecture has been in place for our search
bar since ~2014.
The benchmark is commented out. It takes only a few
milliseconds to run, so there may be no reason not
to always run it. It doesn't test correctness, so
it would arguably inflate line coverage, but set/get
are obviously covered elsewhere.
We now require the actual tests to explicitly
to zrequire Dict, rather than magically adding this.
In one case, the use of Dict was clearly just for
the test (not the app), so I converted that an ordinary
JS object (see timerender.js).
We have ~5 years of proof that we'll probably never
extend Dict with more options.
Breaking the classes into makes both a little faster
(no options to check), and we remove some options
in FoldDict that are never used (from/from_array).
A possible next step is to fine-tune the Dict to use
Map internally.
Note that the TypeScript types for FoldDict are now
more specific (requiring string keys). Of course,
this isn't really enforced until we convert other
modules to TS.
Model classes fetched through apps.get_model don't get methods or class
attributes. It's not feasible to add them to all these objects in
use_db_models, but Recipient.PERSONAL etc. are worth setting, since
doing that increases the range of functions that can successfully be
imported and called in test_migrations.py.
These tests had a lot of very repetetive, identical mocking, in some
tests without even doing anything with the mocks. It's cleaner to put
the mock in the one relevant, common place for all the tests that need
it, and remove it from tests who had no use for the mocking.
Fixes#13504.
This commit is purely an improvement in error handling.
We used to not do any validation on keys before passing them to
memcached, which meant for invalid keys, memcached's own key
validation would throw an exception. Unfortunately, the resulting
error messages are super hard to read; the traceback structure doesn't
even show where the call into memcached happened.
In this commit we add validation to all the basic cache_* functions, and
appropriate handling in their callers.
We also add a lot of tests for the new behavior, which has the nice
effect of giving us decent coverage of all these core caching
functions which previously had been primarily tested manually.
These are leftovers from where we had default settings in the
settings.py file. Now that the files are separate those references to
"below" are not correct.
If ldap sync is run while ldap is misconfigured, it can end up causing
troublesome deactivations due to not finding users in ldap -
deactivating all users, or deactivating all administrators of a realm,
which then will require manual intervention to reactivate at least one
admin in django shell.
This change prevents such potential troublesome situations which are
overwhelmingly likely to be unintentional. If intentional, --force
option can be used to remove the protection.
We had a potentially nasty bug where we
weren't guaranteeing that all/stream/everyone
collated in consistent ways inside of
`compare_people_for_relevance`, which can
send certain types of sort algorithms into
an infinite loop. I doubt this ever happened
in practice, but it's obviously worth fixing.
Now we also have a clear tiebreaker between
any two all/everyone/stream mentions, which
is the idx field.
Finally, this should be a bit more efficient.
We don't have people named "all". Instead, we
create pseudo person objects with email/full_name
of "all" (along with some other fields). The tests
now reflect this.
This name was misleading, since this code is used
in sort_recipients, which happens when you, for
example, autocomplete persons in the "To:" box
when composing (and has nothing to do with
mentioning).
This makes it a bit easier to find common patterns,
plus it sets us up to pull the calls even further
up the stack.
The first rule of dealing with user data is sanitize
at the edges, not deep down in some function that
has many callers. Putting this code so deep down
in the stack means it's more likely to be called in
a loop.
This moves clean_query into all the callers
of query_matches_source_attrs.
This doesn't change anything performance-wise,
but it sets up future commits.
This change is easy--we only had one caller.
This change means any query going against a
target with multiple `match_attrs`, such as
user names (first name, last) only has to
clean the query once per person.
We want to mostly deprecate this function (see
the comment I added), so I gave it a more specific
name.
Ideally I'd just fix `stream_create`, but it does
use this function in a couple places, and it's helpful
to reuse the same sort here. In one place stream_create
actually unshifts the "me" user back to the top of the
list, which makes sense for its use case.
This change should prevent test flakes, plus
it's more deterministic behavior for clients,
who will generally comma-join the ids into
a key for their internal data structures.
I was able to verify test coverage on this
by making the sort reversed, which would
cause test_huddle_send_message_events to
fail.
If two user_ids in a recent huddle have ids
that sort lexically differently than numerically,
such as 7 and 66, then we were creating two
different buckets in pm_conversations.
This regression was introduced in
263ac0eb45 on
November 21, 2019.
Instead of having our callers pass in a possibly
non-canonical version of a user_ids_string, just
have them pass in a list.
The next commit will canonicalize the sort.
The server may send us ids in the order
[11, 2], instead of [2, 11]. We don't want
to rely on server behavior, regardless, for
the sort.
Our tests now show we process that data.
The current code is is still buggy and causes
us to show the same huddle two different times
for situations where the lexical sort doesn't
match the numerical sort.
This happens on czo often, where Tim is user
7, and his id sorts lexically after ids like
58, 622, 4444, etc.
This should be about 4 times faster, saving something like half a
millisecond on each stream of 10000 subscribers.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>