zulip

Commit Graph

Author	SHA1	Message	Date
Steve Howell	4e59937632	js: Add IntDict class. We don't use this yet, but we will soon. We report errors if users pass in strings instead of ints, but we try to still use the key.	2020-01-05 12:27:26 -08:00
Steve Howell	26168eaa98	search: Optimize search bar suggestions for large realms. We only ever show 3 or 4 people in search suggestions (possibly w/a couple variations, like pm-with/sender/etc.), so we can try to search a smaller subset of people before going through the entire realm. We use message_store.user_ids() for this, since you typically want to search messages for people that have sent messages recently, and we already sort based on PM conversations.	2020-01-04 12:58:00 -08:00
Steve Howell	7016292558	search: Track user_ids in message_store. We'll use this for search.	2020-01-04 12:57:58 -08:00
Steve Howell	a5bf6984bc	search: Extract make_people_getter(). This helper lets us reduce the number of people queries down from 4 to either 0 or 1.	2020-01-04 12:55:40 -08:00
Steve Howell	d87c5d7b1f	search: Use people.filter_all_persons() in search. This should avoid some memory allocations. We also use build_person_matcher to avoid repeating the same logic over and over again to process the query into termlets. We also remove people.get_all_persons() and people.person_matches_query().	2020-01-04 12:53:32 -08:00
Steve Howell	d91a0ab9c7	typeahead: Remove diacritics on full names, not pieces. This may actually be a slowdown for the worst case scenario, but it sets us up to be able to easily short circuit the removal of diacritic characters for users that have pure ascii names. For example, czo has lots of names like this: - Tim Abbott - Steve Howell Since they're pure ascii, we can do a one-time check. A subsequent commit will show how we use this.	2020-01-03 17:46:59 -08:00
Steve Howell	7d7028b7d0	performance: Speed up PM lookaheads. This looks like simple code cleanup, but it's more than that. The code cleanup here is that we don't have three callbacks to get a list of typeaheads for bootstrap. Instead, we just have one function that does all the main work. And then the speedup comes from the fact we no longer need to remove diacritics from the query for every time through our loop of seeing if a person matches the query. It's a bit subtle to see in the diff, but these are the relevant lines: const matcher = exports.get_person_or_user_group_matcher(query); const filtered_results = _.filter(people_and_groups, matcher); Before this, bootstrap was doing $.grep, and we'd have to reinitialize the matcher for every person. If you profile this before and after, you'll see that remove_diacritics gets called fewer times. To profile this, you want to loads lots of users into your DB and try to autocomplete "Extra", as in "Extra1 User". If you try to autocomplete something else, then my patch won't really help, and `remove_diacritics` will still show up as expensive. Because it is that expensive a function.	2020-01-03 17:42:29 -08:00
Steve Howell	a0a94b54c9	refactor: Extract helpers for user/stream matching. These had to be done in tandem, since they were both kinda coupled to the function that is now called query_matches_name_description. (This commit slightly negatively impacts PM lookups, but this is addressed in the subsequent commit, which makes PMs much faster. The impact is super minimal--it's just an extra function dispatch.)	2020-01-03 17:42:29 -08:00
Steve Howell	303ab00760	typeahead: Extract get_topic_matcher.	2020-01-03 17:42:27 -08:00
Steve Howell	e9c2a7ef7c	typeahead: Extract get_language_matcher.	2020-01-03 17:42:25 -08:00
Steve Howell	b23df43c1f	typeahead: Extract get_slash__matcher.	2020-01-03 17:42:22 -08:00
Steve Howell	676397a026	typeahead: Extract get_emoji_matcher.	2020-01-03 17:42:20 -08:00
Steve Howell	ccf6640660	refactor: Have compose_content_matcher return a function. This may seem silly now, since we are returning a function that still dispatches over all flavors of search for every item, but subsequent commits will make it obvious why I'm doing this.	2020-01-03 17:39:50 -08:00
Steve Howell	b65da7cbe9	compose typeahead: Do matching/sorting without callbacks. We want to do our own matching of items, rather than just giving a callback to bootstrap, which does $.grep on all the items. Doing our own matching gives us flexibility for future improvements like custom data structures for searching through big amounts of data. Even in the short term we can speed up searches by pulling expensive operations outside the grep/filter call. This architecture has been in place for our search bar since ~2014.	2020-01-03 17:39:48 -08:00
Steve Howell	ee3e488e02	js: Extract FoldDict class. We have ~5 years of proof that we'll probably never extend Dict with more options. Breaking the classes into makes both a little faster (no options to check), and we remove some options in FoldDict that are never used (from/from_array). A possible next step is to fine-tune the Dict to use Map internally. Note that the TypeScript types for FoldDict are now more specific (requiring string keys). Of course, this isn't really enforced until we convert other modules to TS.	2020-01-03 17:19:50 -08:00
Steve Howell	9cd075ffb1	people: Use Set() in track_duplicate_full_name(). This is more idiomatic and probably faster for most browsers. (This function gets called for each name in page load, so any slowness is magnified.)	2020-01-03 17:19:38 -08:00
Steve Howell	b3a69154a6	refactor: Export compare_for_relevance. This future-proofs us a bit more for test coverage.	2020-01-03 14:58:05 -08:00
Steve Howell	0985842c62	Fix sorting for broadcast mentions. We had a potentially nasty bug where we weren't guaranteeing that all/stream/everyone collated in consistent ways inside of `compare_people_for_relevance`, which can send certain types of sort algorithms into an infinite loop. I doubt this ever happened in practice, but it's obviously worth fixing. Now we also have a clear tiebreaker between any two all/everyone/stream mentions, which is the idx field. Finally, this should be a bit more efficient.	2020-01-03 14:58:05 -08:00
Steve Howell	758786ab87	refactor: Extract broadcast_mentions. This will be helpful for testing.	2020-01-03 14:58:05 -08:00
Steve Howell	49ba916be7	refactor: Rename *_for_at_mentioning functions. This name was misleading, since this code is used in sort_recipients, which happens when you, for example, autocomplete persons in the "To:" box when composing (and has nothing to do with mentioning).	2020-01-03 14:58:05 -08:00
Steve Howell	1577662a67	refactor: Clean up exports.compose_matches_sorter.	2020-01-02 12:11:50 -08:00
Steve Howell	c2c5878c3a	refactor: Clean up compose_content_matcher. The switch statement is easier to read, and we also want to eventually remove the "this" that couples us to the awkward typeahead hacks.	2020-01-02 12:11:50 -08:00
Steve Howell	ebf4195bf3	refactor: Extract clean_query_lowercase(). This makes it a bit easier to find common patterns, plus it sets us up to pull the calls even further up the stack. The first rule of dealing with user data is sanitize at the edges, not deep down in some function that has many callers. Putting this code so deep down in the stack means it's more likely to be called in a loop.	2020-01-02 12:11:48 -08:00
Steve Howell	4699710856	refactor: Move clean_query further up the stack. This moves clean_query into all the callers of query_matches_source_attrs. This doesn't change anything performance-wise, but it sets up future commits.	2020-01-02 12:10:10 -08:00
Steve Howell	8448832bfe	refactor: Move clean_query up the stack. This change is easy--we only had one caller. This change means any query going against a target with multiple `match_attrs`, such as user names (first name, last) only has to clean the query once per person.	2020-01-02 12:10:10 -08:00
Steve Howell	5b01efda7b	typeahead: Extract clean_query helper.	2020-01-02 12:10:07 -08:00
Steve Howell	b5d0eab0c6	dict: Add filter_values() method. This method can help us avoid some memory allocations.	2020-01-02 12:03:45 -08:00
Steve Howell	8b04cf1288	people: Use is_my_user_id in get_people_for_stream_create. We want to get away from email-based checks.	2020-01-02 12:03:43 -08:00
Steve Howell	54cb857fee	refactor: Rename people.get_rest_of_realm(). We want to mostly deprecate this function (see the comment I added), so I gave it a more specific name. Ideally I'd just fix `stream_create`, but it does use this function in a couple places, and it's helpful to reuse the same sort here. In one place stream_create actually unshifts the "me" user back to the top of the list, which makes sense for its use case.	2020-01-02 12:03:04 -08:00
Steve Howell	6e93f330c6	bug fix: Fix huddles in "Private Messages". If two user_ids in a recent huddle have ids that sort lexically differently than numerically, such as 7 and 66, then we were creating two different buckets in pm_conversations. This regression was introduced in `263ac0eb45` on November 21, 2019.	2020-01-02 11:59:58 -08:00
Steve Howell	0e68387975	refactor: Have pm_conversations take user_ids. Instead of having our callers pass in a possibly non-canonical version of a user_ids_string, just have them pass in a list. The next commit will canonicalize the sort.	2020-01-02 11:59:58 -08:00
Steve Howell	b3b83f223d	minor: Avoid dict lookup for color. The only thing get_color() does is look up a sub: exports.get_color = function (stream_name) { const sub = exports.get_sub(stream_name); if (sub === undefined) { return stream_color.default_color; } return sub.color; }; So if we have a sub already, there's no point calling the helper. Obviously, this isn't a huge deal, but it happens N times during page load.	2019-12-30 09:50:22 -08:00
Steve Howell	0711c7ea49	performance: Avoid dup calls to subscribed_streams(). In stream_sort.sort_groups, we now have the caller pass us in the list of streams, since they are getting them anyway.	2019-12-30 09:50:22 -08:00
Steve Howell	33246c5c49	streams: Simplify claim_colors. This is about a millisecond faster for lots of streams, since it does more work with native Set.	2019-12-30 09:50:22 -08:00
Steve Howell	631811e686	streams: Add BinaryDict for stream_data. This should make any operation on subscribed streams faster (we won't need to filter out unsubscribed streams every time). I started writing this before I realized we had a bug where we call `subscribed_streams` in a nested loop. After fixing the bugs, this is not as much of a bottleneck, but it's still a speedup in many important places: * build left sidebar * every keystroke in search bar * first keystroke in making #stream_links * every keystroke in compose stream box The streams settings code is kinda complicated. It does a non-deterministic sort of the "others" bucket when you add elements to the left panel. They get hidden, anyway. Our values() call now puts subscribed streams first. It never guaranteed order, but putting subscribed streams first is probably a good behavior for most situations.	2019-12-30 09:50:20 -08:00
Steve Howell	a3512553a8	streams: Add LazySet for subscribers. This defers O(N*S) operations, where N = number of streams S = number of subscribers per stream In many cases we never do an O(N) operation on a stream. Exceptions include: - checking stream links from the compose box - editing a stream - adding members to a newly added stream An operation that used to be O(N)--computing the number of subscribers--is now O(1), and we don't even pay O(N) on a one-time basis to compute it (not counting the cost to build the array from JSON, but we have to do that).	2019-12-30 09:47:55 -08:00
Steve Howell	e804f39f0e	performance: Avoid expensive call in stream_data.is_active. Calling `set_filter_out_inactives` is expensive, since we count up the number of subscribed streams, which iterates through all your streams, creates a new list of subscribed streams, then counts them. In my dev setup, I created 700 streams, and this shaved about 700ms off of the initial call to `build_stream_list`.	2019-12-30 09:45:46 -08:00
Steve Howell	70470dea1c	settings: Use correct email when searching users. If we aren't showing users emails, then we don't want to use emails in the search. And if we are showing users emails, we want to search on the email that's displayed to them. For admins this will be delivery_email. For regular users we arguably shouldn't search on emails either, since it mostly causes confusion, but this commit just preserves the current behavior for those users (unless `show_email` is false).	2019-12-30 09:43:24 -08:00
Steve Howell	3e4326afda	refactor: Extract email_for_user_settings. We want to be able to unit test this value, since it's conditional on several factors: - am I an admin? - can non-admins view emails? - do we have delivery_email for the user? I'm mocking show_email in the tests, since the show_email code is in `settings_org` and kind of hard to unit test. It's not impossible, but it's too much for this commit. (Either we need to extract it out to a nice file or deal with mocking jQuery. That module is mostly data-oriented, so it would be nice to have something like `settings_config` that is actually pure data.)	2019-12-28 11:22:24 -08:00
Steve Howell	3a95be2f2f	refactor: Extract matches_user_settings_search. This was duplicate code. I'm moving it to people for pragmatic reasons--it's hard to unit test stuff in settings_users.js due to all the jQuery. It's also nice to have all people-related search code in one place, just for auditing purposes.	2019-12-28 11:22:24 -08:00
Steve Howell	5e0fc25f74	bug fix: Allow admins to filter users in settings. It appears `c28c3015` caused a regression where we set `email` to undefined if a user does not have `delivery_email` set, and this causes filtering of users to fail for admins doing user settings. This fixes only one of the issues reported in issue #13554. There's probably no easy fix to scrolling taking long, but I think fixing search will mostly address that complaint. The Rust folks seem to agree with me that the search results are too noisy. If I search for "s" I get: * names like Steve (good) * names like Jesse (noisy) * anybody with s in their email (super noisy) Here is the relevant code: return ( item.full_name.toLowerCase().indexOf(value) >= 0 \|\| email.toLowerCase().indexOf(value) >= 0 );	2019-12-28 11:22:24 -08:00
Steve Howell	1df7a7280a	Avoid unnecessary is_ascii checks on search termlets. We now can call is_ascii only once per search termlet when we are filtering multiple persons on the same query. (This requires the caller to use `build_person_matcher` outside a loop or before a `_.filter` call.)	2019-12-28 11:14:21 -08:00
Steve Howell	399e83aa70	minor: Tweak build_person_matcher. This is not a major speedup, but we do a couple simple things here: - trim the query outside the function we build (that might be called multiple times) - don't split names before we possibly early-exit with an email match	2019-12-28 11:14:21 -08:00
Steve Howell	a718b47095	refactor: Speed up filter_people_by_search_terms. We now call build_person_matcher outside the loop.	2019-12-28 11:14:21 -08:00
Steve Howell	9c525f8ecb	refactor: Extract build_person_matcher(). This will allow use to change some O(N) behavior to O(1) where we are performing the same query on a bunch of people. (Subsequent commits will actually take advantage of this prefactoring.)	2019-12-28 11:14:21 -08:00
Steve Howell	ab34ee0800	search performance: Stop at max_items. Once we have max_items results, stop trying to get more items. This should really help large realms when you do a search on streams that turns up more than N streams (where N is about 12). We won't even bother to find people.	2019-12-28 11:09:28 -08:00
Steve Howell	8406d34145	search: Extract make_attacher. This class gives us more control over attaching suggestions to our eventual result. The main thing we do now is remove duplicates as they're encountered. This will make sense in the follow up commit, where we can short circuit actions as soon as we get enough results.	2019-12-28 11:09:26 -08:00
Steve Howell	97293aef96	search: Simplify legacy search code. We now have a list of filterers that we walk through.	2019-12-28 11:09:25 -08:00
Steve Howell	09326cb467	refactor: Extract finalize_results. This has a few benefits: - we remove some duplicate code - we can see finalize_results in profiles It turns out finalize_results is expensive for some searches. If the search itself doesn't do a ton of work but returns a lot of results, we see it in finalize_results. It brings to attention that we should be truncating items earlier instead of doing lots of unnecessary work.	2019-12-28 11:09:25 -08:00
Steve Howell	4141abc171	search: Slightly speed up stream highlighting. This isn't a huge speedup, but it's an easy code change. We remove the two-liner highlight_with_escaping, which was only called in one place, and when we inline it into the caller, we can pull the first line, which builds the regex, out of the loop.	2019-12-28 11:09:23 -08:00

1 2 3 4 5 ...

8733 Commits