Commit Graph

34237 Commits

Author SHA1 Message Date
Steve Howell 631811e686 streams: Add BinaryDict for stream_data.
This should make any operation on subscribed
streams faster (we won't need to filter out
unsubscribed streams every time).

I started writing this before I realized we
had a bug where we call `subscribed_streams`
in a nested loop.

After fixing the bugs, this is not as much of
a bottleneck, but it's still a speedup in many
important places:

    * build left sidebar
    * every keystroke in search bar
    * first keystroke in making #stream_links
    * every keystroke in compose stream box

The streams settings code is kinda complicated.
It does a non-deterministic sort of the "others"
bucket when you add elements to the left panel.
They get hidden, anyway.  Our values() call now
puts subscribed streams first.  It never guaranteed
order, but putting subscribed streams first is
probably a good behavior for most situations.
2019-12-30 09:50:20 -08:00
Steve Howell a3512553a8 streams: Add LazySet for subscribers.
This defers O(N*S) operations, where

    N = number of streams
    S = number of subscribers per stream

In many cases we never do an O(N) operation on
a stream.  Exceptions include:

    - checking stream links from the compose box
    - editing a stream
    - adding members to a newly added stream

An operation that used to be O(N)--computing
the number of subscribers--is now O(1), and we
don't even pay O(N) on a one-time basis to
compute it (not counting the cost to build the
array from JSON, but we have to do that).
2019-12-30 09:47:55 -08:00
Steve Howell e804f39f0e performance: Avoid expensive call in stream_data.is_active.
Calling `set_filter_out_inactives` is expensive, since we
count up the number of subscribed streams, which iterates
through all your streams, creates a new list of subscribed
streams, then counts them.

In my dev setup, I created 700 streams, and this shaved
about 700ms off of the initial call to `build_stream_list`.
2019-12-30 09:45:46 -08:00
Steve Howell 70470dea1c settings: Use correct email when searching users.
If we aren't showing users emails, then we don't
want to use emails in the search.

And if we are showing users emails, we want to
search on the email that's displayed to them.
For admins this will be delivery_email.

For regular users we arguably shouldn't search
on emails either, since it mostly causes confusion,
but this commit just preserves the current
behavior for those users (unless `show_email` is
false).
2019-12-30 09:43:24 -08:00
Steve Howell 3e4326afda refactor: Extract email_for_user_settings.
We want to be able to unit test this value,
since it's conditional on several factors:

    - am I an admin?
    - can non-admins view emails?
    - do we have delivery_email for the user?

I'm mocking show_email in the tests, since the
show_email code is in `settings_org` and
kind of hard to unit test.  It's not impossible,
but it's too much for this commit.  (Either
we need to extract it out to a nice file or
deal with mocking jQuery.  That module is
mostly data-oriented, so it would be nice
to have something like `settings_config` that
is actually pure data.)
2019-12-28 11:22:24 -08:00
Steve Howell 3a95be2f2f refactor: Extract matches_user_settings_search.
This was duplicate code.  I'm moving it to people
for pragmatic reasons--it's hard to unit test stuff
in settings_users.js due to all the jQuery.

It's also nice to have all people-related search
code in one place, just for auditing purposes.
2019-12-28 11:22:24 -08:00
Steve Howell 5e0fc25f74 bug fix: Allow admins to filter users in settings.
It appears c28c3015 caused a regression where we
set `email` to undefined if a user does not have
`delivery_email` set, and this causes filtering
of users to fail for admins doing user settings.

This fixes only one of the issues reported in
issue #13554.

There's probably no easy fix to scrolling taking
long, but I think fixing search will mostly
address that complaint.

The Rust folks seem to agree with me that the
search results are too noisy.  If I search for
"s" I get:

    * names like Steve (good)
    * names like Jesse (noisy)
    * anybody with s in their email (super noisy)

Here is the relevant code:

    return (
        item.full_name.toLowerCase().indexOf(value) >= 0 ||
        email.toLowerCase().indexOf(value) >= 0
    );
2019-12-28 11:22:24 -08:00
Steve Howell 1df7a7280a Avoid unnecessary is_ascii checks on search termlets.
We now can call is_ascii only once per search termlet
when we are filtering multiple persons on the same
query.  (This requires the caller to use
`build_person_matcher` outside a loop or before
a `_.filter` call.)
2019-12-28 11:14:21 -08:00
Steve Howell 399e83aa70 minor: Tweak build_person_matcher.
This is not a major speedup, but we do a couple
simple things here:

    - trim the query outside the function we
      build (that might be called multiple times)

    - don't split names before we possibly
      early-exit with an email match
2019-12-28 11:14:21 -08:00
Steve Howell a718b47095 refactor: Speed up filter_people_by_search_terms.
We now call build_person_matcher outside the loop.
2019-12-28 11:14:21 -08:00
Steve Howell 9c525f8ecb refactor: Extract build_person_matcher().
This will allow use to change some O(N) behavior
to O(1) where we are performing the same query
on a bunch of people.  (Subsequent commits will
actually take advantage of this prefactoring.)
2019-12-28 11:14:21 -08:00
Steve Howell ab34ee0800 search performance: Stop at max_items.
Once we have max_items results, stop trying
to get more items.

This should really help large realms when
you do a search on streams that turns up
more than N streams (where N is about 12).
We won't even bother to find people.
2019-12-28 11:09:28 -08:00
Steve Howell 8406d34145 search: Extract make_attacher.
This class gives us more control over attaching
suggestions to our eventual result.  The main
thing we do now is remove duplicates as they're
encountered.

This will make sense in the follow up commit,
where we can short circuit actions as soon as
we get enough results.
2019-12-28 11:09:26 -08:00
Steve Howell 97293aef96 search: Simplify legacy search code.
We now have a list of filterers that we walk through.
2019-12-28 11:09:25 -08:00
Steve Howell 09326cb467 refactor: Extract finalize_results.
This has a few benefits:

    - we remove some duplicate code
    - we can see finalize_results in profiles

It turns out finalize_results is expensive
for some searches. If the search itself doesn't
do a ton of work but returns a lot of results,
we see it in finalize_results.  It brings to
attention that we should be truncating items
earlier instead of doing lots of unnecessary
work.
2019-12-28 11:09:25 -08:00
Steve Howell 4141abc171 search: Slightly speed up stream highlighting.
This isn't a huge speedup, but it's an easy
code change.

We remove the two-liner highlight_with_escaping,
which was only called in one place, and when
we inline it into the caller, we can pull the
first line, which builds the regex, out of the
loop.
2019-12-28 11:09:23 -08:00
Steve Howell 7a2d9a0579 refactor: Extract build_highlight_regex. 2019-12-28 10:57:53 -08:00
Steve Howell abfd39987c refactor: Remove duplicate code.
The code we removed in highlight_with_escaping
is exactly the same code as in
highlight_with_escaping_and_regex.

I actually copy/pasted this code five years
ago and am now removing the duplication. :)
2019-12-28 10:57:53 -08:00
Steve Howell abdd4b54f4 performance: Speed up search bar highlighting.
When we're highlighting all the people that show
up in a search from the search bar, we need
to fairly expensively build a regex from the
query:

    query = query.toLowerCase();
    query = query.replace(/[\-\[\]{}()*+?.,\\\^$|#\s]/g, '\\$&');
    const regex = new RegExp('(^' + query + ')', 'ig');

Even though the final regex is presumably cached, we
still needed to do that `query.replace` for every person.
Even for relatively small numbers of persons, this would
show up in profiles as expensive.

Now we just build the query once by using a pattern
where you call a function outside the loop to build
an inner function that's used in the loop that closes
on the `query` above.  The diff probably shows this
better than I explained it here.
2019-12-28 10:57:53 -08:00
Steve Howell 6a9eaebff2 populate_db: Make num_recips calculation more clear.
Extracting this calculation makes it easier to hack
it when you're trying to load lots of users.

We probably want a slightly more realistic calculation
here for stress testing.  And also fewer rows.  But
at least now it's a little more clear what it's doing.
2019-12-28 10:56:03 -08:00
Steve Howell 4a8c70593f populate_db: Add random names for large user batches.
If extra_users > 1000, add some names that are more
interesing than Extra111 User.
2019-12-28 10:56:03 -08:00
Mateusz Mandera e90866876c queue: Take advantage of ABC for defining abstract worker base classes.
QueueProcessingWorker and LoopQueueProcessingWorker are abstract classes
meant to be subclassed by a class that will define its own consume()
or consume_batch() method. ABCs are suited for that and we can tag
consume/consume_batch with the @abstractmethod wrapper which will
prevent subclasses that don't define these methods properly to be
impossible to even instantiate (as opposed to only crashing once
consume() is called). It's also nicely detected by mypy, which will
throw errors such as this on invalid use:

error: Only concrete class can be given where "Type[TestWorker]" is
expected
error: Cannot instantiate abstract class 'TestWorker' with abstract
attribute 'consume'

Due to it being detected by mypy, we can remove the test
test_worker_noconsume which just tested the old version of this -
raising an exception when the unimplemented consume() gets called. Now
it can be handled already on the linter level.
2019-12-28 10:52:17 -08:00
Mateusz Mandera ec209a9bc9 test_queue_worker: Extract a repetitive mock. 2019-12-28 10:52:13 -08:00
Mateusz Mandera a54640fc68 queue: Share exception handling code between loop and normal workers.
LoopQueueProcessingWorker can handle exceptions inside consume_batch in
a similar manner to how QueueProcessingWorker handles exceptions inside
consume.
2019-12-28 10:47:36 -08:00
Mateusz Mandera e559447f83 ldap: Improve logging.
Our ldap integration is quite sensitive to misconfigurations, so more
logging is better than less to help debug those issues.
Despite the following docstring on ZulipLDAPException:

"Since this inherits from _LDAPUser.AuthenticationFailed, these will
be caught and logged at debug level inside django-auth-ldap's
authenticate()"

We weren't actually logging anything, because debug level messages were
ignored due to our general logging settings. It is however desirable to
log these errors, as they can prove useful in debugging configuration
problems. The django_auth_ldap logger can get fairly spammy on debug
level, so we delegate ldap logging to a separate file
/var/log/zulip/ldap.log to avoid spamming server.log too much.
2019-12-28 10:47:08 -08:00
Mateusz Mandera a180f01e6b ldap: Use a cleaner super().authenticate() call in ZulipLDAPAuthBackend. 2019-12-28 10:47:08 -08:00
Steve Howell 49cd719273 tests: Verify subscriptions more thoroughly.
When you subscribe/unsubscribe yourself, you want
to make sure these calls work too:

    subscribed_subs
    unsubscribed_subs
2019-12-28 07:29:51 -05:00
Steve Howell 32a1ef20d1 minor: Extract helper for search tests. 2019-12-28 05:43:04 -05:00
Steve Howell f3ebb5fbee test cleanup: Extract a sorter() helper. 2019-12-28 05:43:04 -05:00
Steve Howell 9ac2fe2826 test cleanup: Extract a matcher() helper. 2019-12-28 05:43:04 -05:00
Anders Kaseorg 4b590cc522 templates: Correct sample Google authorized redirect URI.
The required URI was changed in #11450.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-12-21 20:08:31 -08:00
Anders Kaseorg 1f8d21af74 test-install: Use lxc-destroy -f instead of lxc-stop.
Fixes this error after rebooting the host:

$ sudo ./destroy-all  -f
zulip-install-bionic-41MM2
lxc-stop: zulip-install-bionic-41MM2: tools/lxc_stop.c: main: 191 zulip-install-bionic-41MM2 is not running

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-12-18 03:48:39 -08:00
Anders Kaseorg 9b5f9858fb test-install: Run lxc-attach with --clear-env.
The host environment variables (especially PATH) should not be allowed
to pollute the test and could interfere with it.

This allows test-install to run on a NixOS host.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-12-18 03:48:39 -08:00
Anders Kaseorg ab211c7acf lint: Tell ShellCheck to look for sourced files at relative paths.
This uses the new -P option of ShellCheck 0.7.0.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-12-18 03:48:02 -08:00
Tim Abbott 02169c48cf ldap: Fix bad interaction between EMAIL_ADDRESS_VISIBILITY and LDAP sync.
A block of LDAP integration code related to data synchronization did
not correctly handle EMAIL_ADDRESS_VISIBILITY_ADMINS, as it was
accessing .email, not .delivery_email, both for logging and doing the
mapping between email addresses and LDAP users.

Fixes #13539.
2019-12-15 22:59:02 -08:00
Alex Dehnert 998b52dd8c docs: Remove mention of Dropbox CLA.
Based on discussion in https://chat.zulip.org/#narrow/stream/19-documentation/topic/CLA and 
https://chat.zulip.org/#narrow/stream/105-zulipbot/topic/New.20contributor.20definition, signing 
the Dropbox CLA is no longer a requirement, and it's a essentially a documentation bug that it's 
still mentioned.
2019-12-14 22:07:38 -08:00
Tim Abbott 18de865daa docs: Simplify a few details in our release checklist. 2019-12-13 17:24:49 -08:00
Tim Abbott b68ff6446c version: Update version and changelog for Zulip 2.1.1 release. 2019-12-13 17:19:45 -08:00
Tim Abbott e38c58e7c7 docs: Rewrite LDAP discussion of AUTH_LDAP_REVERSE_EMAIL_SEARCH.
This moves the mandatory configuration for options A/B/C into a single
bulleted list for each option, rather than split across two steps; I
think the result is significantly more readable.

It also fixes a bug where we suggested setting
AUTH_LDAP_REVERSE_EMAIL_SEARCH = AUTH_LDAP_USER_SEARCH in some cases,
whereas in fact it will never work because the parameters are
`%(email)s`, not `%(user)s`.

Also, now that one needs to set AUTH_LDAP_REVERSE_EMAIL_SEARCH, it
seems worth adding values for that to the Active Directory
instructions.  Thanks to @alfonsrv for the suggestion.
2019-12-13 13:55:52 -08:00
Vishnu KS 6901087246 install: Use crudini for storing value of POSTGRES_MISSING_DICTIONARIES.
This simplifies the RDS installation process to avoid awkwardly
requiring running the installer twice, and also is significantly more
robust in handling issues around rerunning the installer.

Finally, the answer for whether dictionaries are missing is available
to Django for future use in warnings/etc. around full-text search not
being great with this configuration, should they be required.
2019-12-13 12:05:39 -08:00
Tim Abbott 851eb1a6ee generate_test_data: Remove some useless type annotations.
One of these caused a parser error trying to run pyre on Zulip; the
other is just useless as the type can be inferred.
2019-12-13 11:52:23 -08:00
Mateusz Mandera 1926649dae migrations: Avoid triggering backend initalization in migration 0209.
Fixes #13528.
The email_auth_enabled check caused all enabled backends to get
initialized, and thus if LDAP was enabled the check_ldap_config()
check would cause an error if LDAP was misconfigured
(for example missing the new settings).
2019-12-13 10:54:05 -08:00
Tim Abbott 9812c6d445 version: Update version strings following 2.1 release. 2019-12-12 22:53:52 -08:00
Rohitt Vashishtha dc4181beec minor: Fix typo in changelog. 2019-12-12 22:52:09 -08:00
Tim Abbott 03a3ae8b61 Release Zulip Server 2.1.0. 2019-12-12 22:23:22 -08:00
Tim Abbott 35959d43c4 docs: Clean up troubleshooting guide.
This article is definitely still below our polish goals, but this is
also definitely an improvement.
2019-12-12 22:19:12 -08:00
Tim Abbott bb6bf837ad docs: Update changelog in preparation for 2.1 release. 2019-12-12 21:11:37 -08:00
Tim Abbott 29ace8100f i18n: Update translation data from transifex. 2019-12-12 20:34:46 -08:00
Tim Abbott 7ccc8373e2 bugdown: Fix logic for extracting attachment path_id.
In 3892a8afd8, we restructured the
system for managing uploaded files to a much cleaner model where we
just do parsing inside bugdown.

That new model had potentially buggy handling of cases around both
relative URLs and URLS starting with `realm.host`.

We address this by further rewriting the handling of attachments to
avoid regular expressions entirely, instead relying on urllib for
parsing, and having bugdown output `path_id` values, so that there's
no need for any conversions between formats outside bugdowm.

The check_attachment_reference_change function for processing message
updates is significantly simplified in the process.

The new check on the hostname has the side effect of requiring us to
fix some previously weird/buggy test data.

Co-Author-By: Anders Kaseorg <anders@zulipchat.com>
Co-Author-By: Rohitt Vashishtha <aero31aero@gmail.com>
2019-12-12 20:30:26 -08:00
Tim Abbott 4adcd35698 version: Update version and changelog for Zulip 2.0.8 release. 2019-12-12 17:32:27 -08:00