Commit Graph

11015 Commits

Author SHA1 Message Date
Anders Kaseorg 39f9abeb3f python: Convert json.loads(f.read()) to json.load(f).
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-03-24 10:46:32 -07:00
Mateusz Mandera 5ae6f4f0dd tornado: Put port in logging_data before setup_event_queue in runtornado.
setup_event_queue() generates some logs about loaded event queues, and
it's good for the logging system to have access to the port at that
point already.
2020-03-24 10:25:01 -07:00
Mateusz Mandera eb0216c5a8 middleware: Log <user.id>@subdomain instead of subdomain/<user.id>.
It was decided that the new format is preferable.
2020-03-24 10:25:01 -07:00
Mateusz Mandera a1daf0cf83 middleware: Log 'root/<user.id>' when realm string_id is ''. 2020-03-24 10:25:01 -07:00
Tim Abbott 6df86dab3e jira: Handle comment_created events without issue details.
I'm not sure what causes some Jira webhook events to not include the
metadata that other events do, but it's definitely a format sent by
real installations of Jira (likely a very old version, since this has
fields missing from what modern Jira does) and we've seen it in
production.

The best we can do is encourage users to upgrade Jira for better data.
2020-03-22 21:43:21 -07:00
Tim Abbott 180d8abed6 messages: Fix unlikely exception when trying to delete a message. 2020-03-22 21:35:27 -07:00
Tim Abbott 481d351cee events: Fix buggy apply_events handling of starred_messages.
The previous starred_messages race handling did not correctly consider
the possibility that an event queue might have been registered without
starred_messages.
2020-03-22 21:30:23 -07:00
Mateusz Mandera 5da2f80140 queue_processors: Extract a duplicated logic block into do_consume. 2020-03-22 18:45:46 -07:00
Mateusz Mandera 27c19b081b rate_limit: Remove inaccurate docstring on clear_history methods. 2020-03-22 18:42:35 -07:00
Mateusz Mandera b9e5103d0c rate_limit: Refactor RateLimiterBackend to operate on keys and rules.
Instead of operating on RateLimitedObjects, and making the classes
depend on each too strongly. This also allows getting rid of get_keys()
function from RateLimitedObject, which was a redis rate limiter
implementation detail. RateLimitedObject should only define their own
key() function and the logic forming various necessary redis keys from
them should be in RedisRateLimiterBackend.
2020-03-22 18:42:35 -07:00
Mateusz Mandera 8069133f88 rate_limit: Remove __str__ methods of RateLimitedObjects.
These were clunky from the start and are no longer used, as keys are now
used directly for logging purposes.
2020-03-22 18:42:35 -07:00
Mateusz Mandera 4e9f77a6c4 rate_limit: Adjust keys() of some RateLimitedObjects.
type().__name__ is sufficient, and much readable than type(), so it's
better to use the former for keys.
We also make the classes consistent in forming the keys in the format
type(self).__name__:identifier and adjust logger.warning and statsd to
take advantage of that and simply log the key().
2020-03-22 18:42:35 -07:00
Mateusz Mandera 2c6b1fd575 rate_limit: Rename key_fragment() method to key(). 2020-03-22 18:42:35 -07:00
Mateusz Mandera 9c9f8100e7 rate_limit: Add the concept of RateLimiterBackend.
This will allow easily swapping and using various implementations of
rate-limiting, and separate the implementation logic from
RateLimitedObjects.
2020-03-22 18:42:35 -07:00
Mateusz Mandera 85df6201f6 rate_limit: Move functions called by external code to RateLimitedObject. 2020-03-22 18:42:35 -07:00
Mateusz Mandera 2b51b3c6c5 middleware: Also log request subdomain when logging "unauth" request.
This returns us to a consistent logging format regardless of whether
the request is authenticated.

We also update some log examples in docs to be consistent with the new
style.
2020-03-22 18:32:04 -07:00
Mateusz Mandera 3b5b19fde8 tornado: Log shard id in all logs coming from tornado processes.
This will make it easier to investigate using logs which requests are
being processed by which Tornado process.
2020-03-22 18:26:35 -07:00
Dinesh 5cb476e03d auth: Handle confirm registration page in `stage_two_of_registration`.
When a user in login flow using github auth chooses a email that is
not associated with an existing account, it leads to a "continue to
registration" choice. This cannot be tested with the earlier version
of `stage_two_of_registration`.
Also added the test.
Thanks to Mateusz Mandera for the solution.

Co-authored-by: Mateusz Mandera <mateusz.mandera@protonmail.com>
2020-03-22 17:31:01 -07:00
Dinesh 3de646d2cf auth: Improve GitHub auth with multiple verified emails.
The previous model for GitHub authentication was as follows:

* If the user has only one verified email address, we'll generally just log them in to that account
* If the user has multiple verified email addresses, we will always
  prompt them to pick which one to use, with the one registered as
  "primary" in GitHub listed at the top.

This change fixes the situation for users going through a "login" flow
(not registration) where exactly one of the emails has an account in
the Zulip oragnization -- they should just be logged in.

Fixes part of #12638.
2020-03-22 17:31:01 -07:00
Dinesh 5888d7c0f5 auth: Change how config error URLs are configured.
URLs for config errors were configured seperately for each error
which is better handled by having error name as argument in URL.
A new view `config_error_view` is added containing context for
each error that returns `config_error` page with the relevant
context.
Also fixed tests and some views in `auth.py` to be consistent with
changes.
2020-03-22 17:15:18 -07:00
Steve Howell a041d9e4aa minor: Clean up lstrip() for help article titles.
Saying `foo.lstrip('# ')` does more than just remove
a '# ' prefix.  It removes any combination of '#' and
spaces.

We now make the intention slightly more clear.

We would strip these as you'd expect:

    # foo
    ## foo
    ### foo

but for this we now only strip the first "#":

    # # # # # foo
2020-03-22 11:32:29 -07:00
Steve Howell edf1b1e5e8 minor: Fix buggy lstrip() call in integrations dev panel.
Thanks to @minusworld for catching this--see #14264, which
points out that lstrip() doesn't do what your intuition
might tell you it does.

Now we properly remove the "HTTP_" prefix.

It's not clear to me why we need these prefixes for Django
purposes in the fixtures, but I didn't want to go down
the rabbit hole of fixing those.

To test:

    got to http://YOUR-DEV_SERVER/devtools/integrations/
    select "bitbucket3" for the integration.
    select "diagnostics_ping.json" for the fixture.
    see "X_EVENT_KEY" in "Custom HTTP Headers"

Fixes #14264
2020-03-22 11:32:29 -07:00
Steve Howell 8c1244d0b4 tests: Kill off find_one() helper.
This was only recently added.  Using tuple
assignment raises the same errors, so the
indirection probably isn't worth it.
2020-03-20 13:40:20 -07:00
Steve Howell b5cba4aafe test_narrow: Use tuple unpacking to get messages.
This is a bit more rigorous than just
dereferencing the first element of
a list comprehension, as it will give a
ValueError if more matches are found than
the test was expecting.
2020-03-20 13:40:20 -07:00
Steve Howell ef772ee12f bot events: Prevent duplicate add-bot notifications.
We don't need `do_create_user` to send a partial
event here for bots.  The only caller to `do_create_user`
that actually creates bots (apart from some tests that
just need data setup) is `add_bot_backend`, which
sends the more complete event including bot "extras"
like service info.

The modified event tests show the simplification
here (2 events instead of 3).

Also, the bot tests now use tuple unpacking, which
will force a ValueError if we duplicate events
again.
2020-03-20 13:40:19 -07:00
Steve Howell eb9a252ec9 populate_db, tests: Restrict emails in zulip realm.
We now restrict emails on the zulip realm, and now
`email` and `delivery_email` will be different for
users.

This change should make it more likely to catch
errors where we leak delivery emails or use the
wrong field for lookups.
2020-03-19 16:21:31 -07:00
Steve Howell f647587675 bulk_create: Handle realms that hide delivery emails. 2020-03-19 16:04:05 -07:00
Steve Howell ecbbc3e365 performance: Simplify bulk_create_users().
We were going back to the database to get all
the users in the realm, when we had them right
there already.  I believe this is a legacy
of us running on a very old version of Django
(back in early days), where `bulk_create`
didn't give you back ids in a nice way.

In the interim we added the `RealmAuditLog`
code, which does take advantage of the
existing profiles (and proves we can rely
on them).

But meanwhile we were still
doing a query to get all N users in the
realm.  With `selected_related`!

To be fair, bulk_create_users() is by
its very nature a pretty infrequent
operation.  This change is more motivated
by code cleanup.

Now we just loop through user_ids for
the Recipient/Subscriber foreign key rows.

I also removed some fairly convoluted code mapping
emails to user_ids and just work in user_id
space.
2020-03-19 16:04:05 -07:00
Steve Howell 1306239c16 tests: Use email/delivery_email more explicitly.
We try to use the correct variation of `email`
or `delivery_email`, even though in some
databases they are the same.

(To find the differences, I temporarily hacked
populate_db to use different values for email
and delivery_email, and reduced email visibility
in the zulip realm to admins only.)

In places where we want the "normal" realm
behavior of showing emails (and having `email`
be the same as `delivery_email`), we use
the new `reset_emails_in_zulip_realm` helper.

A couple random things:

    - I fixed any error messages that were leaking
      the wrong email

    - a test that claimed to rely on the order
      of emails no longer does (we sort user_ids
      instead)

    - we now use user_ids in some place where we used
      to use emails

    - for IRC mirrors I just punted and used
      `reset_emails_in_zulip_realm` in most places

    - for MIT-related tests, I didn't fix email
      vs. delivery_email unless it was obvious

I also explicitly reset the realm to a "normal"
realm for a couple tests that I frankly just didn't
have the energy to debug.  (Also, we do want some
coverage on the normal case, even though it is
"easier" for tests to pass if you mix up `email`
and `delivery_email`.)

In particular, I just reset data for the analytics
and corporate tests.
2020-03-19 16:04:03 -07:00
Steve Howell b1f8141200 tests: Prevent false positives for duplicate signups.
We specifically give the existing user different
delivery_email and email addresses, to prevent false
positives during the test that checks that users
signing up with an already-existing email get
an error message.

(We also rename the test.)
2020-03-19 14:32:18 -07:00
Steve Howell d71111f3dc presence api: Use email to look up presence.
We don't want to use delivery_email to look up
presence on email-restricted realms.
2020-03-19 14:32:18 -07:00
Steve Howell 42ee2f5e86 tests: Fix test coverage on recent commit.
I guess `test_classes` has 100% line coverage
enforcement, which is a bit tricky for error
handling.

This fixes that, as well as making the name
snake_case and improving the format of the
errors.
2020-03-19 11:37:31 -04:00
Steve Howell 80acbb9fdf Clean up `test_get_all_profiles_avatar_urls`.
This test was using the anti-pattern of doing an
assertion inside a conditional.

I added the `findOne` helper to make it easier
to write robust tests for scenarios like this.
2020-03-19 10:34:35 -04:00
Mateusz Mandera f5e95c4fc1 requirements: Bump python-social-auth version.
We had a bunch of ugly hacks to monkey patch things due to upstream
being temporarily unmaintained and not merging PRs. Now the project is
active again and the fixes have been merged and included in the latest
version - so we clean up all that code.
2020-03-18 12:14:31 -07:00
Steve Howell ca74cd6e37 bug fix: Fix unread counts for certain API messages.
If I send a message from a normal Zulip client, it is
considered to be "read" by me.  But if I send it via
an API program (using my human account), the message
is not immediately "read" by me.

Now we handle this correctly in `get_raw_unread_data`.

The symptom of this was that these messages would get
"stuck" in "Private Messages" narrows until the next
time you reloaded your app.
2020-03-17 16:26:42 -07:00
Tim Abbott 1b95a1dea7 hello: Focus on distributed teams as use case.
I've always thought of distributed teams as the place where Zulip
really shines over other tools, because chat is much more important in
that context.

And I've always been kinda unhappy with "most productive team chat" as
a line.

There's a lot more we should do here, but this is a start.
2020-03-17 14:49:17 -07:00
Mateusz Mandera 5e47f2975e actions: Optimize query in get_occupied_streams.
Using an Exists subquery to avoid scanning the entire Subscription
table seems to speed things up greatly.
Set up with:
 ./manage.py populate_db --extra_users 2000 --extra-streams 1000

Tested on my computer, the original function was taking ~1.2seconds,
the optimized version only ~0.05-0.06.

Likely fixes #13874; we can re-open if after production testing we
feel more work is warranted.
2020-03-17 05:44:05 -07:00
Mateusz Mandera 884ff425da cache: Remove dead code for caching recipients.
With recipient column denormalized into all three of Stream, UserProfile
and Huddle, there is no more use for this caching.
2020-03-17 05:41:11 -07:00
Mateusz Mandera b4ce167a88 models: Add recipient foreign key to Huddle.
This follows the already tested approach from
8acfa17fe6.
2020-03-17 05:41:11 -07:00
Mateusz Mandera 08780fcb95 test_import_export: Fix how stream.recipient_id is verified. 2020-03-17 05:41:11 -07:00
Tim Abbott b064559652 zephyr: Add strict assertion about username format.
This ensures that even if it were possible to create an MIT Kerberos
account with a malicious username and/or hack webathena to pretend
that's the case, one couldn't do anything malicious.

This security improvement only impacts a single installation of Zulip
where Zephyr mirroring is in use that has already had the fix applied,
so there's no reason to do a security notice for it.

Found by Graham Bleaney using pysa.
2020-03-17 05:37:25 -07:00
Steve Howell ff4b5d8ce6 minor: Fix list/set test flake. 2020-03-15 09:11:14 -04:00
Steve Howell fcc5ae5247 invites: Fix regression w/email vs. delivery_email.
In 220c2a5ff3 I
introduced a query to find invites by delivery_email
but was still using email as the key.

For most realms `email` and `delivery_email` are
synonymous, so this temporary bug would not affect
them.  For realms that restrict emails, the invite
would have probably failed for other reasons, but
the symptom would have been less clear.
2020-03-12 10:13:08 -04:00
Steve Howell 1b16693526 tests: Limit email-based logins.
We now have this API...

If you really just need to log in
and not do anything with the actual
user:

    self.login('hamlet')

If you're gonna use the user in the
rest of the test:

    hamlet = self.example_user('hamlet')
    self.login_user(hamlet)

If you are specifically testing
email/password logins (used only in 4 places):

    self.login_by_email(email, password)

And for failures uses this (used twice):

    self.assert_login_failure(email)
2020-03-11 17:10:22 -07:00
Steve Howell c235333041 test performance: Pass in users to api_* helpers.
This reduces query counts in some cases, since
we no longer need to look up the user again. In
particular, it reduces some noise when we
count queries for O(N)-related tests.

The query count is usually reduced by 2 per
API call.  We no longer need to look up Realm
and UserProfile.  In most cases we are saving
these lookups for the whole tests, since we
usually already have the `user` objects for
other reasons.  In a few places we are simply
moving where that query happens within the
test.

In some places I shorten names like `test_user`
or `user_profile` to just be `user`.
2020-03-11 14:18:29 -07:00
Steve Howell 626ad0078d tests: Add uuid_get and uuid_post.
We want a clean codepath for the vast majority
of cases of using api_get/api_post, which now
uses email and which we'll soon convert to
accepting `user` as a parameter.

These apis that take two different types of
values for the same parameter make sweeps
like this kinda painful, and they're pretty
easy to avoid by extracting helpers to do
the actual common tasks.  So, for example,
here I still keep a common method to
actually encode the credentials (since
the whole encode/decode business is an
annoying detail that you don't want to fix
in two places):

    def encode_credentials(self, identifier: str, api_key: str) -> str:
        """
        identifier: Can be an email or a remote server uuid.
        """
        credentials = "%s:%s" % (identifier, api_key)
        return 'Basic ' + base64.b64encode(credentials.encode('utf-8')).decode('utf-8')

But then the rest of the code has two separate
codepaths.

And for the uuid functions, we no longer have
crufty references to realm.  (In fairness, realm
will also go away when we introduce users.)

For the `is_remote_server` helper, I just inlined
it, since it's now only needed in one place, and the
name didn't make total sense anyway, plus it wasn't
a super robust check.  In context, it's easier
just to use a comment now to say what we're doing:

    # If `role` doesn't look like an email, it might be a uuid.
    if settings.ZILENCER_ENABLED and role is not None and '@' not in role:
        # do stuff
2020-03-11 14:18:29 -07:00
Steve Howell 00dc976379 tests: Use users for common_subscribe_to_streams.
We also use users for get_streams().
2020-03-11 14:18:29 -07:00
Sourabh Singh 1b3cfecf2a
webhooks: Add team reviewers support in github webhook.
The github webhook implementation previously ignored the "team reviewers"
part of pull_request events, resulting in inaccurate output.

Fixes: #14096.
2020-03-10 16:29:59 -07:00
Mateusz Mandera 2000608a9e report_error: Fix inaccurate docstring.
do_report_error isn't actually below.
2020-03-09 13:54:58 -07:00
Mateusz Mandera 89394fc1eb middleware: Use request.user for logging when possible.
Instead of trying to set the _requestor_for_logs attribute in all the
relevant places, we try to use request.user when possible (that will be
when it's a UserProfile or RemoteZulipServer as of now). In other
places, we set _requestor_for_logs to avoid manually editing the
request.user attribute, as it should mostly be left for Django to manage
it.
In places where we remove the "request._requestor_for_logs = ..." line,
it is clearly implied by the previous code (or the current surrounding
code) that request.user is of the correct type.
2020-03-09 13:54:58 -07:00