Commit Graph

10857 Commits

Author SHA1 Message Date
Tim Abbott 1ea2f188ce tornado: Rewrite Django integration to duplicate less code.
Since essentially the first use of Tornado in Zulip, we've been
maintaining our Tornado+Django system, AsyncDjangoHandler, with
several hundred lines of Django code copied into it.

The goal for that code was simple: We wanted a way to use our Django
middleware (for code sharing reasons) inside a Tornado process (since
we wanted to use Tornado for our async events system).

As part of the Django 2.2.x upgrade, I looked at upgrading this
implementation to be based off modern Django, and it's definitely
possible to do that:
* Continue forking load_middleware to save response middleware.
* Continue manually running the Django response middleware.
* Continue working out a hack involving copying all of _get_response
  to change a couple lines allowing us our Tornado code to not
  actually return the Django HttpResponse so we can long-poll.  The
  previous hack of returning None stopped being viable with the Django 2.2
  MiddlewareMixin.__call__ implementation.

But I decided to take this opportunity to look at trying to avoid
copying material Django code, and there is a way to do it:

* Replace RespondAsynchronously with a response.asynchronous attribute
  on the HttpResponse; this allows Django to run its normal plumbing
  happily in a way that should be stable over time, and then we
  proceed to discard the response inside the Tornado `get()` method to
  implement long-polling.  (Better yet might be raising an
  exception?).  This lets us eliminate maintaining a patched copy of
  _get_response.

* Removing the @asynchronous decorator, which didn't add anything now
  that we only have one API endpoint backend (with two frontend call
  points) that could call into this.  Combined with the last bullet,
  this lets us remove a significant hack from our
  never_cache_responses function.

* Calling the normal Django `get_response` method from zulip_finish
  after creating a duplicate request to process, rather than writing
  totally custom code to do that.  This lets us eliminate maintaining
  a patched copy of Django's load_middleware.

* Adding detailed comments explaining how this is supposed to work,
  what problems we encounter, and how we solve various problems, which
  is critical to being able to modify this code in the future.

A key advantage of these changes is that the exact same code should
work on Django 1.11, Django 2.2, and Django 3.x, because we're no
longer copying large blocks of core Django code and thus should be
much less vulnerable to refactors.

There may be a modest performance downside, in that we now run both
request and response middleware twice when longpolling (once for the
request we discard).  We may be able to avoid the expensive part of
it, Zulip's own request/response middleware, with a bit of additional
custom code to save work for requests where we're planning to discard
the response.  Profiling will be important to understanding what's
worth doing here.
2020-02-13 16:13:11 -08:00
Chris Heald a91358e186 webhooks: Fix hellosign webhook.
Hellosign now posts their callback as form/multipart, which Django only
permits to be read once. Attempts to access request.body after the
initial read throw "django.http.request.RawPostDataException: You
cannot access body after reading from request's data stream".

Fixes #13847.
2020-02-12 22:36:11 -08:00
Mateusz Mandera 27b15a9722 install: Don't create internal realm in the installation process. 2020-02-12 12:00:10 -08:00
Mateusz Mandera bde495db87 registration: Add support for mobile and desktop flows.
This makes it possible to create a Zulip account from the mobile or
desktop apps and have the end result be that the user is logged in on
their mobile device.

We may need small changes in the desktop and/or mobile apps to support
this.

Closes #10859.
2020-02-12 11:22:16 -08:00
Mateusz Mandera fe33966642 sessions: Implement the concept of expirable session variables.
This can be useful in the future for various things, and right now it'll
specifically be used in the signup mobile/desktop flows.
2020-02-12 11:09:55 -08:00
Hashir Sarwar eb23c6fa6c test_fixtures: Clean up interface for `template_database_status()`.
1) Created a new class `DatabaseType` and access its objects inside
`template_database_status()` instead of sending five arguments with
default values.

2) Made `check_files` and `setting_name` local variables instead of
function parameters since they had same value(None) for every call.

Fixes #13845.
2020-02-12 11:07:10 -08:00
Tim Abbott 96b0ec705d email_notifications: Fix missing translation tags on sender. 2020-02-12 10:54:34 -08:00
Anders Kaseorg e257253e64 emoji_codes: Replace JS module with JSON module.
webpack optimizes JSON modules using JSON.parse("{…}"), which is
faster than the normal JavaScript parser.

Update the backend to use emoji_codes.json too instead of the three
separate JSON files.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-02-12 10:09:12 -08:00
Tim Abbott cb2c96f736 test_templates: Remove shallow template rendering code.
This code was very useful when first implemented to help catch errors
where our backend templates didn't render, but has been superceded by
the success of our URL coverage testing (which ensures every URL
supported by Zulip's urls.py is accessed by our tests, with a few
exceptions) and other tests covering all of the emails Zulip sends.

It has a significant maintenance cost because it's a bit hacky and
involves generating fake context, so it makes sense to remove these.
Any future coverage issues with templates should be addressed with a
direct test that just accessing the relevant URL or sends the relevant
email.
2020-02-11 18:00:15 -08:00
Mateusz Mandera 2475adbf8a messages_for_topic: Use stream.recipient_id for more efficient query. 2020-02-11 17:39:43 -08:00
Chris Heald bddb370750 tests: Reorder python version logic to be more clear. 2020-02-11 17:34:56 -08:00
Chris Heald 3236483d0e tests: Fix type reflection for Python 3.7.
In python 3.5-3.6, generic types had an __origin__ attribute which
indicated which generic they originated from; the code was reflecting on
that value to check types against the openapi spec. In python3.7, this
changed, and there's no longer an immediately simple way to get this
information in all cases. __origin__ appears to be the implementing
class now, returning `list` or `collections.abc.Iterator` rather than
`typing.List` and `typing.Iterator`. This adds a sloppy-but-effective
mechanism for inferring if a type maps to the List/Dict/Iterator/Mapping
types and gets the test suite passing again.
2020-02-11 17:34:56 -08:00
Dinesh 4304d5f8db auth: Add support for GitLab authentication.
With some tweaks by tabbott to the documentation and comments.

Fixes #13694.
2020-02-11 13:54:17 -08:00
Steve Howell 900f98c0c5 presence: Use realm_id for UserPresence queries.
We now use realm_id for querying UserPresence
instead of building a big WHERE clause from the
list of user_ids.

This commit may be a bit hard to measure, since
we still get the list of user_ids for the PushToken
query in the same method.
2020-02-11 13:11:58 -08:00
Steve Howell d68052b68d presence: Add realm/timestamp index to UserPresence.
It adds this index:

    "zerver_userpresence_realm_id_timestamp_25f410da_idx" btree (realm_id, "timestamp")

We expect this index to provide a major performance improvement when
fetching presence data for the whole realm from the database on
servers like zulipchat.com hosting several realms.
2020-02-11 13:11:28 -08:00
Tim Abbott fcac3a4342 recipients: Rename extract_recipients to extract_private_recipients.
Recent changes mean this function is now only used for private
messages.
2020-02-11 12:28:14 -08:00
Steve Howell 1b6578cafd messages: Fix bug with commas in stream names.
We now validate streams with a separate
function from PM recipients.

It's confusing enough all the ways you can
encode a stream or encode the PM recipients,
but trying to do it all in one function was
hard to reason about and led to at least one
bug.

In particular, there was a bug where streams
with commas in them would get split.  Now
we just don't ever split on commas inside
of `extract_stream_indicator`.

Fixes #13836
2020-02-11 12:20:54 -08:00
Steve Howell 96132fe0e9 extract_recipients: Enforce str as incoming type.
After removing internal_send_message() in a recent
commit, we now have only two callers for
extract_recipients, and they are both related
to our REQ mechanism that always passes strings
to converters.  (If there are default values,
REQ does not call the converters.)

We therefore make two changes:

    - use the more strict annotation of "str"
      for the `s` parameter

    - don't bother with the isinstance check
2020-02-11 12:20:54 -08:00
Steve Howell 8c3eaeb872 Remove obsolete internal_send_messages().
We have been phasing this out for a couple years,
and I fixed the last stragglers over the last
couple days.
2020-02-11 12:20:54 -08:00
Steve Howell 2e8dec233e slow queries: Use internal_send_stream_message().
Note that while the test mocks the actual message
send, we now have a `get_stream` call in the queue
worker, so we have to set up a real stream for
testing (or we could have mocked that as well, but
it didn't seem necessary).  The setup queries add
to the amount of queries reported by the test,
plus the `get_stream` call.  I just made the
query count a digits regex, which is a little bit
lame, but I don't think it's worth risking test
flakes for this.
2020-02-11 12:20:54 -08:00
Steve Howell e37d660d19 error_notify: Use internal_send_stream_message(). 2020-02-11 12:20:53 -08:00
Steve Howell c4e3cfebb0 presence: Add realm_id to UserPresence.
This index is intended to optimize the performance of the very
frequently run query of "what is the presence status of all users in a
realm?".

Main changes:
    - add realm_id to UserPresence
    - add index for realm_id
    - backfill realm_id for old rows
    - change all writes to UserPresence to include
      realm_id

The index is of this form:

    "zerver_userpresence_realm_id_5c4ef5a9" btree (realm_id)

We will create an index on (realm_id, timestamp) in a
future commit, but I think it's a bit faster if you do
the backfill before the index.

There's also a minor tweak to the populate_db script.
2020-02-10 17:21:45 -08:00
Steve Howell 28a8ffbc4c email_mirror: Use internal_send_stream_message().
This is just a refactoring to the more modern API
for sending internal messages.

To make this work we now plumb the email_gateway
flag through `internal_send_stream_message` instead
of `internal_send_message`.

We also change `send_zulip` to have its callers
pass in a full UserProfile object (which one of
them already had).
2020-02-10 15:45:13 -08:00
Steve Howell 6922eef380 signups: Use internal_send_stream_message().
We prefer this to internal_send_message().

We are trying to deprecate `internal_send_message`,
which has extra moving parts related to
`extract_recipients` and `Addressee.legacy_build`.

There are two chunks of code that I touch here
that look pretty similar, but I'm not quite
sure they're worth de-duplicating, since they
use different topics and different message
content.
2020-02-10 15:45:13 -08:00
Steve Howell f1ac16973c tests: Create signups stream in RealmCreationTests. 2020-02-10 15:45:13 -08:00
Steve Howell b33552997e cross realm bots: Simplify notify_new_user.
Instead of having `notify_new_user` delegate
all the heavy lifting to `send_signup_message`,
we just rename `send_signup_message` to be
`notify_new_user` and remove the one-line
wrapper.

We remove a lot of obsolete complexity:

    - `internal` was no longer ever set to True
      by real code, so we kill it off as well
      as well as killing off the internal_blurb code
      and the now-obsolete test

    - the `sender` parameter was actually an
      email, not a UserProfile, but I think
      that got past mypy due to the caller
      passing in something from settings.py

    - we were only passing in NOTIFICATION_BOT
      for the sender, so we just hard code
      that now

    - we eliminate the verbose
      `admin_realm_signup_notifications_stream`
      parameter and just hard code it to
      "signups"

    - we weren't using the optional realm
      parameter

There's also a long ugly comment in
`get_recipient_info` related to this code
that I amended for now.
We should try to take action in a subsequent
commit.
2020-02-10 15:45:13 -08:00
Steve Howell 6e40db4b1f minor: Fix misleading comments.
These comments were naming the wrong function.
2020-02-10 15:45:13 -08:00
Hashir Sarwar dcbd3e486f stream_subscription: Remove unused TypedDict `SubInfo`. 2020-02-10 14:04:22 -08:00
Steve Howell 2ff41bf9e5 /json/users: Use field.realm for realm lookup.
This avoids an unnecessary join to UserProfile.

To verify this, you can do `print(queries)` in the
`test_get_custom_profile_fields_from_api` test.  It's
kinda noisy, so I excerpted them below...

Before:

    SELECT ...
    FROM "zerver_customprofilefieldvalue"
    INNER JOIN "zerver_userprofile" ON ("zerver_customprofilefieldvalue"."user_profile_id" = "zerver_userprofile"."id")
    INNER JOIN "zerver_customprofilefield" ON ("zerver_customprofilefieldvalue"."field_id" = "zerver_customprofilefield"."id")
    WHERE "zerver_userprofile"."realm_id" = 2

After:

    SELECT ...
    FROM "zerver_customprofilefieldvalue"
    INNER JOIN "zerver_customprofilefield" ON ("zerver_customprofilefieldvalue"."field_id" = "zerver_customprofilefield"."id")
    WHERE "zerver_customprofilefield"."realm_id" = 2'

I don't have any way to measure the two queries with
realistic data, but I would assume the second
query is significantly faster on most of our instances,
since CustomProfileField should be tiny.
2020-02-09 22:04:02 -08:00
Steve Howell 9303c386b8 tests: Count queries for /json/users.
I am trying to optimize a query in this endpoint.
I don't think I'll actually reduce the number of
queries, but I wanted to capture the query and
this was the easiest way to do it, so might as
well check in the code! :)
2020-02-09 22:04:02 -08:00
Steve Howell 01f180d042 minor: Remove unused line of code in get_raw_user_data().
The line removed here is a noop, as both sides of the
immediately following conditional reassign the
same variable.

This harmless cruft was the result of the recent commit
1ae5964ab8, which added
support for single-user GETs.
2020-02-09 22:04:02 -08:00
Tim Abbott 986706c7e5 tornado: Use common code for copying headers.
This fixes a bug where our asynchronous requests were only copying the
Content-Type header (i.e. the one case where we're noticed) from the
Django HttpResponse.  I'm not sure what the impact of this would be;
the rate-limiting headers rarely come up when breaking a long-polled
request.  But it seems clearly an improvement to do this in a
consistent fashion.

Only the headers piece is a change; in Tornado

    self.finish(x)

is equivalent to:

    self.write(x)
    self.finish()
2020-02-07 16:14:19 -08:00
Tim Abbott 224a73a3ec tornado: Extract a function for writing Tornado responses.
This increases the readability of what's happening in our core Tornado
handlers code, as well as making this logic reusable.
2020-02-07 16:13:49 -08:00
Tim Abbott 5305e8af85 tornado: Extract convert_tornado_request_to_django_request. 2020-02-07 16:03:58 -08:00
Tim Abbott fc58ae117a handlers: Rename confusingly named response to result_dict.
This should somewhat increase the readability of zulip_finish.
2020-02-07 16:03:58 -08:00
Vishnu KS 4572be8c27 api: Rename subject_links to topic_links.
Fixes #13588
2020-02-07 14:35:22 -08:00
Tim Abbott 84edb5c516 test_fixtures: Fix buggy reuse of status_dir between databases.
Apparently, the arguments passed to template_database_status were
incorrect for the manual testing development database, in that we
didn't pass a status_dir when calling into that code from provision.

The result was that provisioning before running `test-backend` would
ignore changes to the list of check_files (etc.) made after rebasing,
and vice versa.

The cleanest fix is to compute status_dir from other values passed in;
I'm also going to open a follow-up issue for creating a better overall
interface here.
2020-02-07 13:33:08 -08:00
akashaviator 1ae5964ab8 api: Add an api endpoint for GET /users/{id}
This adds a new API endpoint for querying basic data on a single other
user in the organization, reusing the existing infrastructure (and
view function!) for getting data on all users in an organization.

Fixes #12277.
2020-02-07 10:36:31 -08:00
Tim Abbott e39840c705 users: Add read-only mode for access_user_by_id.
We've be using this in the upcoming GET /users/{id} method.
2020-02-07 10:36:31 -08:00
Tim Abbott aa9286a1f9 users: Move query into caller of get_custom_profile_field_values.
This will be useful for supporting a smaller query for a single user.
2020-02-07 10:36:31 -08:00
Tim Abbott 79e5dd1374 users: Rename get_raw_user_data user parameter to acting_user.
This is for improved clarity as we extend this function to take
multiple user objects.
2020-02-07 10:36:31 -08:00
Steve Howell 7e99e7feb2 presence: Extract get_legacy_user_info.
This code is a bit flatter and just preps the data
for a single user.  There is never any interaction
between the data for user A and user B, so we can
mostly avoid complicated nested data structures
and do most of the data-crunching on a per-user basis.

We also do an explicit sort of the data before
running it through groupby.  The explicit sort
simplifies how we calculate `most_recent_info`
and also avoids needing to add `dt` to an intermediate
data structure.

Finally, when it comes to the individual client data,
the code has relied on the assumption that there is
only one row per client, which I believe to be true,
but now the code is more explicit about that.
2020-02-06 17:16:22 -08:00
Steve Howell bf3baa14ac presence: Rename get_status_dict_by_user(). 2020-02-06 17:16:22 -08:00
Steve Howell 675f8514e8 presence: Rename get_status_dict().
We renamed this to get_presences_for_realm(),
and we have the caller pass in realm, not
user_profile.
2020-02-06 17:16:22 -08:00
Steve Howell 363e6bf239 presence: Move get_status_dicts_for_rows(). 2020-02-06 17:16:22 -08:00
Steve Howell 36fba1076f presence: Move get_status_dict_by_user. 2020-02-06 17:16:22 -08:00
Steve Howell 6f027d84a9 presence: Move get_status_dict_by_realm. 2020-02-06 17:16:22 -08:00
Steve Howell 703338dfa3 presence: Extract lib/presence.py.
This will make more sense when we pull some
code out of the model.
2020-02-06 17:16:22 -08:00
Steve Howell a5093be867 presence: Rename get_status_list.
The word "status" is vague, and this isn't
actually returning a list, so we now name it
get_presence_response.

I originally was gonna rename this to
get_presence_dict, but there's a function
called get_status_dict that returns a subset
of the response, so I think it's a bit more
clear that this is the bigger dict that
actually gets sent back.
2020-02-06 17:16:22 -08:00
Steve Howell 8a1fb2dcd6 presence: Calculate server_timestamp slightly earlier.
We want to err on the side of server_timestamp being
old, since we may eventually use this to make responses
just include incremental changes, and we don't want a
time window (however small) when we miss presence rows.
The clients will be able to deal with duplicate data
to the extent that the time windows are overlapping.

Also, extracting the other local var here
(for `presences`) will set up a subsequent commit
where we re-format the data for clients with
slim_presence=True.
2020-02-06 17:16:22 -08:00