Commit Graph

6002 Commits

Author SHA1 Message Date
Steve Howell 1498b2ef69 apply_event: Fix broken deepcopy attempt for subs.
When we were getting an apply_event call for
a subscription/add event, we were trying not to
mutate the event itself, but this clumsy code
was still mutating the actual event:

    # Avoid letting 'subscribers' entries end up in the list
    for i, sub in enumerate(event['subscriptions']):
        event['subscriptions'][i] = \
            copy.deepcopy(event['subscriptions'][i])
        del event['subscriptions'][i]['subscribers']

This is only a theoretical bug.

The only person who receives a subscription/add
event is the current user.

And it wouldn't have affected the current user,
since the apply_event was correctly updating the
state, and we wouldn't actually deliver the event
to the client (because the whole point of apply_event
is to prevent us from having to piggyback the
super-recent events on to our payload or put
them into the event queue and possibly race).

The new code just cleanly makes a copy of each
sub, if necessary, as we add them to state["subscriptions"].

And I updated the event schemas to reflect that
subscribers is always present in subscription/add
event.

Long term we should probably avoid sending subscribers
on this event when the clients don't set something
like include_subscribers.  That's a fairly complicated
fix that involves passing in flags to ClientDescriptor.
Alternatively, we could just say that our policy is
that we never send subscribers there, but we instead
use peer_add events.  See issue #17089 for more
details.
2021-01-21 15:04:07 -08:00
Steve Howell c6acde9c63 apply_event: Use stream_ids, not names, for add/remove.
It's always cleaner to work in id space.  It probably
would have required a perfect storm to have broken
the existing code, but using ids is obviously more
robust in theory, and just as simple.
2021-01-21 15:04:07 -08:00
Steve Howell 0519f2d2b9 minor: Move include_subscribers guards in apply_event.
This sets us up for a cleaner diff in an
upcoming commit.
2021-01-21 15:04:07 -08:00
Steve Howell 3fa595ef85 minor: Clean up args for apply_event.
We now require keywords, so that there is no
pitfall for mixing up boolean parameters.
Positional parameters are basically evil
when you have a bunch of bools.

I also make user_profile the first argument.

Finally, the code is more diff-friendly.
2021-01-21 15:04:07 -08:00
Steve Howell e42baf9e13 minor: Clean up args for apply_events.
I eliminate the defaults, since the existing code
was already specificying values for most things.

I move all the booleans to the bottom for both
parameters and arguments.

I require explicit keywords for everything but
user_profile (which is now first).

And, finally, I format the code in a more
diff-friendly manner.
2021-01-21 15:04:07 -08:00
Steve Howell f2586d2f9b refactor: Introduce SubscriptionInfo dataclass.
We use this as the return type for
gather_subscriptions_helper and
get_web_public_subs, instead of tuples.
2021-01-21 15:04:07 -08:00
Steve Howell 768117f0ff refactor: Unify include_subscribers logic. 2021-01-21 15:04:07 -08:00
Steve Howell e735ce3f01 refactor: Move subscribers logic up to caller.
The gather_subscriptions_helper function now updates
subscribers instead of delegating.
2021-01-21 15:04:07 -08:00
Steve Howell d9740045a5 refactor: Eliminate checks in build_stream_dict_for_sub.
We eliminate some redundant checks.

We also consistently provide a `subscribers` field
in our stream data with `[]`, even if our users
can't access subscribers.  We therefore bump
the API version and tweak the docs.  (See further
down for a detailed justification of the change.)

Even though it is sometimes fine to have redundant code
that is defensive in nature, some upcoming changes are gonna
move subscriber-related logic out of build_stream_dict_for_sub
for certain codepaths as part of our effort to streamline
the payload for subscribers within page_params.

So we can't rely on the code that I removed here
inside of build_stream_dict_for_sub.

Anyway, it makes more sense to do these checks explicitly
in the validate function.

The code in build_stream_dict_for_sub was almost effectively
a noop, since the validation function was already preventing
us from getting subscriber info.  The only difference it
made was sometimes converting `[]` to `None`, and then
subsequently omitting the subscribers field.

Neither ZT nor the webapp make any distinction between
`[]` or <missing key> for the `subscribers` data in
`page_params`.

The webapp has had this code for a long time (and now
equivalent code elsewhere in this PR):

    if (!Object.prototype.hasOwnProperty.call(sub, "subscribers")) {
        sub.subscribers = new LazySet([]);
    }

The webapp calculates access based on booleans, anyway:

    sub.can_access_subscribers =
        page_params.is_admin || sub.subscribed ||
        (!page_params.is_guest && !sub.invite_only);

And ZT would choke if `subscribers` were missing, except that
it never gets to the relevant code due to other checks:

    def get_other_subscribers_in_stream(<snip>):
        assert stream_id is not None or stream_name is not None

        if stream_id:
            assert self.is_user_subscribed_to_stream(stream_id)

            return [sub
                    for sub in self.stream_dict[stream_id]['subscribers']
                    if sub != self.user_id]
        else:
            return [sub
                    for _, stream in self.stream_dict.items()
                    for sub in stream['subscribers']
                    if stream['name'] == stream_name
                    if sub != self.user_id]

You could make a semantic argument that we should prefer
<missing key> to `[]` when subscribers aren't even available, but
we have precedent from the way that `bulk_get_subscriber_user_ids`
has traditionally populated its result:

    result: Dict[int, List[int]] =
        {stream["id"]: [] for stream in stream_dicts}

If we changed `stream_dicts` to `target_stream_dicts` we
would faciliate a move toward `None`, but it would just cause
headaches for other server code as well as the frontends
(which, to reiterate, already prefer the empty array
for convenience).
2021-01-21 15:04:07 -08:00
Steve Howell 40b0c36d21 minor: Update comment for guest subscription access.
As my comment indicates, I would prefer to handle
this explicitly by raising JsonableError in an
else statement here, but it's not a big deal.

This function can probably be simplified with a
bit of work, mostly on the testing side to make
sure we are covering all edge cases, but that
is out of the scope of my current PR.
2021-01-21 15:04:07 -08:00
Mateusz Mandera fcc8debc3a users: Use realm.host in dummy user addresses without email visibility.
By moving the relevant logic from realm.get_bot_domain to
get_fake_email_domain we will make realm.host be used (if possible) for
dummy user addresses. That is, instead of user11@zulipchat.com, the
address will become user11@subdomain.zulipchat.com.
2021-01-21 13:04:38 -08:00
Mateusz Mandera b15dd9147d create_user: Remove redundant argument of get_display_email_address. 2021-01-21 13:04:38 -08:00
Steve Howell c693ae8982 event tests: Cover do_update_user_status better.
We often send only one field (away or status_text)
to be updated.

So we have to make our schema support optional
keys.

As a result of the more flexible schema, we no
longer need to exempt the node fixtures from
our schema checks.
2021-01-20 13:17:32 -08:00
Steve Howell 36b1794c1d user_status: Fix bug with resetting away status.
The fix is pretty simple here--if the client
doesn't send an away status, then don't change
it.

I improved the tests to cover this case.

Fixes #17071
2021-01-20 13:59:35 -05:00
Mateusz Mandera a9242d6dfc retention: Eliminate redundant recipient JOIN from cross-realm query.
Since recipient_id (id of the PERSONAL Recipient of the user) was
denormalized into the UserProfile model, this query can be simplified by
getting rid of the zerver_recipient JOIN.
2021-01-18 21:40:37 -08:00
Mateusz Mandera e3be6db73a retention: Eliminate redundant userprofile JOIN from cross-realm query. 2021-01-18 21:40:37 -08:00
Tim Abbott 5a02b33f2e digest: Add a large block comment on correctness. 2021-01-17 11:37:59 -08:00
Steve Howell 1040fb7219 email digests: Remove handle_digest_email shim.
The previous commit made it so we only call the
shim in tests, so now we completely remove it.
2021-01-17 11:28:30 -08:00
Steve Howell bfa0bdf3d6 email digests: Process users in chunks of 30.
This should make the queue empty more quickly,
because we do bulk queries to prevent database
hops.
2021-01-17 11:28:30 -08:00
Steve Howell e0b451730a email digests: Extract get_new_streams.
This makes us more efficient when handling
multiple users.  We don't have to keep
sending the same two queries to the database.

Note that as part of this we eliminated
a failure mode for the obscure population
of users from whom both `user.is_guest` and
`user.can_access_public_streams()` returns
False.  We know this would have only affected
Zephyr users (by looking at the code), and
we know we don't actually process Zephyr
users for email digests (or else we would
have raised exceptions in the old code).
2021-01-17 11:28:30 -08:00
Steve Howell 23de94504f email digests: Query streams for messages up front.
This should save us many hops to the database when
we process users in bulk.
2021-01-17 11:28:30 -08:00
Steve Howell 3662bf2dcb minor: Rename stream_map -> user_stream_map. 2021-01-17 11:28:30 -08:00
Steve Howell 11c93aced5 minor: Rename user_profile -> user and avoid shadowing. 2021-01-17 11:28:30 -08:00
Steve Howell f8bbb7fea9 email digests: Use select_related("realm").
We mostly need realm_id, but when we go to build
message lists, we need realm.uri.

We could probably be more aggresive about using
`only` here, but for now I am just trying to
reduce hops to the database.
2021-01-17 11:28:29 -08:00
Steve Howell bb56f0ec0e minor: Move get_stream_map to module level.
This is a pure code move.
2021-01-17 11:28:29 -08:00
Steve Howell 52e2d5a733 email digests: Avoid long_term_idle check.
We want to exclude users with recent subscription
activity from emails, regardless of whether
the long_term_idle flag is set.
2021-01-17 11:28:29 -08:00
Steve Howell 162b372b93 email digests: Do one query for recent streams.
This is another way to limit hops to the database
when we process users in bulk.
2021-01-17 11:28:29 -08:00
Alex Vandiver d688e18de2 errors: Remove references to "deployment", use "host".
The `deployment` key was only set in `do_report_error`, which is now
only used in one codepath (the queue worker).  The logging handlers on
staging call notify_server_error directly, which omits the
`deployment` key.

Remove the odd one-of key, and instead simply do dispatch in
`do_report_error`.
2021-01-17 11:08:12 -08:00
Mateusz Mandera 3623681d30 message_edit: Don't rely on .recipient_id change not affecting recipient.
The codepath for moving a topic changes the message.recipient_id to the
id of the new recipient, but later, in update_messages_for_topic_edit,
it uses message.recipient when querying for messages with the matching
topic in the *old* stream (because those are the other messages that
need to be moved). This is a bug which happens to work fine, because in
Django 2, if message.recipient gets fetched first and then
message.recipient_id is mutated, message.recipient will not be altered
and thus will retain the outdated, previously fetched value.

In Django 3 changing .recipient_id causes .recipient to be updated to
the new Recipient objects, which is the Recipient of the *new* stream.
That will cause the bug to manifest.

This is a bugfix preparing for the upgrade to Django 3.
2021-01-17 10:39:46 -08:00
Mateusz Mandera f76202dd59 django3: Save language preference in a cookie rather than the session.
Support for saving it in the session is dropped in django3, the cookie
is the mechanism that needs to be used. The relevant i18n code doesn't
have access to the response objects and thus needs to delegate setting
the cookie to LocaleMiddleware.

Fixes the LocaleMiddleware point of #16030.
2021-01-17 10:38:58 -08:00
Steve Howell 3df507be73 refactor: Clean up args for fetch_initial_state_data.
We now require explicit keywords for all arguments
to fetch_initial_state_data except user_profile.

We provide reasonable defaults to keep the test
code concise.
2021-01-17 12:31:04 -05:00
Siddharth Asthana 6c888977a6 change_subdomain: Create a deactivated realm on updating subdomain.
When changing the subdomain of a realm, create a deactivated realm with
the old subdomain of the realm, and set its deactivated_redirect to the
new subdomain.
Doing this will help us to do the following:
- When a user visits the old subdomain of a realm, we can tell the user
that the realm has been moved.
- During the registration process, we can assure that the old subdomain
of the realm is not used to create a new realm.

If the subdomain is changed multiple times, the deactivated_redirect
fields of all the deactivated realms are updated to point to the new
uri.
2021-01-07 14:15:22 -08:00
Aman Agrawal e566e985e4 topic_edit: Store edit history in all the message affected.
Instead of just storing the edit history in the message which
triggered the topic edit, we store the edit history in all
the messages that changed. This helps users track the edit history
of a message more reliably.
2021-01-04 18:18:05 -08:00
Aman Agrawal c685d36821 hipchat_import: Remove tool from codebase.
Remove functions and scripts used by HipChat import tool and
those which will no longer be required in future.
2020-12-23 08:28:49 -08:00
Mateusz Mandera 160cc5120a api: Require can_create_users permission to create users via API.
Allowing any admins to create arbitrary users is not ideal because it
can lead to abuse issues.  We should require something stronger that
requires the server operator's approval and thus we add a new
can_create_users permission.
2020-12-21 13:20:21 -08:00
Mateusz Mandera d0dc04a093 models: Rename is_api_super_user to can_forge_sender, 2020-12-21 13:15:39 -08:00
sahil839 2fa33be683 actions: Refactor check_message to change return dataclass instead of Dict.
We change the return type of check_message to be dataclass instead of
Dict[str, Any]. This refactoring helps us to understand the context of the
data structure returned by check_message clearly which was not possible
when using Dict.

SendMessageRequest class is added in zerver/lib/message.py inspite of it
not being used in that file itself just to maintain consistency as other
TypedDicts and dataclasses are defined in that file and to avoid circular
dependency as SendMessageRequest is being used in lib/widget.py as well.

We also rename local variable to 'send_request' for accessing
SendMessageRequest objects.
2020-12-21 12:55:30 -08:00
Anders Kaseorg a054f57af6 message: Bundle message stripping, validation, and truncation.
We always want to do these at the same time.  Previously, message
editing did too much stripping (fixes #16837) and failed to check for
NUL bytes.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-12-18 17:44:13 -08:00
sahil839 37c8505435 message: Raise exception when trying to mirror an already sent message.
Previously we were just returning a dict containing a message id when
trying to mirror a already sent message in 'zephyr_mirror' cases.

This commit changes this behaviour to raise an exception when trying
to mirror an already sent message by adding a new exception class
ZephyrMessageAlreadySentException and then the caller returns the
message_id directly, instead of calling do_send_messages which also
returns a list of size one containing the message_id only.

This is a prep commit for changing the return type of check_message to
be a dataclass instead of a Dict as now we have only single output for
check_message.
2020-12-18 16:40:11 -08:00
sahil839 4e99ec34a9 widget: Use different variable names for message and submessage content.
This commit renames the content variable in do_widget_post_save_actions
to message_content and is a prep commit for changing the return type of
check_message from Dict to dataclass.

This change is required because content variable is used two times in
this function - one for message content and other for submessage
content, so when we change the return type of check_message to
dataclass, the type of content variable is considered as str and then
when dict is assigned to content in the submessage case, mypy raises
'Incompatible types in assignment' error.

This issue is not faced before the dataclass migration because there is
no type checking for the values of dict returned by check_message as the
return type of check_message is 'Dict[str, Any]'.
2020-12-18 16:19:35 -08:00
sahil839 db85b8a236 actions: Change type of wildcard_mention_user_ids in message_dict to set.
The message_dict['wildcard_mention_user_ids'] should be empty set instead
of empty list when there are no wildcard mentions similar to the case
when there are wildcard mentions, where it is equal to set of user ids and
not list of user ids.
2020-12-18 16:17:26 -08:00
Anders Kaseorg 2ab0b3d4fc validator: Reject ISO 8601 dates missing leading zeros.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-12-15 16:36:50 -08:00
Max Zawisa 0e40cc72af newrelic: Added owner field and cleaned up code.
I reformatted the tests and view to include information about who
acknowledged and closed the alert. Only includes the information about
the owner if there was an owner.

Made a few small changes to the refactored bit as requested in review.
2020-12-15 12:04:46 -08:00
Max Zawisa 57e847ab89 newrelic: refactor of time input handling.
Moved time formatting check and conversion to
zerver/lib/webhooks/common.py. Updated tests slightly to match new
output. Removed duration from the calculation because the difference
is less than the precision of output and it complicated the error
handling.
2020-12-15 12:04:46 -08:00
Alex Vandiver 438d2aa632 digests: Ensure that the teaser_data can be JSON-serialized.
Leaving this as a set means that it fails in zerver.lib.send_email
when serializing into a ScheduledEmail object.
2020-12-15 11:44:50 -08:00
Anders Kaseorg bf45f921a7 url_preview: Allow Beautiful Soup to get the charset from <meta>.
An HTML document sent without a charset in the Content-Type header
needs to be scanned for a charset in <meta> tags.  We need to pass
bytes instead of str to Beautiful Soup to allow it to do this.

Fixes #16843.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-12-15 11:30:57 -08:00
Siddharth Asthana 82f5759299 Realm: Add a deactivated_redirect URLField to Realm object.
We export a realm's data, and disable the realm, because the user
is moving from Zulip Cloud (e.g. https://example.zulipchat.com/) to
self-hosting or another platform (e.g. https://zulip.example.com/)
which we do not control. This commit adds a field in the realm object
called deactivated_redirect to store the url to which the realm has
moved.
2020-12-14 21:04:52 -08:00
Mateusz Mandera 43a0c60e96 exceptions: Make RateLimited into a subclass of JsonableError.
This simplifies the code, as it allows using the mechanism of converting
JsonableErrors into a response instead of having separate, but
ultimately similar, logic in RateLimitMiddleware.
We don't touch tests here because "rate limited" error responses are
already verified in test_external.py.
2020-12-01 13:40:56 -08:00
Steve Howell 92ce2d0e31 events: Fix apply_event for streams.
In 1bcb8d8ee8 I made
it so the webapp doesn't include "streams" in its
state from `fetch_initial_state_data`, but I didn't
address all the places in apply_event.
2020-12-01 13:01:38 -08:00
Anders Kaseorg 13e35bfa94 mypy: Use sqlalchemy-stubs.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-11-16 18:17:41 -08:00
Steve Howell 99e725cbde populate_db: Simplify how we create reactions.
For 3000 messages and 400 users, this saved
about 30 seconds.

We only do two queries per batch of messages
now, and the algorithm is easier to analyze,
as it's just three nested loops.
2020-11-16 17:19:23 -08:00
Steve Howell e2e0f06b2a email digests: Call get_recent_topics once per batch.
Once we start processing digests in batch, this will
let us amortize the expense of the message query
over multiple users.
2020-11-16 08:59:29 -08:00
Steve Howell 428f0564a0 minor: Move context code down in the function.
This will make a subsequent diff a bit less noisy.
2020-11-16 08:59:29 -08:00
Steve Howell 1d1e45e9ec digests: Use UserActivityInterval for user activity.
Note that we are much more efficient about finding
active users here:

    - we do one query per realm (instead of per-user)
    - we pass the cutoff date to the database
    - we get back just a list of distinct ids
2020-11-16 08:59:29 -08:00
Steve Howell b52f56080e performance: Just get user_ids to queue digest emails. 2020-11-16 08:59:29 -08:00
Steve Howell e13e5d104d refactor: Only require user_id for inactive_since().
This function is going away completely soon.  It is
querying everybody's entire UserActivity history instead
of passing the cutoff date to the database!
2020-11-16 08:59:29 -08:00
Steve Howell d0260392f7 digests: Get user objects from the database.
The query counts increase here for somewhat
contrived reasons.  The tests before this
commit reflected a successful trip to the
UserProfile cache, but that's not actually
realistic in practice.
2020-11-16 08:59:29 -08:00
Steve Howell e49a482baf email digests: Make transactions atomic. 2020-11-16 08:59:28 -08:00
Steve Howell cf6bcfb84a digest emails: Exclude users who had recent digests.
This code protects us in case we ever need to re-run
email digests twice in the same day.
2020-11-16 08:59:28 -08:00
Steve Howell 4271442fba email digests: Write RealmAuditLog rows. 2020-11-16 08:59:28 -08:00
Anders Kaseorg 1275613812 requirements: Upgrade mypy to 0.790.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-11-12 15:44:30 -08:00
Anders Kaseorg e7e1fde6ec fenced_code: Use immutable type for codehilite_conf.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-11-10 15:54:28 -08:00
Anders Kaseorg fbf8ce0305 markdown: Add types for extra Markdown members.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-11-10 15:54:27 -08:00
Anders Kaseorg b48bdc65b9 markdown: Fix AlertWordNotificationProcessor.run type.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-11-10 15:54:27 -08:00
Anders Kaseorg 9573f6dc00 markdown: Fix build_block_parser type.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-11-10 15:54:27 -08:00
Anders Kaseorg 4398eecd2b markdown: Use immutable type for extension config.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-11-10 15:54:27 -08:00
Anders Kaseorg 060036dfd5 markdown: Merge build_engine into Markdown constructor.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-11-10 15:54:27 -08:00
Anders Kaseorg 08c64f5cfa markdown: Fix imports for compatibility with typeshed stubs.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-11-10 15:54:27 -08:00
Anders Kaseorg 3a8cf869db python: Convert os.open(…, O_EXCL) to open(…, "x").
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-11-09 14:31:01 -08:00
Mateusz Mandera 47228f3a95 actions: Implement do_delete_user.
To have a reasonable way of creating the dummy user without duplicating
code, we need change create_user to have the optional force_id argument.
2020-11-09 11:58:02 -08:00
Steve Howell 5da4332620 minor: Add order-by-id to digest message query.
The order-by-id is now explicit, and I add
comments to explain the select_related tables.
2020-11-06 10:05:46 -08:00
Steve Howell 936171d258 refactor: Extract DigestTopic class.
This gets us away from a lot of dictionary soup.
2020-11-06 10:05:46 -08:00
Steve Howell e8b6c56322 refactor: Simplify get_hot_topics().
The code we deleted here was no longer
doing anything.

Maybe the code was always dead, or maybe it
was written during a time when topics_by_diversity
and topics_by_length actually had different keys.

But now it's clearly cruft.

If we have 4 or more topics, then the code above
it would already have populated the list with 4
elements, and the `if num_convos < 4` condition
would evaluate to False.

And if we had 3 or fewer topics, then we would
have already put all possible topics into our
result, and the `topics_by_diversity[num_convos:4]`
slice would be empty.

It's possible that we should just have a simple
heuristic for topic hotness like `10*num_senders
+ messages`, so we don't have to maintain this
fiddly function, and we can just do something like
`topics_by_score[:4]`.
2020-11-06 10:05:46 -08:00
Steve Howell c5dc9d386f refactor: Use sets of stream_ids for email digests.
I now use sets for stream_ids in more of the digest
code.

As part of this I replaced exclude_subscription_modified_streams
with streams_recently_modified_for_user.

It's easier for the caller to just ask for ids
to delete from its callee than it is to pass
in a set/list to mutate.

The simpler boundary between the functions makes
the tests easier to write--you can see the
`filtered_streams` logic goes away in this diff.

I also make the tests a bit more thorough by using
combinations of Cordelia/Othello and Verona/Denmark
to try to find multiple possible flaws.

And I make the time intervals longer than 1s to
avoid false negatives from slow CI boxes.
2020-11-05 17:42:43 -08:00
Steve Howell 88a57ed4ac bulk digest: Get stream subscriptions in bulk.
If we have multiple users, this reduces the amount
of queries we need to do, because we get all
subscriptions for all users in a single query
to Subscription.

For the single-user case, we are introducing an
extra query hop, but the database is doing
roughly the same work, because we are just breaking
up this complex query into two hops:

    messages =
        select ...  from message
        where recipient__type_id in (
            select stream_id from subscription
            where ...
        )

Now it's more like:

    stream_ids =
        select stream_id from subscription
        where ...

    messages =
        select ... from message
        where recipient__type_id in stream_ids
2020-11-05 09:36:59 -08:00
Steve Howell c83db37161 email digests: Introduce bulk methods for digest.
Note that we are not changing anything semantically
or algorithmically yet.  The only overhead here
for the single-user case is boxing and unboxing
data into single-item dicts and lists.

The interfaces for callers in the view and the
queue processor remain the same for now.
2020-11-05 09:36:59 -08:00
Steve Howell 7c89e46731 minor: Clean up some code formatting. 2020-11-05 09:36:59 -08:00
Steve Howell 4bd02eea19 minor: Use user, not user_profile, in some digest code. 2020-11-05 09:36:59 -08:00
Steve Howell e31326c823 refactor: Extract get_digest_context.
This eliminates the union type and boolean parameter,
and it makes it a bit easier to migrate to a
bulk-get approach.
2020-11-05 09:36:59 -08:00
Steve Howell 217967f743 refactor: Extract get_hot_topics.
This extraction will make a bit more sense when
we start doing bulk operations on a realm to
get digests, but even now, it encapsulates the
slightly complex way we cherry-pick the top 4
topics for a user.
2020-11-05 09:36:59 -08:00
Steve Howell 5a6d6f81ff refactor: Extract get_recent_topic_activity. 2020-11-05 09:36:59 -08:00
Steve Howell f987b014b3 refactor: Rename conversation to topic.
Not only is topic shorter, but the name makes
it clear that we're not dealing with abstract
conversations here--we are truly bucketing by
topic.
2020-11-05 09:36:59 -08:00
Steve Howell 6ac3cd3534 refactor: Use list of topics, not tuples. 2020-11-05 09:36:59 -08:00
Steve Howell 878e938a89 minor: Rename conversation_diversity to conversation_senders. 2020-11-05 09:36:59 -08:00
Steve Howell 6dc8250e9a mypy: Add TopicKey type for digests. 2020-11-05 09:36:59 -08:00
Steve Howell 96f6064b18 refactor: Move Messages query down the digest stack.
This prep step is mostly for diff hygiene; the next
commit will make the code a bit nicer.

The original code here had the nice property that
most (but not all) of the DB work happened up
front in `handle_digest_email`, and none of the
DB work was delegated to the callers.  But I
prefer the tradeoff of making the helpers a bit
more cohesive--let them get the data they need.
And we have query-count coverage in our tests,
so there's no real danger of having helpers
down in the stack insidiously doing a bunch of
extra DB hops.
2020-11-05 09:36:59 -08:00
Steve Howell c1f134a3a4 performance: Use ORM to fetch sender in render_markdown.
In 709493cd75 (Feb 2017)
I added code to render_markdown that re-fetched the
sender of the message, to detect whether the message is
a bot.

It's better to just let the ORM fetch this.  The
message object should already have sender.

The diff makes it look like we are saving round trips
to the database, which is true in some cases.  For
the main message-send codepath, though, we are only
saving a trip to memcached, since the middleware
will have put our sender's user object into the
cache.  The test_message_send test calls internally
to check_send_stream_message, so it was actually
hitting the database in render_markdown (prior to
my change).
2020-11-05 09:35:15 -08:00
Steve Howell 637f596751 tests: Fix queries_captured to clear cache up front.
Before this change we were clearing the cache on
every SQL usage.

The code to do this was added in February 2017
in 6db4879f9c.

Now we clear the cache just one time, but before
the action/request under test.

Tests that want to count queries with a warm
cache now specify keep_cache_warm=True.  Those
tests were particularly flawed before this change.

In general, the old code both over-counted and
under-counted queries.

It under-counted SQL usage for requests that were
able to pull some data out of a warm cache before
they did any SQL.  Typically this would have bypassed
the initial query to get UserProfile, so you
will see several off-by-one fixes.

The old code over-counted SQL usage to the extent
that it's a rather extreme assumption that during
an action itself, the entries that you put into
the cache will get thrown away.  And that's essentially
what the prior code simulated.

Now, it's still bad if an action keeps hitting the
cache for no reason, but it's not as bad as hitting
the database.  There doesn't appear to be any evidence
of us doing something silly like fetching the same
data from the cache in a loop, but there are
opportunities to prevent second or third round
trips to the cache for the same object, if we
can re-structure the code so that the same caller
doesn't have two callees get the same data.

Note that for invites, we have some cache hits
that are due to the nature of how we serialize
data to our queue processor--we generally just
serialize ids, and then re-fetch objects when
we pop them off the queue.
2020-11-05 09:35:15 -08:00
YashRE42 967efc32d2 widgets: Remove tictactoe example widget.
Steve asked me to remove this, since the tictactoe game was always
intended as a proof of concept. Now that we have poll and todo
widgets, the sample code for tictactoe has much less value.

We replace the content and type in test_widgets.py to maintain
coverage.
2020-11-03 14:46:39 -08:00
Aman Agrawal 87cdd8433d home: Allow logged out user through home.
We allow user to load webapp without log-in. This is only
be enabled for developed purposes now. Production setups will
see no changes.
2020-11-02 17:07:12 -08:00
Anders Kaseorg ac5cbf7693 Revert "markdown: Escape lang when echoing back custom non-pygments languages."
This reverts commit 564b199fe6, which
was part of #16308.

Escaping is either required or incorrect; it is never “defensive”.
This escaping is incorrect.  lxml already escapes attributes during
serialization (any other behavior would be a serious bug), and
additional escaping just results in double escaping.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-11-02 16:23:48 -08:00
akshatdalton 620e9cbf72 markdown: Fix merging of separate quotations.
Initally, when writing two or more quotes, having
a blank line in between them, merges those quotes.
This created confusion especially in "quote and reply".

This commit fixes such issues. Now two or more quotes
having a blank line in between them, will not get merged.

This change is correct both for usability and for improving our
compatibility with CommonMark.

Fixes #14379.
2020-10-30 15:21:15 -07:00
Mateusz Mandera cbeeadab16 delete_realm: Register a post_delete Realm handler.
By registering a post_delete handler to clear appropriate caches in a
nicer way, we can get rid of the ugly flush-memcached call in the
delete_realm command.
2020-10-30 11:43:03 -07:00
Anders Kaseorg 3c663e48db url_encoding: Skip unnecessary encode before quote.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-30 11:36:38 -07:00
Anders Kaseorg df10b306a6 python: Remove force_bytes.
We are generally good enough at types to know whether a value is str
or bytes.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-30 11:36:38 -07:00
Anders Kaseorg 18d0e4664c python: Replace binascii with bytes.hex to skip some decode operations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-30 11:36:38 -07:00
Anders Kaseorg aaa7b766d8 python: Use universal_newlines to get str from subprocess.
We can replace ‘universal_newlines’ with ‘text’ when we bump our
minimum Python version to 3.7.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-30 11:36:38 -07:00
Anders Kaseorg 9281dccae4 python: Serialize lxml elements directly to str.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-30 11:36:38 -07:00
Anders Kaseorg 7c4f68d9cf python: Skip unnecessary decode before BeautifulSoup parsing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-30 11:36:38 -07:00
Anders Kaseorg 1802a50cc9 python: Use requests.Response.text instead of decoding content.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-30 11:36:38 -07:00
Tim Abbott 3b9c726fc6 outgoing_webhook: Avoid logging a bytes string.
This fixes the new assertLogs() tests failing in CI; we fixed the
weird use of bytes in the test, but not in the runtime code.
2020-10-29 15:55:11 -07:00
m-e-l-u-h-a-n cbfd6464a5 logging: replace mock.patch() for logging with assertLogs()
This commit removes mock.patch with assertLogs().

* Adds return value to do_rest_call() in outgoing_webhook.py, to
  support asserting log output in test_outgoing_webhook_system.py.

* Logs are not asserted in test_realm.py because it would require to users
  to be queried using users=User.objects.filter(realm=realm) and the order
  of resulting queryset varies for each run.

* In test_decorators.py, replacement of mock.patch is not done because
  I'm not sure if it's worth the effort to replace it as it's a return
  value of a function.

Tweaked by tabbott to set proper mypy types.
2020-10-29 15:37:45 -07:00
Vishnu KS fdea49742c apps: Use GitHub API for generating the web app download link. 2020-10-28 23:04:14 -07:00
ryanreh99 dfa7ce5637 uploads: Support non-AWS S3-compatible server.
Boto3 does not allow setting the endpoint url from
the config file. Thus we create a django setting
variable (`S3_ENDPOINT_URL`) which is passed to
service clients and resources of `boto3.Session`.

We also update the uploads-backend documentation
and remove the config environment variable as now
AWS supports the SIGv4 signature format by default.
And the region name is passed as a parameter instead
of creating a config file for just this value.

Fixes #16246.
2020-10-28 21:59:07 -07:00
ryanreh99 1c370a975c refactor: Access a bucket by calling `zerver.lib.uploads.get_bucket`. 2020-10-28 21:52:08 -07:00
Alex Vandiver 2b0bbbb882 tools: Rename postgres to postgresql in tool names. 2020-10-28 11:57:02 -07:00
Alex Vandiver 1f7132f50d docs: Standardize on PostgreSQL, not Postgres. 2020-10-28 11:55:16 -07:00
Anders Kaseorg 1352f2f233 python: Replace manual quote_plus usage with urlencode.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-27 13:47:02 -07:00
Anders Kaseorg 41f509170b users: Canonicalize the timezone identifier.
While working on shifting toward native browser time zone APIs
(#16451), it was found that all but very recent Chrome and Node
versions reject certain legacy timezone aliases like US/Pacific
(https://crbug.com/364374).

For now, we only canonicalize the timezone property returned in user
objects and not the timezone setting itself.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-27 13:42:54 -07:00
Anders Kaseorg a8b1691e97 timezone: Convert get_common_timezones cache to lru_cache.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-27 13:42:54 -07:00
Anders Kaseorg 0b288f92c9 timezone: Remove get_timezone wrapper.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-27 13:42:54 -07:00
Anders Kaseorg 0134112b51 timezone: Remove get_all_timezones wrapper.
Both callers want a set, and pytz already provides all_timezones_set.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-27 13:42:54 -07:00
Tim Abbott 6d7cd351a3 events: Optimize creating streams for new users.
During the new user creation code path, there can be no existing
active clients for the user being created, so we can skip the code to
send events to that user's clients.

The tests here reflect that we need to send fewer events, and do fewer
queries that would have been spent computing data for these..

Fixes #16503, combined with the long series of recent changes by Steve
Howell to fix super-linear behavior in this code path.
2020-10-26 12:47:15 -07:00
Steve Howell 88a7a1b002 events: Optimize peer_add/peer_remove for public streams.
We no bulk up peer_add/peer_remove events by user if the
same user has subscribed to multiple streams (and just
that single user).

This mostly optimizes the new-user codepath, but the
algorithm is a bit more general in nature.
2020-10-26 12:33:28 -07:00
Anders Kaseorg 31d0141a30 python: Close opened files.
Fixes various instances of ‘ResourceWarning: unclosed file’ with
python -Wd.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-26 12:31:30 -07:00
Anders Kaseorg 72d6ff3c3b docs: Fix more capitalization issues.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-23 11:46:55 -07:00
Anders Kaseorg e513b75e86 markdown: Remove handler for old bug with incompatible twitter library.
See commit 8b002040e0 and #86.  The
development environment bug that necessitated this handler has long
been irrelevant.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-23 11:30:26 -07:00
Anders Kaseorg b9fd49a2c6 mypy: Correct mistaken *args type annotations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-23 11:29:13 -07:00
Anders Kaseorg 831d086110 i18n: Fix get_language_translation_data for zh_TW.
Fixes #16600.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-22 16:43:02 -07:00
sahil839 571bb62e3d events: Update subscriber list on peer_add for unsubscribed streams.
We update the subscriber list on peer_add event for unsubscribed
streams as well.
2020-10-22 15:12:32 -07:00
sahil839 733d26aef2 events: Update subscriber list on peer_remove for never subscribed stream.
We now update the subscriber list on peer_remove event for never
subscribed streams also.
2020-10-22 15:12:32 -07:00
sahil839 af9b153ee3 events: Update subscriber list on peer_remove for unsubscribed stream.
We update the subscriber list on peer_remove event for unsubscribed
streams also.
2020-10-22 15:12:32 -07:00
sahil839 d0f5537fb2 actions: Modify check_message for handling wildcard_mention_policy setting.
This commit adds enforcement for sending messages containing wildcard
mentions according to wildcard_mention_policy.
2020-10-22 14:46:32 -07:00
Steve Howell 7ff3859136 subscriber events: Change schema for peer_add/peer_remove.
We now can send an implied matrix of user/stream tuples
for peer_add and peer_remove events.

The client code basically does this:

    for stream_id in event['stream_ids']:
        for user_id in event['user_ids']:
            update_sub(stream_id, user_id)

We used to send individual events, which gets real
expensive when you are creating new streams. For
the case of copy-to-stream case, we should see
events go from U to 1, where U is the number of users
added.

Note that we don't yet fully optimize the potential
of this schema.  For adding a new user with lots
of default streams, we still send S peer_add events.

And if you subscribe a bunch of users to a bunch of
private streams, we only go from U * S to S; we can't
optimize it down to one event easily.
2020-10-22 11:19:53 -07:00
Steve Howell 85ed6f332a performance: Avoid Recipient lookup for stream messages.
All the fields of a stream's recipient object can
be inferred from the Stream, so we just make a local
object.  Django will create a Message object without
checking that the child Recipient object has been
saved.  If that behavior changes in some upgrade,
we should see some pretty obvious symptom, including
query counts changing.

Tweaked by tabbott to add a longer explanatory comment, and delete a
useless old comment.
2020-10-20 11:47:23 -07:00
Steve Howell 7bbcc2ac96 refactor: Compute peers for public streams later.
This saves us a query for edge cases like when
you try to unsubscribe from a public stream
that you have already unsubscribed from.

But this is mostly to prep for upcoming
optimizations.
2020-10-20 11:31:22 -07:00
Steve Howell 363e5d31a6 refactor: Split out public/private logic for peer events.
This doesn't change anything yet, but the goal is
to eventually optimize events for the case where
one user (typically a new user) gets subscribed
to multiple public streams.
2020-10-20 11:31:22 -07:00
Steve Howell 3961e69381 refactor: Extract send_peer_subscriber_events.
We now use the same basic code to send peer_add
and peer_remove events.
2020-10-20 11:31:22 -07:00
Anders Kaseorg 254b904965 markdown: Migrate off deprecated extension registration interface.
Fixes #15205.

https://python-markdown.github.io/change_log/release-3.0/#homegrown-ordereddict-has-been-replaced-with-a-purpose-built-registry
https://python-markdown.github.io/change_log/release-3.0/#md_globals-keyword-deprecated-from-extension-api

The priority numbers are arbitrarily chosen to preserve the existing
order.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-19 18:31:12 -07:00
akshatdalton 287c4ed2bb markdown: Fix Youtube and Vimeo preview overriding markdown link titles bug.
Initially markdown titles were overridden by Youtube and Vimeo preview titles.
But now it will check if any markdown title is present to replace Youtube or
Vimeo preview titles, if preview of linked websites is enabled.
Fixes #16100
2020-10-19 12:06:13 -07:00
Anders Kaseorg d81a93cdf3 requirements: Upgrade markdown to 3.3.1.
Upstream has slightly changed the whitespace around stashes.  Take
this opportunity to clean up the extra blank lines we were outputting.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-19 11:54:14 -07:00
Anders Kaseorg f461a64a6b i18n: Fix some ineffective calls to ugettext at top level.
Translation has no effect when we don’t yet know what language we’re
translating for.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-18 14:31:15 -07:00
Anders Kaseorg bba43f35ca i18n: Be deliberate about distinguishing ugettext and ugettext_lazy.
The early str conversions in zerver.models were defeating the point of
ugettext_lazy.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-18 14:31:15 -07:00
Steve Howell e6f6f8d45f refactor: Avoid "stream_id" on sub.
There was no need to put "stream_id" on the sub
dictionary here.  It's kinda annoying to introduce
the little helper here, but I feel
that's better than crufting up the sub data
structure.
2020-10-18 14:27:31 -07:00
Steve Howell 628a826aa2 minor: Move code and add comments about three lists. 2020-10-18 14:27:31 -07:00
Steve Howell ffee129a35 refactor: Clean up is_web_public flag.
The is_web_public flag is already in Stream.API_FIELDS,
so there is no reason for all this complicated logic.

There's no reason to hack it on to the subscription
object.
2020-10-18 14:27:31 -07:00
Steve Howell 4dce34ab8b refactor: Simplify call to bulk_get_subscriber_user_ids.
The way we were computing the dictionary was very
convoluted--all we need is a set of subscribed user
ids.
2020-10-18 14:27:31 -07:00
Steve Howell b58152abda refactor: Introduce all_streams_map.
We replace all_streams_id with a map.

We also use it to populate never_subscribed_streams.

And all_streams_map is a superset of stream_hash,
which we will soon kill off as well.
2020-10-18 14:27:31 -07:00
Steve Howell 78384ebf1b minor: Remove confusing parens.
Apparently I put these parens in the code as
part of 73c30774cb
during 2017.

It looks like I extracted is_public during
the middle of my change and forgot to remove
the unnecessary parens.  (The code was correct,
but it makes it look like a tuple if you're
skimming it too quickly.)
2020-10-18 14:27:31 -07:00
Steve Howell d60dd94168 refactor: Extract funcs from gather_subscriptions_helper.
This is a pure code move, apart from a little bit
of quote cleanup and renames:

    user_profile -> user
    stream_dict -> result
2020-10-18 14:27:31 -07:00
Steve Howell 79fcf78143 refactor: Exclude "active" from API_FIELDS.
We just need to make sure the relevant queries
get it for the triage process.
2020-10-18 14:27:31 -07:00
Steve Howell c5769d31f2 minor: Move code for web_public_stream_ids. 2020-10-18 14:27:31 -07:00
Steve Howell 0ca07ffd3c peformance: Eliminate StreamRecipientMap.
That class is an artifact of when Stream
didn't have recipient_id.  Now it's simpler
to deal with stream subscriptions.

We also save a query during page load (and
other places where we get subscriber
info).
2020-10-18 14:27:31 -07:00
Steve Howell 1951d75796 performance: Avoid select_related("realm").
We also move this query up in the function
for some future refactorings.
2020-10-18 14:27:31 -07:00
Steve Howell 57efe9d81a performance: Streamline list_to_streams.
We take advantage of stream.recipient to simplify
the query's where clause and avoid the need
for select_related("recipient").
2020-10-16 12:58:11 -07:00
Steve Howell e1bcf6124f refactor: Remove recipient from access_stream_by_name. 2020-10-16 12:58:11 -07:00
Steve Howell a51b483f1a performance: Remove recipient from access_stream_by_id.
The Recipient table is now kind of useless for
stream-related operations, since we have
recipient_id on Stream now.
2020-10-16 12:58:11 -07:00
Steve Howell 31622feb87 refactor: Only return sub from access_stream_common.
Let the callers access stream.recipient as needed.
It costs the same, and some of the callers can
actually stop caring about the actual Recipient
object.
2020-10-16 12:58:11 -07:00
Steve Howell bfd6e2b1fd refactor: Use recipient_id to get topic history. 2020-10-16 12:58:11 -07:00
Steve Howell 3685fcc701 refactor: Remove recipient arg for do_mute_topic. 2020-10-16 12:58:11 -07:00
Steve Howell 65dbee4837 minor: Ask for recipient_id, not recipient. 2020-10-16 12:58:11 -07:00
Steve Howell 378062cc83 performance: Avoid call to access_stream_by_id.
We already trust ids that are put on our queue
for deferred work. For example, see the code for
"mark_stream_messages_as_read_for_everyone"

We now pass stream_recipient_id when we queue
up work for do_mark_stream_messages_as_read.

This generally saves about 3 queries per
user when we unsubscribe them from a stream.
2020-10-16 12:58:11 -07:00
Steve Howell 31eb97ddde performance: Fix do_mark_stream_messages_as_read.
This function no longer asks for data that it
doesn't need.
2020-10-16 12:58:11 -07:00
Steve Howell 6d1f9de7d3 performance: Use SubInfo when removing subscribers.
We get two speedups:

    * The query to get existing subscribers only
      gets the two fields we need.  We no longer
      need all the overhead of user_profile
      and recipient data being returned in the
      query.

    * We avoid Django making extra hops to the
      database to get user info.
2020-10-16 12:58:11 -07:00
Steve Howell 73982f6cc9 refactor: Move SubInfo to stream_subscription.py. 2020-10-16 12:58:11 -07:00
Tim Abbott caa939d2d5 actions: Use transaction.atomic properly when removing subscriptions.
Previously, the transaction.atomic() was not properly scoped to ensure
that RealmAuditLog entries were created in the same transaction,
making it possible for state changes to not be properly recorded in
RealmAuditLog.
2020-10-15 15:12:05 -07:00
Steve Howell 0b91526f28 events: Remove "occupied" semantics for "streams".
When apps like mobile register for "streams", we
will now just use active streams as our baseline,
rather than "occupied" streams.

This means we will send a stream that is active,
even if it happens to have zero occupants.  It's
actually pretty rare that a stream has zero occupants,
and it's not exactly clear that we want to exclude
a non-occupied but otherwise active stream from
our list of streams.

It also happens to be fairly expensive to compute
whether a stream is occupied.

This change only affects API clients (including
possibly our mobile app).  The main webapp never
used the data from this codepath.
2020-10-15 15:12:01 -07:00
Steve Howell b4346d0276 performance: Extract subscribers/peers in bulk.
We replace get_peer_user_ids_for_stream_change
with two bulk functions to get peers and/or
subscribers.

Note that we have three codepaths that care about
peers:

    subscribing existing users:
        we need to tell peers about new subscribers
        we need to tell subscribed user about old subscribers

    unsubscribing existing users:
        we only need to tell peers who unsubscribed

    subscribing new user:
        we only need to tell peers about the new user
        (right now we generate send_event
        calls to tell the new user about existing
        subscribers, but this is a waste
        of effort that we will fix soon)

The two bulk functions are this:

    bulk_get_subscriber_peer_info
    bulk_get_peers

They have some overlap in the implementation,
but there are some nuanced differences that are
described in the comments.

Looking up peers/subscribers in bulk leads to some
nice optimizations.

We will save some memchached traffic if you are
subscribing to multiple public streams.

We will save a query in the remove-subscriber
case if you are only dealing with private streams.
2020-10-15 15:12:01 -07:00
Steve Howell 94e41c71f9 refactor: Use set of ids for altered users. 2020-10-15 15:12:01 -07:00
Steve Howell b894597fa3 refactor: Use sets of stream_ids for helper args. 2020-10-15 15:12:01 -07:00
Steve Howell 3889554977 refactor: Extract send_peer_remove_events. 2020-10-15 15:12:01 -07:00
Steve Howell f86823f82f tests: Add cache_tries_captured helper. 2020-10-15 15:12:01 -07:00
Steve Howell ce70d08cbf test_helpers: Use mock.patch.multiple. 2020-10-15 15:12:01 -07:00
Tim Abbott bf66e9c4ab actions: Add transaction.atomic to bulk_add_subs_to_db_with_logging.
This will ensure that we always fully execute the database part of
modifying subscription objects.  In particular, this should prevent
invariant failures like #16347 where Subscription objects were created
without corresponding RealmAuditLog entries.

Fixes #16347.
2020-10-14 11:06:00 -07:00
Steve Howell 5728149e94 performance: Streamline query to add subscribers.
We don't need the select_related('user_profile')
optimization any more, because we just keep
track of user info in our own data structures.

In this codepath we are never actually modifying
users; we just occasionally need their ids or
emails.

This can be a pretty substantive improvement if
you are adding a bunch of users to a stream
who each have a bunch of their own subscriptions.

We could also limit the number of full rows in this
query by adding an extra hop to the DB just to
get colors (using values_list), and then only get
full sub info for the streams that we're adding, rather
than getting every single subscription, in full, for each user.

Apart from finding what colors the user has already
used, the only other reason we need all the columns
in Subscription here is to handle streams that
need to be reactivated.  Otherwise we could do
only("id", "active", "recipient_id", "user_profile_id")
or similar.  Fortunately, Subscription isn't
an overly wide table; it's mostly bool fields.

But by far the biggest thing to avoid is bringing
in all the extra user_profile data.

We have pretty good coverage on query counts here,
so I think this fix is pretty low risk.
2020-10-14 11:03:07 -07:00
Steve Howell 116a441bc5 refactor: Introduce SubInfo class.
This class removes a lot of the annoying tuples
we were passing around.

Also, by including the user everywhere, which
is easily available to us when we make instances
of SubInfo, it sets the stage to remove
select_related('user_profile').
2020-10-14 10:53:10 -07:00
Steve Howell febef45e38 minor: Add comments to do_get_streams. 2020-10-14 10:53:10 -07:00
Steve Howell a9356508ca events: Stop sending occupy/vacate events.
We used to send occupy/vacate events when
either the first person entered a stream
or the last person exited.

It appears that our two main apps have never
looked at these events.  Instead, it's
generally the case that clients handle
events related to stream creation/deactivation
and subscribe/unsubscribe.

Note that we removed the apply_events code
related to these events.  This doesn't affect
the webapp, because the webapp doesn't care
about the "streams" field in do_events_register.

There is a theoretical situation where a
third party client could be the victim of
a race where the "streams" data includes
a stream where the last subscriber has left.
I suspect in most of those situations it
will be harmless, or possibly even helpful
to the extent that they'll learn about
streams that are in a "quasi" state where
they're activated but not occupied.

We could try to patch apply_event to
detect when subscriptions get added
or removed. Or we could just make the
"streams" piece of do_events_register
not care about occupy/vacate semantics.
I favor the latter, since it might
actually be what users what, and it will
also simplify the code and improve
performance.
2020-10-14 10:53:10 -07:00
Steve Howell 1bcb8d8ee8 performance: Avoid computing page_params.streams in webapp.
The query to get "occupied" streams has been expensive
in the past.  I'm not sure how much any recent attempts
to optimize that query have mitigated the issue, but
since we clearly aren't sending this data, there is no
reason to compute it.
2020-10-14 10:53:10 -07:00
Steve Howell 79803f01f4 minor: Format some code in events.py. 2020-10-14 10:53:10 -07:00
Aman Agrawal fbf7cb82a7 web_public_guest: Rename to web_public_visitor for clarity.
Using web_public_guest for anonymous users is confusing since
'guest' is actually a logged-in user compared to
web_public_guest which is not logged-in and has only
read access to messages. So, we rename it to
web_public_visitor.
2020-10-13 16:59:52 -07:00
Steve Howell 3b338ec32e performance: Optimize filter_stream_authorization.
We no longer do O(N) queries to get existing streams.

This is a somewhat contrived use case--generally, we
are not trying to re-subscribe a user to several
streams.  Still, we want to avoid this.

This commit also makes `test_bulk_subscribe_many`
do more work, and the change to the test helped
me discover this bug.
2020-10-13 18:54:55 -04:00
Anders Kaseorg 6564540d15 docs: Fix some spelling errors.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-13 15:47:13 -07:00
Anders Kaseorg dd48dbd912 docs: Add spaces to “check out”, “log in”, “set up”, “sign up” as verbs.
“Checkout”, “login”, “setup”, and “signup” are nouns, not verbs.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-13 15:47:13 -07:00
Steve Howell 598601e8fc stream events: Prevent spurious events.
If a user asks to be subscribed to a stream
that they are already subscribed to, then
that stream won't be in new_stream_user_ids,
and we won't need to send an event for it.

This change makes that happen more automatically.
2020-10-13 11:28:17 -07:00
Steve Howell 18771099e4 performance: Introduce new_stream_user_ids.
Let
    U = number of users to subscribe
    S = number of streams to subscribe

We were technically doing N^3 amount of work
when we sent certain events, or to be more
precise, U * S * S amount of work.  For each
stream, we were looping through a list of tuples
of size U * S to find the users for the stream.

In practice either U or S is usually 1, so the
performance gains here are probably negligible,
especially since the constant factors here
were just slinging around Python data.

But the code is actually more readable now, so
it's a double win.
2020-10-13 11:28:17 -07:00
Steve Howell ebb605319b refactor: Rename stream_map to recipient_id_to_stream.
I want to make a new dict called stream_id_to_stream,
and stream_map would be confusing.
2020-10-13 11:28:17 -07:00
Steve Howell b502957184 refactor: Extract new_recipient_ids local.
We rename needs_new_sub (which sounds like
a boolean!) to new_recipient_ids, and we
calculate it explicitly within the loop, so
that we don't need to worry as much about
subsequent passes through the loop mutating it.

This allows us to also remove recipient_ids,
which in turn lets us remove recipients_map,
albeit with a small tweak for stream_map.

I also introduce the my_subs local, which
I use to more directly populate used_colors,
as well as using it as the loop var.
2020-10-13 11:28:17 -07:00
Steve Howell 766892d8aa import: Reuse get_last_message_id() helper. 2020-10-13 11:28:17 -07:00
Steve Howell 9df9934ed6 refactor: Pass realm to bulk_add_subscriptions.
I think it's important that the callers understand
that bulk_add_subscriptions assumes all streams
are being created within a single realm, so I make
it an explicit parameter.

This may be overkill--I would also be happy if we
just included the assertions from this commit.
2020-10-13 11:28:17 -07:00
Steve Howell efc931a671 minor: Extract realm local. 2020-10-13 11:28:17 -07:00
Steve Howell b2d0a2efb9 refactor: Extract send_subscription_add_events.
This function now does all the work that we used
to do with notify_subscriptions_added happening
inside a loop.

There's a small fine-tuning here, where we only
get recent traffic on streams that we're actually
sending events for.
2020-10-13 11:28:17 -07:00
Steve Howell 223ce83a0a refactor: Clean up call to notify_subscriptions_added.
We now just pass in all_subscribers_by_stream, rather
than a callback.

We also move sub_tuples_by_user closer to the
loop where we call notify_subscriptions_added.
2020-10-13 11:28:17 -07:00
Steve Howell 811426b345 Extract send_stream_creation_events_for_private_streams.
We can probably avoid passing in users here.
2020-10-12 16:40:37 -07:00
Steve Howell 1cfaef0d1a refactor: Simplify pick_color logic.
This removes the need to jankily mutate
the active flag in the caller, and we don't
need to mutate our subs_by_user either.
2020-10-12 16:40:37 -07:00
Steve Howell 13569ff97a refactor: Eliminate new_subs.
We now just process new subs for a user immediately
within the loop.
2020-10-12 16:40:37 -07:00
Steve Howell 8c70fbde78 refactor: Use subs_to_add in return value.
The subs_to_add is directly related to a var
called new_subs, which I hope to eliminate
soon.
2020-10-12 16:40:37 -07:00
Steve Howell 1afca3d430 minor: Extract local for stream. 2020-10-12 16:40:37 -07:00
Steve Howell 84aa1389d8 Extract bulk_add_subs_to_db_with_logging.
This is a trivial code extraction.
2020-10-12 16:40:37 -07:00
Steve Howell 3ff9ce78ea refactor: Extract send_peer_add_events. 2020-10-12 16:40:37 -07:00
Cody Piersall 5dab6e9d31 emoji-upload: Fix transparency issues on GIF emoji upload.
This preserves the alpha layer on GIF images that need to be resized
before being uploaded.  Two important changes occur here:

1. The new frame is a *copy* of the original image, which preserves the
   GIF info.
2. The disposal method of the original GIF is preserved.  This
   essentially determines what state each frame of the GIF starts from
   when it is drawn; see PIL's docs:
   https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#saving
   for more info.

This resolves some but not all of the test cases in #16370.
2020-10-11 16:23:07 -07:00
Anders Kaseorg b7a94be152 python: Catch BaseException when we need to clean something up.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:16:16 -07:00
Anders Kaseorg 7f69c1d3d5 python: Catch specific exceptions from requests.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:11:41 -07:00
Anders Kaseorg 17ac17286c python: Catch specific exceptions from subprocess.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:11:41 -07:00
Anders Kaseorg aabef3d9be python: Catch specific exceptions from orjson.
Followup to #16120.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:11:41 -07:00
Alex Vandiver c2132a4f9c queue: Drop register_json_consumer / json_drain_queue interface.
Now that all callsites use the same interface, drop the now-unused
ones, and their tests.
2020-10-11 14:19:42 -07:00
Alex Vandiver 179c387409 tornado: Switch to start_json_consumer interface. 2020-10-11 14:19:42 -07:00
Alex Vandiver f9358d5330 queue: Switch batch interface to use the channel.consume iterator.
This low-level interface allows consuming from a queue with timeouts.
This can be used to either consume in batches (with an upper timeout),
or one-at-a-time.  This is notably more performant than calling
`.get()` repeatedly (what json_drain_queue does under the hood), which
is "*highly discouraged* as it is *very inefficient*"[1].

Before this change:
```
$ ./manage.py queue_rate --count 10000 --batch
Purging queue...
Enqueue rate: 11158 / sec
Dequeue rate: 3075 / sec
```

After:
```
$ ./manage.py queue_rate --count 10000 --batch
Purging queue...
Enqueue rate: 11511 / sec
Dequeue rate: 19938 / sec
```

[1] https://www.rabbitmq.com/consumers.html#fetching
2020-10-11 14:19:40 -07:00
Alex Vandiver 2547bdbf4a queue: Rename consume_wrapper to a better name. 2020-10-09 20:40:51 -07:00
Alex Vandiver d5a6b0f99a queue: Rename queue_size, and update for all local queues.
Despite its name, the `queue_size` method does not return the number
of items in the queue; it returns the number of items that the local
consumer has delivered but unprocessed.  These are often, but not
always, the same.

RabbitMQ's queues maintain the queue of unacknowledged messages; when
a consumer connects, it sends to the consumer some number of messages
to handle, known as the "prefetch."  This is a performance
optimization, to ensure the consumer code does not need to wait for a
network round-trip before having new data to consume.

The default prefetch is 0, which means that RabbitMQ immediately dumps
all outstanding messages to the consumer, which slowly processes and
acknowledges them.  If a second consumer were to connect to the same
queue, they would receive no messages to process, as the first
consumer has already been allocated them.  If the first consumer
disconnects or crashes, all prior events sent to it are then made
available for other consumers on the queue.

The consumer does not know the total size of the queue -- merely how
many messages it has been handed.

No change is made to the prefetch here; however, future changes may
wish to limit the prefetch, either for memory-saving, or to allow
multiple consumers to work the same queue.

Rename the method to make clear that it only contains information
about the local queue in the consumer, not the full RabbitMQ queue.
Also include the waiting message count, which is used by the
`consume()` iterator for similar purpose to the pending events list.
2020-10-09 20:40:39 -07:00
Alex Vandiver a1ce1aca3b queue: Update comment to be more accurate about import errors. 2020-10-09 20:40:32 -07:00
Alex Vandiver 2d71ca1fb8 email: Remove unused `log_digest_event` function.
Its last callsite was removed in e46cbaffa2.

Also ref #6786.
2020-10-08 20:35:53 -07:00
sahil839 4e8f5b5b31 streams: Change access_stream_for_delete_or_update to also return sub.
We modify access_stream_for_delete_or_update function to return
Subscription object also along with stream. This change will be
helpful in avoiding an extra query to get subscription object in
code for updating subscription role.
2020-10-08 17:07:30 -07:00
Aman Agrawal 8b419c93e4
message_send: Fix old guests being treated as full members.
For streams in which only full members are allowed to post,
we block guest users from posting there.

Guests users were blocked from posting to admin only streams
already. So now, guest users can only post to
STREAM_POST_POLICY_EVERYONE streams.

This is not a new feature but a bugfix which should have
happened when implementing full member stream policy / guest users.
2020-10-08 11:30:11 -07:00
Anders Kaseorg 363374346c management: Use signal.pause to sleep forever.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-07 16:15:19 -07:00
akshatdalton 52c411df8a emoji: Add padding around the gif on GIF emoji upload.
Replaced ImageOps.fit by ImageOps.pad, in zerver/lib/upload.py, which
returns a sized and padded version of the image, expanded to fill the
requested aspect ratio and size.
Fixes part of #16370.
2020-10-06 17:28:02 -07:00
Alex Vandiver baf882a133 queue: Only ACK drain_queue once it has completed work on the list.
Currently, drain_queue and json_drain_queue ack every message as it is
pulled off of the queue, until the queue is empty.  This means that if
the consumer crashes between pulling a batch of messages off the
queue, and actually processing them, those messages will be
permanently lost.  Sending an ACK on every message also results in a
significant amount lot of traffic to rabbitmq, with notable
performance implications.

Send a singular ACK after the processing has completed, by making
`drain_queue` into a contextmanager.  Additionally, use the `multiple`
flag to ACK all of the messages at once -- or explicitly NACK the
messages if processing failed.  Sending a NACK will re-queue them at
the front of the queue.

Performance of a no-op dequeue before this change:
```
$ ./manage.py queue_rate --count 50000 --batch
Purging queue...
Enqueue rate: 10847 / sec
Dequeue rate: 2479 / sec
```
Performance of a no-op dequeue after this change (a 25% increase):
```
$ ./manage.py queue_rate --count 50000 --batch
Purging queue...
Enqueue rate: 10752 / sec
Dequeue rate: 3079 / sec
```
2020-10-06 17:26:14 -07:00
Abhijeet Prasad Bodas a20d22de43 i18n: Move locale select logic in home to i18n.py.
Part of #16094.
Moved the language selection preference logic from home.py to a new
function in i18n.py to avoid repetition in analytics views and home
views.
2020-10-02 14:56:20 -07:00
Tim Abbott 8c8f3ee13b test_classes: Extract home view helpers for reuse. 2020-10-01 15:14:25 -07:00
Tim Abbott 6d041a3b34 home: Include is_web_public_guest in page_params. 2020-10-01 15:07:19 -07:00
Tim Abbott 351d73ac5a home: Pass realm to build_page_params_for_home_page_load.
This is preparation for this needing to be a separate parameter from
the user.
2020-10-01 15:00:36 -07:00
Aman Agrawal 18d852de49 unreads: Add support for web public guests.
This handles the case of web public guests by returning
RawUnreadMessagesResult with empty initalized values.
2020-10-01 14:53:43 -07:00
Aman Agrawal e02f0fb1c4 get_raw_unread_data: Extract func to get unreads from ums. 2020-10-01 14:46:46 -07:00
Tim Abbott e8e876d54c events: Deduplicate anonymous user code.
This approach lets us deduplicate much of the fetch_initial_state_data
logic around logged-out users.
2020-10-01 14:43:51 -07:00
Aman Agrawal f46f251688 post_process_state: Allow web public guests.
Because the logic already works correctly, we just need to change mypy
types.
2020-10-01 14:41:49 -07:00
Aman Agrawal 190f481f49 stream_subscription: Mark notifications disabled for web public users.
Users without an account can't get notifications, so we might as well
ensure any UI displays them appropriately.
2020-10-01 14:40:48 -07:00
Anders Kaseorg 46babbe9e1 import_realm: Close the memcached connection before forking.
This prevents the memcached connection from being shared across
multiple processes, and hopefully addresses unexpected behavior from
cached functions like get_user_profile_by_id invoked inside the worker
processes.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-01 11:20:39 -07:00
sahil839 6c473ed75f message: Call build_message_send_dict from check_message.
We call build_message_send_dict from check_message instead of
do_send_messages.

This is a prep commit for adding a new setting for handling
wildcard mentions in large streams.
2020-09-29 17:18:04 -07:00
sahil839 f1a5fbaeb0 message: Extract build_message_send_dict function.
We extract the loop for building message dict in
do_send_messages in a separate function named
build_message_send_dict.

This is a prep commit for moving the code for building
of message dict in check_message.
2020-09-29 16:50:47 -07:00
sahil839 0514ba7ecb message: Add 'links_for_embed' to message_dict.
There is a bug where we send event for even
those messages which do not have embedded links
as we are using single set 'links_for_embed' to
check whether we have to send event for
embedded links or not.

This commit fixes the bug by adding 'links_for_embed'
in message dict itself and send the event only
if that message has embedded links.
2020-09-29 16:50:47 -07:00
Steve Howell c199571112 mypy: Add StreamDict.
This requires us to rework the view code a little
bit to explicitly assign fields.
2020-09-29 16:49:10 -07:00
Steve Howell bee18c70f0 mypy: Use str in statsd_key. 2020-09-29 16:49:10 -07:00
Steve Howell 2c496d9afd mypy: Fix do_send_user_group_update_event. 2020-09-29 16:49:10 -07:00
Steve Howell a37ef208dc mypy: Add RawReactionRow. 2020-09-29 16:49:10 -07:00
Vishnu KS 367c792968 actions: Downgrade realm before scrubbing. 2020-09-28 15:37:49 -07:00
Vishnu KS 0d30f59c97 billing: downgrade_now -> downgrade_now_without_creating_additional_invoice. 2020-09-28 15:37:49 -07:00
Steve Howell def3dac6ae event_schema: Add comments to top of the file.
The comments basically explain the common coding
patterns for making the checkers.
2020-09-28 12:19:28 -07:00
Tim Abbott 3242fc7388 soft_deactivation: Fix typo in logging output. 2020-09-28 12:12:04 -07:00
palash 0c18113910 soft_deactivation: Change root logger to zulip.soft_deactivation.
Update logger in the following files using this logger:
test_soft_deactivation, test_home, test_push_notifications
2020-09-28 12:12:00 -07:00
Tim Abbott 899cb41857 MessageDict: Remove _finalize_payload for simplicity.
finalize_payload already has a few options; there's little benefit to
this one being implemented as a separate helper function.
2020-09-28 12:00:18 -07:00
Tim Abbott 99396b25a6 MessageDict: Add a bit of docstring documentation. 2020-09-28 11:50:02 -07:00
Tim Abbott 90ff62aabc actions: Rename message local variable to message_dict.
This is a preparatory refactor to make it easy to see the changes
using `git show` in the next commit.
2020-09-28 11:14:59 -07:00
sahil839 ae74f8aafb actions: Remove unnecessary comment in do_send_messages function.
This commit removes the unnecessary comment which was added in
9454683108, when we were using message.get() for keys which
were also passed as args in do_send_messages, but there are no
such keys in the current code.
2020-09-28 10:58:35 -07:00
sahil839 76c75fea92 actions: Remove unnecessary line from do_send_messages.
This commit removes the unnecessary line of code to get
rendered_content from message dict sent by check_message
when it actually does not inlcude 'rendered_content' key.

This line was added in 9454683108, but now we do not send
rendered_content in the message dict as we render the message
in do_send_messages itself.
2020-09-28 10:58:35 -07:00
Dinesh acca870480 tests: Add a dummy request to self.client.login().
A later commit alters `authenticate` of EmailAuthBackend to
add a store `needs_to_change_password` variable to session
which is useful to insist users on changing their weak password.

The tests start failing with that change because client.login()
runs `authenticate` without a `request` object. So, this commit
sends a request object with `request.session=self.client.session`
to self.client.login() in tests wherever needed.
2020-09-25 16:24:18 -07:00
Abhijeet Prasad Bodas d9d51e32c1 i18n: Simplify logic for translation data in page_params.
This refactors the get_translation_data function to return an empty
dict when 'en' language is passed, to avoid repetition of code in
stats and home.
2020-09-25 16:21:37 -07:00
Clara Dantas 8674287192 digest: Support digest of web public streams for guest users.
This change requires some basic plumbing for test code creating
web-public streams.
2020-09-25 16:11:04 -07:00
Tim Abbott 94a9fa1891 event_schema: Add documentation and rename a few functions.
This should help make this revised subsystem readable for more new
contributors.  We still need to make updates to the high-level
documentation.
2020-09-25 12:53:00 -07:00
Steve Howell 5b7c9c4714 test_events: Add check_realm_user_remove. 2020-09-25 11:43:20 -07:00
Steve Howell 7bb7f2943f event_schema: Finish extraction with realm_emoji/update.
We now no longer define any schemas in test_events--all
of them are in event_schema, which helps our tooling
cross-check schemas for openapi and node tests.
2020-09-25 11:43:20 -07:00
Steve Howell ae4d083a5a event_schema: Extract check_realm_domains_*. 2020-09-25 11:43:20 -07:00
Steve Howell 298bed9fa1 event_schema: Split check_update_message_flags. 2020-09-25 11:43:20 -07:00
Steve Howell f6e0171d02 event_schema: Split check_reaction into add/remove.
It happens that whether you add a reaction or remove
a reaction, we send the exact same fields, just using
a different op code.

This sort of symmetry is actually kind of rare, as
usually "add" events have more fields, and "remove" events
might just send an id of something to remove.

Our openapi schema treats these as two seperate events,
so we are more consistent with it, and it helps our
schema-checking tooling for node fixtures, too.

Note that we now have to exempt the two events from
our openapi checks, due to the is_mirror_dummy field
in the deprecated user block.  We can decide how to
handle this later--one possibility is to just add it
as an optional field on the event_schema side.
2020-09-25 11:43:20 -07:00
Steve Howell b7b2546f44 event_schema: Extract check_subscription_update.
Note that we use value_type for value instead of
bool, since properties can be non-bool things
like color, which we just don't test now.  We
should test them.

We more than compensate for this by checking
the actual value of the value in
check_subscription_update.
2020-09-25 11:43:20 -07:00
Steve Howell b920ebce81 event_schema: Extract check_has_zoom_token. 2020-09-25 11:43:20 -07:00
Steve Howell 0c4286222f event_schema: Extract check_realm_update_dict. 2020-09-25 11:43:20 -07:00
Steve Howell 6ec6525624 event_schema: Extract check_delete_message.
There is a legacy format where we send
singular "message_id" instead of plural
"message_ids".

Then there are different fields for "private"
and "stream" message types.
2020-09-25 11:43:20 -07:00
Steve Howell 88165aee6b event_schema: Extract check_user_group_update. 2020-09-25 11:43:20 -07:00
Steve Howell aaaac11661 event_schema: Extract check_user_group_remove. 2020-09-25 11:43:20 -07:00
Steve Howell 1b7af13f37 event_schema: Extract check_user_group_remove_members. 2020-09-25 11:43:20 -07:00