performance: Avoid monster query for existing subs.

Part of our codepath for subscribing users involves fetching the users' existing subscriptions to make sure we can do things like properly report to the clients that the users were already subscribed. This codepath used to be coupled to code that helped users maintain unique stream colors. Suppose you are creating a new stream, and you are importing users from an older stream with 15k subscribers, and each of your users is subscribed to about 20 streams. The prior code, instead of filtering on recipient_id, would literally look at every subscription for every user, which was kind of crazy if you didn't understand the pick-stream-color complications. Before this commit, we would fetch 300k rows with 15 columns each (granted, all but one of the columns are bool/int). That's a total of 4.5 million tiny objects that we had to glom into Django ORM objects and slice and dice. After this commit, we would fetch exactly zero rows for the are-they-already-subscribed logic. Yes, ZERO. If we were to mistakenly try to re-add the same 15k subscribers to the new stream (under the new code), we will now fetch 15k Sub rows instead of 300k. It is worth looking at the prior commit. We go through great pains to ensure that users get new stream colors when we invite them to a stream, and we still fetch a bunch of data for that. Instead of 4.5 million cells, it's more like 600k cells (2 columns per row), and it's less than that insofar as some users may only have 24 distinct colors among their many streams. It's a lot of work.
2021-12-24 17:09:54 +00:00 · 2021-12-24 17:09:54 +00:00 · fe3295d395
parent f638fd6f72
commit fe3295d395
2 changed files with 8 additions and 11 deletions
--- a/zerver/lib/actions.py
+++ b/zerver/lib/actions.py
@ -135,7 +135,6 @@ from zerver.lib.stream_subscription import (
    get_active_subscriptions_for_stream_id,
    get_bulk_stream_subscriber_info,
    get_stream_subscriptions_for_user,
    get_stream_subscriptions_for_users,
    get_subscribed_stream_ids_for_user,
    get_subscriptions_for_send_message,
    get_used_colors_for_user_ids,
@ -3856,13 +3855,19 @@ def bulk_add_subscriptions(
    for user in users:
        assert user.realm_id == realm.id
    recipient_ids = [stream.recipient_id for stream in streams]
    recipient_id_to_stream = {stream.recipient_id: stream for stream in streams}
    used_colors_for_user_ids: Dict[int, Set[str]] = get_used_colors_for_user_ids(user_ids)
    existing_subs = Subscription.objects.filter(
        user_profile_id__in=user_ids,
        recipient__type=Recipient.STREAM,
        recipient_id__in=recipient_ids,
    )
    subs_by_user: Dict[int, List[Subscription]] = defaultdict(list)
-    all_subs_query = get_stream_subscriptions_for_users(users)
+    for sub in existing_subs:
    for sub in all_subs_query:
        subs_by_user[sub.user_profile_id].append(sub)
    already_subscribed: List[SubInfo] = []
--- a/zerver/lib/stream_subscription.py
+++ b/zerver/lib/stream_subscription.py
@ -74,14 +74,6 @@ def get_stream_subscriptions_for_user(user_profile: UserProfile) -> QuerySet:
    )
 def get_stream_subscriptions_for_users(user_profiles: List[UserProfile]) -> QuerySet:
    # TODO: Change return type to QuerySet[Subscription]
    return Subscription.objects.filter(
        user_profile__in=user_profiles,
        recipient__type=Recipient.STREAM,
    )
 def get_used_colors_for_user_ids(user_ids: List[int]) -> Dict[int, Set[str]]:
    """Fetch which stream colors have already been used for each user in
    user_ids. Uses an optimized query designed to support picking