zulip/zerver/lib/cache_helpers.py

99 lines
5.0 KiB
Python
Raw Normal View History

from __future__ import absolute_import
from six import text_type, binary_type
2016-06-03 23:33:25 +02:00
from typing import Any, Dict, Callable, Tuple
# This file needs to be different from cache.py because cache.py
# cannot import anything from zerver.models or we'd have an import
# loop
from django.conf import settings
from zerver.models import Message, UserProfile, Stream, get_stream_cache_key, \
Recipient, get_recipient_cache_key, Client, get_client_cache_key, \
Huddle, huddle_hash_cache_key, to_dict_cache_key_id
from zerver.lib.cache import cache_with_key, cache_set, \
user_profile_by_email_cache_key, user_profile_by_id_cache_key, \
get_remote_cache_time, get_remote_cache_requests, cache_set_many
from django.utils.importlib import import_module
from django.contrib.sessions.models import Session
import logging
from django.db.models import Q
MESSAGE_CACHE_SIZE = 75000
Increase efficiency of initial message cache query In repeated trials, the initial data fetch used to take about 1100ms. In practice, it was often taking >2000ms, probably due to caching effects. This commit cuts the time down to about 300ms in repeated trials. Note that the semantics are changed slightly in that we may no longer get exactly 25000 messages. However, holes in the message_id sequence are currently very rare or non-existent so this shouldn't be a problem and we don't care about the exact number of messages anyway. I believe the problem was that the query planner was unable to effectively use the LIMIT clause to figure out that only a small subset of zephyr_message was going to be needed. Thus, it planned for operating on the entire table and decided it could not use a more efficient plan because work_mem, although large, would not be large enough to execute the query over all of zephyr_message. The original query was: SELECT "zephyr_message"."id", "zephyr_message"."sender_id", "zephyr_message"."recipient_id", "zephyr_message"."subject", "zephyr_message"."content", "zephyr_message"."rendered_content", "zephyr_message"."rendered_content_version", "zephyr_message"."pub_date", "zephyr_message"."sending_client_id", "zephyr_userprofile"."id", "zephyr_userprofile"."password", "zephyr_userprofile"."last_login", "zephyr_userprofile"."email", "zephyr_userprofile"."is_staff", "zephyr_userprofile"."is_active", "zephyr_userprofile"."date_joined", "zephyr_userprofile"."full_name", "zephyr_userprofile"."short_name", "zephyr_userprofile"."pointer", "zephyr_userprofile"."last_pointer_updater", "zephyr_userprofile"."realm_id", "zephyr_userprofile"."api_key", "zephyr_userprofile"."enable_desktop_notifications", "zephyr_userprofile"."enter_sends", "zephyr_userprofile"."tutorial_status", "zephyr_realm"."id", "zephyr_realm"."domain", "zephyr_realm"."restricted_to_domain", "zephyr_recipient"."id", "zephyr_recipient"."type_id", "zephyr_recipient"."type", "zephyr_client"."id", "zephyr_client"."name" FROM "zephyr_message" INNER JOIN "zephyr_userprofile" ON ( "zephyr_message"."sender_id" = "zephyr_userprofile"."id" ) INNER JOIN "zephyr_realm" ON ( "zephyr_userprofile"."realm_id" = "zephyr_realm"."id" ) INNER JOIN "zephyr_recipient" ON ( "zephyr_message"."recipient_id" = "zephyr_recipient"."id" ) INNER JOIN "zephyr_client" ON ( "zephyr_message"."sending_client_id" = "zephyr_client"."id" ) ORDER BY "zephyr_message"."id" DESC LIMIT 25000; with query plan: Limit (cost=0.00..27120.95 rows=25000 width=362) (actual time=0.051..1121.282 rows=25000 loops=1) -> Nested Loop (cost=0.00..5330872.99 rows=4913981 width=362) (actual time=0.048..1081.014 rows=25000 loops=1) -> Nested Loop (cost=0.00..3932643.31 rows=4913981 width=344) (actual time=0.042..926.398 rows=25000 loops=1) -> Nested Loop (cost=0.00..2550275.29 rows=4913981 width=334) (actual time=0.035..752.524 rows=25000 loops=1) Join Filter: (zephyr_message.sending_client_id = zephyr_client.id) -> Nested Loop (cost=0.00..1739467.29 rows=4913981 width=320) (actual time=0.024..217.348 rows=25000 loops=1) -> Index Scan Backward using zephyr_message_pkey on zephyr_message (cost=0.00..362510.09 rows=4913981 width=156) (actual time=0.014..42.097 rows=25000 loops=1) -> Index Scan using zephyr_userprofile_pkey on zephyr_userprofile (cost=0.00..0.27 rows=1 width=164) (actual time=0.003..0.004 rows=1 loops=25000) Index Cond: (id = zephyr_message.sender_id) -> Materialize (cost=0.00..1.17 rows=11 width=14) (actual time=0.001..0.010 rows=11 loops=25000) -> Seq Scan on zephyr_client (cost=0.00..1.11 rows=11 width=14) (actual time=0.002..0.010 rows=11 loops=1) -> Index Scan using zephyr_recipient_pkey on zephyr_recipient (cost=0.00..0.27 rows=1 width=10) (actual time=0.002..0.003 rows=1 loops=25000) Index Cond: (id = zephyr_message.recipient_id) -> Index Scan using zephyr_realm_pkey on zephyr_realm (cost=0.00..0.27 rows=1 width=18) (actual time=0.002..0.003 rows=1 loops=25000) Index Cond: (id = zephyr_userprofile.realm_id) Total runtime: 1141.408 ms In the new code, we do two queries: SELECT "zephyr_message"."id" FROM "zephyr_message" ORDER BY "zephyr_message"."id" DESC LIMIT 1 followed by: SELECT "zephyr_message"."id", "zephyr_message"."sender_id", "zephyr_message"."recipient_id", "zephyr_message"."subject", "zephyr_message"."content", "zephyr_message"."rendered_content", "zephyr_message"."rendered_content_version", "zephyr_message"."pub_date", "zephyr_message"."sending_client_id", "zephyr_userprofile"."id", "zephyr_userprofile"."password", "zephyr_userprofile"."last_login", "zephyr_userprofile"."email", "zephyr_userprofile"."is_staff", "zephyr_userprofile"."is_active", "zephyr_userprofile"."date_joined", "zephyr_userprofile"."full_name", "zephyr_userprofile"."short_name", "zephyr_userprofile"."pointer", "zephyr_userprofile"."last_pointer_updater", "zephyr_userprofile"."realm_id", "zephyr_userprofile"."api_key", "zephyr_userprofile"."enable_desktop_notifications", "zephyr_userprofile"."enter_sends", "zephyr_userprofile"."tutorial_status", "zephyr_realm"."id", "zephyr_realm"."domain", "zephyr_realm"."restricted_to_domain", "zephyr_recipient"."id", "zephyr_recipient"."type_id", "zephyr_recipient"."type", "zephyr_client"."id", "zephyr_client"."name" FROM "zephyr_message" INNER JOIN "zephyr_userprofile" ON ( "zephyr_message"."sender_id" = "zephyr_userprofile"."id" ) INNER JOIN "zephyr_realm" ON ( "zephyr_userprofile"."realm_id" = "zephyr_realm"."id" ) INNER JOIN "zephyr_recipient" ON ( "zephyr_message"."recipient_id" = "zephyr_recipient"."id" ) INNER JOIN "zephyr_client" ON ( "zephyr_message"."sending_client_id" = "zephyr_client"."id" ) WHERE "zephyr_message"."id" > 4941883 with the message id filled in as the result of the first query. The new query differs from the original only in that its ORDER BY and LIMIT clauses are replaced by a WHERE clause. The second query has query plan: Hash Join (cost=709.30..28048.18 rows=20544 width=365) (actual time=41.678..279.261 rows=25041 loops=1) Hash Cond: (zephyr_message.recipient_id = zephyr_recipient.id) -> Hash Join (cost=102.98..27056.66 rows=20544 width=355) (actual time=3.686..190.730 rows=25041 loops=1) Hash Cond: (zephyr_message.sending_client_id = zephyr_client.id) -> Hash Join (cost=101.73..26772.94 rows=20544 width=341) (actual time=3.649..143.695 rows=25041 loops=1) Hash Cond: (zephyr_userprofile.realm_id = zephyr_realm.id) -> Hash Join (cost=99.99..26488.71 rows=20544 width=323) (actual time=3.578..96.746 rows=25041 loops=1) Hash Cond: (zephyr_message.sender_id = zephyr_userprofile.id) -> Index Scan using zephyr_message_pkey on zephyr_message (cost=0.00..26106.24 rows=20544 width=159) (actual time=0.017..41.980 rows=25041 loops=1) Index Cond: (id > 4941883) -> Hash (cost=83.33..83.33 rows=1333 width=164) (actual time=3.548..3.548 rows=1333 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 275kB -> Seq Scan on zephyr_userprofile (cost=0.00..83.33 rows=1333 width=164) (actual time=0.006..1.646 rows=1333 loops=1) -> Hash (cost=1.33..1.33 rows=33 width=18) (actual time=0.064..0.064 rows=33 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 2kB -> Seq Scan on zephyr_realm (cost=0.00..1.33 rows=33 width=18) (actual time=0.003..0.033 rows=33 loops=1) -> Hash (cost=1.11..1.11 rows=11 width=14) (actual time=0.027..0.027 rows=11 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on zephyr_client (cost=0.00..1.11 rows=11 width=14) (actual time=0.003..0.013 rows=11 loops=1) -> Hash (cost=335.03..335.03 rows=21703 width=10) (actual time=37.974..37.974 rows=21761 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 893kB -> Seq Scan on zephyr_recipient (cost=0.00..335.03 rows=21703 width=10) (actual time=0.004..18.443 rows=21761 loops=1) Total runtime: 299.300 ms (imported from commit b2a70cccc47be7970df407c6be00eccd2e8be82a)
2013-04-25 01:30:19 +02:00
def message_fetch_objects():
2016-06-03 23:33:25 +02:00
# type: () -> List[Any]
try:
max_id = Message.objects.only('id').order_by("-id")[0].id
except IndexError:
return []
return Message.objects.select_related().filter(~Q(sender__email='tabbott/extra@mit.edu'),
id__gt=max_id - MESSAGE_CACHE_SIZE)
Increase efficiency of initial message cache query In repeated trials, the initial data fetch used to take about 1100ms. In practice, it was often taking >2000ms, probably due to caching effects. This commit cuts the time down to about 300ms in repeated trials. Note that the semantics are changed slightly in that we may no longer get exactly 25000 messages. However, holes in the message_id sequence are currently very rare or non-existent so this shouldn't be a problem and we don't care about the exact number of messages anyway. I believe the problem was that the query planner was unable to effectively use the LIMIT clause to figure out that only a small subset of zephyr_message was going to be needed. Thus, it planned for operating on the entire table and decided it could not use a more efficient plan because work_mem, although large, would not be large enough to execute the query over all of zephyr_message. The original query was: SELECT "zephyr_message"."id", "zephyr_message"."sender_id", "zephyr_message"."recipient_id", "zephyr_message"."subject", "zephyr_message"."content", "zephyr_message"."rendered_content", "zephyr_message"."rendered_content_version", "zephyr_message"."pub_date", "zephyr_message"."sending_client_id", "zephyr_userprofile"."id", "zephyr_userprofile"."password", "zephyr_userprofile"."last_login", "zephyr_userprofile"."email", "zephyr_userprofile"."is_staff", "zephyr_userprofile"."is_active", "zephyr_userprofile"."date_joined", "zephyr_userprofile"."full_name", "zephyr_userprofile"."short_name", "zephyr_userprofile"."pointer", "zephyr_userprofile"."last_pointer_updater", "zephyr_userprofile"."realm_id", "zephyr_userprofile"."api_key", "zephyr_userprofile"."enable_desktop_notifications", "zephyr_userprofile"."enter_sends", "zephyr_userprofile"."tutorial_status", "zephyr_realm"."id", "zephyr_realm"."domain", "zephyr_realm"."restricted_to_domain", "zephyr_recipient"."id", "zephyr_recipient"."type_id", "zephyr_recipient"."type", "zephyr_client"."id", "zephyr_client"."name" FROM "zephyr_message" INNER JOIN "zephyr_userprofile" ON ( "zephyr_message"."sender_id" = "zephyr_userprofile"."id" ) INNER JOIN "zephyr_realm" ON ( "zephyr_userprofile"."realm_id" = "zephyr_realm"."id" ) INNER JOIN "zephyr_recipient" ON ( "zephyr_message"."recipient_id" = "zephyr_recipient"."id" ) INNER JOIN "zephyr_client" ON ( "zephyr_message"."sending_client_id" = "zephyr_client"."id" ) ORDER BY "zephyr_message"."id" DESC LIMIT 25000; with query plan: Limit (cost=0.00..27120.95 rows=25000 width=362) (actual time=0.051..1121.282 rows=25000 loops=1) -> Nested Loop (cost=0.00..5330872.99 rows=4913981 width=362) (actual time=0.048..1081.014 rows=25000 loops=1) -> Nested Loop (cost=0.00..3932643.31 rows=4913981 width=344) (actual time=0.042..926.398 rows=25000 loops=1) -> Nested Loop (cost=0.00..2550275.29 rows=4913981 width=334) (actual time=0.035..752.524 rows=25000 loops=1) Join Filter: (zephyr_message.sending_client_id = zephyr_client.id) -> Nested Loop (cost=0.00..1739467.29 rows=4913981 width=320) (actual time=0.024..217.348 rows=25000 loops=1) -> Index Scan Backward using zephyr_message_pkey on zephyr_message (cost=0.00..362510.09 rows=4913981 width=156) (actual time=0.014..42.097 rows=25000 loops=1) -> Index Scan using zephyr_userprofile_pkey on zephyr_userprofile (cost=0.00..0.27 rows=1 width=164) (actual time=0.003..0.004 rows=1 loops=25000) Index Cond: (id = zephyr_message.sender_id) -> Materialize (cost=0.00..1.17 rows=11 width=14) (actual time=0.001..0.010 rows=11 loops=25000) -> Seq Scan on zephyr_client (cost=0.00..1.11 rows=11 width=14) (actual time=0.002..0.010 rows=11 loops=1) -> Index Scan using zephyr_recipient_pkey on zephyr_recipient (cost=0.00..0.27 rows=1 width=10) (actual time=0.002..0.003 rows=1 loops=25000) Index Cond: (id = zephyr_message.recipient_id) -> Index Scan using zephyr_realm_pkey on zephyr_realm (cost=0.00..0.27 rows=1 width=18) (actual time=0.002..0.003 rows=1 loops=25000) Index Cond: (id = zephyr_userprofile.realm_id) Total runtime: 1141.408 ms In the new code, we do two queries: SELECT "zephyr_message"."id" FROM "zephyr_message" ORDER BY "zephyr_message"."id" DESC LIMIT 1 followed by: SELECT "zephyr_message"."id", "zephyr_message"."sender_id", "zephyr_message"."recipient_id", "zephyr_message"."subject", "zephyr_message"."content", "zephyr_message"."rendered_content", "zephyr_message"."rendered_content_version", "zephyr_message"."pub_date", "zephyr_message"."sending_client_id", "zephyr_userprofile"."id", "zephyr_userprofile"."password", "zephyr_userprofile"."last_login", "zephyr_userprofile"."email", "zephyr_userprofile"."is_staff", "zephyr_userprofile"."is_active", "zephyr_userprofile"."date_joined", "zephyr_userprofile"."full_name", "zephyr_userprofile"."short_name", "zephyr_userprofile"."pointer", "zephyr_userprofile"."last_pointer_updater", "zephyr_userprofile"."realm_id", "zephyr_userprofile"."api_key", "zephyr_userprofile"."enable_desktop_notifications", "zephyr_userprofile"."enter_sends", "zephyr_userprofile"."tutorial_status", "zephyr_realm"."id", "zephyr_realm"."domain", "zephyr_realm"."restricted_to_domain", "zephyr_recipient"."id", "zephyr_recipient"."type_id", "zephyr_recipient"."type", "zephyr_client"."id", "zephyr_client"."name" FROM "zephyr_message" INNER JOIN "zephyr_userprofile" ON ( "zephyr_message"."sender_id" = "zephyr_userprofile"."id" ) INNER JOIN "zephyr_realm" ON ( "zephyr_userprofile"."realm_id" = "zephyr_realm"."id" ) INNER JOIN "zephyr_recipient" ON ( "zephyr_message"."recipient_id" = "zephyr_recipient"."id" ) INNER JOIN "zephyr_client" ON ( "zephyr_message"."sending_client_id" = "zephyr_client"."id" ) WHERE "zephyr_message"."id" > 4941883 with the message id filled in as the result of the first query. The new query differs from the original only in that its ORDER BY and LIMIT clauses are replaced by a WHERE clause. The second query has query plan: Hash Join (cost=709.30..28048.18 rows=20544 width=365) (actual time=41.678..279.261 rows=25041 loops=1) Hash Cond: (zephyr_message.recipient_id = zephyr_recipient.id) -> Hash Join (cost=102.98..27056.66 rows=20544 width=355) (actual time=3.686..190.730 rows=25041 loops=1) Hash Cond: (zephyr_message.sending_client_id = zephyr_client.id) -> Hash Join (cost=101.73..26772.94 rows=20544 width=341) (actual time=3.649..143.695 rows=25041 loops=1) Hash Cond: (zephyr_userprofile.realm_id = zephyr_realm.id) -> Hash Join (cost=99.99..26488.71 rows=20544 width=323) (actual time=3.578..96.746 rows=25041 loops=1) Hash Cond: (zephyr_message.sender_id = zephyr_userprofile.id) -> Index Scan using zephyr_message_pkey on zephyr_message (cost=0.00..26106.24 rows=20544 width=159) (actual time=0.017..41.980 rows=25041 loops=1) Index Cond: (id > 4941883) -> Hash (cost=83.33..83.33 rows=1333 width=164) (actual time=3.548..3.548 rows=1333 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 275kB -> Seq Scan on zephyr_userprofile (cost=0.00..83.33 rows=1333 width=164) (actual time=0.006..1.646 rows=1333 loops=1) -> Hash (cost=1.33..1.33 rows=33 width=18) (actual time=0.064..0.064 rows=33 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 2kB -> Seq Scan on zephyr_realm (cost=0.00..1.33 rows=33 width=18) (actual time=0.003..0.033 rows=33 loops=1) -> Hash (cost=1.11..1.11 rows=11 width=14) (actual time=0.027..0.027 rows=11 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on zephyr_client (cost=0.00..1.11 rows=11 width=14) (actual time=0.003..0.013 rows=11 loops=1) -> Hash (cost=335.03..335.03 rows=21703 width=10) (actual time=37.974..37.974 rows=21761 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 893kB -> Seq Scan on zephyr_recipient (cost=0.00..335.03 rows=21703 width=10) (actual time=0.004..18.443 rows=21761 loops=1) Total runtime: 299.300 ms (imported from commit b2a70cccc47be7970df407c6be00eccd2e8be82a)
2013-04-25 01:30:19 +02:00
def message_cache_items(items_for_remote_cache, message):
# type: (Dict[text_type, Tuple[binary_type]], Message) -> None
items_for_remote_cache[to_dict_cache_key_id(message.id, True)] = (message.to_dict_uncached(True),)
def user_cache_items(items_for_remote_cache, user_profile):
# type: (Dict[text_type, Tuple[UserProfile]], UserProfile) -> None
items_for_remote_cache[user_profile_by_email_cache_key(user_profile.email)] = (user_profile,)
items_for_remote_cache[user_profile_by_id_cache_key(user_profile.id)] = (user_profile,)
def stream_cache_items(items_for_remote_cache, stream):
# type: (Dict[text_type, Tuple[Stream]], Stream) -> None
items_for_remote_cache[get_stream_cache_key(stream.name, stream.realm_id)] = (stream,)
def client_cache_items(items_for_remote_cache, client):
# type: (Dict[text_type, Tuple[Client]], Client) -> None
items_for_remote_cache[get_client_cache_key(client.name)] = (client,)
def huddle_cache_items(items_for_remote_cache, huddle):
# type: (Dict[text_type, Tuple[Huddle]], Huddle) -> None
items_for_remote_cache[huddle_hash_cache_key(huddle.huddle_hash)] = (huddle,)
def recipient_cache_items(items_for_remote_cache, recipient):
# type: (Dict[text_type, Tuple[Recipient]], Recipient) -> None
items_for_remote_cache[get_recipient_cache_key(recipient.type, recipient.type_id)] = (recipient,)
session_engine = import_module(settings.SESSION_ENGINE)
def session_cache_items(items_for_remote_cache, session):
# type: (Dict[text_type, text_type], Session) -> None
store = session_engine.SessionStore(session_key=session.session_key)
items_for_remote_cache[store.cache_key] = store.decode(session.session_data)
# Format is (objects query, items filler function, timeout, batch size)
#
# The objects queries are put inside lambdas to prevent Django from
# doing any setup for things we're unlikely to use (without the lambda
# wrapper the below adds an extra 3ms or so to startup time for
# anything importing this file).
cache_fillers = {
'user': (lambda: UserProfile.objects.select_related().all(), user_cache_items, 3600*24*7, 10000),
'client': (lambda: Client.objects.select_related().all(), client_cache_items, 3600*24*7, 10000),
'recipient': (lambda: Recipient.objects.select_related().all(), recipient_cache_items, 3600*24*7, 10000),
'stream': (lambda: Stream.objects.select_related().all(), stream_cache_items, 3600*24*7, 10000),
### Message cache fetching disabled until we can fix the fact that it
### does a bunch of inefficient memcached queries as part of filling
### the display_recipient cache
# 'message': (message_fetch_objects, message_cache_items, 3600 * 24, 1000),
'huddle': (lambda: Huddle.objects.select_related().all(), huddle_cache_items, 3600*24*7, 10000),
'session': (lambda: Session.objects.all(), session_cache_items, 3600*24*7, 10000),
} # type: Dict[str, Tuple[Callable[[], List[Any]], Callable[[Dict[text_type, Any], Any], None], int, int]]
def fill_remote_cache(cache):
2016-06-03 23:33:25 +02:00
# type: (str) -> None
2016-03-31 03:23:21 +02:00
remote_cache_time_start = get_remote_cache_time()
remote_cache_requests_start = get_remote_cache_requests()
items_for_remote_cache = {} # type: Dict[text_type, Any]
(objects, items_filler, timeout, batch_size) = cache_fillers[cache]
count = 0
for obj in objects():
items_filler(items_for_remote_cache, obj)
count += 1
if (count % batch_size == 0):
cache_set_many(items_for_remote_cache, timeout=3600*24)
items_for_remote_cache = {}
cache_set_many(items_for_remote_cache, timeout=3600*24*7)
logging.info("Succesfully populated %s cache! Consumed %s remote cache queries (%s time)" % \
(cache, get_remote_cache_requests() - remote_cache_requests_start,
2016-03-31 03:23:21 +02:00
round(get_remote_cache_time() - remote_cache_time_start, 2)))