We solved the problem the TODO raised by using a different type
annotation syntax, and I'm not sure whether that refactor would
actually improve the code.
This is a wrapper over lru_cache function. It adds following features on
top of lru_cache:
* It will not cache result of functions with unhashable arguments.
* It will clear cache whenever zerver.lib.cache.KEY_PREFIX changes.
The previous implementation had a subtle caching bug: because it was
sharing its cache with the `get_user_profile_by_email` cache, if a
user happened to have an email in that cache, we'd return it, even
though that user didn't match `base_query`.
This causes `get_cross_realm_users` to no longer have a problematic
caching bug.
Along with fixing some minor bugs, this requires extracting out the
default functions so that we can do type: ignores on them properly.
While we're at it, we switch to the Python 3 syntax.
Before this change, we populated two cache entries for each
message that we sent. The entries were largely redundant,
with the only difference being whether we sent the content
as raw markdown or as the rendered HTML.
This commit makes it so we only have one cache entry per
message, and it includes both content and rendered_content.
One legacy source on confusion here is that `content`
changes meaning when you're on the front end. Here is the
situation going forward:
database:
content = raw
rendered_contented = rendered
cache entry:
content = raw
rendered_contented = rendered
payload for the frontend:
content = raw (for apply_markdown=False)
content = rendered (for apply_markdown=True)
Every time we updated a UserProfile object, we were calling
delete_display_recipient_cache(), which churns the cache and
does an extra database hop to find subscriptions. This was
due to saying `updated_fields` instead of `update_fields`.
This made us prone to cache churn for fields like UserProfile.pointer
that are fairly volatile.
Now we use the helper function changed(). To prevent the
opposite problem, we use all the fields that could invalidate
the cache.
This is a prepatory commit that adds non-active users to
the realm user cache. It mostly involves name changes and
removing an `is_active` filter from the relevant DB query.
The only consumer of this cache is `get_raw_user_data`, which
now filters on `is_active` in a dictionary comprehension (but
this will get moved around a bit in a subsequent commit).
We now have a dedicated cache for active_user_ids() that only
stores a list of user_ids.
Before this commit, active_user_ids() used a cache of UserProfile
dictionaries, so it incurred unnecessary deserialization costs for
all the user fields that it sliced away in a list comprehension.
Because the cache is skinnier here, we also need to invalidate it
less frequently. Basically, all we care about is new users, realm
deactivations, and user deactivations.
It's hard to measure how much this will improve performance, because
the speedup for any operation here is pretty minor, but we use this
function a lot, so hopefully it will make the overall system more
healthy.
Due to the refactoring of the avatar URL codepath that added realm IDs
to the URLs, we ended up calling `get_user_profile_by_email` inside
`get_avatar_url`, which in turns was called in a loop over all users
in a realm.
Needless to say, this resulted in a significant performance problem.
We fix this issue by passing in the data needed to compute the avatar
URL, rather than looking it up by email address.
Modify the `bot_list` to hold all the bots owned by an user
irrespective of whether the bot is active or inactive. Also
include the `is_active` field in `active_bot_dict_fields` to
distinguish between inactive and active bots.
Our client code will now receive avatar_url in
page_params.people_list during page load, so it will be
able to use more current urls for old messages (the client
already had some logic for that and was just missing the
data).
We also add avatar_url to the realm_user/add event.
When we change the avatar, we make sure to always send a
realm_user/update event (even for bots).
We also needed to add avatar_version and
avatar_source to our active users cache.
There is a change in Django 1.10 due to which whenever the password
of the user is changed the session hash changes. This change affects
us because we cache user profile objects and these cached objects need
to be refreshed. However, the signal sent by Django in which objects are
refreshed fails to refresh the cache for Tornado because it uses a
different cache prefix.
Note: Backend tests are not affected because they don't rely on Tornado.
This change adds support for displaying inline open graph previews for
links posted into Zulip.
It is designed to interact correctly with message editing.
This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control
whether this feature is enabled.
By default, this setting is currently disabled, so that we can burn it
in for a bit before it impacts users more broadly.
Eventually, we may want to make this manageable via a (set of?)
per-realm settings. E.g. I can imagine a realm wanting to be able to
enable/disable it for certain URLs.
Previously, the key prefix was based on the process id due to which
the JS tests couldn't properly flush user profiles from the cache as
our application spans over multiple processes. This problem becomes
apparent when in json_change_settings view after changing the user_profile
the tornado views continue to get the cached user profile corresponding
to their process id.
I move these three functions to lib/cache.py:
to_dict_cache_key_id
to_dict_cache_key
flush_message
This will prepare us for a more significant refactoring that
eventually breaks down some circular dependencies with
Message and bugdown.
Our flush functions update user profile cache entries which can cause
confusing race conditions (see e.g. #1257). To resolve this, we move
all the user_profile flush functions to delete the entry instead of
updating it -- it will then be fetched as part of the next request
that needs to access the user object.
There are still races here, and there is perhaps an argument that a
better fix for this would be to re-fetch the object and then put it
into the cache, but this resolves the main cache correctness problem
we had with the previous implementation.
Fixes: #1322.
Originally this cache was used to transmit data from Django to Tornado
(and also for general message caching purposes), but now nothing
actually reads from this cache, so we can eliminate it.
Due to a cyclic dependency issue, functions having models as parameters
were annotated as Any.
That issue is fixed by importing models inside an `if False:` block,
so that mypy sees them but they are not imported at runtime.
In update_user_profile_caches, the return type in annotation was
marked as Any. Change that to None because, nothing is being returned
in that function.
This changes the type annotations for the cache keys in Zulip to be
consistently text_type, and updates the annotations for values that
are used as cache keys across the codebase.
[Substantially revised by tabbott]
This probably still has some bugs in it, but having mostly complete
annotations for models.py will help a lot for the annotations folks
are adding to other files.
The old code for this lookup was unnecessarily complicated because we
were working around Guardian, where the `is_realm_admin` check was
extremely expensive.
Previously we relied on having two matching list of fields for the
get_active_user_dicts_in_realm, one in the actual code and the other
in the caching system. By unifying these lists to have a single
source, we eliminate a class of caching bugs we might otherwise
regularly introduce.