For the below payloads we want `owner_id` instead
of `owner`, which we should deprecate. (The
`owner` field is actually an email, which is
not a stable key.)
page_params.realm_bots
realm_bot/add
realm_bot/update
IMPORTANT NOTE: Some of the data served in
these payloads is cached with the key
`bot_dicts_in_realm_cache_key`.
For page_params, we get the new field
via `get_owned_bot_dicts`.
For realm_bot/add, we modified
`created_bot_event`.
For realm_bot/update, we modified
`do_change_bot_owner`.
On the JS side, we no longer
look up the bot's owner directly in
`server_events_dispatch` when we get
a realm_bot/update event. Instead, we
delegate that job to `bot_data.js`.
I modified the tests accordingly.
Previously, alert words were a JSON list of strings stored in a
TextField on user_profile. That hacky model reflected the fact that
they were an early prototype feature.
This commit migrates from that to a separate table, 'AlertWord'. The
new AlertWord has user_profile, word, id and realm(denormalization so
we can provide a nice index for fetching all the alert words in a
realm).
This transition requires moving the logic for flushing the Alert Words
caches to their own independent feature.
Note that this commit should not be cherry-picked without the
following commit, which fixes case-sensitivity issues with Alert Words.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Fixes#13504.
This commit is purely an improvement in error handling.
We used to not do any validation on keys before passing them to
memcached, which meant for invalid keys, memcached's own key
validation would throw an exception. Unfortunately, the resulting
error messages are super hard to read; the traceback structure doesn't
even show where the call into memcached happened.
In this commit we add validation to all the basic cache_* functions, and
appropriate handling in their callers.
We also add a lot of tests for the new behavior, which has the nice
effect of giving us decent coverage of all these core caching
functions which previously had been primarily tested manually.
The user information in display_recipient in cached message_dicts
becomes outdated if the information is changed in any way.
In particular, since we don't have a way to find all the message
objects that might contain PMs after an organization toggles the
setting to hide user email addresses from other users, we had a
situation where client might see inaccurate cached data from before
the transition for a period of up to hours.
We address this by using our generic_bulk_cached_fetch toolchain to
ensure we always are fetching display_recipient data from the database
(and/or a special recipient_id -> display_recipient cache, which we
can flush easily).
Fixes#12818.
The typing for generic_bulk_cached_fetch is complicated, and was
recorded incorrectly previously for the case where a cache_transformer
function is required. We fix this by adding the new CacheItemT, and
additionally add comments explaining what's going on with these types
for future reference.
Thanks to Mateusz Mandera for raising this issue.
In the unlikely event that someone edited the properties of a system
bot and then saved the result, we were still caching the old version
indefinitely in the get_system_bot cache.
This led to a confusing case where a newly installed Zulip server
didn't have is_api_super_user properly set on its EMAIL_GATEWAY_BOT in
memcached.
Co-authored-by: Mateusz Mandera <mateusz.mandera@protonmail.com>
Modifies the dict with the user info to include the key `bot_owner_id`
so it can be displayed in the user info popover.
Tests concerned with changing bot owner have been modified to have
number of events=2 because while updating the bot info, two events
are fired -- updating the `realm_bot` and `realm_user` since the
key `bot_owner_id` is a part of realm user info.
Calls to `render_markdown_path` weren't getting cached since the context
argument is unhashable, and the `ignore_unhashable_lru_cache` decorator ignores
such calls. This commit adds a couple of more decorators - one which converts
dict arguments to the function to a dict items tuple, and another which converts
dict items tuple arguments back to dicts. These two decorators used along with
the `ignore_unhashable_lru_cache` decorator ensure that the calls to
`render_markdown_path` with the context dict argument are also cached.
The time to run zerver.tests.test_urls.PublicURLTest.test_public_urls drops by
about 50% from 8.4s to 4.1s with this commit. The time to run
zerver.tests.test_docs.DocPageTest.test_doc_endpoints drops by about 20% from
3.2s to 2.5s.
Previously, these cache keys looked like:
:1:9c26164d3a393e316e0f8210efe270e08710d45astream_by_realm_and_name:...
Now, they look like this:
:1:9c26164d3a393e316e0f8210efe270e08710d45a:stream_by_realm_and_name:...
This commit leverages the ahocorasick algorithm to build a set of user_ids
that have their alert_words present in the message. It runs in linear time
of the order of length of the input message as opposed to number of
alert_words. This is after building a ahocorasick Automaton which runs
in O(number of alert_words in entire realm) which is usually cached.
Refactor the potentially expensive work done by Beautiful Soup into a
function that is called by the alter_content function, so that we can
cache the result. Saves a significant portion of the runtime of
loading of all of our /help/ and /api/ documentation pages (e.g. 12ms
for /api).
Fixes#11088.
Tweaked by tabbott to use the URL path as the cache key, clean up
argument structure, and use a clearer name for the function.
A key part of this is the new helper, get_user_by_delivery_email. Its
verbose name is important for clarity; it should help avoid blind
copy-pasting of get_user (which we'll also want to rename).
Unfortunately, it requires detailed understanding of the context to
figure out which one to use; each is used in about half of call sites.
Another important note is that this PR doesn't migrate get_user calls
in the tests except where not doing so would cause the tests to fail.
This probably deserves a follow-up refactor to avoid bugs here.
This is initial work, which will help us establish habits of using a
well-tested approach for renaming a Zulip organization (since as part
of https://github.com/zulip/zulip-mobile/issues/3142, we'll likely
need to make this function do more).
This supports guest user in the user-info-form-modal as well as in the
role section of the admin-user-table.
With some fixes by Tim Abbott and Shubham Dhama.
We don't want really long urls to lead to truncated
keys, or we could theoretically have two different
urls get mixed up previews.
Also, this suppresses warnings about exceeding the
250 char limit.
Finally, this gives the key a proper prefix.
Now reading API keys from a user is done with the get_api_key wrapper
method, rather than directly fetching it from the user object.
Also, every place where an action should be done for each API key is now
using get_all_api_keys. This method returns for the moment a single-item
list, containing the specified user's API key.
This commit is the first step towards allowing users have multiple API
keys.
We solved the problem the TODO raised by using a different type
annotation syntax, and I'm not sure whether that refactor would
actually improve the code.
This is a wrapper over lru_cache function. It adds following features on
top of lru_cache:
* It will not cache result of functions with unhashable arguments.
* It will clear cache whenever zerver.lib.cache.KEY_PREFIX changes.
The previous implementation had a subtle caching bug: because it was
sharing its cache with the `get_user_profile_by_email` cache, if a
user happened to have an email in that cache, we'd return it, even
though that user didn't match `base_query`.
This causes `get_cross_realm_users` to no longer have a problematic
caching bug.
Along with fixing some minor bugs, this requires extracting out the
default functions so that we can do type: ignores on them properly.
While we're at it, we switch to the Python 3 syntax.
Before this change, we populated two cache entries for each
message that we sent. The entries were largely redundant,
with the only difference being whether we sent the content
as raw markdown or as the rendered HTML.
This commit makes it so we only have one cache entry per
message, and it includes both content and rendered_content.
One legacy source on confusion here is that `content`
changes meaning when you're on the front end. Here is the
situation going forward:
database:
content = raw
rendered_contented = rendered
cache entry:
content = raw
rendered_contented = rendered
payload for the frontend:
content = raw (for apply_markdown=False)
content = rendered (for apply_markdown=True)
Every time we updated a UserProfile object, we were calling
delete_display_recipient_cache(), which churns the cache and
does an extra database hop to find subscriptions. This was
due to saying `updated_fields` instead of `update_fields`.
This made us prone to cache churn for fields like UserProfile.pointer
that are fairly volatile.
Now we use the helper function changed(). To prevent the
opposite problem, we use all the fields that could invalidate
the cache.
This is a prepatory commit that adds non-active users to
the realm user cache. It mostly involves name changes and
removing an `is_active` filter from the relevant DB query.
The only consumer of this cache is `get_raw_user_data`, which
now filters on `is_active` in a dictionary comprehension (but
this will get moved around a bit in a subsequent commit).
We now have a dedicated cache for active_user_ids() that only
stores a list of user_ids.
Before this commit, active_user_ids() used a cache of UserProfile
dictionaries, so it incurred unnecessary deserialization costs for
all the user fields that it sliced away in a list comprehension.
Because the cache is skinnier here, we also need to invalidate it
less frequently. Basically, all we care about is new users, realm
deactivations, and user deactivations.
It's hard to measure how much this will improve performance, because
the speedup for any operation here is pretty minor, but we use this
function a lot, so hopefully it will make the overall system more
healthy.
Due to the refactoring of the avatar URL codepath that added realm IDs
to the URLs, we ended up calling `get_user_profile_by_email` inside
`get_avatar_url`, which in turns was called in a loop over all users
in a realm.
Needless to say, this resulted in a significant performance problem.
We fix this issue by passing in the data needed to compute the avatar
URL, rather than looking it up by email address.
Modify the `bot_list` to hold all the bots owned by an user
irrespective of whether the bot is active or inactive. Also
include the `is_active` field in `active_bot_dict_fields` to
distinguish between inactive and active bots.
Our client code will now receive avatar_url in
page_params.people_list during page load, so it will be
able to use more current urls for old messages (the client
already had some logic for that and was just missing the
data).
We also add avatar_url to the realm_user/add event.
When we change the avatar, we make sure to always send a
realm_user/update event (even for bots).
We also needed to add avatar_version and
avatar_source to our active users cache.
There is a change in Django 1.10 due to which whenever the password
of the user is changed the session hash changes. This change affects
us because we cache user profile objects and these cached objects need
to be refreshed. However, the signal sent by Django in which objects are
refreshed fails to refresh the cache for Tornado because it uses a
different cache prefix.
Note: Backend tests are not affected because they don't rely on Tornado.
This change adds support for displaying inline open graph previews for
links posted into Zulip.
It is designed to interact correctly with message editing.
This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control
whether this feature is enabled.
By default, this setting is currently disabled, so that we can burn it
in for a bit before it impacts users more broadly.
Eventually, we may want to make this manageable via a (set of?)
per-realm settings. E.g. I can imagine a realm wanting to be able to
enable/disable it for certain URLs.
Previously, the key prefix was based on the process id due to which
the JS tests couldn't properly flush user profiles from the cache as
our application spans over multiple processes. This problem becomes
apparent when in json_change_settings view after changing the user_profile
the tornado views continue to get the cached user profile corresponding
to their process id.
I move these three functions to lib/cache.py:
to_dict_cache_key_id
to_dict_cache_key
flush_message
This will prepare us for a more significant refactoring that
eventually breaks down some circular dependencies with
Message and bugdown.