This migration applies under the assumption that extra_data_json has
been populated for all existing and coming audit log entries.
- This removes the manual conversions back and forth for extra_data
throughout the codebase including the orjson.loads(), orjson.dumps(),
and str() calls.
- The custom handler used for converting Decimal is removed since
DjangoJSONEncoder handles that for extra_data.
- We remove None-checks for extra_data because it is now no longer
nullable.
- Meanwhile, we want the bouncer to support processing RealmAuditLog entries for
remote servers before and after the JSONField migration on extra_data.
- Since now extra_data should always be a dict for the newer remote
server, which is now migrated, the test cases are updated to create
RealmAuditLog objects by passing a dict for extra_data before
sending over the analytics data. Note that while JSONField allows for
non-dict values, a proper remote server always passes a dict for
extra_data.
- We still test out the legacy extra_data format because not all
remote servers have migrated to use JSONField extra_data.
This verifies that support for extra_data being a string or None has not
been dropped.
Co-authored-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
We remove bot's subscriptions for private streams to which the
new owner is not subscribed and keep the ones to which the new
owner is subscribed on changing owner.
This commit also changes the code for sending subscription
remove events to use transaction.on_commit since we call
the function inside a transactopn in do_change_bot_owner and
this also requires some changes in tests in test_events.
For topic wildcard mentions, the 'wildcard_mentioned' flag is set
for those user messages having 'user_profile_id' in
'topic_participant_user_ids', i.e. all topic participants.
Earlier, the flag was set if the 'user_profile_id' exists in
'all_topic_wildcard_mention_user_ids'.
'all_topic_wildcard_mention_user_ids' contains the ids of those
users who are topic participants and have enabled notifications
for '@topic' mentions.
The earlier approach was incorrect, as it would set the
'wildcard_mentioned' flag only for those topic participants
who have enabled the notifications for '@topic' mention instead
of setting the flag for all the topic participants.
The bug was introduced in 4c9d26c.
This commit addresses the issue where the topic highlighting
in search results was offset by one character when an
apostrophe was present. The problem stemmed from the disparity
in HTML escaping generated by the function `func.escape_html` which
is used to obtain `topic_matches` differs from the escaping performed
by the function `django.utils.html.escape` for apostrophes (').
func.escape_html | django.utils.html.escape
-----------------+--------------------------
' | '
To fix this SQL query is changed to return the HTML-escaped
topic name generated by the function `func.escape_html`.
Fixes: #25633.
Currently, we are displaying the "Complete the organization profile"
banner immediately after the organization was created. It's important to
strongly encourage orgs to configure their profile, so we should delay
showing the banner if the profile has not been configured after 15 days.
Thus also allows the users to check out Zulip and see how it works before
configuring the organization settings.
Fixes: #24122.
The number of affected objects may be quite high, and they are
selected by `id IN (...)` query, and updated with a giant `CASE`.
This turns out to be quadratic, and can cause large queries to take
hours, in a state where they cannot be terminated, when PostgreSQL >11
tries to JIT the query.
Set a batch_size as a stopgap performance fix before moving to
`.update()` as a real fix.
We have historically cached two types of values
on a per-request basis inside of memory:
* linkifiers
* display recipients
Both of these caches were hand-written, and they
both actually cache values that are also in memcached,
so the per-request cache essentially only saves us
from a few memcached hits.
I think the linkifier per-request cache is a necessary
evil. It's an important part of message rendering, and
it's not super easy to structure the code to just get
a single value up front and pass it down the stack.
I'm not so sure we even need the display recipient
per-request cache any more, as we are generally pretty
smart now about hydrating recipient data in terms of
how the code is organized. But I haven't done thorough
research on that hypotheseis.
Fortunately, it's not rocket science to just write
a glorified memoize decorator and tie it into key
places in the code:
* middleware
* tests (e.g. asserting db counts)
* queue processors
That's what I did in this commit.
This commit definitely reduces the amount of code
to maintain. I think it also gets us closer to
possibly phasing out this whole technique, but that
effort is beyond the scope of this PR. We could
add some instrumentation to the decorator to see
how often we get a non-trivial number of saved
round trips to memcached.
Note that when we flush linkifiers, we just use
a big hammer and flush the entire per-request
cache for linkifiers, since there is only ever
one realm in the cache.
We want to phase out the use of get_display_recipient
for streams, and this is the last place that I
eliminate it. The next commit will eliminate the
dead code and make mypy types tighter.
This change will make push notifications slightly
slower in some situations, but we avoid all the
complexity of a cache, and this code tends to run
offline.
We could always make this code a bit more efficient
by being a little smarter about what data we fetch
up front. For example, get_apns_alert_title gets
called by a function that already has the stream
name. It's just a bit of a pain to refactor when
you have all the DM codepath mucked up with the
stream codepath.
There's no need for the complexity and extra round
trips to call get_display_recipient in a testing
context.
We also eliminate the unnecessary call to check_string.
This function is poorly named, but that's a sweep
for another day.
The get_display_recipient helper is a clumsy way to get
stream names, and it's not even representative of how
most of our code retrieves stream names.
The new helper also double-checks that the Stream
object has the correct recipient id.
We no longer have to reason about the 12 possible
ways of invoking get_narrow_url. We also avoid
double computation in a couple places.
Finally, we get stricter type checks by just inlining
the calls.
We don't need to call get_display_recipient for
non-stream messages.
I will rename display_recipient in the next commit;
if I were to combine the steps the diff would be too
hard to read.
This commit renames the keyword 'pm' to 'dm' in the
'pm_mention_email_disabled_user_ids' and
'pm_mention_push_disabled_user_ids' attributes of the
'RecipientInfoResult' dataclass.
'pm' and 'dm' are the acronyms for 'private message' and
'direct message' respectively.
It includes 'TODO/compatibility' code to support the old format
fields in the tornado queues during the Zulip server upgrades.
This commit renames the 'PRIVATE_MESSAGE' attribute of the
'NotificationTriggers' class to 'DIRECT_MESSAGE'.
Custom migration to update the existing value in the database.
It includes 'TODO/compatibility' code to support the old
notification trigger value 'private_message' in the
push notification queue during the Zulip server upgrades.
Earlier 'private_message' was one of the possible values for the
'trigger' property of the '[`POST /zulip-outgoing-webhook`]' response;
Update the docs to reflect the change in the above-mentioned trigger
value.
This commit adds 'TODO/compatibility' code to support the
old notification trigger values in the push notification queue
during the Zulip server upgrades.
In f4fa82e, we renamed the following notification triggers:
* 'wildcard_mentioned' to 'stream_wildcard_mentioned'
* 'followed_topic_wildcard_mentioned' to
'stream_wildcard_mentioned_in_followed_topic'.
This should have been added in f4fa82e.
This commit adds code to pass all the required arguments to
select_related call for Message objects such that only the
required related fields are fetched from the database.
Previously, we did not pass any arguments to select_related,
so all the directly and indirectly related fields were fetched
when many of them were actually not being used and made the
query unnecessarily complex.
This commit adds code to pass all the required arguments to
select_related call for Message objects such that only the
required related fields are fetched from the database.
Previously, we did not pass any arguments to select_related,
so all the directly and indirectly related fields were fetched
when many of them were actually not being used and made the
query unnecessarily complex.
This commit adds code to pass all the required arguments to
select_related call for Subscription objects such that only
the required related fields are fetched from the database.
Previously, we did not pass any arguments to select_related,
so all the directly and indirectly related fields were fetched
when many of them were actually not being used and made the
query unnecessarily complex.
This commit adds code to pass all the required arguments to
select_related call for MissedMessageEmailAddress such that
only the required related fields are fetched from the database.
Previously, we did not pass any arguments to select_related,
so all the directly and indirectly related fields were fetched
when many of them were actually not being used and made the
query unnecessarily complex.
We only need ID of the recipient and not the full object, so we
directly access ID using "stream.recipient_id" instead of using
the complete recipient object.
This commit removes get_huddle_recipient function and we now use
get_or_create_huddle in get_recipient_from_user_profiles.
As a result of this change, we do not fetch the recipient from
Huddle object but instead get it using the "id" and "recipient_id"
fields available from Huddle object like we do for a personal
message. This change allows us to not fetch recipient object
using select_related when querying the Huddle object.
We now fetch recipient object when querying "Huddle" object in
get_or_create_huddle_backend as this query is eventually used
to get the recipient object only in get_huddle_recipient.
This commit also updates the select_related call in the code to
populate Huddle objects in cache to pass "Recipient" as argument.
Previously no argument was passed to select_related and thus no
related objects were being fetched, with no non-null related fields
being present.
- Adds instructions block and relative link to Starred messages.
- Adds "Toggle starred messages counter" subheading.
- Adds "Searching for messages" as a related article.
The set of objects in the `users` object can be very large (in some
cases, literally every object in the database) and making them into a
giant `id in (...)` to handle the one tiny corner case which we never
use is silly.
Switch the `--users` codepath to returning a QuerySet as well, so it
can be composed. We pass a QuerySet into send_custom_email as well,
so it can ensure that the realm is `select_related` in as well, no
matter how the QuerySet was generated.
Substituting the rendered body via Jinja2 means that it cannot
perform any interpolation itself. While the string replacement is
hacky, it is the only solution which avoids running Jinja2 more than
once, and also allows the user-supplied content to have Jinja2
substitutions in it.
In this commit, we introduce a new option in the stream creation
UI - a 'Default stream for new users' checkbox. By default, the
checkbox is set to 'off' and is only visible to admins. This
allow admins to easily designate a stream as the default stream
for new users during stream creation.
Fixes#24048.
Adds a test for when a value for a user's custom profile field is
removed and not set to a new value. The omission of this event in
the tests was noted as a possibility in #22103, which updated the
API documentation for these events having `null` for the field
value.
When adding the test discovered that the events logic was not
deleting the field from the user object and instead setting it to
`None`, so fixes that logic as well. There was a similar bug fixed
in commit 96c61a1a41 for when custom profile fields are removed
from a realm.
When applying realm_user update events, some of the event fields
for the person object were being updated to the same value in a
loop. Unnests those calls from the loop over the existing fields
so that they are only updated once.
The original nesting was introduced in commit 649fccde6b and
was expanded in other additions to the logic for these events.
This commit adds code to pass stream traffic data using
the "stream_weekly_traffic" field in stream objects.
We already include the traffic data in Subscription objects,
but the traffic data does not depend on the user to stream
relationship and is stream-only information, so it's better
to include it in Stream objects. We may remove the traffic
data and other stream information fields for Subscription
objects in future.
This will help clients to correctly display the stream
traffic data in case where client receives a stream
creation event and no subscription event, for an already
existing stream which the user did not have access to before.
This commit changes the code to not use get_client_data
function and instead use `stream_to_dict` function to
get the stream data in a dictionary form. This is a
prep commit add stream traffic data to Stream objects.
This commit adds stream_to_dict method which is same as
Stream.to_dict method as of now. This is a prep commit
to include stream traffic data in stream objects.
I move the helper user_ids_to_users to the only
place that it's used, and then I simplify it to
do a direct database query.
These endpoints aren't hit often enough to justify
caching complexity, and for really large user groups,
hitting the cache can actually be counterproductive.
Particularly when you add new users to an existing
group, the bulk of the cost is sending out
notification messages to users.
The only change to the test is that I added an
assertion on the query count.
The cross-realm bots rarely change, and there are only
a few of them, so we just query them all at once and
put them in the cache.
Also, we put the dictionaries in the cache, instead of
the user objects, since there is nothing time-sensitive
about the dictionaries, and they are small. This saves
us a little time computing the avatar url and things
like that, not to mention marshalling costs.
This commit also fixes a theoretical bug where we would
have stale cache entries if somebody somehow modified
the cross-realm bots without bumping KEY_PREFIX.
Internally we no longer pre-fetch the realm objects for
the bots, but we don't get overly precise about picking
individual fields from UserProfile, since we rarely hit
the database and since we don't store raw ORM objects
in the cache.
The test diffs make it look like we are hitting the
cache an extra time, but the tests weren't counting
bulk fetches. Now we only use a single key for all
bots rather a key per bot.
The bulk_get_users() function was only being used to
get cross-realm bots.
It appears that it was introduced in
f02e5b90f6 for that
specific use case.
Now we make the function more specific and test it more
accurately.
We also eliminate a lot of janky code and comments,
including some code that never had test coverage.
Incidentally, it appears that we did not have any code
to invalidate the cache keys here, and that is still
the case. In practice I assume people rarely
re-configure their cross-realm bots unless they are
upgrading the server, and then KEY_PREFIX comes into
play. 25fd4c5508 seems
to have caused that hopefully harmless regression.
A further step will be to make this cache more coarse,
since there are only a few cross-realm bots. The next
commit will hopefully simplify the code and address the
validation pitfall.
Earlier the API endpoints related to user_group accepts and returns a
field `can_mention_group_id` which represents the ID
of user_group whose members can mention the group.
This commit renames this field to `can_mention_group`.
Earlier the API endpoints related to streams accepts and returns a
field `can_remove_subscribers_group_id` which represents the ID
of user_group whose members can remove subscribers from stream.
This commit renames this field to `can_remove_subscribers_group`.
Dropping support for url_prefix for RealmPlayground, the server now uses
url_template instead only for playground creation, retrieval and audit
logging upon removal.
This does the necessary handling so that url_template is expanded with
the extracted code.
Fixes#25723.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This commit removes the stray strings used to refer to
various types of notification triggers.
We use the attributes of the 'NotificationTriggers' class instead.
Updates the realm field `default_code_block_language` to have a default
value of an empty string instead of None. Also updates the web-app to
check for the empty string and not `null` to indicate no default is set.
This means that both new realms and existing realms that have no default
set will have the same value for this setting: an empty string.
Previously, new realms would have None if no default was set, while realms
that had set and then unset a value for this field would have an empty
string when no default was set.
Expands support for the message ID operand for id" operator to be either
a string or an integer. Previously, this operand was always validated as
a string.
Restore the default django.utils.log.AdminEmailHandler when
ERROR_REPORTING is enabled. Those with more sophisticated needs can
turn it off and use Sentry or a Sentry-compatible system.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
We used to access the complete objects for UserProfile foreign
keys like "bot_owner" and "default_sending_stream", where we only
needed ID of them.
This commit fixes some of such instances and now we directly get
the id using "bot_owner_id" and "default_sending_stream_id" so
that we can avoid the unnecessary complexity of accessing the
complete object.
This commit updates code to pass "realm" and "bot_owner" args to
select_related call in get_users. We pass "realm" and "bot_owner"
args to get_users because the caches which this function is used
to populate are used for get_user and get_user_profile_by_api_key
functions and they also select both these fields when querying for
UserProfile objects.
This commit updates the select_related calls in queries to
get UserProfile objects in get_user function called in
management commands to pass "realm" as argument to
select_related call.
There are some management commands like deactivate_user,
change_full_name, etc. which might need fields like
"default_sending_stream" when changing full name of a bot
or something similar, but we don't think that would happen
often and we can afford to have a DB round trip to get
these fields if needed.
Also, note that "realm" is the only non-null foreign key
field in UserProfile object, so select_related() was only
fetching realm object previously as well. But we should
still pass "realm" as argument in select_related call so
that we can make sure that only required fields are
selected in case we add more foreign keys to UserProfile
in future.
This commit updates select_related call to pass "realm" as
argument in select_related call in fetch_users_by_id function
as we only require realm for the UserProfile objects fetched
using fetch_users_by_id.
Also, note that "realm" is the only non-null foreign key field
in UserProfile object, so select_related() was only fetching
realm object previously as well. But we should still pass "realm"
as argument in select_related call so that we can make sure that
only required fields are selected in case we add more foreign
keys to UserProfile in future.
We do not use any related fields for the UserProfile objects
fetched by get_active_users, so we can simply remove the
select_related call.
The user object from get_active_users was used to get realm
but since get_active_users called from a realm object we can
directly use that realm object. This change also leads to
some changes in the cache code where we now pass the realm
to the function instead of selecting it from UserProfile object.
This commit removes select_related call from
get_soft_deactivated_users_for_catch_up as
we do not use any related fields for the
UserProfile objects fetched using this call.
Uploads are well-positioned to use S3's "intelligent tiering" storage
class. Add a setting to let uploaded files to declare their desired
storage class at upload time, and document how to move existing files
to the same storage class.
‘blocklist’ was added in 0.0.35 (with backwards compatibility for the
old name), and type annotations were added in 0.0.91 (with only the
new name).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This code removes a lot of complexity with very likely
positive overall impact on system performance and
negligible downside.
We already cache display recipients on a per-user
level, so there's no need for another cache layer on
top of that that keys them with recipient ids.
We avoid strange things where Alice/Bob and Bob/Charlie
get put into the top layer cache and then we still have
a cache miss on Alice/Charlie despite the lower level
cache being able to support per-user lookups.
This change does introduce an extra database round trip
if any of our messages have a huddle, but the query is
extremely cheap, and we can always try to cache that
function more directly or try to re-use some of our
other huddle-based caches.
As part of this, we clean up the names for the
lower-level per-user cache of display recipients, and
we simplify the cache keys.
We also stop passing in a full Recipient object to the
`bulk_get_huddle_user_ids` functions.
The local impact of this change should be easy to
measure (at least approximately), since we use this
function every time a user gets messages via the
/messages endpoint.
The only overlap between how we fetched streams and
users was to share some really complicated data
structures.
We can also short-circuit some logic if a message
batch is either all-stream or all-DM.
We restrict the columns, avoid quadratic looping,
and don't bother with order_by.
We also return the user ids (per recipient) as
sets, since that's how the only caller uses the
info (albeit implicitly via set.union accepting
a list).
It’s unclear what was supposed to be “safe” about this wrapper. The
hashlib API is fine without it, and we don’t want to encourage further
use of SHA-1.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Translators benefit from the extra information in the field names, and
need the reordering freedom that isn’t available with multiple
positional fields.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
- Adds `message_handle_match` function to handle new pattern for
relative help links to "Drafts" and "Scheduled messages" for logged-in
users: `{relative|message|drafts}` and `{relative|message|scheduled}`.
`remove_denormalized_recipient_column_from_data` removes the
`recipient` data from `zerver_userprofile`, but did not remove it from
`zerver_userprofile_mirrordummy`, which was later appended to the list
of `zerver_userprofile` objects. This led to failure when inserting,
as the mirrordummy objects still tried to reference their previous
`recipient_id`s.
Move the merging of the two sets earlier, before we call
`remove_denormalized_recipient_column_from_data`.