The query plan for fetching recent messages from the arbitrary set of
streams formed by the intersection of 30 random users can be quite
bad, and can descend into a sequential scan on `zerver_recipient`.
Worse, this work of pulling recent messages out is redone if the
stream appears in the next batch of 30 users.
Instead, pull the recent messages for a stream on a one-by-one basis,
but cache them in an in-memory cache. Since digests are enqueued in
30-user batches but still one-realm-at-a-time, work will be saved both
in terms of faster query plans whose results can also be reused across
batches.
This requires that we pull the stream-id to stream-name mapping for
_all_ streams in the realm at once, but that is well-indexed and
unlikely to cause performance issues -- in fact, it may be faster
than pulling a random subset of the streams in the realm.
The type annotation for functools.partial uses unchecked Any for all
the function parameters (both early and late). returns.curry.partial
uses a mypy plugin to check the parameters safely.
https://returns.readthedocs.io/en/latest/pages/curry.html
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Matching the topic exactly, as opposed to case-insensitively, is not a
common operation, and one that we want to make difficult to do
accidentally. Inline the single use case of it.
This algorithm existed in multiple places, with different queries.
Since we only access properties in the UserMessage table, we
standardize on the much simpler and faster Index Only Scan, rather
than a merge join.
When searching for links inside a topic name, the question mark (?)
was used to split the topic. If a URL had a query after the URL
(e.g., "?foo=bar"), then the query was trimmed from the URL.
Removing the question mark from `basic_link_splitter` is sufficient
to fix this issue. The `get_web_link_regex` function then removes
the trailing punctuation if any, including literal question marks.
Fixes#26368.
When there was no space right after `/todo` but there was content on a
new line, the message would be rendered plainly, not as a todo widget.
This was because we split on only the space character to then check if
the first token was a valid widget.
Now we split on both spaces and newlines to extract the widget name,
irrespective of whether it is followed by a space or a newline. This
results in the message being rendered as a todo widget as expected.
Rename existing shortened references to demo organizations, like
`is_demo_org` or `demo-org-warning`, that have been used in the
codebase so far and replace them to be like the `models.py`
variable: `Realm.demo_organization_scheduled_deletion_date`.
This REDOS was not exploitable, as its content is only read from
checked-in files; regardless, simplify it to not backtrack. We also
do not actually have any location which use leading or trailing
whitespace, so remove those optional bits.
This function is used by almost all webhooks.
To support it, we use the "api_ignore_parameter" flag so that positional
arguments like topic and body that are not intended to be parsed from
the request can be ignored.
This demonstrates the use of BaseModel to replace a check_dict_only
validator.
We also add support to referring to $defs in the OpenAPI tests. In the
future, we can descend down each object instead of mapping them to dict
for more accurate checks.
We want to reject ambiguous type annotations that set ApiParamConfig
inside a Union. If a parameter is Optional and has a default of None, we
prefer Annotated[Optional[T], ...] over Optional[Annotated[T, ...]].
This implements a check that detects Optional[Annotated[T, ...]] and
raise an assertion error if ApiParamConfig is in the annotation. It also
checks if the type annotation contains any ApiParamConfig objects that
are ignored, which can happen if the Annotated type is nested inside
another type like List, Union, etc.
Note that because
param: Annotated[Optional[T], ...] = None
and
param: Optional[Annotated[Optional[T], ...]] = None
are equivalent in runtime prior to Python 3.11, there is no way for us
to distinguish the two. So we cannot detect that in runtime.
See also: https://github.com/python/cpython/issues/90353
The goal of typed_endpoint is to replicate most features supported by
has_request_variables, and to improve on top of it. There are some
unresolved issues that we don't plan to work on currently. For example,
typed_endpoint does not support ignored_parameters_supported for 400
responses, and it does not run validators on path-only arguments.
Unlike has_request_variables, typed_endpoint supports error handling by
processing validation errors from Pydantic.
Most features supported by has_request_variables are supported by
typed_endpoint in various ways.
To define a function, use a syntax like this with Annotated if there is
any metadata you want to associate with a parameter, do note that
parameters that are not keyword-only are ignored from the request:
```
@typed_endpoint
def view(
request: HttpRequest,
user_profile: UserProfile,
*,
foo: Annotated[int, ApiParamConfig(path_only=True)],
bar: Json[int],
other: Annotated[
Json[int],
ApiParamConfig(
whence="lorem",
documentation_status=NTENTIONALLY_UNDOCUMENTED
)
] = 10,
) -> HttpResponse:
....
```
There are also some shorthands for the commonly used annotated types,
which are encouraged when applicable for better readability and less
typing:
```
WebhookPayload = Annotated[Json[T], ApiParamConfig(argument_type_is_body=True)]
PathOnly = Annotated[T, ApiParamConfig(path_only=True)]
```
Then the view function above can be rewritten as:
```
@typed_endpoint
def view(
request: HttpRequest,
user_profile: UserProfile,
*,
foo: PathOnly[int],
bar: Json[int],
other: Annotated[
Json[int],
ApiParamConfig(
whence="lorem",
documentation_status=INTENTIONALLY_UNDOCUMENTED
)
] = 10,
) -> HttpResponse:
....
```
There are some intentional restrictions:
- A single parameter cannot have more than one ApiParamConfig
- Path-only parameters cannot have default values
- argument_type_is_body is incompatible with whence
- Arguments of name "request", "user_profile", "args", and "kwargs" and
etc. are ignored by typed_endpoint.
- positional-only arguments are not supported by typed_endpoint. Only
keyword-only parameters are expected to be parsed from the request.
- Pydantic's strict mode is always enabled, because we don't want to
coerce input parsed from JSON into other types unnecessarily.
- Using strict mode all the time also means that we should always use
Json[int] instead of int, because it is only possible for the request
to have data of type str, and a type annotation of int will always
reject such data.
typed_endpoint's handling of ignored_parameters_unsupported is mostly
identical to that of has_request_variables.
_default_manager is the same as objects on most of our models. But
when a model class is stored in a variable, the type system doesn’t
know which model the variable is referring to, so it can’t know that
objects even exists (Django doesn’t add it if the user added a custom
manager of a different name). django-stubs used to incorrectly assume
it exists unconditionally, but it no longer does.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This is important because the "guests" value isn't one that we'd
expect anyone to pick intentionally, and in particular isn't an
available option for the similar/adjacent "email invitations" setting.
This commit rename the existing setting `Who can invite users to this
organization` to `Who can send email invitations to new users` and
also renames all the variables related to this setting that do not
require a change to the API.
This was done for better code readability as a new setting
`Who can create invite links` will be added in future commits.
This commit does the backend changes required for adding a realm
setting based on groups permission model and does the API changes
required for the new setting `Who can create multiuse invite link`.
This commit adds id_field_name field to GroupPermissionSetting
type which will be used to store the string formed by concatenation
of setting_name and `_id`.
This commit makes the database changes while creating internal_realm
to be done in a single transaction.
This is needed for deferring the foreign key constraints
to the end of transaction.
This commit removes the 'alert' field from the payload for
Android via GCM/FCM.
The alert strings generated do not get used at all and have
not been used since at least 2019. On Android, we construct
the notification UI ourselves in the client, and we ignore
the alert string.
To make creation of demo organizations feel lightweight for users,
we do not want to require an email address at sign-up. Instead an
empty string will used for the new realm owner's email. Currently
implements that for new demo organizations in the development
environment.
Because the user's email address does not exist, we don't enqueue
any of the welcome emails upon account/realm creation, and we
don't create/send new login emails.
This is a part of #19523.
Co-authored by: Tim Abbott <tabbott@zulip.com>
Co-authored by: Lauryn Menard <lauryn@zulip.com>
The 'startinline' option is utilized in the `CodeHilite` instantiation
to indicate that the provided PHP code snippet should be highlighted
even if it doesn't start with the opening tag or marker of the
associated programming language (which will rarely be the case in
Zulip, since one discusses a section of a file much more often than a
whole file).
Fixes: https://chat.zulip.org/#narrow/stream/137-feedback/topic/php.20syntax.20highlighting.20should.20not.20require.20.60.3C.3Fphp.60.
Signed-off-by: Akshat <akshat25iiit@gmail.com>
This commit updates the 'get_apns_alert_subtitle' function to
return a common subtitle, i.e., "{full_name} mentioned everyone:"
for wildcard mentions.
The triggers for the stream or topic wildcard mentions include:
* NotificationTriggers.TOPIC_WILDCARD_MENTION_IN_FOLLOWED_TOPIC
* NotificationTriggers.STREAM_WILDCARD_MENTION_IN_FOLLOWED_TOPIC
* NotificationTriggers.TOPIC_WILDCARD_MENTION
* NotificationTriggers.STREAM_WILDCARD_MENTION
We now send stream creation and stream deletion events on
changing a user's role because a user can gain or lose
access to some streams on changing their role.
This commit extracts the code which queries the required streams
to a new function "get_user_streams". The new functions returns
the list of "Stream" object and not dictionaries and then
do_get_streams function converts it into list of dictionaries.
This change is important because we would use the new function
in further commit where we want list of "Stream" objects and
not list of dictionaries.
There was a bug in apply_event code where only a stream which
is not private is added to the "never_subscribed" data after
a stream creation event. Instead, it should be added to the
"never_subscribed" data irrespective of permission policy of
the stream as we already send stream creation events only to
those users who can access the stream. Due to the current
bug, private streams were not being added to "never_subscribed"
data in apply_event for admins as well. This commit fixes it
and also makes sure the "never_subscribed" list is sorted
which was not done before and was also a bug.
The bugs mentioned above were unnoticed as the tests did not
cover these cases and this commit also adds tests for those
cases.
The "streams" field in "/register" response did not include web-public
streams for non-admin users but the data for those are eventually
included in the subscriptions data sent using "subscriptions",
"unsubscribed" and "never_subscribed" fields.
This commit adds code to include the web-public streams in "streams"
field as well as everyone can access those and will make the "streams"
data complete.
The name and docstring were just wrong, having a UserMessage row isn't
sufficient for having message access and is actually only relevant in a
private stream with private history. The function is only used in a
single place anyway, in bulk_access_messages.
The comment mentioning this function in handle_remove_push_notification
can be tweaked to just not mention any function specifically and just
say why we're not checking message access.
Users who used to be subscribed to a private stream and have been
removed from it since retain the ability to edit messages/topics, and
delete messages that they used to have access to, if other relevant
organization permissions allow these actions. For example, a user may be
able to edit or delete their old messages they posted in such a private
stream. An administrator will be able to delete old messages (that they
had access to) from the private stream.
We fix this by fixing the logic in has_message_access (which lies at the
core of our message access checks - access_message() and
bulk_access_messages())
to not rely on only a UserMessage row for checking access but also
verify stream type and subscription status.
**Background**
User groups are expected to comply with the DAG constraint for the
many-to-many inter-group membership. The check for this constraint has
to be performed recursively so that we can find all direct and indirect
subgroups of the user group to be added.
This kind of check is vulnerable to phantom reads which is possible at
the default read committed isolation level because we cannot guarantee
that the check is still valid when we are adding the subgroups to the
user group.
**Solution**
To avoid having another transaction concurrently update one of the
to-be-subgroup after the recursive check is done, and before the subgroup
is added, we use SELECT FOR UPDATE to lock the user group rows.
The lock needs to be acquired before a group membership change is about
to occur before any check has been conducted.
Suppose that we are adding subgroup B to supergroup A, the locking protocol
is specified as follows:
1. Acquire a lock for B and all its direct and indirect subgroups.
2. Acquire a lock for A.
For the removal of user groups, we acquire a lock for the user group to
be removed with all its direct and indirect subgroups. This is the special
case A=B, which is still complaint with the protocol.
**Error handling**
We currently rely on Postgres' deadlock detection to abort transactions
and show an error for the users. In the future, we might need some
recovery mechanism or at least better error handling.
**Notes**
An important note is that we need to reuse the recursive CTE query that
finds the direct and indirect subgroups when applying the lock on the
rows. And the lock needs to be acquired the same way for the addition and
removal of direct subgroups.
User membership change (as opposed to user group membership) is not
affected. Read-only queries aren't either. The locks only protect
critical regions where the user group dependency graph might violate
the DAG constraint, where users are not participating.
**Testing**
We implement a transaction test case targeting some typical scenarios
when an internal server error is expected to happen (this means that the
user group view makes the correct decision to abort the transaction when
something goes wrong with locks).
To achieve this, we add a development view intended only for unit tests.
It has a global BARRIER that can be shared across threads, so that we
can synchronize them to consistently reproduce certain potential race
conditions prevented by the database locks.
The transaction test case lanuches pairs of threads initiating possibly
conflicting requests at the same time. The tests are set up such that exactly N
of them are expected to succeed with a certain error message (while we don't
know each one).
**Security notes**
get_recursive_subgroups_for_groups will no longer fetch user groups from
other realms. As a result, trying to add/remove a subgroup from another
realm results in a UserGroup not found error response.
We also implement subgroup-specific checks in has_user_group_access to
keep permission managing in a single place. Do note that the API
currently don't have a way to violate that check because we are only
checking the realm ID now.
We want to make the callers be more explicit about the use of the
user group being accessed, so that the later implemented database lock
can be benefited from the visibility.
Adds typing notification constants to the response given by
`POST /register`. Until now, these were hardcoded by clients
based on the documentation for implementing typing notifications
in the main endpoint description for `api/set-typing-status`.
This change also reflects updating the web-app frontend code
to use the new constants from the register response.
Co-authored-by: Samuel Kabuya <samuel.mwangikabuya@kibo.school>
Co-authored-by: Wilhelmina Asante <wilhelmina.asante@kibo.school>
Fixes#11767.
Previously multi-character emoji sequences weren't matched in the
emoji regex, so we'd convert the characters to separate images,
breaking the intended display.
This change allows us to match the full emoji sequence, and
therefore show the correct image.
We have modified the code to directly fetch realm from Message
object instead of "sender" field and thus we no longer need to
fetch "sender__realm" using select_related.
We do not want to access realm from "sender" field so that
we do not need to pass "sender__realm" argument to
select_related call when querying messages. We can instead
pass realm as argument to wildcard_mention_allowed.
We can directly get the realm object from Message object now
and there is no need to get the realm object from "sender"
field of Message object.
After this change, we would not need to fetch "sender__realm"
field using "select_related" and instead only passing "realm"
to select_related when querying Message objects would be enough.
This commit also updates a couple of cases to directly access
realm ID from message object and not message.sender. Although
we have fetched sender object already, so accessing realm_id
from message directly or from message.sender should not matter,
but we can be consistent to directly get realm from Message
object whenever possible.
We set stream_weekly_traffic field to "null" for Subscription
objects in zephyr mirror realm as we do not need stream traffic
data in zephyr mirror realm. This makes the subscription data
consistent with steams data.
This commit also udpates test to check never_subscribed data for
zephyr mirror realm.
Instead of having a "realm.is_zephyr_mirror_realm" check for every
get_streams_traffic call, this commit udpates get_streams_traffic to
accept realm as parameter and return "None" for zephyr mirror realm.
The "followup_day2" email template name is not clear or descriptive
about the purpose of the email. Creates a duplicate of those email
template files with the template name "zulip_onboarding_topics".
Because any existing scheduled emails that use the "followup_day2"
templates will need to be updated before the current templates can
be removed, we don't do a simple file rename here.
The "followup_day1" email template name is not clear or descriptive
about the purpose of the email. Creates a duplicate of those email
template files with the template name "account_registered".
Because any existing scheduled emails that use the "followup_day1"
templates will need to be updated before the current templates can
be removed, we don't do a simple file rename here.
This migration applies under the assumption that extra_data_json has
been populated for all existing and coming audit log entries.
- This removes the manual conversions back and forth for extra_data
throughout the codebase including the orjson.loads(), orjson.dumps(),
and str() calls.
- The custom handler used for converting Decimal is removed since
DjangoJSONEncoder handles that for extra_data.
- We remove None-checks for extra_data because it is now no longer
nullable.
- Meanwhile, we want the bouncer to support processing RealmAuditLog entries for
remote servers before and after the JSONField migration on extra_data.
- Since now extra_data should always be a dict for the newer remote
server, which is now migrated, the test cases are updated to create
RealmAuditLog objects by passing a dict for extra_data before
sending over the analytics data. Note that while JSONField allows for
non-dict values, a proper remote server always passes a dict for
extra_data.
- We still test out the legacy extra_data format because not all
remote servers have migrated to use JSONField extra_data.
This verifies that support for extra_data being a string or None has not
been dropped.
Co-authored-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
We remove bot's subscriptions for private streams to which the
new owner is not subscribed and keep the ones to which the new
owner is subscribed on changing owner.
This commit also changes the code for sending subscription
remove events to use transaction.on_commit since we call
the function inside a transactopn in do_change_bot_owner and
this also requires some changes in tests in test_events.
For topic wildcard mentions, the 'wildcard_mentioned' flag is set
for those user messages having 'user_profile_id' in
'topic_participant_user_ids', i.e. all topic participants.
Earlier, the flag was set if the 'user_profile_id' exists in
'all_topic_wildcard_mention_user_ids'.
'all_topic_wildcard_mention_user_ids' contains the ids of those
users who are topic participants and have enabled notifications
for '@topic' mentions.
The earlier approach was incorrect, as it would set the
'wildcard_mentioned' flag only for those topic participants
who have enabled the notifications for '@topic' mention instead
of setting the flag for all the topic participants.
The bug was introduced in 4c9d26c.
This commit addresses the issue where the topic highlighting
in search results was offset by one character when an
apostrophe was present. The problem stemmed from the disparity
in HTML escaping generated by the function `func.escape_html` which
is used to obtain `topic_matches` differs from the escaping performed
by the function `django.utils.html.escape` for apostrophes (').
func.escape_html | django.utils.html.escape
-----------------+--------------------------
' | '
To fix this SQL query is changed to return the HTML-escaped
topic name generated by the function `func.escape_html`.
Fixes: #25633.
Currently, we are displaying the "Complete the organization profile"
banner immediately after the organization was created. It's important to
strongly encourage orgs to configure their profile, so we should delay
showing the banner if the profile has not been configured after 15 days.
Thus also allows the users to check out Zulip and see how it works before
configuring the organization settings.
Fixes: #24122.
The number of affected objects may be quite high, and they are
selected by `id IN (...)` query, and updated with a giant `CASE`.
This turns out to be quadratic, and can cause large queries to take
hours, in a state where they cannot be terminated, when PostgreSQL >11
tries to JIT the query.
Set a batch_size as a stopgap performance fix before moving to
`.update()` as a real fix.
We have historically cached two types of values
on a per-request basis inside of memory:
* linkifiers
* display recipients
Both of these caches were hand-written, and they
both actually cache values that are also in memcached,
so the per-request cache essentially only saves us
from a few memcached hits.
I think the linkifier per-request cache is a necessary
evil. It's an important part of message rendering, and
it's not super easy to structure the code to just get
a single value up front and pass it down the stack.
I'm not so sure we even need the display recipient
per-request cache any more, as we are generally pretty
smart now about hydrating recipient data in terms of
how the code is organized. But I haven't done thorough
research on that hypotheseis.
Fortunately, it's not rocket science to just write
a glorified memoize decorator and tie it into key
places in the code:
* middleware
* tests (e.g. asserting db counts)
* queue processors
That's what I did in this commit.
This commit definitely reduces the amount of code
to maintain. I think it also gets us closer to
possibly phasing out this whole technique, but that
effort is beyond the scope of this PR. We could
add some instrumentation to the decorator to see
how often we get a non-trivial number of saved
round trips to memcached.
Note that when we flush linkifiers, we just use
a big hammer and flush the entire per-request
cache for linkifiers, since there is only ever
one realm in the cache.
We want to phase out the use of get_display_recipient
for streams, and this is the last place that I
eliminate it. The next commit will eliminate the
dead code and make mypy types tighter.
This change will make push notifications slightly
slower in some situations, but we avoid all the
complexity of a cache, and this code tends to run
offline.
We could always make this code a bit more efficient
by being a little smarter about what data we fetch
up front. For example, get_apns_alert_title gets
called by a function that already has the stream
name. It's just a bit of a pain to refactor when
you have all the DM codepath mucked up with the
stream codepath.
There's no need for the complexity and extra round
trips to call get_display_recipient in a testing
context.
We also eliminate the unnecessary call to check_string.
This function is poorly named, but that's a sweep
for another day.
The get_display_recipient helper is a clumsy way to get
stream names, and it's not even representative of how
most of our code retrieves stream names.
The new helper also double-checks that the Stream
object has the correct recipient id.
We no longer have to reason about the 12 possible
ways of invoking get_narrow_url. We also avoid
double computation in a couple places.
Finally, we get stricter type checks by just inlining
the calls.
We don't need to call get_display_recipient for
non-stream messages.
I will rename display_recipient in the next commit;
if I were to combine the steps the diff would be too
hard to read.
This commit renames the keyword 'pm' to 'dm' in the
'pm_mention_email_disabled_user_ids' and
'pm_mention_push_disabled_user_ids' attributes of the
'RecipientInfoResult' dataclass.
'pm' and 'dm' are the acronyms for 'private message' and
'direct message' respectively.
It includes 'TODO/compatibility' code to support the old format
fields in the tornado queues during the Zulip server upgrades.