Rather than fetch all UserMessage rows for all streams, and subtract
those out in Python-space from the list of all Message rows the user
may have received -- do this via a "NOT EXISTS" subquery. This is
much better indexed (performing in fractions of milliseconds rather
than hundreds), and also consumes much less memory.
This kind of payload that's loaded from json in the body of the request
is not only used for webhooks, but also in the push bouncer, and may get
used elsewhere too - so a general name is better.
Earlier, 'is_row_muted' returned 'true' if the message was in
a muted stream or muted topic.
If the message is in an unmuted or followed topic in a muted
stream, such topics should be treated as not muted topics
in an unmuted stream.
This commit fixes the incorrect behavior.
Now, for wildcard mentions, 'unread_msgs.mentions' exclude
the IDs in muted streams only if the message is in default or
muted topic.
Also, 'unread_msgs.count' takes into account the unreads in unmuted
or followed topics in muted streams too.
Documents that this bug was fixed in the API changelog.
Update 'get_muted_stream_ids' to return a set of IDs
instead of a list.
This will help to avoid linear time search operations later
while using 'if stream_id in muted_streams_ids'.
This prep commit renames the 'build_topic_mute_checker' function
to 'build_get_topic_visibility_policy' and updates it to support
all the visibility policies.
The function prefetches the visibility policies the user has
configured for various topics and prepares a dict named
'topic_to_visibility_policy' to be used later on.
These queries benefit from the increased specificity of using the
realm / recipient / sender indexes. The argument from 11a1cb9630
does not apply in these cases, since there are only 2 usermessage rows
for each matching message row for DMs, and few more than that for
huddles.
This query has two halves; messages set by the user, and messages
received by the user. The former uses the already-specific
usermessage privatemessage flag index; the latter relies on the
recipient index on messages.
Add the realm_id to the latter half, so that the recipient_id is
paired with the realm_id.
Clarifies that the `all` field in the `op: "add"` event is only
relevant for the `"read"` message flag, and that it will be false
for all other specified flags in theses events.
Deprecates the `all` field in the `op: "remove"` event and document
that it is false for all specified flags.
Updates the deprecated `operation` field description and makes
a few other small revisions to the event text for clarity and
accuracy.
This commit adds a `jitsi_server_url` field to the Realm model, which
will be used to save the URL of the custom Jitsi Meet server. In
the database, `None` will encode the server-level default. We can't
readily use `None` in the API, as it could be confused with "field not
sent". Therefore, we will use the string "default" for this purpose.
We have also introduced `server_jitsi_server_url` in the `/register`
API. This will be used to display the server's default Jitsi server
URL in the settings UI.
The existing `jitsi_server_url` will now be calculated as
`realm_jitsi_server_url || server_jitsi_server_url`.
Fixes a part of #17914.
Co-authored-by: Gaurav Pandey <gauravguitarrocks@gmail.com>
The unique index on `(user_id, message_id)` that is the
`zerver_usermessage` table is rather specific, and even the PostgreSQL
extended statistics are not enough for it to realize there is a
correlation between the `realm_id` in the message table and the
`user_id` in the usermessage table. This means that adding the
`realm_id` limit when there is a join to `zerver_usermessage` flips
the query plan from a nested loop of unique usermessage index-only
scan, with an index scan of the messages pkey -- to a parallel hash
join of the messages limit with a index scan of just the user_id limit
on usermessages. It thinks this is necessary because it thinks that
the `realm_id` limit may remove a large number of messages from the
usermessage set -- which is totally untrue.
Remove the `realm_id` limit if we have a usermessage join.
This endpoint verifies that the services that Zulip needs to function
are running, and Django can talk to them. It is designed to be used
as a readiness probe[^1] for Zulip, either by Kubernetes, or some other
reverse-proxy load-balancer in front of Zulip. Because of this, it
limits access to only localhost and the IP addresses of configured
reverse proxies.
Tests are limited because we cannot stop running services (which would
impact other concurrent tests) and there would be extremely limited
utility to mocking the very specific methods we're calling to raising
the exceptions that we're looking for.
[^1]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
The `expected` flag was incredibly confusing, as you
couldn't tell from the calling code what you were
actually expecting to happen.
I avoid the context manager idiom in order to force
the callers to create simple helper functions, and
I de-duplicate some code in some places.
I also force the caller to explicitly soft-deactivate
the user with one simple line of code, so that the
person reading the test doesn't have to research
the side effects of the helper. (And I make it
very easy for new authors to follow the practice
going forward.)
This is also somewhat of a prep commit to avoid
the obfuscated use of refresh_from_db.
The get_user function is poorly named, but I don't want to
sweep the entire codebase yet.
It's also nice to have a test wrapper for little experiments
like profiling tests or hunting down calls to refresh_from_db.
It's possible that we would also just change the new wrapper
to more directly call Django. The `get_user` function isn't
used in a ton of real-world places, so we might want the test
code to just bypass the cache.
I add a bunch of cute helper methods to make
the test a bit more readable.
And then I make sure to get clean objects,
which precludes the need for our callback
functions to refresh the user objects.
And finally I make sure that our validation
functions don't cause any round trips (assuming
we have fetched objects using a standard
Zulip helper, which example_user ensures.)
In feature levels 153 and 154, a new value of "partially_completed"
for `result` in a success (HTTP status code 200) was added for two
endpoints that process messages in batches: /api/delete-topic and
/api/mark-all-as-read.
Prior to these changes, `result` was either "success" or "error" for
all responses, which was a useful API invariant to have for clients.
So, here we remove "partially_completed" as a potential value for
"result" in a response. And instead, for the two endpoints noted
above, we return a boolean field "complete" to indicate if the
response successfully deleted/marked as read all the targeted
messages (complete: true) or if only some of the targeted messages
were processed (complete: false).
The "code" field for an error string that was also returned as part
of a partially completed response is removed in these changes as
well.
The web app does not currently use the /api/mark-all-as-read
endpoint, but it does use the /api/delete-topic endpoint, so these
changes update that to check the `complete` boolean instead of the
string value for `result`.
This adds support for syncing user role via the newly added "role"
attribute, which can be set to either of
['owner', 'administrator', 'moderator', 'member', 'guest'].
Removes durable=True from the atomic decorator of do_change_user_role,
as django-scim2 runs PATCH operations in an atomic block.
Since the cache is flushed when the cutoff or realm changes, the
maximum size of the cache should cap out at the number of streams in
the realm. Raise the max cache size, now that this will not simply
lead to useless cache space for smaller servers.
There is now no longer any reason to have the scheduled_email
enqueuing wait until all of the users' contexts have been generated.
Switch to returning the contexts as an iterator, and send them as we
compute them.
The query plan for fetching recent messages from the arbitrary set of
streams formed by the intersection of 30 random users can be quite
bad, and can descend into a sequential scan on `zerver_recipient`.
Worse, this work of pulling recent messages out is redone if the
stream appears in the next batch of 30 users.
Instead, pull the recent messages for a stream on a one-by-one basis,
but cache them in an in-memory cache. Since digests are enqueued in
30-user batches but still one-realm-at-a-time, work will be saved both
in terms of faster query plans whose results can also be reused across
batches.
This requires that we pull the stream-id to stream-name mapping for
_all_ streams in the realm at once, but that is well-indexed and
unlikely to cause performance issues -- in fact, it may be faster
than pulling a random subset of the streams in the realm.
The type annotation for functools.partial uses unchecked Any for all
the function parameters (both early and late). returns.curry.partial
uses a mypy plugin to check the parameters safely.
https://returns.readthedocs.io/en/latest/pages/curry.html
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Matching the topic exactly, as opposed to case-insensitively, is not a
common operation, and one that we want to make difficult to do
accidentally. Inline the single use case of it.
This algorithm existed in multiple places, with different queries.
Since we only access properties in the UserMessage table, we
standardize on the much simpler and faster Index Only Scan, rather
than a merge join.
When searching for links inside a topic name, the question mark (?)
was used to split the topic. If a URL had a query after the URL
(e.g., "?foo=bar"), then the query was trimmed from the URL.
Removing the question mark from `basic_link_splitter` is sufficient
to fix this issue. The `get_web_link_regex` function then removes
the trailing punctuation if any, including literal question marks.
Fixes#26368.
When there was no space right after `/todo` but there was content on a
new line, the message would be rendered plainly, not as a todo widget.
This was because we split on only the space character to then check if
the first token was a valid widget.
Now we split on both spaces and newlines to extract the widget name,
irrespective of whether it is followed by a space or a newline. This
results in the message being rendered as a todo widget as expected.
Rename existing shortened references to demo organizations, like
`is_demo_org` or `demo-org-warning`, that have been used in the
codebase so far and replace them to be like the `models.py`
variable: `Realm.demo_organization_scheduled_deletion_date`.
This REDOS was not exploitable, as its content is only read from
checked-in files; regardless, simplify it to not backtrack. We also
do not actually have any location which use leading or trailing
whitespace, so remove those optional bits.
This function is used by almost all webhooks.
To support it, we use the "api_ignore_parameter" flag so that positional
arguments like topic and body that are not intended to be parsed from
the request can be ignored.
This demonstrates the use of BaseModel to replace a check_dict_only
validator.
We also add support to referring to $defs in the OpenAPI tests. In the
future, we can descend down each object instead of mapping them to dict
for more accurate checks.
We want to reject ambiguous type annotations that set ApiParamConfig
inside a Union. If a parameter is Optional and has a default of None, we
prefer Annotated[Optional[T], ...] over Optional[Annotated[T, ...]].
This implements a check that detects Optional[Annotated[T, ...]] and
raise an assertion error if ApiParamConfig is in the annotation. It also
checks if the type annotation contains any ApiParamConfig objects that
are ignored, which can happen if the Annotated type is nested inside
another type like List, Union, etc.
Note that because
param: Annotated[Optional[T], ...] = None
and
param: Optional[Annotated[Optional[T], ...]] = None
are equivalent in runtime prior to Python 3.11, there is no way for us
to distinguish the two. So we cannot detect that in runtime.
See also: https://github.com/python/cpython/issues/90353
The goal of typed_endpoint is to replicate most features supported by
has_request_variables, and to improve on top of it. There are some
unresolved issues that we don't plan to work on currently. For example,
typed_endpoint does not support ignored_parameters_supported for 400
responses, and it does not run validators on path-only arguments.
Unlike has_request_variables, typed_endpoint supports error handling by
processing validation errors from Pydantic.
Most features supported by has_request_variables are supported by
typed_endpoint in various ways.
To define a function, use a syntax like this with Annotated if there is
any metadata you want to associate with a parameter, do note that
parameters that are not keyword-only are ignored from the request:
```
@typed_endpoint
def view(
request: HttpRequest,
user_profile: UserProfile,
*,
foo: Annotated[int, ApiParamConfig(path_only=True)],
bar: Json[int],
other: Annotated[
Json[int],
ApiParamConfig(
whence="lorem",
documentation_status=NTENTIONALLY_UNDOCUMENTED
)
] = 10,
) -> HttpResponse:
....
```
There are also some shorthands for the commonly used annotated types,
which are encouraged when applicable for better readability and less
typing:
```
WebhookPayload = Annotated[Json[T], ApiParamConfig(argument_type_is_body=True)]
PathOnly = Annotated[T, ApiParamConfig(path_only=True)]
```
Then the view function above can be rewritten as:
```
@typed_endpoint
def view(
request: HttpRequest,
user_profile: UserProfile,
*,
foo: PathOnly[int],
bar: Json[int],
other: Annotated[
Json[int],
ApiParamConfig(
whence="lorem",
documentation_status=INTENTIONALLY_UNDOCUMENTED
)
] = 10,
) -> HttpResponse:
....
```
There are some intentional restrictions:
- A single parameter cannot have more than one ApiParamConfig
- Path-only parameters cannot have default values
- argument_type_is_body is incompatible with whence
- Arguments of name "request", "user_profile", "args", and "kwargs" and
etc. are ignored by typed_endpoint.
- positional-only arguments are not supported by typed_endpoint. Only
keyword-only parameters are expected to be parsed from the request.
- Pydantic's strict mode is always enabled, because we don't want to
coerce input parsed from JSON into other types unnecessarily.
- Using strict mode all the time also means that we should always use
Json[int] instead of int, because it is only possible for the request
to have data of type str, and a type annotation of int will always
reject such data.
typed_endpoint's handling of ignored_parameters_unsupported is mostly
identical to that of has_request_variables.
_default_manager is the same as objects on most of our models. But
when a model class is stored in a variable, the type system doesn’t
know which model the variable is referring to, so it can’t know that
objects even exists (Django doesn’t add it if the user added a custom
manager of a different name). django-stubs used to incorrectly assume
it exists unconditionally, but it no longer does.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This is important because the "guests" value isn't one that we'd
expect anyone to pick intentionally, and in particular isn't an
available option for the similar/adjacent "email invitations" setting.
This commit rename the existing setting `Who can invite users to this
organization` to `Who can send email invitations to new users` and
also renames all the variables related to this setting that do not
require a change to the API.
This was done for better code readability as a new setting
`Who can create invite links` will be added in future commits.
This commit does the backend changes required for adding a realm
setting based on groups permission model and does the API changes
required for the new setting `Who can create multiuse invite link`.
This commit adds id_field_name field to GroupPermissionSetting
type which will be used to store the string formed by concatenation
of setting_name and `_id`.
This commit makes the database changes while creating internal_realm
to be done in a single transaction.
This is needed for deferring the foreign key constraints
to the end of transaction.
This commit removes the 'alert' field from the payload for
Android via GCM/FCM.
The alert strings generated do not get used at all and have
not been used since at least 2019. On Android, we construct
the notification UI ourselves in the client, and we ignore
the alert string.
To make creation of demo organizations feel lightweight for users,
we do not want to require an email address at sign-up. Instead an
empty string will used for the new realm owner's email. Currently
implements that for new demo organizations in the development
environment.
Because the user's email address does not exist, we don't enqueue
any of the welcome emails upon account/realm creation, and we
don't create/send new login emails.
This is a part of #19523.
Co-authored by: Tim Abbott <tabbott@zulip.com>
Co-authored by: Lauryn Menard <lauryn@zulip.com>
The 'startinline' option is utilized in the `CodeHilite` instantiation
to indicate that the provided PHP code snippet should be highlighted
even if it doesn't start with the opening tag or marker of the
associated programming language (which will rarely be the case in
Zulip, since one discusses a section of a file much more often than a
whole file).
Fixes: https://chat.zulip.org/#narrow/stream/137-feedback/topic/php.20syntax.20highlighting.20should.20not.20require.20.60.3C.3Fphp.60.
Signed-off-by: Akshat <akshat25iiit@gmail.com>
This commit updates the 'get_apns_alert_subtitle' function to
return a common subtitle, i.e., "{full_name} mentioned everyone:"
for wildcard mentions.
The triggers for the stream or topic wildcard mentions include:
* NotificationTriggers.TOPIC_WILDCARD_MENTION_IN_FOLLOWED_TOPIC
* NotificationTriggers.STREAM_WILDCARD_MENTION_IN_FOLLOWED_TOPIC
* NotificationTriggers.TOPIC_WILDCARD_MENTION
* NotificationTriggers.STREAM_WILDCARD_MENTION
We now send stream creation and stream deletion events on
changing a user's role because a user can gain or lose
access to some streams on changing their role.
This commit extracts the code which queries the required streams
to a new function "get_user_streams". The new functions returns
the list of "Stream" object and not dictionaries and then
do_get_streams function converts it into list of dictionaries.
This change is important because we would use the new function
in further commit where we want list of "Stream" objects and
not list of dictionaries.
There was a bug in apply_event code where only a stream which
is not private is added to the "never_subscribed" data after
a stream creation event. Instead, it should be added to the
"never_subscribed" data irrespective of permission policy of
the stream as we already send stream creation events only to
those users who can access the stream. Due to the current
bug, private streams were not being added to "never_subscribed"
data in apply_event for admins as well. This commit fixes it
and also makes sure the "never_subscribed" list is sorted
which was not done before and was also a bug.
The bugs mentioned above were unnoticed as the tests did not
cover these cases and this commit also adds tests for those
cases.
The "streams" field in "/register" response did not include web-public
streams for non-admin users but the data for those are eventually
included in the subscriptions data sent using "subscriptions",
"unsubscribed" and "never_subscribed" fields.
This commit adds code to include the web-public streams in "streams"
field as well as everyone can access those and will make the "streams"
data complete.
The name and docstring were just wrong, having a UserMessage row isn't
sufficient for having message access and is actually only relevant in a
private stream with private history. The function is only used in a
single place anyway, in bulk_access_messages.
The comment mentioning this function in handle_remove_push_notification
can be tweaked to just not mention any function specifically and just
say why we're not checking message access.
Users who used to be subscribed to a private stream and have been
removed from it since retain the ability to edit messages/topics, and
delete messages that they used to have access to, if other relevant
organization permissions allow these actions. For example, a user may be
able to edit or delete their old messages they posted in such a private
stream. An administrator will be able to delete old messages (that they
had access to) from the private stream.
We fix this by fixing the logic in has_message_access (which lies at the
core of our message access checks - access_message() and
bulk_access_messages())
to not rely on only a UserMessage row for checking access but also
verify stream type and subscription status.
**Background**
User groups are expected to comply with the DAG constraint for the
many-to-many inter-group membership. The check for this constraint has
to be performed recursively so that we can find all direct and indirect
subgroups of the user group to be added.
This kind of check is vulnerable to phantom reads which is possible at
the default read committed isolation level because we cannot guarantee
that the check is still valid when we are adding the subgroups to the
user group.
**Solution**
To avoid having another transaction concurrently update one of the
to-be-subgroup after the recursive check is done, and before the subgroup
is added, we use SELECT FOR UPDATE to lock the user group rows.
The lock needs to be acquired before a group membership change is about
to occur before any check has been conducted.
Suppose that we are adding subgroup B to supergroup A, the locking protocol
is specified as follows:
1. Acquire a lock for B and all its direct and indirect subgroups.
2. Acquire a lock for A.
For the removal of user groups, we acquire a lock for the user group to
be removed with all its direct and indirect subgroups. This is the special
case A=B, which is still complaint with the protocol.
**Error handling**
We currently rely on Postgres' deadlock detection to abort transactions
and show an error for the users. In the future, we might need some
recovery mechanism or at least better error handling.
**Notes**
An important note is that we need to reuse the recursive CTE query that
finds the direct and indirect subgroups when applying the lock on the
rows. And the lock needs to be acquired the same way for the addition and
removal of direct subgroups.
User membership change (as opposed to user group membership) is not
affected. Read-only queries aren't either. The locks only protect
critical regions where the user group dependency graph might violate
the DAG constraint, where users are not participating.
**Testing**
We implement a transaction test case targeting some typical scenarios
when an internal server error is expected to happen (this means that the
user group view makes the correct decision to abort the transaction when
something goes wrong with locks).
To achieve this, we add a development view intended only for unit tests.
It has a global BARRIER that can be shared across threads, so that we
can synchronize them to consistently reproduce certain potential race
conditions prevented by the database locks.
The transaction test case lanuches pairs of threads initiating possibly
conflicting requests at the same time. The tests are set up such that exactly N
of them are expected to succeed with a certain error message (while we don't
know each one).
**Security notes**
get_recursive_subgroups_for_groups will no longer fetch user groups from
other realms. As a result, trying to add/remove a subgroup from another
realm results in a UserGroup not found error response.
We also implement subgroup-specific checks in has_user_group_access to
keep permission managing in a single place. Do note that the API
currently don't have a way to violate that check because we are only
checking the realm ID now.
We want to make the callers be more explicit about the use of the
user group being accessed, so that the later implemented database lock
can be benefited from the visibility.
Adds typing notification constants to the response given by
`POST /register`. Until now, these were hardcoded by clients
based on the documentation for implementing typing notifications
in the main endpoint description for `api/set-typing-status`.
This change also reflects updating the web-app frontend code
to use the new constants from the register response.
Co-authored-by: Samuel Kabuya <samuel.mwangikabuya@kibo.school>
Co-authored-by: Wilhelmina Asante <wilhelmina.asante@kibo.school>
Fixes#11767.
Previously multi-character emoji sequences weren't matched in the
emoji regex, so we'd convert the characters to separate images,
breaking the intended display.
This change allows us to match the full emoji sequence, and
therefore show the correct image.
We have modified the code to directly fetch realm from Message
object instead of "sender" field and thus we no longer need to
fetch "sender__realm" using select_related.
We do not want to access realm from "sender" field so that
we do not need to pass "sender__realm" argument to
select_related call when querying messages. We can instead
pass realm as argument to wildcard_mention_allowed.
We can directly get the realm object from Message object now
and there is no need to get the realm object from "sender"
field of Message object.
After this change, we would not need to fetch "sender__realm"
field using "select_related" and instead only passing "realm"
to select_related when querying Message objects would be enough.
This commit also updates a couple of cases to directly access
realm ID from message object and not message.sender. Although
we have fetched sender object already, so accessing realm_id
from message directly or from message.sender should not matter,
but we can be consistent to directly get realm from Message
object whenever possible.