The `user_activity_interval` worker calls:
```python3
last = UserActivityInterval.objects.filter(user_profile=user_profile).order_by("-end")[0]
`````
Which results in a query like:
```sql
SELECT "zerver_useractivityinterval"."id", "zerver_useractivityinterval"."user_profile_id", "zerver_useractivityinterval"."start", "zerver_useractivityinterval"."end" FROM "zerver_useractivityinterval" WHERE "zerver_useractivityinterval"."user_profile_id" = 12345 ORDER BY "zerver_useractivityinterval"."end" DESC LIMIT 1
```
For users which have at least one matching row, this results in a
query plan like:
```
Limit (cost=0.56..711.38 rows=1 width=24) (actual time=0.078..0.078 rows=1 loops=1)
-> Index Scan Backward using zerver_useractivityinterval_7f021a14 on zerver_useractivityinterval (cost=0.56..1031399.46 rows=1451 width=24) (actual time=0.077..0.078 rows=1 loops=1)
Filter: (user_profile_id = 12345)
Rows Removed by Filter: 98
Planning Time: 0.059 ms
Execution Time: 0.088 ms
```
But for users that have just been created, with no matching rows, this
is considerably more expensive:
```
Limit (cost=0.56..711.38 rows=1 width=24) (actual time=10798.146..10798.146 rows=0 loops=1)
-> Index Scan Backward using zerver_useractivityinterval_7f021a14 on zerver_useractivityinterval (cost=0.56..1031399.46 rows=1451 width=24) (actual time=10798.145..10798.145 rows=0 loops=1)
Filter: (user_profile_id = 12345)
Rows Removed by Filter: (count of every single row in the table, redacted)
Planning Time: 0.053 ms
Execution Time: 10798.158 ms
```
Regular vacuuming can force the use of the index on `user_profile_id`
as long as there are few enough users, which is fast -- however, at
some point, the query planner decides that is insufficiently specific,
always chooses the effective-whole-table-scan.
Add an index on `(user_profile_id, end)`, which is expected to be
sufficiently specific that it is used even with large numbers of user
profiles.
Ref #19250.
The transforms called from `build_message_payload` use
`lxml.html.fromstring` to parse (and stringify, and re-parse) the HTML
generated by Markdown. However, this function fails if it is passed
an empty document. "empty" is broader than just the empty string; it
also includes any document made entirely out of control characters,
spaces, unpaired surrogates, U+FFFE, or U+FFFF, and so forth. These
documents would fail to parse, and raise a ParserError.
Using `lxml.html.fragment_fromstring` handles these cases, but does by
wrapping the contents in a <div> every time it is called. As such,
replacing each `fromstring` with `fragment_fromstring` would nest
another layer of `<div>`.
Instead of each of the helper functions re-parsing, modifying, and
stringifying the HTML, parse it once with `fragment_fromstring` and
pass around the parsed document to each helper, which modifies it
in-place. This adds one outer `<div>`, requiring minor changes to
tests and the prepend-sender functions.
The modification to add the sender is left using BeautifulSoup, as
that sort of transform is much less readable, and more fiddly, in raw
lxml.
Partial fix for #19559.
This is a roundabout way to appease a semgrep complaint about
‘error_msg = error_msg % (string_id,)’ while also improving the code.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
9ac55a8cf6 introduced support for
batch updates to stories. However, that commit didn't skip label
removals, as we already do in non-batch story payloads. This led
to an exception for batch story update payloads where labels were
removed but none were added.
maybe_send_batched_emails handles batches of emails from different
users at once; as it processes each user's batch, it enqueues messages
onto the `email_senders` queue. If `handle_missedmessage_emails`
raises an exception when processing a single user's email, no events
are marked as handled -- including those that were already handled and
enqueued onto `email_senders`. This results in an increasing number
of users being sent repeated emails about the same missed messages.
Catch and log any exceptions when handling an individual user's
events. This guarantees forward progress, and that notifications are
sent at-most-once, not at-least-once.
This commit indicates that the realm_message_retention_days field can have
a special value, similar to its stream counterpart, and also explains how
the special value changed over different server versions.
With an extension from tabbott to double-enter the changelog entry.
Related discussion: https://chat.zulip.org/#narrow/stream/378-api-design/topic/realm_message_retention_days
The ability to use multiple ports has been removed a long time ago.
And the "optional" note in the help message is in fact incorrect
since `addrport` being `None` is not supported.
We do not allow any user to edit the system user groups (including
renaming, deleting, adding or removing members, etc.) from the
API. These user groups will change only by the code when a new
user is added or role of a user is changed.
This is implemented by rejecting access_user_group_by_id always
except the case when it is use to get the user group for sending
email and push notifications, as we would need to send notifications
to the mentioned user group.
We make the description parameter in create_user_group as keyword-only
to improve readability. We would also keep the is_system_group
parameter which will be added in future keyword-only.
Tuples cannot be deserialized from JSON.
While we do use these validators for other things, like event
dictionaries, we have migrated the API away from using those. The
last use was removed in 4f3d5f2d87
Signed-off-by: Anders Kaseorg <anders@zulip.com>