Commit Graph

8151 Commits

Author SHA1 Message Date
Alex Vandiver 548bb5362e message_cache: Rename "to_dict" functions which deal with bytes. 2024-02-14 17:31:31 +00:00
Alex Vandiver 96119e45b9 message_cache: Rename update_to_dict_cache to update_message_cache.
This better describes what it does.
2024-02-14 17:31:31 +00:00
Alex Vandiver 93a071a1f8 message: Split MessageDict and friends into its own file. 2024-02-14 17:31:30 +00:00
Alex Vandiver 11bde84580 message: Move render_markdown into zerver.lib.markdown. 2024-02-14 17:31:30 +00:00
Alex Vandiver eaf58438ec message_edit: Carry the QuerySet through as much as possible.
Rather than pass around a list of message objects in-memory, we
instead keep the same constructed QuerySet which includes the later
propagated messages (if any), and use that same query to pick out
affected Attachment objects, rather than limiting to the set of ids.
This is not necessarily a win -- the list of message-ids *may* be very
long, and thus the query may be more concise, easier to send to
PostgreSQL, and faster for PostgreSQL to parse.  However, the list of
ids is almost certainly better-indexed.

After processing the move, the QuerySet must be re-defined as a search
of ids (and possibly a very long list of such), since there is no
other way which is guaranteed to correctly single out the moved
messages.  At this point, it is mostly equivalent to the list of
Message objects, and certainly takes no less memory.
2024-02-14 17:31:30 +00:00
Alex Vandiver a2657b843c topic: Use a single SQL statement to propagate message moves.
Rather than use `bulk_update()` to batch-move chunks of messages, use
a single SQL query to move the messages.  This is much more efficient
for large topic moves.  Since the `edit_history` field is not yet
JSON (see #26496) this requires that PostgreSQL cast the current data
into `jsonb`, append the new data (also cast to `jsonb`), and then
re-cast that as text.

For single-message moves, this _increases_ the SQL query count by one,
since we have to re-query for the updated data from the database after
the bulk update.  However, this is overall still a performance
improvement, which improves to 2x or 3x for larger topic moves.  Below
is a table of duration in seconds to run `do_update_message` to move a
topic to a new stream, based on messages in the topic, for before and
after this change:

| Topic size |  Before  |  After  |
| ---------- | -------- | ------- |
| 1          |   0.1036 |  0.0868 |
| 2          |   0.1108 |  0.0925 |
| 5          |   0.1139 |  0.0959 |
| 10         |   0.1218 |  0.0972 |
| 20         |   0.1310 |  0.1098 |
| 50         |   0.1759 |  0.1366 |
| 100        |   0.2307 |  0.1662 |
| 200        |   0.3880 |  0.2229 |
| 500        |   0.7676 |  0.4052 |
| 1000       |   1.3990 |  0.6848 |
| 2000       |   2.9706 |  1.3370 |
| 5000       |   7.5218 |  3.2882 |
| 10000      |  14.0272 |  5.4434 |
2024-02-14 17:25:06 +00:00
Alex Vandiver 7dcc7540f9 message: Add a bulk_access_stream_messages_query method.
This applies access restrictions in SQL, so that individual messages
do not need to be walked one-by-one.  It only functions for stream
messages.

Use of this method significantly speeds up checks if we moved "all
visible messages" in a topic, since we no longer need to walk every
remaining message in the old topic to determine that at least one was
visible to the user.  Similarly, it significantly speeds up merging
into existing topics, since it no longer must walk every message in
the new topic to determine if the user could see at least one.

Finally, it unlocks the ability to bulk-update only messages the user
has access to, in a single query (see subsequent commit).
2024-02-13 16:39:52 +00:00
Alex Vandiver c118f1874e streams: Remove a lie from the docstring. 2024-02-13 15:39:45 +00:00
Alex Vandiver 13b9c87f93 tests: Reserve "Internal" client, used by email gateway and topic moves. 2024-02-13 04:06:35 +00:00
Alex Vandiver a84de411a9 tests: Clear in-memory Client caches before testing query counts.
This makes counts more apples-to-apples comparable when run
back-to-back.
2024-02-13 03:57:43 +00:00
Anders Kaseorg e79572d0d5 page_params: Remove unused first_in_realm.
It’s unused since commit e1843dd1b9
(#5819).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-02-08 10:08:15 -08:00
Anders Kaseorg b59faf540f page_params: Remove unused prompt_for_invites.
It’s unused since commit ebe959f2b0.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-02-08 10:08:15 -08:00
Alya Abbott e9b0c7f2c0 name_restrictions: Reserve additional subdomains. 2024-02-07 12:10:00 -08:00
Mateusz Mandera 5672595c2a push_notifs: Gracefully handle exception when server cant push.
The problem was that earlier this was just an uncaught JsonableError,
leading to a full traceback getting spammed to the admins.
The prior commit introduced a clear .code for this error on the bouncer
side, meaning the self-hosted server can now detect that and handle it
nicely, by just logging.error about it and also take the opportunity to
adjust the realm.push_notifications_... flags.
2024-02-07 10:36:33 -08:00
Mateusz Mandera 3bda31c48c zilencer: Improve json error when plan doesn't allow push notifs.
This allows the self-hosted server to explicitly test for that exception
and catch and log it nicely.
2024-02-07 10:36:33 -08:00
David Rosa d29cd04387 integrations: Create incoming webhook for GitHub Sponsors.
Creates an incoming webhook integration for Github Sponsors. The
main use case is getting notifications when new sponsors sign up.

Fixes #18320.
2024-02-07 09:52:03 -08:00
Anders Kaseorg 029e765e20 openapi: Validate real requests and responses, not fictional mocks.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-02-05 19:57:21 -05:00
Anders Kaseorg 131b230e2b openapi: Represent OpenAPI parameters with a Parameter class.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-02-05 19:57:21 -05:00
Anders Kaseorg 0dd92d2116 test_classes: Add Content-Type header to empty DELETE/POST bodies.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-02-05 19:57:21 -05:00
Anders Kaseorg a356ec7011 test_classes: Default client_post to application/x-www-form-urlencoded.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-02-05 19:57:21 -05:00
Anders Kaseorg 53e80c41ea ruff: Fix SIM113 Use `enumerate()` for index variable in `for` loop.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-02-02 10:30:45 -08:00
Anders Kaseorg 712917b2c9 ruff: Fix RUF019 Unnecessary key check before dictionary access.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-02-02 10:30:45 -08:00
David Rosa fe0d4db153 help: Improve integrations documentation.
- Renames "Bots and integrations" to "Bots overview" everywhere
  (sidebar, page title, page URL).
- Adds a copy of /api/integrations-overview (symbolic link) as the
  second page in the Bots & integrations section, titled
  "Integrations overview".

Fixes #28758.
2024-02-01 09:45:56 -08:00
David Rosa 1e4f5c6433 integrations: Create incoming webhook for Patreon.
Creates an incoming webhook integration for Patreon. The main
use case is getting notifications when new patrons sign up.

Fixes #18321.

Co-authored-by: Hari Prashant Bhimaraju <haripb01@gmail.com>
Co-authored-by: Sudipto Mondal <sudipto.mondal1997@gmail.com>
2024-01-30 13:13:19 -08:00
Anders Kaseorg 93198a19ed requirements: Upgrade Python requirements.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-01-29 10:41:54 -08:00
Alya Abbott 10d8d4578e help: Change "All older versions" tab to "All versions".
The instructions actually work on 8.0+ as well, not just older versions.
2024-01-25 18:18:04 -08:00
Alex Vandiver d80b063b61 import: Rewrite "delivered_message" column of scheduled messages.
This also requires shuffling the message import to before the
scheduled messages.

Fixes: #28690.
2024-01-24 13:29:47 -08:00
Alex Vandiver 07c4291749 message: Rewrite personals query to be more performant and accurate.
The previous query suffered from bad corner cases when the user had
received a large number of direct messages but sent very few,
comparatively.  This mean that the first half of the UNION would
retrieve a very large number of UserMessage rows, requiring fetching a
large number of Message rows, merely to throw them away upon
determining that the recipient was the current user.

Instead of merging two queries of "last 1k received" + "last 1k sent",
we instead make better use of the UserMessage rows to find "last 1k
sent or received."  This may change the list of recipients, as large
disparities in sent/received messages may result in pushing the
most-recently-sent users off of the list.  These are likely uncommon
edge cases, however -- and the disparity is the whole reason for the
performance problem.

This also provides more correct answers.  In the case where a user's
1001'th message sent was to person A today, but my most recent message
received was from them yesterday, the previous plan would show the
message I received yesterday message-id as the max, and not the more
recent message I sent today.

While we could theoretically raise the `RECENT_CONVERSATIONS_LIMIT` to
more frequently match the same recipient list as previously, this
increases the cost of the most common cases unreasonably.  With a
1000-message limit, the common cases are slightly faster, and the tail
latencies are very much improved; raising `RECENT_CONVERSATIONS_LIMIT`
would increase the result similarity to the old algorithm, at the cost
of the p50 and p75.

|        |   Old   |   New   |
| ------ | ------- | ------- |
| Mean   | 0.05287 | 0.02520 |
| p50    | 0.00695 | 0.00556 |
| p75    | 0.05592 | 0.03351 |
| p90    | 0.14645 | 0.08026 |
| p95    | 0.20181 | 0.10906 |
| p99    | 0.30691 | 0.16014 |
| p99.9  | 0.57894 | 0.19521 |
| max    | 22.0610 | 0.22184 |

On the whole, however, the much more bounded worst case are worth the
small changes to the resultset.
2024-01-18 09:30:20 -08:00
Mateusz Mandera 80f5963bbc auth: Add a configurable wrapper around authenticate calls. 2024-01-15 12:18:48 -08:00
Prakhar Pratyush b7e56ccbdc lib: Rename *topic local variables to *topic_name.
This is preparatory work towards adding a Topic model.
We plan to use the local variable name as 'topic' for
the Topic model objects.

Currently, we use *topic as the local variable name for
topic names.

We rename local variables of the form *topic to *topic_name
so that we don't need to think about type collisions in
individual code paths where we might want to talk about both
Topic objects and strings for the topic name.
2024-01-15 09:40:43 -08:00
Prakhar Pratyush bc66eaee7d views: Rename *topic local variables to *topic_name.
This is preparatory work towards adding a Topic model.
We plan to use the local variable name as 'topic' for
the Topic model objects.

Currently, we use *topic as the local variable name for
topic names.

We rename local variables of the form *topic to *topic_name
so that we don't need to think about type collisions in
individual code paths where we might want to talk about both
Topic objects and strings for the topic name.
2024-01-15 09:40:43 -08:00
Prakhar Pratyush 1eef052bd1 actions: Rename *topic local variables to *topic_name.
This is preparatory work towards adding a Topic model.
We plan to use the local variable name as 'topic' for
the Topic model objects.

Currently, we use *topic as the local variable name for
topic names.

We rename local variables of the form *topic to *topic_name
so that we don't need to think about type collisions in
individual code paths where we might want to talk about both
Topic objects and strings for the topic name.
2024-01-15 09:40:43 -08:00
Sahil Batra c0c9623ae4 message: Allow system bots to mention group if everyone else can.
We now allow system bots to mention a group if can_mention_group
setting is set to "role:everyone" group and not when it is set
to some other group.
2024-01-10 14:57:21 -08:00
Evgenii 3f06596cf0
dev_ldap_directory: Use f-strings for better readability. 2024-01-09 12:09:09 -08:00
Mateusz Mandera 3ec3ac63f2 zilencer: Have server send realm_uuid to remaining bouncer endpoints.
Requests to these endpoint are about a specified user, and therefore
also have a notion of the RemoteRealm for these requests. Until now
these endpoints weren't getting the realm_uuid value, because it wasn't
used - but now it is needed for updating .last_request_datetime on the
RemoteRealm.
2024-01-05 13:09:09 -08:00
Alex Vandiver 4ab9cd7cf2 markdown: Prevent OverflowError with large time integers.
`<time:1234567890123>` causes a "signed integer is greater than
maximum" exception from dateutil.parser; datetime also cannot handle
it ("year 41091 is out of range") but that is a ValueError which is
already caught.

Catch the OverflowError thrown by dateutil.
2024-01-05 12:01:06 -08:00
Alex Vandiver 75d6f35069 s3: Add a setting for S3 addressing style.
This controls if boto3 attempts to use
`https://bucketname.endpointname/` or `https://endpointname/bucket/`
as its prefix.  See
https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

Fixes: #28424.
2024-01-05 11:12:18 -08:00
Alex Vandiver 3aea67a8ed s3: Only use get_bucket to get to boto3 clients and resources.
boto3 has two different modalities of making API calls -- through
resources, and through clients.  Resources are a higher-level
abstraction, and thus more generally useful, but some APIs are only
accessible through clients.  It is possible to get to a client object
from a resource, but not vice versa.

Use `get_bucket(...).meta.client` when we need direct access to the
client object for more complex API calls; this lets all of the
configuration for how to access S3 to sit within `get_bucket`.  Client
objects are not bound to only one bucket, but we get to them based on
the bucket we will be interacting with, for clarity.

We removed the cached session object, as it serves no real purpose.
2024-01-05 11:12:18 -08:00
Alex Vandiver 214bd4ed88 s3: Stop caching get_boto_client, which is only ever called once.
e883ab057f started caching the boto client, which we had identified
as slow call.  e883ab057f went further, calling
`get_boto_client().generate_presigned_url()` once and caching that
result.

This makes the inner cache on the client useless.  Remove it.
2024-01-05 11:12:18 -08:00
Alex Vandiver bd38e6cb69 send_email: Distinct emails means distinct, case-insensitively. 2024-01-04 10:46:53 -08:00
Alex Vandiver 8d9ead0f6d send_custom_email: Order by delivery_email if necessary.
If we `.distinct("delivery_email")` then we must also
`.order_by("delivery_email")`; adc987dc43 added the `.order_by`
call, which broke the newsletter codepath, since it did not contain
the `delivery_email` in the ordering fields.

Add a flag to distinct on emails in `send_custom_email`.
2024-01-04 10:46:53 -08:00
Anders Kaseorg c343d7c30e models: Move query_for_ids to zerver.lib.query_helpers.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg 33d140c8dc models: Extract zerver.models.alert_words.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg 1f1b2f9a68 models: Extract zerver.models.bots.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg 27c0b507af models: Extract zerver.models.custom_profile_fields.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg c9c819e1d7 models: Extract zerver.models.scheduled_jobs.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg cff0b78771 models: Move some functions to zerver.lib.attachments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg b15999c799 models: Extract zerver.models.messages.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg bac027962f models: Extract zerver.models.clients.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg 4aa2d76bea models: Extract zerver.models.streams.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00