Commit Graph

78 Commits

Author SHA1 Message Date
Alex Vandiver 57ff573535 topic: Move sqlalchemy methods into their own file.
Loading sqlalchemy can take a significant amount of time, so splitting
these into these own file can be a significant startup-time savings.
2024-04-16 09:48:11 -07:00
Alex Vandiver b747ea285f topic: Fix history order for topic moves.
5c96f94206 mistakenly appended, rather than prepended, the edit to
the history.  This caused AssertionErrors when attempting to view the
history of moved messages, which check that the `last_edit_time`
matches the timestamp of the first edit in the list.

Fix the ordering, and update the `edit_history` for messages that were
affected.  We limit to only messages edited since the commit was
merged, since that helps bound the affected messages somewhat.
2024-02-20 21:30:32 -08:00
Alex Vandiver 22837fc1b4 message_edit: Carry the QuerySet through as much as possible.
Rather than pass around a list of message objects in-memory, we
instead keep the same constructed QuerySet which includes the later
propagated messages (if any), and use that same query to pick out
affected Attachment objects, rather than limiting to the set of ids.
This is not necessarily a win -- the list of message-ids *may* be very
long, and thus the query may be more concise, easier to send to
PostgreSQL, and faster for PostgreSQL to parse.  However, the list of
ids is almost certainly better-indexed.

After processing the move, the QuerySet must be re-defined as a search
of ids (and possibly a very long list of such), since there is no
other way which is guaranteed to correctly single out the moved
messages.  At this point, it is mostly equivalent to the list of
Message objects, and certainly takes no less memory.
2024-02-14 12:27:03 -08:00
Alex Vandiver 5c96f94206 topic: Use a single SQL statement to propagate message moves.
Rather than use `bulk_update()` to batch-move chunks of messages, use
a single SQL query to move the messages.  This is much more efficient
for large topic moves.  Since the `edit_history` field is not yet
JSON (see #26496) this requires that PostgreSQL cast the current data
into `jsonb`, append the new data (also cast to `jsonb`), and then
re-cast that as text.

For single-message moves, this _increases_ the SQL query count by one,
since we have to re-query for the updated data from the database after
the bulk update.  However, this is overall still a performance
improvement, which improves to 2x or 3x for larger topic moves.  Below
is a table of duration in seconds to run `do_update_message` to move a
topic to a new stream, based on messages in the topic, for before and
after this change:

| Topic size |  Before  |  After  |
| ---------- | -------- | ------- |
| 1          |   0.1036 |  0.0868 |
| 2          |   0.1108 |  0.0925 |
| 5          |   0.1139 |  0.0959 |
| 10         |   0.1218 |  0.0972 |
| 20         |   0.1310 |  0.1098 |
| 50         |   0.1759 |  0.1366 |
| 100        |   0.2307 |  0.1662 |
| 200        |   0.3880 |  0.2229 |
| 500        |   0.7676 |  0.4052 |
| 1000       |   1.3990 |  0.6848 |
| 2000       |   2.9706 |  1.3370 |
| 5000       |   7.5218 |  3.2882 |
| 10000      |  14.0272 |  5.4434 |
2024-02-14 12:27:03 -08:00
Alex Vandiver 822131fef4 message: Add a bulk_access_stream_messages_query method.
This applies access restrictions in SQL, so that individual messages
do not need to be walked one-by-one.  It only functions for stream
messages.

Use of this method significantly speeds up checks if we moved "all
visible messages" in a topic, since we no longer need to walk every
remaining message in the old topic to determine that at least one was
visible to the user.  Similarly, it significantly speeds up merging
into existing topics, since it no longer must walk every message in
the new topic to determine if the user could see at least one.

Finally, it unlocks the ability to bulk-update only messages the user
has access to, in a single query (see subsequent commit).
2024-02-14 12:27:03 -08:00
Alex Vandiver d55240e543 topic: Add comments calling out case-sensitive index usage. 2023-09-27 10:22:42 -07:00
Alex Vandiver 436e9c8a0c topic: Add realm limits to topic history queries. 2023-09-27 10:22:42 -07:00
Alex Vandiver b94402152d models: Always search Messages with a realm_id or id limit.
Unless there is a limit on `id`, always provide a `realm_id` limit as
well.  We also notate which index is expected to be used in each
query.
2023-09-11 15:00:37 -07:00
Alex Vandiver 9d3d57e786 message_send: Inline single use of filter_by_exact_message_topic.
Matching the topic exactly, as opposed to case-insensitively, is not a
common operation, and one that we want to make difficult to do
accidentally.  Inline the single use case of it.
2023-09-11 15:00:37 -07:00
Alex Vandiver 570ff08fde topic: Set a max batch_size on bulk_upate call.
The number of affected objects may be quite high, and they are
selected by `id IN (...)` query, and updated with a giant `CASE`.
This turns out to be quadratic, and can cause large queries to take
hours, in a state where they cannot be terminated, when PostgreSQL >11
tries to JIT the query.

Set a batch_size as a stopgap performance fix before moving to
`.update()` as a real fix.
2023-08-14 13:33:20 -07:00
Sahil Batra 7ff5423e21 topic: Pass args to select_related call for Message objects.
This commit adds code to pass all the required arguments to
select_related call for Message objects such that only the
required related fields are fetched from the database.

Previously, we did not pass any arguments to select_related,
so all the directly and indirectly related fields were fetched
when many of them were actually not being used and made the
query unnecessarily complex.
2023-08-10 17:35:43 -07:00
Prakhar Pratyush c0c30bc5f7 topic_mentions: Fetch users to be notified of @topic mentions.
This commit adds the 'topic_wildcard_mention_user_ids' and
'topic_wildcard_mention_in_followed_topic_user_ids'
attributes to the 'RecipientInfoResult' dataclass.

Only topic participants are notified of @topic mentions.

Topic participants are anyone who sent a message to a topic
or reacted to a message on the topic.

'topic_wildcard_mention_in_followed_topic_user_ids' stores the
ids of the topic participants who follow the topic and have
enabled the wildcard mention notifications for followed topics.

'topic_wildcard_mention_user_ids' stores the ids of the topic
participants for whom 'user_allows_notifications_in_StreamTopic'
with setting 'wildcard_mentions_notify' returns True.
2023-07-13 11:34:48 -07:00
Anders Kaseorg d3efd4c095 python: Import F, Q, QuerySet from their canonical module.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-03-05 14:46:28 -08:00
Anders Kaseorg df001db1a9 black: Reformat with Black 23.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.

(This does not actually upgrade our Python environment to Black 23
yet.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-02 10:40:13 -08:00
Josh Klar c15d066bf5 email-notifs: Use bracketed prefix to indicate a resolved topic.
Some email clients (notably, Gmail Web) support automatically threading
emails together if recipients and subjects match[1]. Manual testing
indicated that prefixing a subject with "[bracketed content]" does not
break this threading behavior, but the added checkmark in a resolved
topic's title does. Before sending an email notification, determine
whether the topic is resolved, and pass this information to the Jinja
template to properly format a threadable email subject.

Fixes: #22538

[1]: https://support.google.com/mail/answer/5900
2022-12-15 23:56:48 -08:00
Zixuan James Li e382cec015 topic: Add a None check with an assertion.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-08-12 17:08:04 -07:00
Zixuan James Li 07cc859120 topic: Tighten function signatures with generic QuerySet.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-07-07 11:28:13 -07:00
Anders Kaseorg 3bf8ee2156 python: Unquote some unnecessarily quoted type annotations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-06-26 17:37:41 -07:00
Anders Kaseorg 5f92078d07 request: Add a var_name parameter to converter.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-03-15 13:02:02 -07:00
Lauryn Menard 58f21fc748 edit_history: Remove `LEGACY_PREV_TOPIC` constant from code base.
Removes `LEGACY_PREV_TOPIC` which is no longer needed due to the
message edit history migration.

Also remove additions to the linter exclude list that were added
earlier in this commit series.
2022-03-04 10:25:48 -08:00
Tim Abbott f1e5ed91a1 types: Add EditHistoryEvent and APIEditHistoryEvent types.
These types will help make iteration on this code easier.

Note that `user_id` can be null due to the fact that
edit history entries before March 2017 did not log
the user that made the edit, which was years after
supporting topic edits (discovered in test deployment
of migration on chat.zulip.org).

Co-authored-by: Lauryn Menard <lauryn.menard@gmail.com>
2022-03-04 10:25:48 -08:00
Anders Kaseorg 817146c28b python: Upgrade SQLAlchemy from 1.3.24 to 1.4.23.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-31 06:47:39 -07:00
Anders Kaseorg 4206e5f00b python: Remove locally dead code.
These changes are all independent of each other; I just didn’t feel
like making dozens of commits for them.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-19 01:51:37 -07:00
akshatdalton f5c4d51ed2 resolve topic: Add `is:resolved` search keyword/filtering support.
This commit adds the backend support for `is:resolved` search keyword.
In the next commit, I will add the frontend support for the same.
2021-07-13 23:18:41 -07:00
akshatdalton 7ec406f39d refactor: Extract `RESOLVED_TOPIC_PREFIX` in topic.py.
This is a prep commit for #18990.
2021-07-13 23:18:41 -07:00
Tim Abbott 41d499d44c message_edit: Require access to messages to move between streams.
Currently, moving messages between streams is an action limited to
organization administrators. A big part of the motivation for that
restriction was to prevent users from moving messages from a private
stream without shared history as a way to access messages they should
not have access to.

Organization administrators can already just make the stream have
shared history if they want to access its messages, but allowing
non-administrators to move messages between would have
introduced a security bug without this change.
2021-05-12 16:23:22 -07:00
Tim Abbott 7ef0d21fc2 message_edit: Pass old_stream to update_messages_for_topic_edit.
We'll need this for checking access to moved messages.
2021-05-12 16:23:22 -07:00
Tim Abbott f78e604868 message_edit: Pass acting_user to update_messages_for_topic_edit.
We'll need for checking access if non-administrators can move topics.
2021-05-12 16:23:22 -07:00
Aman Agrawal de50f4ae25 message_edit: Extract update_edit_history. 2021-04-23 15:12:09 -07:00
Aman Agrawal 736fdcda49 update_messages_for_topic_edit: Remame `message` variable. 2021-04-23 15:12:09 -07:00
Anders Kaseorg 6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Mateusz Mandera 3623681d30 message_edit: Don't rely on .recipient_id change not affecting recipient.
The codepath for moving a topic changes the message.recipient_id to the
id of the new recipient, but later, in update_messages_for_topic_edit,
it uses message.recipient when querying for messages with the matching
topic in the *old* stream (because those are the other messages that
need to be moved). This is a bug which happens to work fine, because in
Django 2, if message.recipient gets fetched first and then
message.recipient_id is mutated, message.recipient will not be altered
and thus will retain the outdated, previously fetched value.

In Django 3 changing .recipient_id causes .recipient to be updated to
the new Recipient objects, which is the Recipient of the *new* stream.
That will cause the bug to manifest.

This is a bugfix preparing for the upgrade to Django 3.
2021-01-17 10:39:46 -08:00
Aman Agrawal e566e985e4 topic_edit: Store edit history in all the message affected.
Instead of just storing the edit history in the message which
triggered the topic edit, we store the edit history in all
the messages that changed. This helps users track the edit history
of a message more reliably.
2021-01-04 18:18:05 -08:00
Anders Kaseorg 13e35bfa94 mypy: Use sqlalchemy-stubs.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-11-16 18:17:41 -08:00
Anders Kaseorg 72d6ff3c3b docs: Fix more capitalization issues.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-23 11:46:55 -07:00
Steve Howell bfd6e2b1fd refactor: Use recipient_id to get topic history. 2020-10-16 12:58:11 -07:00
Steve Howell 65dbee4837 minor: Ask for recipient_id, not recipient. 2020-10-16 12:58:11 -07:00
Tim Abbott 3b2a262b6f topic: Reorder topic history functions. 2020-08-25 17:03:48 -07:00
Tim Abbott 88a28d5470 topic: Refactor get_topic_history_for_stream.
This now uses get_topic_history_for_public_stream as a subroutine, to
avoid duplicating that large section of SQL.
2020-08-25 17:03:13 -07:00
Aman c3a8492697 topic: Rename get_topic_history_for_web_public_stream. 2020-08-25 17:01:12 -07:00
Anders Kaseorg 9b7c6828ec models: Annotate most field types.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-02 13:28:10 -07:00
Aman Agrawal 2e5f860d41 message_edit: Do case-insensitive exact match when editing topics.
When doing query for same topic names in a stream, we should do a
case-insensitive exact match for the topic, since that's the data
model for topics in Zulip.
2020-06-13 16:36:29 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Hashir Sarwar 6364d27ed5 topic: Remove 7 days restriction for editing & moving topics.
Previously, we had a restriction that we could only
edit and move the topics of 7 days old messages.
This buggy behaviour is now removed as in this
commit.

Fixes #14492.
Part of #13912.
2020-05-08 12:57:50 -07:00
Anders Kaseorg fead14951c python: Convert assignment type annotations to Python 3.6 style.
This commit was split by tabbott; this piece covers the vast majority
of files in Zulip, but excludes scripts/, tools/, and puppet/ to help
ensure we at least show the right error messages for Xenial systems.

We can likely further refine the remaining pieces with some testing.

Generated by com2ann, with whitespace fixes and various manual fixes
for runtime issues:

-    invoiced_through: Optional[LicenseLedger] = models.ForeignKey(
+    invoiced_through: Optional["LicenseLedger"] = models.ForeignKey(

-_apns_client: Optional[APNsClient] = None
+_apns_client: Optional["APNsClient"] = None

-    notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
-    signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)

-    author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)
+    author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)

-    bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)
+    bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)

-    default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
-    default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)

-descriptors_by_handler_id: Dict[int, ClientDescriptor] = {}
+descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {}

-worker_classes: Dict[str, Type[QueueProcessingWorker]] = {}
-queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {}
+worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {}
+queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {}

-AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None
+AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-22 11:02:32 -07:00
Tim Abbott 843345dfee message_edit: Add backend for moving a topic to another stream.
This commit reuses the existing infrastructure for moving a topic
within a stream to add support for moving topics from one stream to
another.

Split from the original full-feature commit so that we can merge just
the backend, which is finished, at this time.

This is a large part of #6427.

The feature is incomplete, in that we don't have real-time update of
the frontend to handle the event, documentation, etc., but this commit
is a good mergable checkpoint that we can do further work on top of.
We also still ideally would have a test_events test for the backend,
but I'm willing to leave that for follow-up work.

This appears to have switched to tabbott as the author during commit
squashing sometime ago, but this commit is certainly:

Co-Authored-By: Wbert Adrián Castro Vera <wbertc@gmail.com>
2020-04-07 14:19:19 -07:00
Tim Abbott 49ca7cf717 topic: Add recipient_id to fields for message edit saves.
This is preparation for supporting moving messages between streams in
some cases.

It doesn't actually have any functional effect, since flush_message
clears the message unconditionally anyway.
2020-02-26 16:12:07 -08:00
Mateusz Mandera 2475adbf8a messages_for_topic: Use stream.recipient_id for more efficient query. 2020-02-11 17:39:43 -08:00