5c96f94206 mistakenly appended, rather than prepended, the edit to
the history. This caused AssertionErrors when attempting to view the
history of moved messages, which check that the `last_edit_time`
matches the timestamp of the first edit in the list.
Fix the ordering, and update the `edit_history` for messages that were
affected. We limit to only messages edited since the commit was
merged, since that helps bound the affected messages somewhat.
Rather than pass around a list of message objects in-memory, we
instead keep the same constructed QuerySet which includes the later
propagated messages (if any), and use that same query to pick out
affected Attachment objects, rather than limiting to the set of ids.
This is not necessarily a win -- the list of message-ids *may* be very
long, and thus the query may be more concise, easier to send to
PostgreSQL, and faster for PostgreSQL to parse. However, the list of
ids is almost certainly better-indexed.
After processing the move, the QuerySet must be re-defined as a search
of ids (and possibly a very long list of such), since there is no
other way which is guaranteed to correctly single out the moved
messages. At this point, it is mostly equivalent to the list of
Message objects, and certainly takes no less memory.
Rather than use `bulk_update()` to batch-move chunks of messages, use
a single SQL query to move the messages. This is much more efficient
for large topic moves. Since the `edit_history` field is not yet
JSON (see #26496) this requires that PostgreSQL cast the current data
into `jsonb`, append the new data (also cast to `jsonb`), and then
re-cast that as text.
For single-message moves, this _increases_ the SQL query count by one,
since we have to re-query for the updated data from the database after
the bulk update. However, this is overall still a performance
improvement, which improves to 2x or 3x for larger topic moves. Below
is a table of duration in seconds to run `do_update_message` to move a
topic to a new stream, based on messages in the topic, for before and
after this change:
| Topic size | Before | After |
| ---------- | -------- | ------- |
| 1 | 0.1036 | 0.0868 |
| 2 | 0.1108 | 0.0925 |
| 5 | 0.1139 | 0.0959 |
| 10 | 0.1218 | 0.0972 |
| 20 | 0.1310 | 0.1098 |
| 50 | 0.1759 | 0.1366 |
| 100 | 0.2307 | 0.1662 |
| 200 | 0.3880 | 0.2229 |
| 500 | 0.7676 | 0.4052 |
| 1000 | 1.3990 | 0.6848 |
| 2000 | 2.9706 | 1.3370 |
| 5000 | 7.5218 | 3.2882 |
| 10000 | 14.0272 | 5.4434 |
This applies access restrictions in SQL, so that individual messages
do not need to be walked one-by-one. It only functions for stream
messages.
Use of this method significantly speeds up checks if we moved "all
visible messages" in a topic, since we no longer need to walk every
remaining message in the old topic to determine that at least one was
visible to the user. Similarly, it significantly speeds up merging
into existing topics, since it no longer must walk every message in
the new topic to determine if the user could see at least one.
Finally, it unlocks the ability to bulk-update only messages the user
has access to, in a single query (see subsequent commit).
Matching the topic exactly, as opposed to case-insensitively, is not a
common operation, and one that we want to make difficult to do
accidentally. Inline the single use case of it.
The number of affected objects may be quite high, and they are
selected by `id IN (...)` query, and updated with a giant `CASE`.
This turns out to be quadratic, and can cause large queries to take
hours, in a state where they cannot be terminated, when PostgreSQL >11
tries to JIT the query.
Set a batch_size as a stopgap performance fix before moving to
`.update()` as a real fix.
This commit adds code to pass all the required arguments to
select_related call for Message objects such that only the
required related fields are fetched from the database.
Previously, we did not pass any arguments to select_related,
so all the directly and indirectly related fields were fetched
when many of them were actually not being used and made the
query unnecessarily complex.
This commit adds the 'topic_wildcard_mention_user_ids' and
'topic_wildcard_mention_in_followed_topic_user_ids'
attributes to the 'RecipientInfoResult' dataclass.
Only topic participants are notified of @topic mentions.
Topic participants are anyone who sent a message to a topic
or reacted to a message on the topic.
'topic_wildcard_mention_in_followed_topic_user_ids' stores the
ids of the topic participants who follow the topic and have
enabled the wildcard mention notifications for followed topics.
'topic_wildcard_mention_user_ids' stores the ids of the topic
participants for whom 'user_allows_notifications_in_StreamTopic'
with setting 'wildcard_mentions_notify' returns True.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Some email clients (notably, Gmail Web) support automatically threading
emails together if recipients and subjects match[1]. Manual testing
indicated that prefixing a subject with "[bracketed content]" does not
break this threading behavior, but the added checkmark in a resolved
topic's title does. Before sending an email notification, determine
whether the topic is resolved, and pass this information to the Jinja
template to properly format a threadable email subject.
Fixes: #22538
[1]: https://support.google.com/mail/answer/5900
Removes `LEGACY_PREV_TOPIC` which is no longer needed due to the
message edit history migration.
Also remove additions to the linter exclude list that were added
earlier in this commit series.
These types will help make iteration on this code easier.
Note that `user_id` can be null due to the fact that
edit history entries before March 2017 did not log
the user that made the edit, which was years after
supporting topic edits (discovered in test deployment
of migration on chat.zulip.org).
Co-authored-by: Lauryn Menard <lauryn.menard@gmail.com>
These changes are all independent of each other; I just didn’t feel
like making dozens of commits for them.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Currently, moving messages between streams is an action limited to
organization administrators. A big part of the motivation for that
restriction was to prevent users from moving messages from a private
stream without shared history as a way to access messages they should
not have access to.
Organization administrators can already just make the stream have
shared history if they want to access its messages, but allowing
non-administrators to move messages between would have
introduced a security bug without this change.
The codepath for moving a topic changes the message.recipient_id to the
id of the new recipient, but later, in update_messages_for_topic_edit,
it uses message.recipient when querying for messages with the matching
topic in the *old* stream (because those are the other messages that
need to be moved). This is a bug which happens to work fine, because in
Django 2, if message.recipient gets fetched first and then
message.recipient_id is mutated, message.recipient will not be altered
and thus will retain the outdated, previously fetched value.
In Django 3 changing .recipient_id causes .recipient to be updated to
the new Recipient objects, which is the Recipient of the *new* stream.
That will cause the bug to manifest.
This is a bugfix preparing for the upgrade to Django 3.
Instead of just storing the edit history in the message which
triggered the topic edit, we store the edit history in all
the messages that changed. This helps users track the edit history
of a message more reliably.
When doing query for same topic names in a stream, we should do a
case-insensitive exact match for the topic, since that's the data
model for topics in Zulip.
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>