If do_delete_messages (and friends) are called for a massive number of
messages, the giant list of message ids is passed to Postgres even
though chunk_size makes all but the first chunk_size of message ids
useless.
This commit performs a sweep on the first batch of non API
files to rename "huddle" to "direct_message_group`.
It also renames variables and methods of type -
"huddle_message" to "group_direct_message".
This is a part of #28640
Replaced HUDDLE attribute with DIRECT_MESSAGE_GROUP using VS Code search,
part of a general renaming of the object class.
Fixes part of #28640.
Co-authored-by: JohnLu2004 <JohnLu10212004@gmail.com>
Fixes#25414.
We add Attachment.scheduled_messages relation to track ScheduledMessages
which reference the attachment.
The import bits can be done after merging this, by updating #25345.
This commit updates the pattern for dealing with tuples
returned by the delete() query.
The '(num_deleted, ignored) = ModelName.objects.filter().delete()'
pattern is preferred due to better readability.
We avoid the pattern '(num_deleted, _)' because Django uses _
for translation, which may lead to future bugs.
We no longer need to do the inner joins to figure out the message's
realm and split up the cross-realm and regular case - now we just look
at zerver_message.realm directly.
To explain the rationale of this change, for example, there is
`get_user_activity_summary` which accepts either a `Collection[UserActivity]`,
where `QuerySet[T]` is not strictly `Sequence[T]` because its slicing behavior
is different from the `Protocol`, making `Collection` necessary.
Similarily, we should have `Iterable[T]` instead of `List[T]` so that
`QuerySet[T]` will also be an acceptable subtype, or `Sequence[T]` when we
also expect it to be indexed.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This ensures that all the keyword arguments in `move_rows`
have the correct types. Note that `returning_id` is supposed to be a
flag instead of a `Composable` `Literal`.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Since recipient_id (id of the PERSONAL Recipient of the user) was
denormalized into the UserProfile model, this query can be simplified by
getting rid of the zerver_recipient JOIN.
Streams can have lots of subscribers, meaning that the archiving process
will be moving tons of UserMessages per message. For that reason, using
a smaller batch size for stream messages is justified.
Some personal messages need to be added in test_scrub_realm to have
coverage of do_delete_messages_by_sender after these changes.
Currently, we use -1 as the Realm.message_retention_days value to retain
message forever unless specified at stream level for a particular stream,
that is, no policy set at the realm level. But this is incoherent with what
we use for Stream.message_retention_days where -1 means
> disable retention policy for this stream unconditionally
that can be confusing from an API standpoint.
So instead of trying some hack to reset the value to NULL or using some
other value like -2 for RETAIN_MESSAGE_FOREVER and use that for API. It is
much more intuitive to use a string like 'forever' that can be mapped to
RETAIN_MESSAGE_FOREVER at the backend. And this is similar to what we use
for streams settings as well.
To be more consistent with the meaning in the Stream model, and to make
it easier to have a reasonable settings API, we get rid of the None
value for Realm.message_retention_days in favor of the value -1 to
represent the "don't delete messages" default policy.
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
For unknown reasons, deleting 10,000s of ArchiveTransaction objects
results in rapidly growing memory in the job making the request in the
Django process, eventually leading to an OOM kill.
I don't understand why Django behaves that way; I would have expected
the failure mode to instead be a serious load problem on the database
server, but perhaps the way Django's internal deletion logic handles
cascading the deletes to many millions of ArchiveMessages and other
ForeignKey objects requires tracking a lot of data in memory.
The solution is the same in any case, which is to batch the deletions
to execute a reasonable number of them at once. Doing a single
ArchiveTransaction at a time would likely result in huge numbers of
database queries in a loop, which performs very poorly. So we balance
by batching deletions in groups of 100 ArchiveTransactions; testing
this in production, I saw no spike of memory usage materially beyond
that of a normal Django process, and each bulk-deletion transaction
takes several seconds to process (meaning per-transaction overhead is
negligible).
For realms with no retention policy on themselves or any of their
streams, no archiving happens, but 3 lines of logs would be generated.
That's redundant and we make changes in this commit to avoid logging
those lines if nothing of interest is happening.
In https://github.com/zulip/zulip/pull/12823 some changes to the realms
structure have been made, so now both in production and development
cross-realm bots live in the realm with string_id "zulipinternal".
There was a TODO in retention code to eliminate a conditional in a query
that became redundant with this change, and also the zulipinternal realm
should be omitted from the archiving process in archive_messages().
With the recipient field being denormalized into the UserProfile and
Streams models, all current uses of get_stream_recipients can be done
more efficiently, by simply checking the .recipient_id attribute on the
appropriate objects.
Fixes#1727.
With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.