Commit Graph

102 Commits

Author SHA1 Message Date
Mateusz Mandera 18fbb5d146 retention: Limit number of ids passed to db in delete messages query.
If do_delete_messages (and friends) are called for a massive number of
messages, the giant list of message ids is passed to Postgres even
though chunk_size makes all but the first chunk_size of message ids
useless.
2024-09-20 09:31:21 -07:00
Mateusz Mandera ed7c330548 retention: Rename run_archiving_in_chunks to run_archiving. 2024-09-20 09:31:21 -07:00
Anders Kaseorg 0fa5e7f629 ruff: Fix UP035 Import from `collections.abc`, `typing` instead.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg 531b34cb4c ruff: Fix UP007 Use `X | Y` for type annotations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg e08a24e47f ruff: Fix UP006 Use `list` instead of `List` for type annotation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
roanster007 52692a6448 refactor: Rename `huddle` to `direct_message_group` in non API.
This commit performs a sweep on the first batch of non API
files to rename "huddle" to "direct_message_group`.

It also renames variables and methods of type -
"huddle_message" to "group_direct_message".

This is a part of #28640
2024-07-04 07:56:31 -07:00
Mateusz Mandera 224ea3aaed retention: Add .restored_timestamp to ArchiveTransaction.
This is just generally useful for tracking and debugging.
2024-05-09 10:54:44 -07:00
John Lu a5cf0ec526
refactor: Replace HUDDLE with DIRECT_MESSAGE_GROUP.
Replaced HUDDLE attribute with DIRECT_MESSAGE_GROUP using VS Code search,
part of a general renaming of the object class.

Fixes part of #28640.

Co-authored-by: JohnLu2004 <JohnLu10212004@gmail.com>
2024-03-21 16:39:33 -07:00
Alex Vandiver b94402152d models: Always search Messages with a realm_id or id limit.
Unless there is a limit on `id`, always provide a `realm_id` limit as
well.  We also notate which index is expected to be used in each
query.
2023-09-11 15:00:37 -07:00
Alex Vandiver 0f918d9071 retention: Do not archive attachments with scheduled messages. 2023-08-06 13:40:02 -07:00
Mateusz Mandera 414658fc8e scheduled_message: Handle attachments properly.
Fixes #25414.

We add Attachment.scheduled_messages relation to track ScheduledMessages
which reference the attachment.

The import bits can be done after merging this, by updating #25345.
2023-05-08 09:56:02 -07:00
Prakhar Pratyush e45623fccc python: Update tuple handling pattern; returned by a delete() query.
This commit updates the pattern for dealing with tuples
returned by the delete() query.

The '(num_deleted, ignored) = ModelName.objects.filter().delete()'
pattern is preferred due to better readability.

We avoid the pattern '(num_deleted, _)' because Django uses _
for translation, which may lead to future bugs.
2023-03-27 16:18:23 -07:00
Anders Kaseorg e1ed44907b ruff: Fix SIM118 Use `key in dict` instead of `key in dict.keys()`.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-04 16:25:07 -08:00
Mateusz Mandera 760587450e retention: Remove two redundant comments.
These two identical comments don't contribute anything useful and seem
just out of place at this point.
2022-10-31 10:23:57 -07:00
Mateusz Mandera 7b13204e8f retention: Use Message.realm to simplify private message query.
We no longer need to do the inner joins to figure out the message's
realm and split up the cross-realm and regular case - now we just look
at zerver_message.realm directly.
2022-10-31 10:23:57 -07:00
Anders Kaseorg b4b8691239 retention: Inline move_rows query arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-30 06:46:34 -07:00
Zixuan James Li ab1bbdda65 typing: Broaden type annotations for QuerySet compatibility.
To explain the rationale of this change, for example, there is
`get_user_activity_summary` which accepts either a `Collection[UserActivity]`,
where `QuerySet[T]` is not strictly `Sequence[T]` because its slicing behavior
is different from the `Protocol`, making `Collection` necessary.

Similarily, we should have `Iterable[T]` instead of `List[T]` so that
`QuerySet[T]` will also be an acceptable subtype, or `Sequence[T]` when we
also expect it to be indexed.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-07-07 11:27:42 -07:00
Zixuan James Li f0e505d557 retention: Guarantee type-safeness when calling move_rows.
This ensures that all the keyword arguments in `move_rows`
have the correct types. Note that `returning_id` is supposed to be a
flag instead of a `Composable` `Literal`.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-06-28 16:01:35 -07:00
Mateusz Mandera acfa55138e retention: Add docstring info on how archive cleaning works.
In particular, it's important to record the special treatment around
ArchivedAttachment rows not being deleted in this step.
2022-06-08 15:12:36 -07:00
Zixuan James Li e338ada66c typing: Add none-checks for Recipient objects.
Signed-off-by: Zixuan James Li <359101898@qq.com>
2022-05-31 09:43:55 -07:00
Anders Kaseorg 80def8d2c2 models: Manage indexes from migration 0001 with Django.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-23 11:59:45 -08:00
PIG208 254f706465 typing: Fix argument type for models in function signatures. 2021-08-20 05:54:19 -07:00
Anders Kaseorg 09564e95ac mypy: Add types-psycopg2.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-09 20:32:19 -07:00
Mateusz Mandera 8f588dcbab models: Pass realm to get_user_including_cross_realm calls. 2021-07-26 15:33:13 -07:00
Vishnu KS acffc0ae0a populate_db: Use do_create_realm for creating zephyr realm. 2021-07-06 17:22:00 -07:00
Anders Kaseorg 6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Mateusz Mandera a9242d6dfc retention: Eliminate redundant recipient JOIN from cross-realm query.
Since recipient_id (id of the PERSONAL Recipient of the user) was
denormalized into the UserProfile model, this query can be simplified by
getting rid of the zerver_recipient JOIN.
2021-01-18 21:40:37 -08:00
Mateusz Mandera e3be6db73a retention: Eliminate redundant userprofile JOIN from cross-realm query. 2021-01-18 21:40:37 -08:00
Anders Kaseorg 3885fdadce realm: Fix strict_optional errors.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-06 11:25:48 -07:00
Mateusz Mandera 890cafac11 retention: Use batch size of 100 for stream messages.
Streams can have lots of subscribers, meaning that the archiving process
will be moving tons of UserMessages per message. For that reason, using
a smaller batch size for stream messages is justified.

Some personal messages need to be added in test_scrub_realm to have
coverage of do_delete_messages_by_sender after these changes.
2020-06-24 10:41:00 -07:00
Mateusz Mandera 0c6497d43a retention: Add restore_retention_policy_deletions_for_stream function. 2020-06-24 10:40:38 -07:00
Mateusz Mandera 468f8cf488 retention: Improve logging of transactions. 2020-06-24 10:40:38 -07:00
Pragati Agrawal 1562ec758e org settings: Use 'forever' value instead of -1 for message_retention_days.
Currently, we use -1 as the Realm.message_retention_days value to retain
message forever unless specified at stream level for a particular stream,
that is, no policy set at the realm level. But this is incoherent with what
we use for Stream.message_retention_days where -1 means

> disable retention policy for this stream unconditionally

that can be confusing from an API standpoint.

So instead of trying some hack to reset the value to NULL or using some
other value like -2 for RETAIN_MESSAGE_FOREVER and use that for API. It is
much more intuitive to use a string like 'forever' that can be mapped to
RETAIN_MESSAGE_FOREVER at the backend. And this is similar to what we use
for streams settings as well.
2020-06-24 10:38:58 -07:00
Mateusz Mandera 7a03e2a7fe retention: Replace Realm.message_retention_days None value with -1.
To be more consistent with the meaning in the Stream model, and to make
it easier to have a reasonable settings API, we get rid of the None
value for Realm.message_retention_days in favor of the value -1 to
represent the "don't delete messages" default policy.
2020-06-24 10:33:21 -07:00
Tim Abbott d503549f0b retention: Add some more system documentation. 2020-06-20 17:35:07 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Mateusz Mandera b234fe8ccb retention: Pass optional realm argument to move_messages_to_archive.
This allows having the realm field of ArchiveTransaction set instead of
NULL when using move_messages_to_archive.
2020-05-16 14:46:56 -07:00
Mateusz Mandera 7d8a3581a5 retention: Clarify the status of cross-realm huddles in a comment. 2020-05-16 14:42:40 -07:00
Tim Abbott f10f2600e0 retention: Fix OOM issues when deleting large numbers of transactions.
For unknown reasons, deleting 10,000s of ArchiveTransaction objects
results in rapidly growing memory in the job making the request in the
Django process, eventually leading to an OOM kill.

I don't understand why Django behaves that way; I would have expected
the failure mode to instead be a serious load problem on the database
server, but perhaps the way Django's internal deletion logic handles
cascading the deletes to many millions of ArchiveMessages and other
ForeignKey objects requires tracking a lot of data in memory.

The solution is the same in any case, which is to batch the deletions
to execute a reasonable number of them at once.  Doing a single
ArchiveTransaction at a time would likely result in huge numbers of
database queries in a loop, which performs very poorly.  So we balance
by batching deletions in groups of 100 ArchiveTransactions; testing
this in production, I saw no spike of memory usage materially beyond
that of a normal Django process, and each bulk-deletion transaction
takes several seconds to process (meaning per-transaction overhead is
negligible).
2020-05-15 17:10:19 -07:00
Mateusz Mandera 812ac4714f retention: Optimize fetching of realms and streams with retention policy. 2020-05-07 16:28:05 -07:00
Anders Kaseorg fd65511fe9 retention: Improve move_rows escaping correctness with psycopg2.sql.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-04 09:35:30 -07:00
Tim Abbott 341787a5e0 retention: Use logging API in a more standard way. 2020-05-03 10:57:23 -07:00
Mateusz Mandera 0d7cbc71dd retention: Make logging less unnecessarily verbose.
For realms with no retention policy on themselves or any of their
streams, no archiving happens, but 3 lines of logs would be generated.
That's redundant and we make changes in this commit to avoid logging
those lines if nothing of interest is happening.
2020-05-03 19:24:00 +02:00
Anders Kaseorg bdc365d0fe logging: Pass format arguments to logging.
https://docs.python.org/3/howto/logging.html#optimization

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-02 10:18:02 -07:00
Anders Kaseorg fead14951c python: Convert assignment type annotations to Python 3.6 style.
This commit was split by tabbott; this piece covers the vast majority
of files in Zulip, but excludes scripts/, tools/, and puppet/ to help
ensure we at least show the right error messages for Xenial systems.

We can likely further refine the remaining pieces with some testing.

Generated by com2ann, with whitespace fixes and various manual fixes
for runtime issues:

-    invoiced_through: Optional[LicenseLedger] = models.ForeignKey(
+    invoiced_through: Optional["LicenseLedger"] = models.ForeignKey(

-_apns_client: Optional[APNsClient] = None
+_apns_client: Optional["APNsClient"] = None

-    notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
-    signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)

-    author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)
+    author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)

-    bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)
+    bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)

-    default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
-    default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)

-descriptors_by_handler_id: Dict[int, ClientDescriptor] = {}
+descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {}

-worker_classes: Dict[str, Type[QueueProcessingWorker]] = {}
-queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {}
+worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {}
+queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {}

-AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None
+AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-22 11:02:32 -07:00
Mateusz Mandera cbdfef28a8 retention: Update to account for the zulipinternal realm.
In https://github.com/zulip/zulip/pull/12823 some changes to the realms
structure have been made, so now both in production and development
cross-realm bots live in the realm with string_id "zulipinternal".
There was a TODO in retention code to eliminate a conditional in a query
that became redundant with this change, and also the zulipinternal realm
should be omitted from the archiving process in archive_messages().
2020-02-14 17:15:26 -08:00
Mateusz Mandera 9a42a83e15 streams: Remove get_stream_recipients function and its uses.
With the recipient field being denormalized into the UserProfile and
Streams models, all current uses of get_stream_recipients can be done
more efficiently, by simply checking the .recipient_id attribute on the
appropriate objects.
2019-12-12 12:05:42 -08:00
Mateusz Mandera dbe508bb91 models: Migration of Message.pub_date to date_sent, part 2.
Fixes #1727.

With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.
2019-10-05 19:01:34 -07:00