zulip

Commit Graph

Author	SHA1	Message	Date
Mateusz Mandera	224ea3aaed	retention: Add .restored_timestamp to ArchiveTransaction. This is just generally useful for tracking and debugging.	2024-05-09 10:54:44 -07:00
John Lu	a5cf0ec526	refactor: Replace HUDDLE with DIRECT_MESSAGE_GROUP. Replaced HUDDLE attribute with DIRECT_MESSAGE_GROUP using VS Code search, part of a general renaming of the object class. Fixes part of #28640. Co-authored-by: JohnLu2004 <JohnLu10212004@gmail.com>	2024-03-21 16:39:33 -07:00
Alex Vandiver	b94402152d	models: Always search Messages with a realm_id or id limit. Unless there is a limit on `id`, always provide a `realm_id` limit as well. We also notate which index is expected to be used in each query.	2023-09-11 15:00:37 -07:00
Alex Vandiver	0f918d9071	retention: Do not archive attachments with scheduled messages.	2023-08-06 13:40:02 -07:00
Mateusz Mandera	414658fc8e	scheduled_message: Handle attachments properly. Fixes #25414. We add Attachment.scheduled_messages relation to track ScheduledMessages which reference the attachment. The import bits can be done after merging this, by updating #25345.	2023-05-08 09:56:02 -07:00
Prakhar Pratyush	e45623fccc	python: Update tuple handling pattern; returned by a delete() query. This commit updates the pattern for dealing with tuples returned by the delete() query. The '(num_deleted, ignored) = ModelName.objects.filter().delete()' pattern is preferred due to better readability. We avoid the pattern '(num_deleted, _)' because Django uses _ for translation, which may lead to future bugs.	2023-03-27 16:18:23 -07:00
Anders Kaseorg	e1ed44907b	ruff: Fix SIM118 Use `key in dict` instead of `key in dict.keys()`. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-01-04 16:25:07 -08:00
Mateusz Mandera	760587450e	retention: Remove two redundant comments. These two identical comments don't contribute anything useful and seem just out of place at this point.	2022-10-31 10:23:57 -07:00
Mateusz Mandera	7b13204e8f	retention: Use Message.realm to simplify private message query. We no longer need to do the inner joins to figure out the message's realm and split up the cross-realm and regular case - now we just look at zerver_message.realm directly.	2022-10-31 10:23:57 -07:00
Anders Kaseorg	b4b8691239	retention: Inline move_rows query arguments. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-07-30 06:46:34 -07:00
Zixuan James Li	ab1bbdda65	typing: Broaden type annotations for QuerySet compatibility. To explain the rationale of this change, for example, there is `get_user_activity_summary` which accepts either a `Collection[UserActivity]`, where `QuerySet[T]` is not strictly `Sequence[T]` because its slicing behavior is different from the `Protocol`, making `Collection` necessary. Similarily, we should have `Iterable[T]` instead of `List[T]` so that `QuerySet[T]` will also be an acceptable subtype, or `Sequence[T]` when we also expect it to be indexed. Signed-off-by: Zixuan James Li <p359101898@gmail.com>	2022-07-07 11:27:42 -07:00
Zixuan James Li	f0e505d557	retention: Guarantee type-safeness when calling move_rows. This ensures that all the keyword arguments in `move_rows` have the correct types. Note that `returning_id` is supposed to be a flag instead of a `Composable` `Literal`. Signed-off-by: Zixuan James Li <p359101898@gmail.com>	2022-06-28 16:01:35 -07:00
Mateusz Mandera	acfa55138e	retention: Add docstring info on how archive cleaning works. In particular, it's important to record the special treatment around ArchivedAttachment rows not being deleted in this step.	2022-06-08 15:12:36 -07:00
Zixuan James Li	e338ada66c	typing: Add none-checks for Recipient objects. Signed-off-by: Zixuan James Li <359101898@qq.com>	2022-05-31 09:43:55 -07:00
Anders Kaseorg	80def8d2c2	models: Manage indexes from migration 0001 with Django. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-23 11:59:45 -08:00
PIG208	254f706465	typing: Fix argument type for models in function signatures.	2021-08-20 05:54:19 -07:00
Anders Kaseorg	09564e95ac	mypy: Add types-psycopg2. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-09 20:32:19 -07:00
Mateusz Mandera	8f588dcbab	models: Pass realm to get_user_including_cross_realm calls.	2021-07-26 15:33:13 -07:00
Vishnu KS	acffc0ae0a	populate_db: Use do_create_realm for creating zephyr realm.	2021-07-06 17:22:00 -07:00
Anders Kaseorg	6e4c3e41dc	python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	11741543da	python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Mateusz Mandera	a9242d6dfc	retention: Eliminate redundant recipient JOIN from cross-realm query. Since recipient_id (id of the PERSONAL Recipient of the user) was denormalized into the UserProfile model, this query can be simplified by getting rid of the zerver_recipient JOIN.	2021-01-18 21:40:37 -08:00
Mateusz Mandera	e3be6db73a	retention: Eliminate redundant userprofile JOIN from cross-realm query.	2021-01-18 21:40:37 -08:00
Anders Kaseorg	3885fdadce	realm: Fix strict_optional errors. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-07-06 11:25:48 -07:00
Mateusz Mandera	890cafac11	retention: Use batch size of 100 for stream messages. Streams can have lots of subscribers, meaning that the archiving process will be moving tons of UserMessages per message. For that reason, using a smaller batch size for stream messages is justified. Some personal messages need to be added in test_scrub_realm to have coverage of do_delete_messages_by_sender after these changes.	2020-06-24 10:41:00 -07:00
Mateusz Mandera	0c6497d43a	retention: Add restore_retention_policy_deletions_for_stream function.	2020-06-24 10:40:38 -07:00
Mateusz Mandera	468f8cf488	retention: Improve logging of transactions.	2020-06-24 10:40:38 -07:00
Pragati Agrawal	1562ec758e	org settings: Use 'forever' value instead of -1 for message_retention_days. Currently, we use -1 as the Realm.message_retention_days value to retain message forever unless specified at stream level for a particular stream, that is, no policy set at the realm level. But this is incoherent with what we use for Stream.message_retention_days where -1 means > disable retention policy for this stream unconditionally that can be confusing from an API standpoint. So instead of trying some hack to reset the value to NULL or using some other value like -2 for RETAIN_MESSAGE_FOREVER and use that for API. It is much more intuitive to use a string like 'forever' that can be mapped to RETAIN_MESSAGE_FOREVER at the backend. And this is similar to what we use for streams settings as well.	2020-06-24 10:38:58 -07:00
Mateusz Mandera	7a03e2a7fe	retention: Replace Realm.message_retention_days None value with -1. To be more consistent with the meaning in the Stream model, and to make it easier to have a reasonable settings API, we get rid of the None value for Realm.message_retention_days in favor of the value -1 to represent the "don't delete messages" default policy.	2020-06-24 10:33:21 -07:00
Tim Abbott	d503549f0b	retention: Add some more system documentation.	2020-06-20 17:35:07 -07:00
Anders Kaseorg	365fe0b3d5	python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-11 16:45:32 -07:00
Anders Kaseorg	69730a78cc	python: Use trailing commas consistently. Automatically generated by the following script, based on the output of lint with flake8-comma: import re import sys last_filename = None last_row = None lines = [] for msg in sys.stdin: m = re.match( r"\x1b\[35mflake8 \\|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg ) if m: filename, row_str, col_str, err = m.groups() row, col = int(row_str), int(col_str) if filename == last_filename: assert last_row != row else: if last_filename is not None: with open(last_filename, "w") as f: f.writelines(lines) with open(filename) as f: lines = f.readlines() last_filename = filename last_row = row line = lines[row - 1] if err in ["C812", "C815"]: lines[row - 1] = line[: col - 1] + "," + line[col - 1 :] elif err in ["C819"]: assert line[col - 2] == "," lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ") if last_filename is not None: with open(last_filename, "w") as f: f.writelines(lines) Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-06-11 16:04:12 -07:00
Mateusz Mandera	b234fe8ccb	retention: Pass optional realm argument to move_messages_to_archive. This allows having the realm field of ArchiveTransaction set instead of NULL when using move_messages_to_archive.	2020-05-16 14:46:56 -07:00
Mateusz Mandera	7d8a3581a5	retention: Clarify the status of cross-realm huddles in a comment.	2020-05-16 14:42:40 -07:00
Tim Abbott	f10f2600e0	retention: Fix OOM issues when deleting large numbers of transactions. For unknown reasons, deleting 10,000s of ArchiveTransaction objects results in rapidly growing memory in the job making the request in the Django process, eventually leading to an OOM kill. I don't understand why Django behaves that way; I would have expected the failure mode to instead be a serious load problem on the database server, but perhaps the way Django's internal deletion logic handles cascading the deletes to many millions of ArchiveMessages and other ForeignKey objects requires tracking a lot of data in memory. The solution is the same in any case, which is to batch the deletions to execute a reasonable number of them at once. Doing a single ArchiveTransaction at a time would likely result in huge numbers of database queries in a loop, which performs very poorly. So we balance by batching deletions in groups of 100 ArchiveTransactions; testing this in production, I saw no spike of memory usage materially beyond that of a normal Django process, and each bulk-deletion transaction takes several seconds to process (meaning per-transaction overhead is negligible).	2020-05-15 17:10:19 -07:00
Mateusz Mandera	812ac4714f	retention: Optimize fetching of realms and streams with retention policy.	2020-05-07 16:28:05 -07:00
Anders Kaseorg	fd65511fe9	retention: Improve move_rows escaping correctness with psycopg2.sql. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-05-04 09:35:30 -07:00
Tim Abbott	341787a5e0	retention: Use logging API in a more standard way.	2020-05-03 10:57:23 -07:00
Mateusz Mandera	0d7cbc71dd	retention: Make logging less unnecessarily verbose. For realms with no retention policy on themselves or any of their streams, no archiving happens, but 3 lines of logs would be generated. That's redundant and we make changes in this commit to avoid logging those lines if nothing of interest is happening.	2020-05-03 19:24:00 +02:00
Anders Kaseorg	bdc365d0fe	logging: Pass format arguments to logging. https://docs.python.org/3/howto/logging.html#optimization Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-05-02 10:18:02 -07:00
Anders Kaseorg	fead14951c	python: Convert assignment type annotations to Python 3.6 style. This commit was split by tabbott; this piece covers the vast majority of files in Zulip, but excludes scripts/, tools/, and puppet/ to help ensure we at least show the right error messages for Xenial systems. We can likely further refine the remaining pieces with some testing. Generated by com2ann, with whitespace fixes and various manual fixes for runtime issues: - invoiced_through: Optional[LicenseLedger] = models.ForeignKey( + invoiced_through: Optional["LicenseLedger"] = models.ForeignKey( -_apns_client: Optional[APNsClient] = None +_apns_client: Optional["APNsClient"] = None - notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) - signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) + notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) + signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) - author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE) + author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE) - bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL) + bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL) - default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) - default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) + default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) + default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) -descriptors_by_handler_id: Dict[int, ClientDescriptor] = {} +descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {} -worker_classes: Dict[str, Type[QueueProcessingWorker]] = {} -queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {} +worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {} +queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {} -AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None +AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-22 11:02:32 -07:00
Mateusz Mandera	cbdfef28a8	retention: Update to account for the zulipinternal realm. In https://github.com/zulip/zulip/pull/12823 some changes to the realms structure have been made, so now both in production and development cross-realm bots live in the realm with string_id "zulipinternal". There was a TODO in retention code to eliminate a conditional in a query that became redundant with this change, and also the zulipinternal realm should be omitted from the archiving process in archive_messages().	2020-02-14 17:15:26 -08:00
Mateusz Mandera	9a42a83e15	streams: Remove get_stream_recipients function and its uses. With the recipient field being denormalized into the UserProfile and Streams models, all current uses of get_stream_recipients can be done more efficiently, by simply checking the .recipient_id attribute on the appropriate objects.	2019-12-12 12:05:42 -08:00
Mateusz Mandera	dbe508bb91	models: Migration of Message.pub_date to date_sent, part 2. Fixes #1727. With the server down, apply migrations 0245 and 0246. 0246 will remove the pub_date column, so it's essential that the previous migrations ran correctly to copy data before running this.	2019-10-05 19:01:34 -07:00
Anders Kaseorg	becef760bf	cleanup: Delete leading newlines. Previous cleanups (mostly the removals of Python __future__ imports) were done in a way that introduced leading newlines. Delete leading newlines from all files, except static/assets/zulip-emoji/NOTICE, which is a verbatim copy of the Apache 2.0 license. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-08-06 23:29:11 -07:00
Mateusz Mandera	d1c2185c81	retention: Archive cross-realm personal messages. We can simply archive cross-realm personal messages according to the retention policy of the recipient's realm. It requires adding another message-archiving query for this case however. What remains is to figure out how to treat cross-realm huddle messages.	2019-07-08 20:03:20 -07:00
Mateusz Mandera	24ec1c7aa1	Revert "retention: Delete objects tied to a Message in one query with archiving." This reverts commit `8f15884c7d`. Using the WITH ( ) ... DELETE method leads to a small performance drop, while probably not offering many positives, so it seems appropriate to go to the simpler case of just letting things get cleaned up by CASCADE.	2019-07-08 16:35:53 -07:00
Mateusz Mandera	b55f24e07c	retention: Avoid "SELECT realm" being run by Django ORM for each stream. The way the code changed in this commit was written caused Django to fetch stream.realm from the database for every stream, leading to redundant, identical queries. Each stream's realm is already known, so we use that information.	2019-07-08 16:32:38 -07:00
Mateusz Mandera	89ba6d7941	retention: Improve the placement of logging in run_archiving_in_chunks.	2019-07-02 17:33:53 -07:00
Mateusz Mandera	17c5398703	retention: Filter also by transaction type when restoring by realm.	2019-07-02 17:31:07 -07:00

1 2

96 Commits