zulip

Commit Graph

Author	SHA1	Message	Date
Mateusz Mandera	3ac11a3fc5	retention: Use ON CONFLICT DO UPDATE to handle re-archiving properly. When archiving Messages, we stop relying on LEFT JOIN ... IS NULL to avoid duplicates when INSERTing. Instead we use ON CONFLICT DO UPDATE (added in postgresql 9.5) to, in case of archiving a Message that already has a corresponding archived objects (this happens if a Message gets archived, restored and then archived again), re-assign the existing ArchivedMessage to the new transaction. This also allows us to fix test_archiving_messages_second_time, which was temporarily disable a few commits before.	2019-06-26 12:05:59 -07:00
Mateusz Mandera	7b2b4435ed	retention: Combine run_message_batch_query and run_archiving_in_chunks. We combine run_message_batch_query and run_archiving_in_chunks functions, which makes the code simpler and more readable - we get rid of hacky generator usage, for example. In the process, move_expired_messages_* functions are adjusted, and now they archive Messages as well as their related objects. Appropriate adjustments in reaction to this are made in the main archiving functions which call move_expired_messages_* (they no longer need to call move_related_objects_to_archive).	2019-06-26 12:05:59 -07:00
Mateusz Mandera	6e46c6d752	retention: Add functions for restoring archived data. Functions for restoring archived data are added and existing tests are expanded to restore data they archived and check correctness.	2019-06-26 12:05:59 -07:00
Mateusz Mandera	9acd3b0f46	retention: Rewrite move_messages_to_archive to use existing functions. Instead of having a bunch of custom code in the function, we make it use run_message_batch_query and run_archiving_in_chunks to do the necessary operations in a consistent way, using the same codepaths as the rest of the archiving system. This breaks test_archiving_messages_second_time temporarily, but we will fix it and re-enable the test in the next commits, where we'll address various other issues with re-archiving of messages. We also remove the @transaction.atomic wrapper, because atomicity is handled by the logic inside run_archiving_in_chunks.	2019-06-26 12:05:59 -07:00
Mateusz Mandera	80b834dd1b	retention: Update move_rows() function code. We make minor changes to the move_rows() function to allow its use in the code for restoring from the archive.	2019-06-26 12:05:59 -07:00
Mateusz Mandera	e3fe66a084	retention: Set savepoint=False on atomic wrapper on move_rows(). Savepoints create unnecessary overhead, and there's no benefit from them, with the way we use this function.	2019-06-26 12:05:59 -07:00
Mateusz Mandera	5d8d5910a8	retention: Log archive_transaction id and information.	2019-06-26 12:05:59 -07:00
Mateusz Mandera	a2cce62c1c	retention: Use new ArchiveTransaction model. We add a new model, ArchiveTransaction, to tie archived objects together in a coherent way, according to the batches in which they are archived. This enables making a better system for restoring from archive, and it seems just more sensible to tie the archived objects in this way, rather the somewhat vague setting of archive_timestamp to each object using timezone_now().	2019-06-26 12:05:59 -07:00
Mateusz Mandera	8f15884c7d	retention: Delete objects tied to a Message in one query with archiving. Rather than relying on the CASCADING property of the ForeignKey to the Message table to clean up these objects, we delete them in the same query as we archive them - since it's guaranteed that any of these objects that we archive will be deleted due to their Message being deleted later. We don't have this guarantee for Attachment objects, which is why we can't apply this scheme to them.	2019-06-13 11:18:11 -07:00
Mateusz Mandera	25810752fe	retention: Fully process each Message chunk in a transaction. To ensure the database retains a consistent state if archiving gets interrupted, we process each Messages chunk together with related objects in a single atomic transaction.	2019-06-13 11:17:54 -07:00
Mateusz Mandera	55eb46433b	retention: Use yield when batching instead of returning a list of lists. This generator architecture will be cleaner for supporting the transactionality model we want.	2019-06-13 11:11:34 -07:00
Mateusz Mandera	37a22844b9	retention: Clean up code of move_messages_to_archive().	2019-06-13 11:02:11 -07:00
Mateusz Mandera	a68c460a14	retention: Clean up code for archiving attachment_messages. We had two duplicate functions for archiving zerver_attachment_messages rows, doing the same thing - archiving by message_id. One of them had a redundant INNER JOIN, so we get rid of that too.	2019-06-13 11:02:11 -07:00
Mateusz Mandera	cbee5beeac	retention: Log progress through the archiving process.	2019-06-13 11:02:11 -07:00
Mateusz Mandera	e3c7a5d896	retention: Loop over realms in archive_messages. Since we loop over realms in the functions for archiving stream messages and then personal+huddle messages, and also want to split cleaning up attachments by realm - it makes sense to do it all in one single loop.	2019-06-13 11:02:11 -07:00
Mateusz Mandera	5b8140cf75	retention: Group stream message archiving by realm. We group the process of archiving stream message by realm, to allow logging and keeping track of time taken per realm.	2019-06-11 09:25:25 -07:00
Mateusz Mandera	f06a4b4eab	retention: Batch Message archiving queries. We batch queries that archive Messages, to limit the maximum amount of Message objects archived in a single query. This leads to the archiving of other related objects being batched as well, because we loop over chunks of archived messages and archive their related objects per-chunk.	2019-06-11 09:25:25 -07:00
Tim Abbott	065575debf	retention: Add a quick comment explaining how deletion works.	2019-06-06 11:41:07 -07:00
Mateusz Mandera	323be57151	retention: If stream has no retention policy set, use realm policy. We add the following behavior: If stream has message_retention_days set to -1, archiving for it is disabled. If stream has message_retention_days set to null, use the realm's policy. If the realm has no policy, we don't archive for this stream.	2019-06-06 11:17:42 -07:00
Mateusz Mandera	8bef82c7f9	retention: Clean up redundant code for special handling of UserMessages. UserMessages no longer need special handling, they can be archived by move_models_with_message_key_to_archive and automatically cleaned up like the other models with a message key with CASCADING=True.	2019-06-06 11:17:42 -07:00
Mateusz Mandera	0e9fa4f028	retention: Support stream-based retention policies. We change the archiving scheme to allow having stream based retention policies. In the first step of the archiving process, we loop over streams and archive their expired messages and related objects. Then we separately archive all expired personal and huddle messages and related objects. As the last step, we scan for redundant attachments which can now be deleted. To achieve this, we have to rewrite a significant portion of the retention code and rework some of the database queries. For the sake of simplicity, we neither archive nor delete cross-realm messages, except cross-realm stream messages – in their case they can be processed in the same manner as ordinary stream messages. In the query for archiving personal and huddle messages we simply exclude those sent by cross-realm bots. We change the tests to adapt to these modifications.	2019-06-06 11:17:42 -07:00
Mateusz Mandera	aa45325b5f	retention: Rename move_expired_rows to move_rows.	2019-06-06 11:17:42 -07:00
Mateusz Mandera	d373a16910	retention: Remove realm_id check when archiving attachments. Since we archive attachments and attachment_messages tied to a list of ids of Messages that we just archived (so from the current realm), it's unnecessary to check their realm in the queries. This could potentially cause archiving of an attachment with realm_id of another realm, but this isn't an issue, as long as we make sure we don't end up deleting the original Attachment object incorrectly - but realm_id check is included in delete_expired_attachments() to ensure that.	2019-06-06 11:17:42 -07:00
Mateusz Mandera	6c3ba25474	retention: Use RETURNING to speed up database queries. We add RETURNING to fetch relevant message and usermessage ids in archiving queries and use them to make other queries faster and slower. A side-effect of this implementation is that with cross-realm messages, the UserMessage of the recipient and the Message will not be deleted - but cross-realm messages are rare, will still get correctly put in the archive tables and so failing to delete should not be a problem for now. They will be fully handled later.	2019-06-02 14:55:14 -07:00
Mateusz Mandera	426e3bbbd9	retention: Remove redundant LEFT JOIN in archiving UserMessages. zerver_archivedmessage is already INNER JOIN-ed earlier in the query, so we check the pub_date in it, instead of joining zerver_message, which would just redundantly join the analogical rows.	2019-06-02 14:55:14 -07:00
Mateusz Mandera	4facc93670	retention: Add archiving of SubMessages.	2019-05-30 11:40:20 -07:00
Mateusz Mandera	37c42a09e5	retention: Archiving of models tied to a Message, applied to Reactions. We add general code that will archive models that are tied to a specific Message (such as Reactions and SubMessages). Certain details of the model are grabbed from a list models_with_message_key, and then used to create queries that will archive these database tables. We put Reaction in that list in this commit, and add appropriate tests. To have archiving of other analogical models (for example SubMessage), one only needs to make an appropriate entry in the models_with_message_key list.	2019-05-30 11:40:20 -07:00
Mateusz Mandera	2bc6d52c72	retention: Fix name of move_attachment_message_to_archive_by_message. The first instance of the word "message" should be in plural. We rename to move_attachment_message_to_archive_by_message.	2019-05-29 16:26:11 -07:00
Mateusz Mandera	2ca650be4d	retention: Clean up move_messages_to_archive() for more clarity.	2019-05-29 16:26:11 -07:00
Mateusz Mandera	c5ac66b9c8	retention: Split archive_messages code into two functions. We split archive_messages code into two functions: moving to archive and cleanup. This allows cleaning up the tests - they can call these functions directly instead of copying several lines of archive_messages here and there in multiple tests.	2019-05-27 12:53:32 -07:00
Tim Abbott	3d1aa98b2e	retention: Use a consistent ordering for processing realms. This is probably a good idea for the production use case, since then there's some consistency of behavior, and if we extend logging, one knows exactly which realms were or were not executed before a logged failure. This fixes the nondeterministic test failures we've been seeing in CI: if you use `-id` in that order_by, it happens consistently.	2019-05-22 10:48:53 -07:00
K.Kanakhin	e930851d16	retention-period: Add more core code for retention policy. This is a very old commit for #106, which has been on hiatus for a few years. It was significantly modified by tabbott to: * Improve coding style and variable names * Update mypy annotations style * Clean up the testing logic * Update for API changes elsewhere in our system But the actual runtime code is essentially unmodified from the original work by Kirill. It contains basic support for archiving Messages, UserMessages, and Attachments with a nice test suite. It's still not usable in production (e.g. it will probably break Reactions, SubMessages, etc.), but upcoming commits will address that.	2019-05-19 20:22:47 -07:00
Anders Kaseorg	f0ecb93515	zerver core: Remove unused imports. Signed-off-by: Anders Kaseorg <andersk@mit.edu>	2019-02-02 17:41:24 -08:00
Vishnu Ks	962d72b58b	retention: move_messages_to_archive should accept multiple message ids. This will speed up the scrub realm management command. Calling the function with a single message_id in a loop was extremely inefficient.	2018-10-11 15:31:12 -07:00
rht	ee546a33a3	zerver/lib: Use python 3 syntax for typing. Edited by tabbott to improve various line-wrapping decisions.	2017-11-28 17:15:14 -08:00
rht	2e12fe5e2e	zerver/lib: Remove print_function.	2017-09-27 18:05:45 -07:00
rht	f43e54d352	zerver/lib: Remove absolute_import.	2017-09-27 10:00:39 -07:00
K.Kanakhin	2434f2d96c	messages: Add support for admins deleting messages. This makes it possible for Zulip administrators to delete messages. This is primarily intended for use in deleting early test messages, but it can solve other problems as well. Later we'll want to play with the permissions model for this, but for now, the goal is just to integrate the feature. Note that it saves the deleted messages for some time using the same approach as Zulip's message retention policy feature. Fixes #135.	2017-05-29 21:59:38 -07:00
hackerkid	b2504084ab	Replace timezone.now with timezone_now.	2017-04-16 12:28:56 -07:00
Rishi Gupta	af4718c50c	retention.py: Remove use of domain from get_expired_messages. get_expired_messages seems to only be used by tests anyway.	2017-03-14 17:17:42 -07:00
Raghav Jajodia	a3a03bd6a5	mypy: Added Dict, List and Set imports. Fixed mypy errors associated with the upgrade.	2017-03-04 14:33:44 -08:00
Bickio	e009383460	pep8: Fix E231.	2016-11-30 19:59:25 -08:00
K.Kanakhin	39e0886361	retention-policy: Add tool to determine expired messages. This is a first step towards implementing a message retention policy feature. - Add Realm model message_retention_days field to setup messages expired period for realm. - Add migration. - Add tool to get expired messages for each Realm. - Add tests to cover tool for getting expired messages.	2016-10-25 15:38:08 -07:00

1 2

93 Commits