To ensure the database retains a consistent state if archiving gets
interrupted, we process each Messages chunk together with related
objects in a single atomic transaction.
We batch queries that archive Messages, to limit the maximum amount of
Message objects archived in a single query. This leads to the archiving
of other related objects being batched as well, because we loop over
chunks of archived messages and archive their related objects per-chunk.
We add the following behavior:
If stream has message_retention_days set to -1, archiving for it is
disabled.
If stream has message_retention_days set to null, use the realm's
policy. If the realm has no policy, we don't archive for this stream.
We change the archiving scheme to allow having stream based retention
policies. In the first step of the archiving process, we loop over
streams and archive their expired messages and related objects.
Then we separately archive all expired personal and huddle messages and
related objects. As the last step, we scan for redundant attachments
which can now be deleted.
To achieve this, we have to rewrite a significant portion of the
retention code and rework some of the database queries.
For the sake of simplicity, we neither archive nor delete cross-realm
messages, except cross-realm stream messages – in their case they can
be processed in the same manner as ordinary stream messages.
In the query for archiving personal and huddle messages we simply
exclude those sent by cross-realm bots.
We change the tests to adapt to these modifications.
We add RETURNING to fetch relevant message and usermessage ids in
archiving queries and use them to make other queries faster and slower.
A side-effect of this implementation is that with cross-realm messages,
the UserMessage of the recipient and the Message will not be deleted -
but cross-realm messages are rare, will still get correctly put in the
archive tables and so failing to delete should not be a problem for now.
They will be fully handled later.
We add general code that will archive models that are tied to a specific
Message (such as Reactions and SubMessages). Certain details of the
model are grabbed from a list models_with_message_key, and then used to
create queries that will archive these database tables.
We put Reaction in that list in this commit, and add appropriate tests.
To have archiving of other analogical models (for example SubMessage),
one only needs to make an appropriate entry in the
models_with_message_key list.
test_retention.py had various issues - we opt for keeping its essence
(what should the tests do and verify), but rewriting a lot of it in
order to have more clarity in what's happening there.
We split archive_messages code into two functions: moving to archive and
cleanup. This allows cleaning up the tests - they can call
these functions directly instead of copying several lines of
archive_messages here and there in multiple tests.
test_cross_realm_messages_archiving_two_realm_expired doesn't run the
code path patched in commit 3d1aa98b2ea344fba7fbb2373a37d4cf30f53e08i,
so it can still fail. We apply the analogical change in the test as
in the cited commit.
This is probably a good idea for the production use case, since then
there's some consistency of behavior, and if we extend logging, one
knows exactly which realms were or were not executed before a logged
failure.
This fixes the nondeterministic test failures we've been seeing in CI:
if you use `-id` in that order_by, it happens consistently.
We've been seeing nondeterministic failures in this test suite in CI
that we can't reproduce locally; these print statements should help
track them down.
This is a very old commit for #106, which has been on hiatus for a few
years. It was significantly modified by tabbott to:
* Improve coding style and variable names
* Update mypy annotations style
* Clean up the testing logic
* Update for API changes elsewhere in our system
But the actual runtime code is essentially unmodified from the
original work by Kirill.
It contains basic support for archiving Messages, UserMessages, and
Attachments with a nice test suite. It's still not usable in
production (e.g. it will probably break Reactions, SubMessages, etc.),
but upcoming commits will address that.
The test named `test_archiving_messages_with_attachment`
started flaking recently. We use sets for comparison
instead of lists to avoid arbitrary sorting differences.
We now update all test messages to have a pub_date
of "now" in the setUp() function in TestRetentionLib.
We've seen tests flake on query counts before this
patch. It's not certain that the test flaked due
to time-related glitches, but it seems the most
plausible explanation.
This makes it possible for Zulip administrators to delete messages.
This is primarily intended for use in deleting early test messages,
but it can solve other problems as well.
Later we'll want to play with the permissions model for this, but for
now, the goal is just to integrate the feature.
Note that it saves the deleted messages for some time using the same
approach as Zulip's message retention policy feature.
Fixes#135.
This is a first step towards implementing a message retention policy
feature.
- Add Realm model message_retention_days field to setup
messages expired period for realm.
- Add migration.
- Add tool to get expired messages for each Realm.
- Add tests to cover tool for getting expired messages.