Commit Graph

104 Commits

Author SHA1 Message Date
Mateusz Mandera 2d544250b7 events: Add block for compatibility with old delete_message events. 2020-03-03 15:52:42 -08:00
Mateusz Mandera 7db3d4560f do_delete_messages: Archive the messages in bulk.
The test added in this commit shows 37 queries - compared to 181 without
the change to the function. That seems very much worth it.
2020-02-27 23:12:32 -08:00
Tim Abbott 7ccc8373e2 bugdown: Fix logic for extracting attachment path_id.
In 3892a8afd8, we restructured the
system for managing uploaded files to a much cleaner model where we
just do parsing inside bugdown.

That new model had potentially buggy handling of cases around both
relative URLs and URLS starting with `realm.host`.

We address this by further rewriting the handling of attachments to
avoid regular expressions entirely, instead relying on urllib for
parsing, and having bugdown output `path_id` values, so that there's
no need for any conversions between formats outside bugdowm.

The check_attachment_reference_change function for processing message
updates is significantly simplified in the process.

The new check on the hostname has the side effect of requiring us to
fix some previously weird/buggy test data.

Co-Author-By: Anders Kaseorg <anders@zulipchat.com>
Co-Author-By: Rohitt Vashishtha <aero31aero@gmail.com>
2019-12-12 20:30:26 -08:00
Rohitt Vashishtha 3892a8afd8 messages: Set has_attachment correctly using Bugdown.
Previously, we would naively set has_attachment just by searching
the whole messages for strings like `/user_uploads/...`. We now
prevent running do_claim_attachments for messages that obviously
do not have an attachment in them that we previously ran.

For example: attachments in codeblocks or
             attachments that otherwise do not match our link syntax.

The new implementation runs that check on only the urls that
bugdown determines should be rendered. We also refactor some
Attachment tests in test_messages to test this change.

The new method is:

1. Create a list of potential_attachment_urls in Bugdown while rendering.
2. Loop over this list in do_claim_attachments for the actual claiming.
   For saving:
3. If we claimed an attachment, set message.has_attachment to True.
   For updating:
3. If claimed_attachment != message.has_attachment: update has_attachment.

We do not modify the logic for 'unclaiming' attachments when editing.
2019-12-11 11:03:44 -08:00
Mateusz Mandera bbf2474bd0 tests: setUp overrides should call super().setUp().
MigrationsTestCase is intentionally omitted from this, since migrations
tests are different in their nature and so whatever setUp()
ZulipTestCase may do in the future, MigrationsTestCase may not
necessarily want to replicate.
2019-10-19 17:27:01 -07:00
Mateusz Mandera dbe508bb91 models: Migration of Message.pub_date to date_sent, part 2.
Fixes #1727.

With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.
2019-10-05 19:01:34 -07:00
Mateusz Mandera 4646c7550c test_retention: Prepare for moving system bots to zulipinternal. 2019-07-20 15:08:08 -07:00
Mateusz Mandera d1c2185c81 retention: Archive cross-realm personal messages.
We can simply archive cross-realm personal messages according to the
retention policy of the recipient's realm. It requires adding another
message-archiving query for this case however.

What remains is to figure out how to treat cross-realm huddle messages.
2019-07-08 20:03:20 -07:00
Mateusz Mandera 7950aaea1e retention: Add code for deleting old archive data. 2019-06-26 12:24:47 -07:00
Mateusz Mandera 3ac11a3fc5 retention: Use ON CONFLICT DO UPDATE to handle re-archiving properly.
When archiving Messages, we stop relying on LEFT JOIN ... IS NULL to
avoid duplicates when INSERTing. Instead we use ON CONFLICT DO UPDATE
(added in postgresql 9.5) to, in case of archiving a Message that
already has a corresponding archived objects (this happens if a Message
gets archived, restored and then archived again), re-assign the existing
ArchivedMessage to the new transaction.

This also allows us to fix test_archiving_messages_second_time, which
was temporarily disable a few commits before.
2019-06-26 12:05:59 -07:00
Mateusz Mandera 6e46c6d752 retention: Add functions for restoring archived data.
Functions for restoring archived data are added and existing tests are
expanded to restore data they archived and check correctness.
2019-06-26 12:05:59 -07:00
Mateusz Mandera 9acd3b0f46 retention: Rewrite move_messages_to_archive to use existing functions.
Instead of having a bunch of custom code in the function, we make it use
run_message_batch_query and run_archiving_in_chunks to do the necessary
operations in a consistent way, using the same codepaths as the rest of
the archiving system.
This breaks test_archiving_messages_second_time temporarily, but we will
fix it and re-enable the test in the next commits, where we'll address
various other issues with re-archiving of messages.

We also remove the @transaction.atomic wrapper, because atomicity is
handled by the logic inside run_archiving_in_chunks.
2019-06-26 12:05:59 -07:00
Mateusz Mandera c869ea8e1e test_retention: Factor out a class with shared helper functions. 2019-06-26 12:05:59 -07:00
Mateusz Mandera 7fc48f8b93 test_retention: Check if messages get deleted when archiving.
We add additional checks in _verify_archive_data to make sure the
archived Messages and UserMessages are deleted from their normal tables.
2019-06-26 12:05:59 -07:00
Mateusz Mandera 25810752fe retention: Fully process each Message chunk in a transaction.
To ensure the database retains a consistent state if archiving gets
interrupted, we process each Messages chunk together with related
objects in a single atomic transaction.
2019-06-13 11:17:54 -07:00
Mateusz Mandera f06a4b4eab retention: Batch Message archiving queries.
We batch queries that archive Messages, to limit the maximum amount of
Message objects archived in a single query. This leads to the archiving
of other related objects being batched as well, because we loop over
chunks of archived messages and archive their related objects per-chunk.
2019-06-11 09:25:25 -07:00
Mateusz Mandera 323be57151 retention: If stream has no retention policy set, use realm policy.
We add the following behavior:
If stream has message_retention_days set to -1, archiving for it is
disabled.
If stream has message_retention_days set to null, use the realm's
policy. If the realm has no policy, we don't archive for this stream.
2019-06-06 11:17:42 -07:00
Mateusz Mandera 0e9fa4f028 retention: Support stream-based retention policies.
We change the archiving scheme to allow having stream based retention
policies. In the first step of the archiving process, we loop over
streams and archive their expired messages and related objects.
Then we separately archive all expired personal and huddle messages and
related objects. As the last step, we scan for redundant attachments
which can now be deleted.
To achieve this, we have to rewrite a significant portion of the
retention code and rework some of the database queries.
For the sake of simplicity, we neither archive nor delete cross-realm
messages, except cross-realm stream messages – in their case they can
be processed in the same manner as ordinary stream messages.
In the query for archiving personal and huddle messages we simply
exclude those sent by cross-realm bots.
We change the tests to adapt to these modifications.
2019-06-06 11:17:42 -07:00
Mateusz Mandera 6c3ba25474 retention: Use RETURNING to speed up database queries.
We add RETURNING to fetch relevant message and usermessage ids in
archiving queries and use them to make other queries faster and slower.
A side-effect of this implementation is that with cross-realm messages,
the UserMessage of the recipient and the Message will not be deleted -
but cross-realm messages are rare, will still get correctly put in the
archive tables and so failing to delete should not be a problem for now.
They will be fully handled later.
2019-06-02 14:55:14 -07:00
Mateusz Mandera 4facc93670 retention: Add archiving of SubMessages. 2019-05-30 11:40:20 -07:00
Mateusz Mandera 37c42a09e5 retention: Archiving of models tied to a Message, applied to Reactions.
We add general code that will archive models that are tied to a specific
Message (such as Reactions and SubMessages). Certain details of the
model are grabbed from a list models_with_message_key, and then used to
create queries that will archive these database tables.
We put Reaction in that list in this commit, and add appropriate tests.
To have archiving of other analogical models (for example SubMessage),
one only needs to make an appropriate entry in the
models_with_message_key list.
2019-05-30 11:40:20 -07:00
Mateusz Mandera dfee559333 test_retention: Check that Reactions get correctly deleted. 2019-05-30 11:33:41 -07:00
Mateusz Mandera 29729b7748 test_retention: Check that SubMessages get correctly deleted. 2019-05-30 11:27:38 -07:00
Mateusz Mandera 6d69405f54 test_retention: Keep helper functions in a base class. 2019-05-30 11:27:38 -07:00
Mateusz Mandera 2370e6717c test_retention: Factor out _make_expired_zulip_messages helper function. 2019-05-30 11:27:38 -07:00
Mateusz Mandera 0bf90be886 retention: Clean up and rewrite test_retention.py.
test_retention.py had various issues - we opt for keeping its essence
(what should the tests do and verify), but rewriting a lot of it in
order to have more clarity in what's happening there.
2019-05-27 12:53:32 -07:00
Mateusz Mandera c5ac66b9c8 retention: Split archive_messages code into two functions.
We split archive_messages code into two functions: moving to archive and
cleanup. This allows cleaning up the tests - they can call
these functions directly instead of copying several lines of
archive_messages here and there in multiple tests.
2019-05-27 12:53:32 -07:00
Mateusz Mandera db86043195 test_retention: Quick fix for the remaining test failure.
test_cross_realm_messages_archiving_two_realm_expired doesn't run the
code path patched in commit 3d1aa98b2ea344fba7fbb2373a37d4cf30f53e08i,
so it can still fail. We apply the analogical change in the test as
in the cited commit.
2019-05-22 14:15:18 -07:00
Tim Abbott 3d1aa98b2e retention: Use a consistent ordering for processing realms.
This is probably a good idea for the production use case, since then
there's some consistency of behavior, and if we extend logging, one
knows exactly which realms were or were not executed before a logged
failure.

This fixes the nondeterministic test failures we've been seeing in CI:
if you use `-id` in that order_by, it happens consistently.
2019-05-22 10:48:53 -07:00
Tim Abbott bde9b28589 test_retention: Update debugging code for CI failures.
This should provide more helpful output for the next stage of
debugging.
2019-05-21 14:10:15 -07:00
Tim Abbott 55b15ba117 test_retention: Improve and extent print-debugging.
We needed flush=True to have output not be lost.

Also print the original messages, so we can compare what's missing.
2019-05-21 09:28:03 -07:00
Tim Abbott 1353e94b29 test_retention: Add print-debugging.
We've been seeing nondeterministic failures in this test suite in CI
that we can't reproduce locally; these print statements should help
track them down.
2019-05-20 19:43:28 -07:00
K.Kanakhin e930851d16 retention-period: Add more core code for retention policy.
This is a very old commit for #106, which has been on hiatus for a few
years.  It was significantly modified by tabbott to:
* Improve coding style and variable names
* Update mypy annotations style
* Clean up the testing logic
* Update for API changes elsewhere in our system

But the actual runtime code is essentially unmodified from the
original work by Kirill.

It contains basic support for archiving Messages, UserMessages, and
Attachments with a nice test suite.  It's still not usable in
production (e.g. it will probably break Reactions, SubMessages, etc.),
but upcoming commits will address that.
2019-05-19 20:22:47 -07:00
Anders Kaseorg 3127fb4dbd zerver/tests: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:43:03 -08:00
Steve Howell 4b4f27fffb tests: Fix flaky test by using sets, not lists.
The test named `test_archiving_messages_with_attachment`
started flaking recently.  We use sets for comparison
instead of lists to avoid arbitrary sorting differences.
2018-10-25 13:47:37 -05:00
Vishnu Ks 962d72b58b retention: move_messages_to_archive should accept multiple message ids.
This will speed up the scrub realm management command. Calling the
function with a single message_id in a loop was extremely inefficient.
2018-10-11 15:31:12 -07:00
Steve Howell dc70bc1d3f tests: Make retention tests less time-sensitive.
We now update all test messages to have a pub_date
of "now" in the setUp() function in TestRetentionLib.

We've seen tests flake on query counts before this
patch.  It's not certain that the test flaked due
to time-related glitches, but it seems the most
plausible explanation.
2018-08-09 06:13:40 -04:00
Vishnu Ks b9bc1c2b33 Eliminate get_user_profile_by_email from test_classes. 2017-11-26 15:47:56 -08:00
rht 3bf9cd0656 zerver/tests: Use python 3 syntax for typing (part 3). 2017-11-21 22:01:19 -08:00
neiljp (Neil Pilgrim) 6a1786dc1b mypy: Clarify return type of _check_messages_before_archiving. 2017-11-07 11:26:46 -08:00
rht c4fcff7178 refactor: Replace super(.*self) with Python 3-specific super().
We change all the instances except for the `test_helpers.py`
TimeTrackingCursor monkey-patching, which actually needs to specify
the base class.
2017-10-30 14:30:25 -07:00
Steve Howell 8276442ee6 tests: Fix send_message calls in test_retention.py. 2017-10-28 10:20:59 -07:00
rht 691598a88b py3: Remove "from six.moves import range".
This is no longer required, since in Python 3, this is what the range
built-in does.
2017-10-17 23:28:14 -07:00
rht 1e87a4b68c zerver/tests: Remove absolute_import. 2017-09-27 10:00:39 -07:00
K.Kanakhin 2434f2d96c messages: Add support for admins deleting messages.
This makes it possible for Zulip administrators to delete messages.
This is primarily intended for use in deleting early test messages,
but it can solve other problems as well.

Later we'll want to play with the permissions model for this, but for
now, the goal is just to integrate the feature.

Note that it saves the deleted messages for some time using the same
approach as Zulip's message retention policy feature.

Fixes #135.
2017-05-29 21:59:38 -07:00
Vishnu Ks 820dc9dd9a Replace espuser@mit.edu with mit_user('espuser'). 2017-05-22 19:02:42 -07:00
Vishnu Ks 99fc0e9e62 Replace starnine@mit.edu with mit_user('starnine'). 2017-05-22 19:02:42 -07:00
hackerkid b2504084ab Replace timezone.now with timezone_now. 2017-04-16 12:28:56 -07:00
Raghav Jajodia a3a03bd6a5 mypy: Added Dict, List and Set imports.
Fixed mypy errors associated with the upgrade.
2017-03-04 14:33:44 -08:00
Rishi Gupta 1453a5bfda Change string_id of test zephyr realm from mit to zephyr.
Also changes Realm.is_zephyr_mirror_realm to use string_id=zephyr instead of
domain=mit.edu.
Part of a larger migration away from Realm.domain.
2017-03-04 12:18:01 -08:00
Rishi Gupta 494c1a2b55 Remove unnecessary uses of Realm.domain in zerver/tests. 2017-01-09 11:26:08 -08:00
sinwar 4582a98c09 tests: Split out ZulipTestCase and WebhookTestCase to a separate file.
Fixes #1671.
2016-11-10 19:29:43 -08:00
Rishi Gupta 6e9e2cc05f Change datetime.now() to timezone.now() in various tests.
Good practice to use timezone-aware datetime objects unless there is a
reason to do otherwise.
2016-11-07 20:13:53 -08:00
K.Kanakhin 39e0886361 retention-policy: Add tool to determine expired messages.
This is a first step towards implementing a message retention policy
feature.

- Add Realm model message_retention_days field to setup
  messages expired period for realm.
- Add migration.
- Add tool to get expired messages for each Realm.
- Add tests to cover tool for getting expired messages.
2016-10-25 15:38:08 -07:00