Commit Graph

68 Commits

Author SHA1 Message Date
Steve Howell 3b2c881ce6 tests: Decouple test_retention and test_reactions.
We generally want to avoid having two sibling test
suites depend on each other, unless there's a real
compelling reason to share code.  (And if there is
code to share, we can usually promote it to either
test_helpers or ZulipTestCase, as I did here.)

This commit is also prep for the next commit, where
I try to simplify all of the helpers in EmojiReactionBase.

Especially now that we have f-strings, it is usually
better to just call api_post explicitly than to
obscure the mechanism with thin wrappers around
api_post.  Our url schemes are pretty stable, so it's
unlikely that the helpers are actually gonna prevent
future busywork.
2020-07-17 11:04:54 -07:00
Mateusz Mandera 0c6497d43a retention: Add restore_retention_policy_deletions_for_stream function. 2020-06-24 10:40:38 -07:00
Mateusz Mandera 7a03e2a7fe retention: Replace Realm.message_retention_days None value with -1.
To be more consistent with the meaning in the Stream model, and to make
it easier to have a reasonable settings API, we get rid of the None
value for Realm.message_retention_days in favor of the value -1 to
represent the "don't delete messages" default policy.
2020-06-24 10:33:21 -07:00
Aman Agrawal cda7b2f539 deletion: Add support for bulk message deletion events.
This is designed to have no user-facing change unless the client
declares bulk_message_deletion in its client_capabilities.

Clients that do so will receive a single bulk event for bulk deletions
of messages within a single conversation (topic or PM thread).

Backend implementation of #15285.
2020-06-14 22:34:00 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Anders Kaseorg 67e7a3631d python: Convert percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-10 15:02:09 -07:00
Mateusz Mandera b234fe8ccb retention: Pass optional realm argument to move_messages_to_archive.
This allows having the realm field of ArchiveTransaction set instead of
NULL when using move_messages_to_archive.
2020-05-16 14:46:56 -07:00
Mateusz Mandera 812ac4714f retention: Optimize fetching of realms and streams with retention policy. 2020-05-07 16:28:05 -07:00
Anders Kaseorg fead14951c python: Convert assignment type annotations to Python 3.6 style.
This commit was split by tabbott; this piece covers the vast majority
of files in Zulip, but excludes scripts/, tools/, and puppet/ to help
ensure we at least show the right error messages for Xenial systems.

We can likely further refine the remaining pieces with some testing.

Generated by com2ann, with whitespace fixes and various manual fixes
for runtime issues:

-    invoiced_through: Optional[LicenseLedger] = models.ForeignKey(
+    invoiced_through: Optional["LicenseLedger"] = models.ForeignKey(

-_apns_client: Optional[APNsClient] = None
+_apns_client: Optional["APNsClient"] = None

-    notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
-    signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)

-    author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)
+    author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)

-    bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)
+    bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)

-    default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
-    default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)

-descriptors_by_handler_id: Dict[int, ClientDescriptor] = {}
+descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {}

-worker_classes: Dict[str, Type[QueueProcessingWorker]] = {}
-queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {}
+worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {}
+queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {}

-AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None
+AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-22 11:02:32 -07:00
Anders Kaseorg 1cf63eb5bf python: Whitespace fixes from autopep8.
Generated by autopep8, with the setup.cfg configuration from #14532.
I’m not sure why pycodestyle didn’t already flag these.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-21 17:58:09 -07:00
Anders Kaseorg c734bbd95d python: Modernize legacy Python 2 syntax with pyupgrade.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-09 16:43:22 -07:00
Stefan Weil d2fa058cc1
text: Fix some typos (most of them found and fixed by codespell).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-03-27 17:25:56 -07:00
Steve Howell 5e2a32c936 tests: Use users in send_*_message.
This commit mostly makes our tests less
noisy, since emails are no longer an important
detail of sending messages (they're not even
really used in the API).

It also sets us up to have more scrutiny
on delivery_email/email in the future
for things that actually matter.  (This is
a prep commit for something along those
lines, kind of hard to explain the full
plan.)
2020-03-07 18:30:13 -08:00
Mateusz Mandera 2d544250b7 events: Add block for compatibility with old delete_message events. 2020-03-03 15:52:42 -08:00
Mateusz Mandera 7db3d4560f do_delete_messages: Archive the messages in bulk.
The test added in this commit shows 37 queries - compared to 181 without
the change to the function. That seems very much worth it.
2020-02-27 23:12:32 -08:00
Tim Abbott 7ccc8373e2 bugdown: Fix logic for extracting attachment path_id.
In 3892a8afd8, we restructured the
system for managing uploaded files to a much cleaner model where we
just do parsing inside bugdown.

That new model had potentially buggy handling of cases around both
relative URLs and URLS starting with `realm.host`.

We address this by further rewriting the handling of attachments to
avoid regular expressions entirely, instead relying on urllib for
parsing, and having bugdown output `path_id` values, so that there's
no need for any conversions between formats outside bugdowm.

The check_attachment_reference_change function for processing message
updates is significantly simplified in the process.

The new check on the hostname has the side effect of requiring us to
fix some previously weird/buggy test data.

Co-Author-By: Anders Kaseorg <anders@zulipchat.com>
Co-Author-By: Rohitt Vashishtha <aero31aero@gmail.com>
2019-12-12 20:30:26 -08:00
Rohitt Vashishtha 3892a8afd8 messages: Set has_attachment correctly using Bugdown.
Previously, we would naively set has_attachment just by searching
the whole messages for strings like `/user_uploads/...`. We now
prevent running do_claim_attachments for messages that obviously
do not have an attachment in them that we previously ran.

For example: attachments in codeblocks or
             attachments that otherwise do not match our link syntax.

The new implementation runs that check on only the urls that
bugdown determines should be rendered. We also refactor some
Attachment tests in test_messages to test this change.

The new method is:

1. Create a list of potential_attachment_urls in Bugdown while rendering.
2. Loop over this list in do_claim_attachments for the actual claiming.
   For saving:
3. If we claimed an attachment, set message.has_attachment to True.
   For updating:
3. If claimed_attachment != message.has_attachment: update has_attachment.

We do not modify the logic for 'unclaiming' attachments when editing.
2019-12-11 11:03:44 -08:00
Mateusz Mandera bbf2474bd0 tests: setUp overrides should call super().setUp().
MigrationsTestCase is intentionally omitted from this, since migrations
tests are different in their nature and so whatever setUp()
ZulipTestCase may do in the future, MigrationsTestCase may not
necessarily want to replicate.
2019-10-19 17:27:01 -07:00
Mateusz Mandera dbe508bb91 models: Migration of Message.pub_date to date_sent, part 2.
Fixes #1727.

With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.
2019-10-05 19:01:34 -07:00
Mateusz Mandera 4646c7550c test_retention: Prepare for moving system bots to zulipinternal. 2019-07-20 15:08:08 -07:00
Mateusz Mandera d1c2185c81 retention: Archive cross-realm personal messages.
We can simply archive cross-realm personal messages according to the
retention policy of the recipient's realm. It requires adding another
message-archiving query for this case however.

What remains is to figure out how to treat cross-realm huddle messages.
2019-07-08 20:03:20 -07:00
Mateusz Mandera 7950aaea1e retention: Add code for deleting old archive data. 2019-06-26 12:24:47 -07:00
Mateusz Mandera 3ac11a3fc5 retention: Use ON CONFLICT DO UPDATE to handle re-archiving properly.
When archiving Messages, we stop relying on LEFT JOIN ... IS NULL to
avoid duplicates when INSERTing. Instead we use ON CONFLICT DO UPDATE
(added in postgresql 9.5) to, in case of archiving a Message that
already has a corresponding archived objects (this happens if a Message
gets archived, restored and then archived again), re-assign the existing
ArchivedMessage to the new transaction.

This also allows us to fix test_archiving_messages_second_time, which
was temporarily disable a few commits before.
2019-06-26 12:05:59 -07:00
Mateusz Mandera 6e46c6d752 retention: Add functions for restoring archived data.
Functions for restoring archived data are added and existing tests are
expanded to restore data they archived and check correctness.
2019-06-26 12:05:59 -07:00
Mateusz Mandera 9acd3b0f46 retention: Rewrite move_messages_to_archive to use existing functions.
Instead of having a bunch of custom code in the function, we make it use
run_message_batch_query and run_archiving_in_chunks to do the necessary
operations in a consistent way, using the same codepaths as the rest of
the archiving system.
This breaks test_archiving_messages_second_time temporarily, but we will
fix it and re-enable the test in the next commits, where we'll address
various other issues with re-archiving of messages.

We also remove the @transaction.atomic wrapper, because atomicity is
handled by the logic inside run_archiving_in_chunks.
2019-06-26 12:05:59 -07:00
Mateusz Mandera c869ea8e1e test_retention: Factor out a class with shared helper functions. 2019-06-26 12:05:59 -07:00
Mateusz Mandera 7fc48f8b93 test_retention: Check if messages get deleted when archiving.
We add additional checks in _verify_archive_data to make sure the
archived Messages and UserMessages are deleted from their normal tables.
2019-06-26 12:05:59 -07:00
Mateusz Mandera 25810752fe retention: Fully process each Message chunk in a transaction.
To ensure the database retains a consistent state if archiving gets
interrupted, we process each Messages chunk together with related
objects in a single atomic transaction.
2019-06-13 11:17:54 -07:00
Mateusz Mandera f06a4b4eab retention: Batch Message archiving queries.
We batch queries that archive Messages, to limit the maximum amount of
Message objects archived in a single query. This leads to the archiving
of other related objects being batched as well, because we loop over
chunks of archived messages and archive their related objects per-chunk.
2019-06-11 09:25:25 -07:00
Mateusz Mandera 323be57151 retention: If stream has no retention policy set, use realm policy.
We add the following behavior:
If stream has message_retention_days set to -1, archiving for it is
disabled.
If stream has message_retention_days set to null, use the realm's
policy. If the realm has no policy, we don't archive for this stream.
2019-06-06 11:17:42 -07:00
Mateusz Mandera 0e9fa4f028 retention: Support stream-based retention policies.
We change the archiving scheme to allow having stream based retention
policies. In the first step of the archiving process, we loop over
streams and archive their expired messages and related objects.
Then we separately archive all expired personal and huddle messages and
related objects. As the last step, we scan for redundant attachments
which can now be deleted.
To achieve this, we have to rewrite a significant portion of the
retention code and rework some of the database queries.
For the sake of simplicity, we neither archive nor delete cross-realm
messages, except cross-realm stream messages – in their case they can
be processed in the same manner as ordinary stream messages.
In the query for archiving personal and huddle messages we simply
exclude those sent by cross-realm bots.
We change the tests to adapt to these modifications.
2019-06-06 11:17:42 -07:00
Mateusz Mandera 6c3ba25474 retention: Use RETURNING to speed up database queries.
We add RETURNING to fetch relevant message and usermessage ids in
archiving queries and use them to make other queries faster and slower.
A side-effect of this implementation is that with cross-realm messages,
the UserMessage of the recipient and the Message will not be deleted -
but cross-realm messages are rare, will still get correctly put in the
archive tables and so failing to delete should not be a problem for now.
They will be fully handled later.
2019-06-02 14:55:14 -07:00
Mateusz Mandera 4facc93670 retention: Add archiving of SubMessages. 2019-05-30 11:40:20 -07:00
Mateusz Mandera 37c42a09e5 retention: Archiving of models tied to a Message, applied to Reactions.
We add general code that will archive models that are tied to a specific
Message (such as Reactions and SubMessages). Certain details of the
model are grabbed from a list models_with_message_key, and then used to
create queries that will archive these database tables.
We put Reaction in that list in this commit, and add appropriate tests.
To have archiving of other analogical models (for example SubMessage),
one only needs to make an appropriate entry in the
models_with_message_key list.
2019-05-30 11:40:20 -07:00
Mateusz Mandera dfee559333 test_retention: Check that Reactions get correctly deleted. 2019-05-30 11:33:41 -07:00
Mateusz Mandera 29729b7748 test_retention: Check that SubMessages get correctly deleted. 2019-05-30 11:27:38 -07:00
Mateusz Mandera 6d69405f54 test_retention: Keep helper functions in a base class. 2019-05-30 11:27:38 -07:00
Mateusz Mandera 2370e6717c test_retention: Factor out _make_expired_zulip_messages helper function. 2019-05-30 11:27:38 -07:00
Mateusz Mandera 0bf90be886 retention: Clean up and rewrite test_retention.py.
test_retention.py had various issues - we opt for keeping its essence
(what should the tests do and verify), but rewriting a lot of it in
order to have more clarity in what's happening there.
2019-05-27 12:53:32 -07:00
Mateusz Mandera c5ac66b9c8 retention: Split archive_messages code into two functions.
We split archive_messages code into two functions: moving to archive and
cleanup. This allows cleaning up the tests - they can call
these functions directly instead of copying several lines of
archive_messages here and there in multiple tests.
2019-05-27 12:53:32 -07:00
Mateusz Mandera db86043195 test_retention: Quick fix for the remaining test failure.
test_cross_realm_messages_archiving_two_realm_expired doesn't run the
code path patched in commit 3d1aa98b2ea344fba7fbb2373a37d4cf30f53e08i,
so it can still fail. We apply the analogical change in the test as
in the cited commit.
2019-05-22 14:15:18 -07:00
Tim Abbott 3d1aa98b2e retention: Use a consistent ordering for processing realms.
This is probably a good idea for the production use case, since then
there's some consistency of behavior, and if we extend logging, one
knows exactly which realms were or were not executed before a logged
failure.

This fixes the nondeterministic test failures we've been seeing in CI:
if you use `-id` in that order_by, it happens consistently.
2019-05-22 10:48:53 -07:00
Tim Abbott bde9b28589 test_retention: Update debugging code for CI failures.
This should provide more helpful output for the next stage of
debugging.
2019-05-21 14:10:15 -07:00
Tim Abbott 55b15ba117 test_retention: Improve and extent print-debugging.
We needed flush=True to have output not be lost.

Also print the original messages, so we can compare what's missing.
2019-05-21 09:28:03 -07:00
Tim Abbott 1353e94b29 test_retention: Add print-debugging.
We've been seeing nondeterministic failures in this test suite in CI
that we can't reproduce locally; these print statements should help
track them down.
2019-05-20 19:43:28 -07:00
K.Kanakhin e930851d16 retention-period: Add more core code for retention policy.
This is a very old commit for #106, which has been on hiatus for a few
years.  It was significantly modified by tabbott to:
* Improve coding style and variable names
* Update mypy annotations style
* Clean up the testing logic
* Update for API changes elsewhere in our system

But the actual runtime code is essentially unmodified from the
original work by Kirill.

It contains basic support for archiving Messages, UserMessages, and
Attachments with a nice test suite.  It's still not usable in
production (e.g. it will probably break Reactions, SubMessages, etc.),
but upcoming commits will address that.
2019-05-19 20:22:47 -07:00
Anders Kaseorg 3127fb4dbd zerver/tests: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:43:03 -08:00
Steve Howell 4b4f27fffb tests: Fix flaky test by using sets, not lists.
The test named `test_archiving_messages_with_attachment`
started flaking recently.  We use sets for comparison
instead of lists to avoid arbitrary sorting differences.
2018-10-25 13:47:37 -05:00
Vishnu Ks 962d72b58b retention: move_messages_to_archive should accept multiple message ids.
This will speed up the scrub realm management command. Calling the
function with a single message_id in a loop was extremely inefficient.
2018-10-11 15:31:12 -07:00