zulip/analytics/migrations/0015_clear_duplicate_counts.py

from django.db import migrations
from django.db.backends.postgresql.schema import BaseDatabaseSchemaEditor
from django.db.migrations.state import StateApps
from django.db.models import Count, Sum


def clear_duplicate_counts(apps: StateApps, schema_editor: BaseDatabaseSchemaEditor) -> None:
    """This is a preparatory migration for our Analytics tables.

    The backstory is that Django's unique_together indexes do not properly
    handle the subgroup=None corner case (allowing duplicate rows that have a
    subgroup of None), which meant that in race conditions, rather than updating
    an existing row for the property/(realm, stream, user)/time with subgroup=None, Django would
    create a duplicate row.

    In the next migration, we'll add a proper constraint to fix this bug, but
    we need to fix any existing problematic rows before we can add that constraint.

    We fix this in an appropriate fashion for each type of CountStat object; mainly
    this means deleting the extra rows, but for LoggingCountStat objects, we need to
    additionally combine the sums.
    """
    count_tables = dict(
        realm=apps.get_model("analytics", "RealmCount"),
        user=apps.get_model("analytics", "UserCount"),
        stream=apps.get_model("analytics", "StreamCount"),
        installation=apps.get_model("analytics", "InstallationCount"),
    )

    for name, count_table in count_tables.items():
        value = [name, "property", "end_time"]
        if name == "installation":
            value = ["property", "end_time"]
        counts = (
            count_table.objects.filter(subgroup=None)
            .values(*value)
            .annotate(Count("id"), Sum("value"))
            .filter(id__count__gt=1)
        )

        for count in counts:
            count.pop("id__count")
            total_value = count.pop("value__sum")
            duplicate_counts = list(count_table.objects.filter(**count))
            first_count = duplicate_counts[0]
            if count["property"] in ["invites_sent::day", "active_users_log:is_bot:day"]:
                # For LoggingCountStat objects, the right fix is to combine the totals;
                # for other CountStat objects, we expect the duplicates to have the same value.
                # And so all we need to do is delete them.
                first_count.value = total_value
                first_count.save()
            to_cleanup = duplicate_counts[1:]
            for duplicate_count in to_cleanup:
                duplicate_count.delete()


class Migration(migrations.Migration):
    dependencies = [
        ("analytics", "0014_remove_fillstate_last_modified"),
    ]

    operations = [
        migrations.RunPython(clear_duplicate_counts, reverse_code=migrations.RunPython.noop),
    ]
analytics: Fix missing unique constraint when subgroup is null. Replaced unique_together with UniqueConstraint in models that covered nullable fields as in unique_together database indexes don't work where subgroup=None. So added conditional unique index handling invalid duplicate Count data. Added 0015_clear_duplicate_counts migration to handle existing data that violates the constraints. Also corrected a test case in test_counts.py which didn't clear its state properly and thus was accidentally taking advantage of this database schema bug. 2020-02-29 22:48:15 +01:00			`from django.db import migrations`
typing: Use BaseDatabaseSchemaEditor in place of DatabaseSchemaEditor. This is a part of #18777. Signed-off-by: Zixuan James Li <359101898@qq.com> 2022-05-27 23:33:51 +02:00			`from django.db.backends.postgresql.schema import BaseDatabaseSchemaEditor`
analytics: Fix missing unique constraint when subgroup is null. Replaced unique_together with UniqueConstraint in models that covered nullable fields as in unique_together database indexes don't work where subgroup=None. So added conditional unique index handling invalid duplicate Count data. Added 0015_clear_duplicate_counts migration to handle existing data that violates the constraints. Also corrected a test case in test_counts.py which didn't clear its state properly and thus was accidentally taking advantage of this database schema bug. 2020-02-29 22:48:15 +01:00			`from django.db.migrations.state import StateApps`
migrations: Fix unused import error. 2020-03-06 21:08:14 +01:00			`from django.db.models import Count, Sum`
analytics: Fix missing unique constraint when subgroup is null. Replaced unique_together with UniqueConstraint in models that covered nullable fields as in unique_together database indexes don't work where subgroup=None. So added conditional unique index handling invalid duplicate Count data. Added 0015_clear_duplicate_counts migration to handle existing data that violates the constraints. Also corrected a test case in test_counts.py which didn't clear its state properly and thus was accidentally taking advantage of this database schema bug. 2020-02-29 22:48:15 +01:00
python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2020-06-11 00:54:34 +02:00
typing: Use BaseDatabaseSchemaEditor in place of DatabaseSchemaEditor. This is a part of #18777. Signed-off-by: Zixuan James Li <359101898@qq.com> 2022-05-27 23:33:51 +02:00			`def clear_duplicate_counts(apps: StateApps, schema_editor: BaseDatabaseSchemaEditor) -> None:`
analytics: Fix missing unique constraint when subgroup is null. Replaced unique_together with UniqueConstraint in models that covered nullable fields as in unique_together database indexes don't work where subgroup=None. So added conditional unique index handling invalid duplicate Count data. Added 0015_clear_duplicate_counts migration to handle existing data that violates the constraints. Also corrected a test case in test_counts.py which didn't clear its state properly and thus was accidentally taking advantage of this database schema bug. 2020-02-29 22:48:15 +01:00			`"""This is a preparatory migration for our Analytics tables.`

			`The backstory is that Django's unique_together indexes do not properly`
			`handle the subgroup=None corner case (allowing duplicate rows that have a`
			`subgroup of None), which meant that in race conditions, rather than updating`
migrations: Upgrade migrations to remove duplicates in all Count tables. This commit upgrades 0015_clear_duplicate_counts migration to remove duplicate count in StreamCount, UserCount, InstallationCount as well. Fixes https://github.com/zulip/docker-zulip/issues/266 2020-07-30 22:55:02 +02:00			`an existing row for the property/(realm, stream, user)/time with subgroup=None, Django would`
analytics: Fix missing unique constraint when subgroup is null. Replaced unique_together with UniqueConstraint in models that covered nullable fields as in unique_together database indexes don't work where subgroup=None. So added conditional unique index handling invalid duplicate Count data. Added 0015_clear_duplicate_counts migration to handle existing data that violates the constraints. Also corrected a test case in test_counts.py which didn't clear its state properly and thus was accidentally taking advantage of this database schema bug. 2020-02-29 22:48:15 +01:00			`create a duplicate row.`

			`In the next migration, we'll add a proper constraint to fix this bug, but`
			`we need to fix any existing problematic rows before we can add that constraint.`

			`We fix this in an appropriate fashion for each type of CountStat object; mainly`
			`this means deleting the extra rows, but for LoggingCountStat objects, we need to`
			`additionally combine the sums.`
			`"""`
python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:19:30 +01:00			`count_tables = dict(`
python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:20:45 +01:00			`realm=apps.get_model("analytics", "RealmCount"),`
			`user=apps.get_model("analytics", "UserCount"),`
			`stream=apps.get_model("analytics", "StreamCount"),`
			`installation=apps.get_model("analytics", "InstallationCount"),`
python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:19:30 +01:00			`)`
migrations: Upgrade migrations to remove duplicates in all Count tables. This commit upgrades 0015_clear_duplicate_counts migration to remove duplicate count in StreamCount, UserCount, InstallationCount as well. Fixes https://github.com/zulip/docker-zulip/issues/266 2020-07-30 22:55:02 +02:00
			`for name, count_table in count_tables.items():`
python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:20:45 +01:00			`value = [name, "property", "end_time"]`
			`if name == "installation":`
			`value = ["property", "end_time"]`
python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:19:30 +01:00			`counts = (`
			`count_table.objects.filter(subgroup=None)`
			`.values(*value)`
python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:20:45 +01:00			`.annotate(Count("id"), Sum("value"))`
python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:19:30 +01:00			`.filter(id__count__gt=1)`
			`)`
analytics: Fix missing unique constraint when subgroup is null. Replaced unique_together with UniqueConstraint in models that covered nullable fields as in unique_together database indexes don't work where subgroup=None. So added conditional unique index handling invalid duplicate Count data. Added 0015_clear_duplicate_counts migration to handle existing data that violates the constraints. Also corrected a test case in test_counts.py which didn't clear its state properly and thus was accidentally taking advantage of this database schema bug. 2020-02-29 22:48:15 +01:00
migrations: Upgrade migrations to remove duplicates in all Count tables. This commit upgrades 0015_clear_duplicate_counts migration to remove duplicate count in StreamCount, UserCount, InstallationCount as well. Fixes https://github.com/zulip/docker-zulip/issues/266 2020-07-30 22:55:02 +02:00			`for count in counts:`
python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:20:45 +01:00			`count.pop("id__count")`
			`total_value = count.pop("value__sum")`
migrations: Upgrade migrations to remove duplicates in all Count tables. This commit upgrades 0015_clear_duplicate_counts migration to remove duplicate count in StreamCount, UserCount, InstallationCount as well. Fixes https://github.com/zulip/docker-zulip/issues/266 2020-07-30 22:55:02 +02:00			`duplicate_counts = list(count_table.objects.filter(**count))`
			`first_count = duplicate_counts[0]`
python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:20:45 +01:00			`if count["property"] in ["invites_sent::day", "active_users_log:is_bot:day"]:`
migrations: Upgrade migrations to remove duplicates in all Count tables. This commit upgrades 0015_clear_duplicate_counts migration to remove duplicate count in StreamCount, UserCount, InstallationCount as well. Fixes https://github.com/zulip/docker-zulip/issues/266 2020-07-30 22:55:02 +02:00			`# For LoggingCountStat objects, the right fix is to combine the totals;`
			`# for other CountStat objects, we expect the duplicates to have the same value.`
			`# And so all we need to do is delete them.`
			`first_count.value = total_value`
			`first_count.save()`
			`to_cleanup = duplicate_counts[1:]`
			`for duplicate_count in to_cleanup:`
			`duplicate_count.delete()`
analytics: Fix missing unique constraint when subgroup is null. Replaced unique_together with UniqueConstraint in models that covered nullable fields as in unique_together database indexes don't work where subgroup=None. So added conditional unique index handling invalid duplicate Count data. Added 0015_clear_duplicate_counts migration to handle existing data that violates the constraints. Also corrected a test case in test_counts.py which didn't clear its state properly and thus was accidentally taking advantage of this database schema bug. 2020-02-29 22:48:15 +01:00
python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:19:30 +01:00
analytics: Fix missing unique constraint when subgroup is null. Replaced unique_together with UniqueConstraint in models that covered nullable fields as in unique_together database indexes don't work where subgroup=None. So added conditional unique index handling invalid duplicate Count data. Added 0015_clear_duplicate_counts migration to handle existing data that violates the constraints. Also corrected a test case in test_counts.py which didn't clear its state properly and thus was accidentally taking advantage of this database schema bug. 2020-02-29 22:48:15 +01:00			`class Migration(migrations.Migration):`
			`dependencies = [`
python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:20:45 +01:00			`("analytics", "0014_remove_fillstate_last_modified"),`
analytics: Fix missing unique constraint when subgroup is null. Replaced unique_together with UniqueConstraint in models that covered nullable fields as in unique_together database indexes don't work where subgroup=None. So added conditional unique index handling invalid duplicate Count data. Added 0015_clear_duplicate_counts migration to handle existing data that violates the constraints. Also corrected a test case in test_counts.py which didn't clear its state properly and thus was accidentally taking advantage of this database schema bug. 2020-02-29 22:48:15 +01:00			`]`

			`operations = [`
python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:19:30 +01:00			`migrations.RunPython(clear_duplicate_counts, reverse_code=migrations.RunPython.noop),`
analytics: Fix missing unique constraint when subgroup is null. Replaced unique_together with UniqueConstraint in models that covered nullable fields as in unique_together database indexes don't work where subgroup=None. So added conditional unique index handling invalid duplicate Count data. Added 0015_clear_duplicate_counts migration to handle existing data that violates the constraints. Also corrected a test case in test_counts.py which didn't clear its state properly and thus was accidentally taking advantage of this database schema bug. 2020-02-29 22:48:15 +01:00			`]`