migrations: Upgrade migrations to remove duplicates in all Count tables.

This commit upgrades 0015_clear_duplicate_counts migration to remove
duplicate count in StreamCount, UserCount, InstallationCount as well.

Fixes https://github.com/zulip/docker-zulip/issues/266
This commit is contained in:
arpit551 2020-07-31 02:25:02 +05:30 committed by Tim Abbott
parent 9c317b0495
commit a68d38cc52
1 changed files with 24 additions and 18 deletions

View File

@ -10,7 +10,7 @@ def clear_duplicate_counts(apps: StateApps, schema_editor: DatabaseSchemaEditor)
The backstory is that Django's unique_together indexes do not properly The backstory is that Django's unique_together indexes do not properly
handle the subgroup=None corner case (allowing duplicate rows that have a handle the subgroup=None corner case (allowing duplicate rows that have a
subgroup of None), which meant that in race conditions, rather than updating subgroup of None), which meant that in race conditions, rather than updating
an existing row for the property/realm/time with subgroup=None, Django would an existing row for the property/(realm, stream, user)/time with subgroup=None, Django would
create a duplicate row. create a duplicate row.
In the next migration, we'll add a proper constraint to fix this bug, but In the next migration, we'll add a proper constraint to fix this bug, but
@ -20,18 +20,24 @@ def clear_duplicate_counts(apps: StateApps, schema_editor: DatabaseSchemaEditor)
this means deleting the extra rows, but for LoggingCountStat objects, we need to this means deleting the extra rows, but for LoggingCountStat objects, we need to
additionally combine the sums. additionally combine the sums.
""" """
RealmCount = apps.get_model('analytics', 'RealmCount') count_tables = dict(realm=apps.get_model('analytics', 'RealmCount'),
user=apps.get_model('analytics', 'UserCount'),
stream=apps.get_model('analytics', 'StreamCount'),
installation=apps.get_model('analytics', 'InstallationCount'))
realm_counts = RealmCount.objects.filter(subgroup=None).values( for name, count_table in count_tables.items():
'realm_id', 'property', 'end_time').annotate( value = [name, 'property', 'end_time']
if name == 'installation':
value = ['property', 'end_time']
counts = count_table.objects.filter(subgroup=None).values(*value).annotate(
Count('id'), Sum('value')).filter(id__count__gt=1) Count('id'), Sum('value')).filter(id__count__gt=1)
for realm_count in realm_counts: for count in counts:
realm_count.pop('id__count') count.pop('id__count')
total_value = realm_count.pop('value__sum') total_value = count.pop('value__sum')
duplicate_counts = list(RealmCount.objects.filter(**realm_count)) duplicate_counts = list(count_table.objects.filter(**count))
first_count = duplicate_counts[0] first_count = duplicate_counts[0]
if realm_count['property'] in ["invites_sent::day", "active_users_log:is_bot:day"]: if count['property'] in ["invites_sent::day", "active_users_log:is_bot:day"]:
# For LoggingCountStat objects, the right fix is to combine the totals; # For LoggingCountStat objects, the right fix is to combine the totals;
# for other CountStat objects, we expect the duplicates to have the same value. # for other CountStat objects, we expect the duplicates to have the same value.
# And so all we need to do is delete them. # And so all we need to do is delete them.