zulip/zerver/management/commands/create_large_indexes.py

from typing import Any

from django.db import connection

from zerver.lib.management import ZulipBaseCommand

def create_index_if_not_exist(index_name: str, table_name: str,
                              column_string: str, where_clause: str) -> None:
    #
    #  This function is somewhat similar to
    #  zerver.lib.migrate.create_index_if_not_exist.
    #
    #  The other function gets used as part of Django migrations; this function
    #  uses SQL that is not supported by Django migrations.
    #
    #  Creating concurrent indexes is kind of a pain with current versions
    #  of Django/postgres, because you will get this error with seemingly
    #  reasonable code:
    #
    #    CREATE INDEX CONCURRENTLY cannot be executed from a function or multi-command string
    #
    # For a lot more detail on this process, refer to the commit message
    # that added this file to the repo.

    with connection.cursor() as cursor:
        sql = '''
            SELECT 1
            FROM pg_class
            where relname = %s
            '''
        cursor.execute(sql, [index_name])
        rows = cursor.fetchall()
        if len(rows) > 0:
            print('Index %s already exists.' % (index_name,))
            return

        print("Creating index %s." % (index_name,))
        sql = '''
            CREATE INDEX CONCURRENTLY
            %s
            ON %s (%s)
            %s;
            ''' % (index_name, table_name, column_string, where_clause)
        cursor.execute(sql)
        print('Finished creating %s.' % (index_name,))


def create_indexes() -> None:

    # copied from 0082
    create_index_if_not_exist(
        index_name='zerver_usermessage_starred_message_id',
        table_name='zerver_usermessage',
        column_string='user_profile_id, message_id',
        where_clause='WHERE (flags & 2) != 0',
    )

    # copied from 0083
    create_index_if_not_exist(
        index_name='zerver_usermessage_mentioned_message_id',
        table_name='zerver_usermessage',
        column_string='user_profile_id, message_id',
        where_clause='WHERE (flags & 8) != 0',
    )

    # copied from 0095
    create_index_if_not_exist(
        index_name='zerver_usermessage_unread_message_id',
        table_name='zerver_usermessage',
        column_string='user_profile_id, message_id',
        where_clause='WHERE (flags & 1) = 0',
    )

    # copied from 0098
    create_index_if_not_exist(
        index_name='zerver_usermessage_has_alert_word_message_id',
        table_name='zerver_usermessage',
        column_string='user_profile_id, message_id',
        where_clause='WHERE (flags & 512) != 0',
    )

    # copied from 0099
    create_index_if_not_exist(
        index_name='zerver_usermessage_wildcard_mentioned_message_id',
        table_name='zerver_usermessage',
        column_string='user_profile_id, message_id',
        where_clause='WHERE (flags & 8) != 0 OR (flags & 16) != 0',
    )

    # copied from 0177
    create_index_if_not_exist(
        index_name='zerver_usermessage_is_private_message_id',
        table_name='zerver_usermessage',
        column_string='user_profile_id, message_id',
        where_clause='WHERE (flags & 2048) != 0',
    )

    # copied from 0180
    create_index_if_not_exist(
        index_name='zerver_usermessage_active_mobile_push_notification_id',
        table_name='zerver_usermessage',
        column_string='user_profile_id, message_id',
        where_clause='WHERE (flags & 4096) != 0',
    )

class Command(ZulipBaseCommand):
    help = """Create concurrent indexes for large tables."""

    def handle(self, *args: Any, **options: str) -> None:
        create_indexes()
management: Remove unused imports in management commands. Signed-off-by: Anders Kaseorg <andersk@mit.edu> 2019-02-02 23:53:29 +01:00			`from typing import Any`
Add create_large_indexes management command. This management command creates the same indexes as migrations 82, 83, and 95, which are all indexes on the huge UserMessage table. () This command quickly no-ops with clear messaging when the indexes already exist, so it's idempotent in that regard. (If somebody somehow creates an index by the same name incorrectly, they can always drop it in dbshell and re-run this command.) If any of the migrations have not been run, which we detect simply by the existence of the indexes, then we create them using a `CREATE INDEX CONCURRENTLY` command. This functionality in postgres allows you to create indexes against large tables without disrupting queries against those tables. The tradeoff here is that creating indexes concurrently takes significantly longer than doing them non-concurrently. Since most tables are small, we typically just use regular Django migrations and run them during a brief interval while the app is down. For indexes on big tables, we will want to run this command as part of the upgrade process, and we will want to run it while the app is still up, otherwise it's pointless. All the code in create_indexes() is literally copy/pasted from the relevant migrations, and that scheme should work going forward. (It uses a different implementation of create_index_if_not_exist than the migrations use, but the code is identical lexically in the function.) If we ever do major restructuring of our large tables, such as UserMessage, and we end up droppping some of these indexes, then we will need to make this command migrations-aware. For now it's safe to assume that indexes are generally additive in nature, and the sooner we create them during the upgrade process, the better. () UserMessage is huge for large installations, of course. 2017-08-16 17:42:25 +02:00
			`from django.db import connection`

			`from zerver.lib.management import ZulipBaseCommand`

zerver/management: Change use of typing.Text to str. 2018-05-10 19:30:04 +02:00			`def create_index_if_not_exist(index_name: str, table_name: str,`
			`column_string: str, where_clause: str) -> None:`
Add create_large_indexes management command. This management command creates the same indexes as migrations 82, 83, and 95, which are all indexes on the huge UserMessage table. () This command quickly no-ops with clear messaging when the indexes already exist, so it's idempotent in that regard. (If somebody somehow creates an index by the same name incorrectly, they can always drop it in dbshell and re-run this command.) If any of the migrations have not been run, which we detect simply by the existence of the indexes, then we create them using a `CREATE INDEX CONCURRENTLY` command. This functionality in postgres allows you to create indexes against large tables without disrupting queries against those tables. The tradeoff here is that creating indexes concurrently takes significantly longer than doing them non-concurrently. Since most tables are small, we typically just use regular Django migrations and run them during a brief interval while the app is down. For indexes on big tables, we will want to run this command as part of the upgrade process, and we will want to run it while the app is still up, otherwise it's pointless. All the code in create_indexes() is literally copy/pasted from the relevant migrations, and that scheme should work going forward. (It uses a different implementation of create_index_if_not_exist than the migrations use, but the code is identical lexically in the function.) If we ever do major restructuring of our large tables, such as UserMessage, and we end up droppping some of these indexes, then we will need to make this command migrations-aware. For now it's safe to assume that indexes are generally additive in nature, and the sooner we create them during the upgrade process, the better. () UserMessage is huge for large installations, of course. 2017-08-16 17:42:25 +02:00			`#`
			`# This function is somewhat similar to`
			`# zerver.lib.migrate.create_index_if_not_exist.`
			`#`
			`# The other function gets used as part of Django migrations; this function`
			`# uses SQL that is not supported by Django migrations.`
			`#`
			`# Creating concurrent indexes is kind of a pain with current versions`
			`# of Django/postgres, because you will get this error with seemingly`
			`# reasonable code:`
			`#`
			`# CREATE INDEX CONCURRENTLY cannot be executed from a function or multi-command string`
			`#`
			`# For a lot more detail on this process, refer to the commit message`
			`# that added this file to the repo.`

			`with connection.cursor() as cursor:`
			`sql = '''`
			`SELECT 1`
			`FROM pg_class`
			`where relname = %s`
			`'''`
			`cursor.execute(sql, [index_name])`
			`rows = cursor.fetchall()`
			`if len(rows) > 0:`
			`print('Index %s already exists.' % (index_name,))`
			`return`

			`print("Creating index %s." % (index_name,))`
			`sql = '''`
			`CREATE INDEX CONCURRENTLY`
			`%s`
			`ON %s (%s)`
			`%s;`
			`''' % (index_name, table_name, column_string, where_clause)`
			`cursor.execute(sql)`
			`print('Finished creating %s.' % (index_name,))`


zerver/management: Use python 3 syntax for typing. 2017-10-26 11:35:57 +02:00			`def create_indexes() -> None:`
Add create_large_indexes management command. This management command creates the same indexes as migrations 82, 83, and 95, which are all indexes on the huge UserMessage table. () This command quickly no-ops with clear messaging when the indexes already exist, so it's idempotent in that regard. (If somebody somehow creates an index by the same name incorrectly, they can always drop it in dbshell and re-run this command.) If any of the migrations have not been run, which we detect simply by the existence of the indexes, then we create them using a `CREATE INDEX CONCURRENTLY` command. This functionality in postgres allows you to create indexes against large tables without disrupting queries against those tables. The tradeoff here is that creating indexes concurrently takes significantly longer than doing them non-concurrently. Since most tables are small, we typically just use regular Django migrations and run them during a brief interval while the app is down. For indexes on big tables, we will want to run this command as part of the upgrade process, and we will want to run it while the app is still up, otherwise it's pointless. All the code in create_indexes() is literally copy/pasted from the relevant migrations, and that scheme should work going forward. (It uses a different implementation of create_index_if_not_exist than the migrations use, but the code is identical lexically in the function.) If we ever do major restructuring of our large tables, such as UserMessage, and we end up droppping some of these indexes, then we will need to make this command migrations-aware. For now it's safe to assume that indexes are generally additive in nature, and the sooner we create them during the upgrade process, the better. () UserMessage is huge for large installations, of course. 2017-08-16 17:42:25 +02:00
			`# copied from 0082`
			`create_index_if_not_exist(`
			`index_name='zerver_usermessage_starred_message_id',`
			`table_name='zerver_usermessage',`
			`column_string='user_profile_id, message_id',`
			`where_clause='WHERE (flags & 2) != 0',`
			`)`

			`# copied from 0083`
			`create_index_if_not_exist(`
			`index_name='zerver_usermessage_mentioned_message_id',`
			`table_name='zerver_usermessage',`
			`column_string='user_profile_id, message_id',`
			`where_clause='WHERE (flags & 8) != 0',`
			`)`

			`# copied from 0095`
			`create_index_if_not_exist(`
			`index_name='zerver_usermessage_unread_message_id',`
			`table_name='zerver_usermessage',`
			`column_string='user_profile_id, message_id',`
			`where_clause='WHERE (flags & 1) = 0',`
			`)`

database: Add database index for alert words. 2017-08-16 21:29:23 +02:00			`# copied from 0098`
			`create_index_if_not_exist(`
			`index_name='zerver_usermessage_has_alert_word_message_id',`
			`table_name='zerver_usermessage',`
			`column_string='user_profile_id, message_id',`
			`where_clause='WHERE (flags & 512) != 0',`
			`)`

database: Add database index for wildcard mentions. 2017-08-16 21:51:26 +02:00			`# copied from 0099`
			`create_index_if_not_exist(`
			`index_name='zerver_usermessage_wildcard_mentioned_message_id',`
			`table_name='zerver_usermessage',`
			`column_string='user_profile_id, message_id',`
			`where_clause='WHERE (flags & 8) != 0 OR (flags & 16) != 0',`
			`)`

models: Add is_private flag to UserMessage and add index for it. The is_private flag is intended to be set if recipient type is 'private'(1) or 'huddle'(3), otherwise i.e if it is 'stream'(2), it should be unset. This commit adds a database index for the is_private flag (which we'll need to use it). That index is used to reset the flag if it was already set. The already set flags were due to a previous removal of is_me_message flag for which the values were not cleared out. For now, the is_private flag is always 0 since the really hard part of this migration is clearing the unspecified previous state; future commits will fully implement it actually doing something. History: Migration rewritten significantly by tabbott to ensure it runs in only 3 minutes on chat.zulip.org. A key detail in making that work was to ensure that we use the new index for the queries to find rows to update (which currently requires the `order_by` and `limit` clauses). 2018-06-24 16:49:18 +02:00			`# copied from 0177`
			`create_index_if_not_exist(`
			`index_name='zerver_usermessage_is_private_message_id',`
			`table_name='zerver_usermessage',`
			`column_string='user_profile_id, message_id',`
			`where_clause='WHERE (flags & 2048) != 0',`
			`)`

models: Add new UserMessage flag active_mobile_push_notification. This flag is used to track which user/message pairs correspond to an active mobile push notification, that should potentially be cleared when the user reads the message. This flag should never appear on a message that is also marked as read; eventually we may want a cron job to check for that condition. We include a partial index on UserMessage for this flag. 2018-08-02 01:06:14 +02:00			`# copied from 0180`
			`create_index_if_not_exist(`
			`index_name='zerver_usermessage_active_mobile_push_notification_id',`
			`table_name='zerver_usermessage',`
			`column_string='user_profile_id, message_id',`
			`where_clause='WHERE (flags & 4096) != 0',`
			`)`

Add create_large_indexes management command. This management command creates the same indexes as migrations 82, 83, and 95, which are all indexes on the huge UserMessage table. () This command quickly no-ops with clear messaging when the indexes already exist, so it's idempotent in that regard. (If somebody somehow creates an index by the same name incorrectly, they can always drop it in dbshell and re-run this command.) If any of the migrations have not been run, which we detect simply by the existence of the indexes, then we create them using a `CREATE INDEX CONCURRENTLY` command. This functionality in postgres allows you to create indexes against large tables without disrupting queries against those tables. The tradeoff here is that creating indexes concurrently takes significantly longer than doing them non-concurrently. Since most tables are small, we typically just use regular Django migrations and run them during a brief interval while the app is down. For indexes on big tables, we will want to run this command as part of the upgrade process, and we will want to run it while the app is still up, otherwise it's pointless. All the code in create_indexes() is literally copy/pasted from the relevant migrations, and that scheme should work going forward. (It uses a different implementation of create_index_if_not_exist than the migrations use, but the code is identical lexically in the function.) If we ever do major restructuring of our large tables, such as UserMessage, and we end up droppping some of these indexes, then we will need to make this command migrations-aware. For now it's safe to assume that indexes are generally additive in nature, and the sooner we create them during the upgrade process, the better. () UserMessage is huge for large installations, of course. 2017-08-16 17:42:25 +02:00			`class Command(ZulipBaseCommand):`
			`help = """Create concurrent indexes for large tables."""`

zerver/management: Use python 3 syntax for typing. 2017-10-26 11:35:57 +02:00			`def handle(self, args: Any, *options: str) -> None:`
Add create_large_indexes management command. This management command creates the same indexes as migrations 82, 83, and 95, which are all indexes on the huge UserMessage table. () This command quickly no-ops with clear messaging when the indexes already exist, so it's idempotent in that regard. (If somebody somehow creates an index by the same name incorrectly, they can always drop it in dbshell and re-run this command.) If any of the migrations have not been run, which we detect simply by the existence of the indexes, then we create them using a `CREATE INDEX CONCURRENTLY` command. This functionality in postgres allows you to create indexes against large tables without disrupting queries against those tables. The tradeoff here is that creating indexes concurrently takes significantly longer than doing them non-concurrently. Since most tables are small, we typically just use regular Django migrations and run them during a brief interval while the app is down. For indexes on big tables, we will want to run this command as part of the upgrade process, and we will want to run it while the app is still up, otherwise it's pointless. All the code in create_indexes() is literally copy/pasted from the relevant migrations, and that scheme should work going forward. (It uses a different implementation of create_index_if_not_exist than the migrations use, but the code is identical lexically in the function.) If we ever do major restructuring of our large tables, such as UserMessage, and we end up droppping some of these indexes, then we will need to make this command migrations-aware. For now it's safe to assume that indexes are generally additive in nature, and the sooner we create them during the upgrade process, the better. () UserMessage is huge for large installations, of course. 2017-08-16 17:42:25 +02:00			`create_indexes()`