zulip/zerver
Florian Pritz f37ac80384 import_realm: Speed up first_message_id calculation.
On my data (about 10 million messages in 1600 streams) this used to take
about 40 hours, while the improved statement completes in roughly 30
seconds.

The old solution had postgres go through the entire table until the
first match for each stream. Thus, the time spent scanning the table
got longer and longer for each stream because postgres always started at
the beginning (and somehow it did not use any indices) and had to skip
over all rows until it found the first message from the stream that is
was looking for each time.

This new statement just performans a bulk operation, scanning the table
only once and then inserts the results directly into the destination
table.

Slightly more verbose inforation about this change can be found in:
https://chat.zulip.org/#narrow/stream/31-production-help/topic/Import.20Rocketchat.20data/near/1408867

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-10-17 11:43:21 -07:00
..
actions message_edit: Support sending notifications with topic changes. 2022-10-11 11:35:41 -07:00
data_import slack: Skip files where file_access: access_denied. 2022-10-11 10:53:16 -07:00
integration_fixtures/nagios
lib import_realm: Speed up first_message_id calculation. 2022-10-17 11:43:21 -07:00
management python: Clean up getattr, setattr, delattr calls with literal names. 2022-10-10 08:40:28 -07:00
migrations migrations: Remove noop realm filters operations. 2022-10-14 17:52:28 -07:00
openapi api-docs: Update examples of queue_id for uuid format. 2022-10-13 10:08:42 -07:00
tests test_classes: Create a dedicate helper for query count check. 2022-10-17 11:32:52 -07:00
tornado python: Replace avoidable uses of __special__ attributes. 2022-10-10 08:32:29 -07:00
views message_edit: Support sending notifications with topic changes. 2022-10-11 11:35:41 -07:00
webhooks python: Mark dict parameters with defaults as read-only. 2022-10-06 13:48:28 -07:00
worker python: Use format string for logging str(obj). 2022-10-10 08:32:29 -07:00
__init__.py
apps.py sentry: Initialize sentry in AppConfig ready hook. 2022-09-26 12:42:36 -07:00
context_processors.py templates: Rename `OPEN_GRAPH` variables to `PAGE` or `PAGE_METADATA`. 2022-09-06 14:57:06 -07:00
decorator.py python: Clean up getattr, setattr, delattr calls with literal names. 2022-10-10 08:40:28 -07:00
filters.py typing: Fix function signatures. 2021-08-20 05:54:19 -07:00
forms.py forms: Fix another 500 error on realm creation with invalid email. 2022-09-19 14:12:32 -07:00
logging_handlers.py python: Use Python 3.8 typing.{Protocol,TypedDict}. 2022-04-27 12:57:49 -07:00
middleware.py python: Clean up getattr, setattr, delattr calls with literal names. 2022-10-10 08:40:28 -07:00
models.py cache: Only cache list results of QuerySets, not the QuerySet itself. 2022-10-12 22:25:48 -07:00
signals.py requirements: Upgrade to Django 4.0. 2022-07-13 16:07:17 -07:00