zulip/zerver/lib
Florian Pritz f37ac80384 import_realm: Speed up first_message_id calculation.
On my data (about 10 million messages in 1600 streams) this used to take
about 40 hours, while the improved statement completes in roughly 30
seconds.

The old solution had postgres go through the entire table until the
first match for each stream. Thus, the time spent scanning the table
got longer and longer for each stream because postgres always started at
the beginning (and somehow it did not use any indices) and had to skip
over all rows until it found the first message from the stream that is
was looking for each time.

This new statement just performans a bulk operation, scanning the table
only once and then inserts the results directly into the destination
table.

Slightly more verbose inforation about this change can be found in:
https://chat.zulip.org/#narrow/stream/31-production-help/topic/Import.20Rocketchat.20data/near/1408867

Signed-off-by: Florian Pritz <bluewind@xinu.at>
2022-10-17 11:43:21 -07:00
..
markdown linkifiers: Support %20 in URLs for topic links. 2022-10-11 14:31:13 -07:00
url_preview python: Use Python 3.8 typing.{Protocol,TypedDict}. 2022-04-27 12:57:49 -07:00
webhooks python: Clean up getattr, setattr, delattr calls with literal names. 2022-10-10 08:40:28 -07:00
__init__.py
addressee.py docs: Fix many spelling mistakes. 2022-02-07 18:51:06 -08:00
alert_words.py docs: Remove highlight parameters from links. 2022-02-16 13:15:39 -08:00
async_utils.py requirements: Upgrade Python requirements. 2022-05-03 10:10:06 -07:00
attachments.py
avatar.py avatar: Remove ?x=x kludge. 2021-10-14 12:47:43 -07:00
avatar_hash.py settings: Make AVATAR_SALT mandatory. 2022-08-25 12:13:03 -07:00
bot_config.py bot_config: Placate mypy 0.930. 2021-12-28 09:31:55 -08:00
bot_lib.py actions: Split out zerver.actions.message_send. 2022-04-14 17:14:34 -07:00
bot_storage.py
bulk_create.py streams: Set can_remove_subscribers_group while creating streams. 2022-09-14 16:03:11 -07:00
cache.py cache: Log a warning when attempting to store a whole QuerySet. 2022-10-12 22:25:48 -07:00
cache_helpers.py typing: Import ValuesQuerySet alias from django_stubs_ext. 2022-10-05 16:15:56 -07:00
camo.py typing: Apply trivial none-checks with assertions as necessary. 2022-06-23 19:25:48 -07:00
ccache.py
compatibility.py requirements: Upgrade to Django 4.0. 2022-07-13 16:07:17 -07:00
context_managers.py
create_user.py python: Clean up getattr, setattr, delattr calls with literal names. 2022-10-10 08:40:28 -07:00
data_types.py
db.py db: Use cursor_factory psycopg2 option. 2022-07-05 17:54:17 -07:00
debug.py python: Accept Optional[FrameType] in signal handlers. 2021-12-28 09:31:55 -08:00
dev_ldap_directory.py python: Use a real parser for email addresses. 2022-07-29 15:47:33 -07:00
digest.py typing: Broaden type annotations for QuerySet compatibility. 2022-07-07 11:27:42 -07:00
display_recipient.py typing: Import ValuesQuerySet alias from django_stubs_ext. 2022-10-05 16:15:56 -07:00
domains.py
drafts.py typing: Remove ViewFuncT. 2022-08-22 15:46:16 -07:00
email_mirror.py email_mirror: Remove limits (expiry, max uses) to improve usability. 2022-09-16 18:07:28 -07:00
email_mirror_helpers.py email_mirror: Move ZulipEmailForwardUserError into email_mirror_helpers. 2021-08-31 16:37:54 -07:00
email_notifications.py email_notifications: Highlight personal mentions in explanations. 2022-09-29 15:54:21 -07:00
email_validation.py python: Use a real parser for email addresses. 2022-07-29 15:47:33 -07:00
emoji.py emoji: Add which emoji are supported to the /register call. 2022-08-26 17:58:31 -07:00
error_notify.py error_notify: Fix type narrowing of settings.ERROR_BOT. 2022-07-15 14:00:56 -07:00
event_schema.py message_edit: Send only changed settings in event data and api response. 2022-09-28 11:47:40 -07:00
events.py events: Send empty list for custom_profile_fields in spectator view. 2022-10-14 13:05:35 -07:00
exceptions.py exceptions: Guard validation error conversion with message_dict. 2022-07-26 14:17:46 -07:00
export.py export: Remove unnecessary if in export with consent code. 2022-09-27 11:56:27 -07:00
external_accounts.py typing: Import StrPromise alias from django_stubs_ext. 2022-10-05 16:15:56 -07:00
fix_unreads.py python: Use format string for logging str(obj). 2022-10-10 08:32:29 -07:00
generate_test_data.py
github.py fetch-contributor-data: Use builtin backoff. 2021-09-01 05:34:13 -07:00
home.py home: Prevent mypy from inferring the type of page_params. 2022-06-23 22:09:05 -07:00
hotspots.py typing: Import StrPromise alias from django_stubs_ext. 2022-10-05 16:15:56 -07:00
html_diff.py html_diff: Handle empty differences between empty strings. 2021-10-18 18:27:40 -07:00
html_to_text.py
i18n.py django: Use HttpRequest.headers. 2022-05-13 20:42:20 -07:00
import_realm.py import_realm: Speed up first_message_id calculation. 2022-10-17 11:43:21 -07:00
initial_password.py initial_password: Add explicit development environment assertion. 2022-03-21 12:05:59 -07:00
integrations.py typing: Import StrPromise alias from django_stubs_ext. 2022-10-05 16:15:56 -07:00
logging_util.py python: Clean up getattr, setattr, delattr calls with literal names. 2022-10-10 08:40:28 -07:00
management.py python: Replace avoidable uses of __special__ attributes. 2022-10-10 08:32:29 -07:00
mdiff.py python: Replace universal_newlines with text. 2022-01-23 22:16:01 -08:00
mention.py markdown: Update characters allowed before @ and stream mentions. 2022-08-06 19:29:39 -07:00
message.py typing: Import ValuesQuerySet alias from django_stubs_ext. 2022-10-05 16:15:56 -07:00
migrate.py
mobile_auth_otp.py
name_restrictions.py name_restrictions: Add your-org.zulipchat.com as a reserved name. 2022-05-17 14:58:31 -07:00
narrow.py message_fetch: Move narrowing query builder to zerver.lib.narrow. 2022-09-27 17:02:10 -07:00
notes.py notes: Separate __notes_map per-subclass. 2022-10-10 08:42:13 -07:00
notification_data.py notifications: Move user group mentions helpers together. 2022-04-27 16:43:54 -07:00
onboarding.py onboarding: Use dictionary comprehension for dict initialization. 2022-08-06 16:21:12 -07:00
outgoing_http.py python: Replace requests.packages.urllib3 alias with urllib3. 2022-01-23 22:14:17 -08:00
outgoing_webhook.py actions: Split out zerver.actions.message_send. 2022-04-14 17:14:34 -07:00
presence.py user-presence: Refactor function names with "status" for clarity. 2022-09-23 12:27:54 -07:00
profile.py profile: Strengthen decorator types using ParamSpec. 2022-04-14 12:44:35 -07:00
push_notifications.py python: Mark dict parameters with defaults as read-only. 2022-10-06 13:48:28 -07:00
pysa.py
queue.py requirements: Upgrade to Tornado 6. 2022-05-02 17:41:49 -07:00
rate_limiter.py rate_limit: Remove rate_limit_ip. 2022-08-17 12:05:38 -07:00
realm_description.py
realm_icon.py
realm_logo.py realm: Rename plan type constants to be more descriptive. 2021-10-19 12:20:39 -07:00
recipient_users.py actions: Split out zerver.lib.recipient_users. 2022-04-14 17:14:30 -07:00
redis_utils.py
remote_server.py python: Replace avoidable uses of __special__ attributes. 2022-10-10 08:32:29 -07:00
request.py python: Clean up getattr, setattr, delattr calls with literal names. 2022-10-10 08:40:28 -07:00
response.py response: Replace json_unauthorized with UnauthorizedError. 2022-07-18 18:01:42 -07:00
rest.py typing: Remove ViewFuncT. 2022-08-22 15:46:16 -07:00
retention.py retention: Inline move_rows query arguments. 2022-07-30 06:46:34 -07:00
safe_session_cached_db.py session: Enforce that changes cannot happen in a transaction. 2022-03-15 13:52:15 -07:00
scim.py mypy: Enable redundant-expr errors. 2022-06-23 19:22:12 -07:00
scim_filter.py scim: Order Users by id when queried using filter syntax. 2021-11-26 16:06:16 -08:00
send_email.py python: Use format string for logging str(obj). 2022-10-10 08:32:29 -07:00
server_initialization.py realms: Create default system user groups for internal realm. 2022-08-11 04:38:36 -07:00
sessions.py typing: Add none-checks for miscellaneous cases. 2022-05-31 09:43:55 -07:00
singleton_bmemcached.py cache: Instantiate only one BMemcached cache backend. 2022-05-02 17:41:49 -07:00
soft_deactivation.py soft_deactivation: Tighten function signatures with generic QuerySet. 2022-07-07 11:28:13 -07:00
sounds.py actions: Split out zerver.lib.sounds. 2022-04-14 14:26:40 -07:00
sqlalchemy_utils.py sqlalchemy_utils: Remove NonClosingPool.recreate override. 2022-02-10 11:59:41 -08:00
storage.py storage: Fix type annotation of content. 2022-07-27 13:46:13 -07:00
stream_color.py streams: Extract stream_color library. 2022-03-14 18:01:36 -07:00
stream_subscription.py typing: Import ValuesQuerySet alias from django_stubs_ext. 2022-10-05 16:15:56 -07:00
stream_topic.py stream_topic: Refactor user_ids_muting_topic. 2022-09-27 17:18:48 -07:00
stream_traffic.py streams: Extract stream_traffic library. 2022-03-14 18:01:36 -07:00
streams.py streams: Fix can_remove_subscribers_from_stream type. 2022-09-19 13:53:44 -07:00
string_validation.py email_mirror: Replace disallowed characters in incoming email subject. 2022-08-22 17:16:20 -07:00
subdomains.py subdomains: Fix realm=None case for is_static_or_current_realm_url. 2022-10-06 15:15:10 -07:00
subscription_info.py typing: Use django-stubs' type annotations for QuerySet. 2022-10-05 16:15:56 -07:00
templates.py templates: Provide proper error message if entrypoint is not defined. 2022-08-30 16:02:06 -07:00
test_classes.py test_classes: Create a dedicate helper for query count check. 2022-10-17 11:32:52 -07:00
test_console_output.py requirements: Upgrade Python requirements. 2022-05-03 10:10:06 -07:00
test_data.source.txt Rename default branch to ‘main’. 2021-09-06 12:56:35 -07:00
test_fixtures.py test_fixtures: Rebuild database when create_realm.py changes. 2022-08-12 13:16:35 -07:00
test_helpers.py test_helpers: Tighten type annotation for queries_captured. 2022-10-17 11:32:52 -07:00
test_runner.py requirements: Upgrade Django to 4.1. 2022-10-06 15:59:07 -07:00
tex.py python: Replace universal_newlines with text. 2022-01-23 22:16:01 -08:00
thumbnail.py docs: Remove some outdated references to thumbnailing.md doc. 2022-07-12 17:44:24 -07:00
timeout.py timeout: Minor comment cleanups. 2022-04-07 17:26:01 -07:00
timestamp.py docs: Add missing space in “time zone”. 2022-02-24 14:05:12 -08:00
timezone.py timezone: Improve tzdata parser’s compatibility with zic(8). 2022-09-20 16:58:31 -07:00
topic.py topic: Add a None check with an assertion. 2022-08-12 17:08:04 -07:00
transfer.py python: Clean up getattr, setattr, delattr calls with literal names. 2022-10-10 08:40:28 -07:00
types.py typing: Import StrPromise alias from django_stubs_ext. 2022-10-05 16:15:56 -07:00
unminify.py
upload.py upload: Remove `mimetype` url parameter in `get_file_info`. 2022-08-08 16:06:09 -07:00
url_encoding.py python: Use a real parser for email addresses. 2022-07-29 15:47:33 -07:00
url_redirects.py help: Restructure "Mastering the compose box" article. 2022-09-22 15:20:37 -07:00
user_agent.py
user_counts.py actions: Split out zerver.lib.user_counts. 2022-04-14 17:14:30 -07:00
user_groups.py typing: Import ValuesQuerySet alias from django_stubs_ext. 2022-10-05 16:15:56 -07:00
user_message.py actions: Split out zerver.lib.user_message. 2022-04-14 17:14:30 -07:00
user_mutes.py
user_status.py user-status: Stop updating the UserStatus model for `away` updates. 2022-09-23 12:27:54 -07:00
user_topics.py user_topics: Refactor the construction loop for UserTopicDict. 2022-08-11 13:45:54 -07:00
users.py python: Mark dict parameters with defaults as read-only. 2022-10-06 13:48:28 -07:00
utils.py
validator.py requirements: Upgrade to Django 4.0. 2022-07-13 16:07:17 -07:00
widget.py
zcommand.py actions: Split out zerver.actions.user_settings. 2022-04-14 17:14:34 -07:00
zephyr.py