Commit Graph

206 Commits

Author SHA1 Message Date
Tim Abbott 412dc8dcda import: Set last_modified in import_uploads_local.
This has no effect other than to make the S3 and local code paths more
nearly identical.
2018-12-05 16:15:01 -08:00
Tim Abbott d8d0492d64 import: Restructure uploads path logic to be more similar.
This is preparation for future deduplication of the two redundant
uploads backends.
2018-12-05 16:15:01 -08:00
Tim Abbott 671ceccd78 import: Deduplicate medium avatars special logic.
This requires a bit of care with upload_backend to avoid breaking how
we mock that class in our tests.
2018-12-05 16:15:01 -08:00
Tim Abbott 36b43a6d7a import: Deduplicate first block of import_uploads logic. 2018-12-05 16:15:01 -08:00
Tim Abbott f80bab58c0 import_realm: Add progress indicator for importing uploads.
This makes it easier to see how we're doing when uploading a very
large number of files.
2018-12-05 16:15:01 -08:00
Steve Howell 88f50b97fd import: Render content before inserting messages.
By rendering content before bulk importing messages,
we avoid O(N) database hops.
2018-11-07 10:33:11 -08:00
Steve Howell bf3f7d93d0 Simplify params for fix_message_rendered_content. 2018-11-07 10:33:11 -08:00
Steve Howell 0878d86706 import: Avoid unnecessary Message lookups.
We now no longer go the DB to get a Message object
during render.
2018-11-07 10:33:11 -08:00
Steve Howell 1e12b13a56 import: Avoid unnecessary sender lookups.
This commit speeds up the import by avoiding
sender lookups and instead using the data
for users that we already have in memory.

This avoids a few DB hops, many hops to memcached,
plus some object construction.

We now call do_render_markdown() directly.  This
also makes it more explicit that the import has
never rendered alert words.
2018-11-07 10:33:10 -08:00
Steve Howell f9a7451167 import: Pass in realm to render codepath.
We avoid querying the same realm multiple times.
2018-11-07 10:08:46 -08:00
Steve Howell 92a7f04149 import: Inline save_message_rendered_content().
This function requires a message object, whereas
we want to work with JSON data to avoid necessary
queries when we import data.  Inlining the function
sets us up for a subsequent refactoring.

We change the way we deal with theoretical return
values of `None` to use an assertion; otherwise,
we would have to loosen up a bunch of mypy types
from `str` to `Optional[str]`.  It's not clear `None`
is even possible--we've moved toward throwing exceptions
there instead of silently failing.
2018-11-07 10:08:45 -08:00
Tim Abbott e14a35b490 import: Don't assume a last_modified key is present.
This fixes an exception when importing uploaded file data from
Slack/HipChat.
2018-11-07 09:52:35 -08:00
Tim Abbott 1bf385e35f import: Avoid sending a content-type of None to S3.
The previous logic was incorrect, in that if `content_type` was set to
None (which happens with Slack/HipChat export, among other things),
then we wouldn't run the `guess_type` logic to auto-detect the
Content-Type to send to S3.
2018-11-06 13:03:14 -08:00
Steve Howell a092bee6b3 import: Reduce memory usage for UserMessage ids.
The UserMessage table can be huge, so creating a
bunch of entries in `ID_MAP` can overflow memory.

We don't have any tables that depend on `UserMessage`,
and we don't send the 'id' fields from `zerver_usermessage`
to the database, so re-mapping them was just busy-work.
2018-11-05 10:18:01 -08:00
Steve Howell 53436b4b41 import: Rename id_maps -> ID_MAP. 2018-10-23 17:27:37 -05:00
Steve Howell bd9e4ef0c8 import: Use pub_date to sort message ids.
When we create new ids for message rows, we
now sort the new ids by their corresponding
pub_date values in the rows.

This takes a sizable chunk of memory.

This feature only gets turned on if you
set sort_by_date to True in realm.json.
2018-10-23 17:27:37 -05:00
Steve Howell 2d4b09f59d utils: Add process_list_in_batches(). 2018-10-15 10:54:23 -07:00
Steve Howell 493aae2958 imports: Make loading UserMessage faster and more robust.
We use UserMessageLite to avoid Django overhead, and we
do updates in chunks of 10000.  (The export may be broken
into several files already, but a reasonable chunking at
import time is good defense against running out of memory.)
2018-10-13 16:43:28 -07:00
Steve Howell 329154da32 import: Speed up create_subscription_events().
The code was needlessly querying the DB to get full
objects for entities where we only needed user_id,
realm_id, and stream_id.

With my test data of ~1000 records this sped up the
function from ~8s to ~0.5s.  The speedup would probably
be even more for larger data sets.
2018-10-02 16:55:16 -07:00
Tim Abbott a0451b692f import: Move zerver_client import before realm import.
This table is independent of the realm/stream table dance, and moving
it here helps makes the flow read more clearly.
2018-09-21 10:58:24 -07:00
Rishi Gupta b470cef864 import: Set Realm.plan_type to SELF_HOSTED on import.
Tweaked by tabbott to avoid an unnecessary .save().
2018-09-21 10:57:22 -07:00
Tim Abbott e2bd03365e import: Fix handling of recipient IDs for welcome bot.
If any user had sent the reply to the welcome bot recommended by our
tutorial, then the Zulip export/import process didn't work properly,
because we weren't including (and then remapping) the recipient ID for
sending PMs to the cross-realm bots.  This commit fixes that gap, by
recording the necessary data on the export side, and doing the
appropriate remapping on the import side.
2018-09-20 17:55:17 -07:00
Tim Abbott c9189439de import: Handle signup_notifications_stream_id.
Previously, our realm import logic only did the special remapping
logic for the original notifications_stream_id; when we added the new
signup_notifications_stream_id field, we neglected to handle it in the
same way.
2018-09-20 17:41:55 -07:00
Rhea Parekh 20bca1409f import: Set emoji records 'last_modified' value in 'import_uploads_s3'.
The 'last_modified' value in emoji records is
needed for uploading the file to the S3 backend.
We set the same in the function 'import_uploads_s3'.

We also have to remove the keyword 'last_modified'
while building the RealmEmoji dict, as it is not
a field which exists in RealmEmoji objects.
2018-08-10 16:20:36 -07:00
Tim Abbott 2f6f38fa7f import: Guess upload content-types when unavailable from export.
This is mostly for exports from other software like Slack, that might
not provide a content-type.
2018-08-10 09:32:28 -07:00
Tim Abbott 1ecbf49c93 import: Don't assume user_profile_id attribute is set on emojis.
The s3 import code path made a hard assumption about `user_profile_id`
being set (we'd already fixed this in the local uploads code path).

Ideally, it should be, and I've opened #10268 for fixing that, but for
now this is how it needs to work.
2018-08-10 09:32:18 -07:00
Tim Abbott 4c4b6d105e import: Fix re-rendering of markdown for Zulip->Zulip imports.
The code added in 26300110ca was only
needed for importing data from Slack, Gitter, or another tool which
doesn't use Zulip's markdown format.
2018-08-09 15:15:50 -07:00
Rhea Parekh 26300110ca import: Fix rendered_content in imported messages.
After the messages have been imported, set the rendered_content of the
messages instead of leaving its value to be 'None'.

This is important to ensure that:
(1) Performance for users is good after completing the import.
(2) The database's full-text indexes have all of the imported messages
(which only happens properly when Message rows have their
rendered_content field edited).

Fixes #9168.
2018-08-09 15:12:53 -07:00
Yago González 6a192ac84c utils: Move random API key generator as generate_api_key.
random_api_key, the function we use to generate random tokens for API
keys, has been moved to zerver/lib/utils.py because it's used in more
parts of the codebase (apart from user creation), and having it in
zerver/lib/create_user.py was prone to cyclic dependencies.

The function has also been renamed to generate_api_key to have an
imperative name, that makes clearer what it does.
2018-08-08 16:45:25 -07:00
Tim Abbott 38dd9e49de import_realm: Add comments for update_model_ids.
This basically explains why in these cases we delay doing
bulk_import_model.
2018-07-26 16:13:14 -07:00
Rhea Parekh fca6bc91aa import: Add 'get_db_table' function.
Implement this function in 'bulk_import_model'
and 'update_model_ids'.

This lets us save on redundant-feeling arguments in these
frequently-called helper functions.
2018-07-26 16:07:31 -07:00
Rhea Parekh 8803ac4af8 import: Fix BotStorageData and BotConfigData import.
The function 'update_model_ids' should be used on
the models BotStorageData and BotConfigData.
It is wrongly added here for UserGroup model.

Also the sequence name for BotStorageData and
BotConfigData is 'zerver_botuserstatedata_id_seq' and
'zerver_botuserconfigdata_id_seq' respectively, which
should be specifically mentioned in the function
'allocate_ids'.

This fixes some nondeterministic test failures.
2018-07-23 11:24:17 -07:00
Tim Abbott 839300d781 import: Fix typo in re_map_foreign_keys_many_to_many.
This was introduced when I rebased together the two implementations of
this function.
2018-07-23 10:19:04 -07:00
Rhea Parekh fe299277f3 import: Import BotStorageData and BotConfigData. 2018-07-23 08:21:00 -07:00
Rhea Parekh 2978e025df import: Import UserGroup. 2018-07-23 08:21:00 -07:00
Rhea Parekh bc2307108d import: Import Service. 2018-07-23 08:20:58 -07:00
Rhea Parekh f444e5b628 import: Import MutedTopic. 2018-07-23 08:20:58 -07:00
Rhea Parekh 091d101e7d import: Import UserHotspot. 2018-07-23 08:20:58 -07:00
Rhea Parekh 6f7b7e143f import: Map user IDs for custom profile field of type 'USER'.
The CustomProfileField object which has the `field_type`
`USER` needs to be updated with the new user IDs.
2018-07-23 08:19:04 -07:00
Rhea Parekh 3879e345d9 import: Add function to re-map foreign keys for ManyToMany fields.
This will be used while for any ManyToMany field which
is being imported.
We add an internal function which takes in the old ID list
of the ManyToMany field and return the new updated ID list.
2018-07-23 08:19:04 -07:00
Anders Kaseorg a0293e8a86 zerver/lib/import_realm.py: Avoid shelling out for mkdir.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2018-07-19 10:43:37 -07:00
Rhea Parekh e9884916c9 import: Support import of huddles.
For importing huddles we have to have unique huddle hashes.
Huddle hashes are extracted from the list of users participating
in a huddle. So to extract these user ids, we first use huddle
id to getting the matching recipient, and then we use subscription
to get the user ids from the recipient id.

Added tests for the same (tests slightly tweaked by tabbott).
2018-07-12 19:06:52 +05:30
Rhea Parekh 4bbccd8287 import: import RealmAuditLog when 'zerver_realmauditlog` is missing.
* If `zerver_realmauditlog` is present in the exported data,
  `RealmAuditLog` would be imported normally.

* If it is not present, `create_subscription_events`
  function in would create the `subscription_created`
  events for RealmAuditLog. The reason this function
  is in `import_realm` module and not in the individual
  export tool scripts (like Slack) is because this
  function would be common for all export tools.

This fixes #9846 for users who have not already done an import of
their organization from Slack.

Fixes #9846.
2018-07-10 16:00:19 +05:30
Rhea Parekh 70b4794816 import: import RealmAuditLog. 2018-07-10 15:53:15 +05:30
Rhea Parekh d1ba6bae03 import: 'processing_emojis' and 'processing_avatars' should now be True together.
Raise an exception when the fields
'processing_emojis' and 'processing_avatars' are
True at the same time. Also add test for the same.
2018-06-18 23:06:09 +05:30
Rhea Parekh 4d21f7f747 import: 'attachment_path' should be saved with the 's3_path' of the record.
For the S3 backend uploads, 'attachment_path' should be
saved with the 's3_path' of the record, as the original
'path' is changed while exporting files from s3. (See
function 'export_files_from_s3' in export.py for reference.)
2018-06-18 23:06:01 +05:30
Rhea Parekh 0730087111 import: Add elif condition for 'processing_emojis' in 'import_uploads_s3'.
'processing_emojis' should have an 'elif' condition here as
we want the function to work for avatars, emojis or uploads
one at a time.
2018-06-18 23:04:18 +05:30
Rhea Parekh f2b5f5a8f9 import: Fix processing_avatars bug in 'import_uploads_s3'.
All the avatars should be processed later on to run the
'ensure_medium_avatar_image' function. This is similar to
'import_uploads_local'.
2018-06-18 22:37:34 +05:30
Rhea Parekh f66ca9a5c3 import: Pass 'processing_emojis' in 'import_uploads_s3'.
'import_uploads_s3' should be passed with the parameter
'processing_emojis' from 'import_uploads'.
2018-06-18 22:35:36 +05:30
Tim Abbott 328136344a import: Fix typo in zerver_customprofilefieldvalue table name.
Apparently, we were doing this slightly wrong.
2018-05-31 10:47:27 -07:00
Rhea Parekh 7198cc3899 import: Fix RealmEmoji import bug.
RealmEmoji should be imported after UserProfile,
as the new user_profile ids are not allocated
if we import it before.
2018-05-27 21:54:20 -07:00
Rhea Parekh 1b7b9a7164 import: Fix reaction import bug.
In 'zerver_reaction', the emoji_code should be updated
with the RealmEmoji allocated id when the 'reaction_type'
is 'realm_emoji'. Hence we add an extra field 'reaction_field'
in 're_map_foreign_keys', to process the above mentioned
condition.
2018-05-27 21:54:20 -07:00
Rhea Parekh c79d7f1070 Import: Move zerver_reaction from 'messages-000001.json' to 'realm.json'.
Also change the existing slack conversion script structure, to
include 'zerver_realm' in 'realm.json'.
2018-05-27 21:54:20 -07:00
Rhea Parekh c24c249b8c export: Support export of Custom Profile Field. 2018-05-23 09:07:26 -07:00
Aditya Bansal a68376e2ba zerver/lib: Change use of typing.Text to str. 2018-05-12 15:22:39 -07:00
Tim Abbott c4b886d8ae import: Split out import.py into its own module.
This should make it a bit easier to find the code.
2018-04-23 15:21:12 -07:00