Commit Graph

47 Commits

Author SHA1 Message Date
Alex Vandiver fe654b76b7 data_import: Stop tar'ing up converted data.
`./manage.py import` does not take a tarball; it takes a directory.
Making a separate tarball is a waste of CPU time and disk, as it is
never used.

This was included in the commit of the initial Slack conversion code
in 5b37c5562b and propagated from there into every conversion tool.

Remove the unnecessary tarball creation.
2023-02-26 17:42:01 -08:00
Mateusz Mandera 00b3546c9f models: Add denormalized .realm column to Message.
This commit adds the OPTIONAL .realm attribute to Message
(and ArchivedMessage), with the server changes for making new Messages
have this set. Old Messages still have to be migrated to backfill this,
before it can be non-nullable.

Appropriate test changes to correctly set .realm for Messages the tests
manually create are included here as well.
2022-10-07 10:09:38 -07:00
Mateusz Mandera 2811a1228f import_util: Make build_message only take kwargs.
build_message has a lot of arguments, so it's hard to verify correctness
of callers that just try to get the order right. It's much clearer to be
explicit via kwargs. mattermost.py and rocketchat.py already do this, so
let's bring slack.py and gitter.py up to par.
2022-09-27 15:04:48 -07:00
Mateusz Mandera d350406991 gitter: Make imported Realm start with only GitHub auth enabled.
Users will only be able to login via GitHub, because imported users
get GitHub's generated noreply email addresses - so this should be the
only auth method enabled at first, to avoid confusion.
2022-08-29 11:10:18 -07:00
Mateusz Mandera eed8800573 long_term_idle_helper: Change all_user_ids arg to an Iterator. 2022-08-29 11:03:27 -07:00
Mateusz Mandera 4c7a9816ff gitter: Soft deactivate appropriate imported users.
We want to use the long_term_idle_helper logic for gitter imports just
like we do for slack.
2022-08-29 11:03:27 -07:00
Mateusz Mandera 7ac31223e8 gitter: Extract get_user_from_message helper. 2022-08-29 11:03:27 -07:00
Mateusz Mandera a86aa13e57 gitter: Extract get_timestamp_from_message function. 2022-08-29 11:03:27 -07:00
Alex Vandiver 842cff5975 gitter: Some users (e.g. from matrix.org) may not have avatar URLs. 2022-08-29 11:03:27 -07:00
Anders Kaseorg 544bbd5398 docs: Fix capitalization mistakes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-10 09:57:26 -07:00
Anders Kaseorg 6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 72d6ff3c3b docs: Fix more capitalization issues.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-23 11:46:55 -07:00
Anders Kaseorg 61d0417e75 python: Replace ujson with orjson.
Fixes #6507.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-08-11 10:55:12 -07:00
Anders Kaseorg 60a25b2721 docs: Fix spelling errors caught by codespell.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-08-11 10:23:06 -07:00
Steve Howell c44500175d database: Remove short_name from UserProfile.
A few major themes here:

    - We remove short_name from UserProfile
      and add the appropriate migration.

    - We remove short_name from various
      cache-related lists of fields.

    - We allow import tools to continue to
      write short_name to their export files,
      and then we simply ignore the field
      at import time.

    - We change functions like do_create_user,
      create_user_profile, etc.

    - We keep short_name in the /json/bots
      API.  (It actually gets turned into
      an email.)

    - We don't modify our LDAP code much
      here.
2020-07-17 11:15:15 -07:00
Steve Howell 0b65abcdf5 pointer: Remove pointer from UserProfile.
Most of the changes here are just that we no
longer need to provide a value for pointer
when we create UserProfile objects.
2020-07-03 13:08:40 +00:00
Anders Kaseorg 74c17bf94a python: Convert more percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Now including %d, %i, %u, and multi-line strings.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 23:27:22 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 67e7a3631d python: Convert percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-10 15:02:09 -07:00
Cyril Cohen 0d6f80059b
gitter import: Subscribe every user to every stream. 2020-05-05 21:31:35 -07:00
Cyril Cohen 5598f8f6b0 gitter: Support importing data from multiple Gitter rooms.
**Features:**
Improving `./manage.py convert_gitter_data`
- If messages have been post-processed to add a 'room' field, we
  create as many streams as existing rooms.
- Messages with a 'room' field go to the corresponding stream.
- This modification is backward compatible. I.e.
  + messages that have no 'room' field go to the default stream/topic
  + messages that do, go to a specific stream

**Implementation:**
- adding a map `stream_map` to map room names to stream ids
- create as many streams as room field messages + 1 default streamFeatures:
- If messages have been post-processed to add a 'room' field to messages,
  we create as many streams as existing rooms.
- Up to renaming of the default stream/topic, this modification is
  backwards compatible.
  I.e. messages that have no 'room' field go to the default stream/topic
       messages that do, go to a specific stream

Implementation:
- adding a map stream_map to map room names to stream ids
- create as many streams as room field messages + 1 default stream

Takes advantage of https://github.com/minrk/archive-gitter/pull/5.
2020-05-02 10:30:18 -07:00
Anders Kaseorg bdc365d0fe logging: Pass format arguments to logging.
https://docs.python.org/3/howto/logging.html#optimization

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-02 10:18:02 -07:00
Anders Kaseorg fead14951c python: Convert assignment type annotations to Python 3.6 style.
This commit was split by tabbott; this piece covers the vast majority
of files in Zulip, but excludes scripts/, tools/, and puppet/ to help
ensure we at least show the right error messages for Xenial systems.

We can likely further refine the remaining pieces with some testing.

Generated by com2ann, with whitespace fixes and various manual fixes
for runtime issues:

-    invoiced_through: Optional[LicenseLedger] = models.ForeignKey(
+    invoiced_through: Optional["LicenseLedger"] = models.ForeignKey(

-_apns_client: Optional[APNsClient] = None
+_apns_client: Optional["APNsClient"] = None

-    notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
-    signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)

-    author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)
+    author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)

-    bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)
+    bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)

-    default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
-    default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)

-descriptors_by_handler_id: Dict[int, ClientDescriptor] = {}
+descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {}

-worker_classes: Dict[str, Type[QueueProcessingWorker]] = {}
-queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {}
+worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {}
+queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {}

-AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None
+AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-22 11:02:32 -07:00
Anders Kaseorg c734bbd95d python: Modernize legacy Python 2 syntax with pyupgrade.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-09 16:43:22 -07:00
Vishnu Ks 5e6d86c8c4 slack_import: Support importing multiparty IMs. 2019-07-09 15:03:28 -07:00
Anders Kaseorg 643bd18b9f lint: Fix code that evaded our lint checks for string % non-tuple.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-04-23 15:21:37 -07:00
Vishnu Ks 6e3720e0b7 gitter: Fix minor comment typo in build_userprofile. 2019-03-20 10:12:18 -07:00
Anders Kaseorg 56a675d5ec export: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:25:27 -08:00
Tim Abbott e9900b2bdf gitter: Do something reasonable with invalid fullnames. 2018-12-12 10:07:52 -08:00
Steve Howell d86dd165da gitter/slack/hipchat: Remove "subject" from conversions.
We (lexically) remove "subject" from the conversion code.  The
`build_message` helper calls `set_topic_name` under the hood,
so things still have "subject" in the JSON.

There was good code coverage on `build_message`.
2018-11-12 15:47:11 -08:00
Steve Howell 5cb60f7bea conversions: Use subscriber_map for Slack/Gitter.
We now use subscriber_map for building UserMessage
rows in Slack/Gitter conversions.

This is mostly designed to simplify the code, rather
than having to scan the entire subscribers for each
message.

I am guessing this will improve performance for most
conversions.  We sort small lists on every message,
in order to be deterministic, but the sorting cost
is probably more than offset by avoiding the O(N)
scans across all subscriptions.  Also, it's probably
negligible in the grand scheme of things, compared
to JSON parsing, file I/O, etc.

This commits also fixes some typos with mentioned_users_id ->
mentioned_user_ids and cleans up a test a bit as well.
2018-10-29 13:24:50 -07:00
Steve Howell 5194701787 conversions: Use NEXT_ID for usermessage_id.
This is mostly complicated due to the way that the
Slack import passes around tuples of ids to maintain
four different parallel sequences.
2018-10-29 13:24:50 -07:00
Tim Abbott f9b6eeb488 import: Migrate from json to ujson for better perf.
We expect to get better memory performace from
ujson than json.

We also do a better job of closing file handles.

This likely fixes #10377.
2018-10-17 12:11:08 -07:00
Steve Howell 23d7b3d2cc import: De-dup create_converted_data_files helper. 2018-10-13 16:47:41 -07:00
Rhea Parekh f70b9a3eba import: Move 'build_message' to import_util. 2018-08-19 22:27:13 -07:00
Rhea Parekh a5bc701181 import: Move 'build_stream' to import_util. 2018-08-19 22:27:13 -07:00
Rhea Parekh c4f8abbd30 import: Build Message with the model class. 2018-08-19 22:27:13 -07:00
Rhea Parekh 4ea7302e14 import: Add missing fields in UserProfile object.
The missing fields are checked by `full_clean()` method.
The datetime field errors are ignored as they are fixed
in the `import_realm` script. The field that are
allowed to be null are not included while building
this object.
2018-08-19 22:27:13 -07:00
Rhea Parekh c77763bd8e import: Move 'build_realm' to import_util. 2018-08-19 22:27:13 -07:00
Rhea Parekh b6ccc0bc52 import: Move 'build_defaultstream' to import_util. 2018-08-07 16:45:42 -07:00
Rhea Parekh bee3964f14 import: Move 'build_usermessages' to import_util. 2018-08-07 16:45:42 -07:00
Rhea Parekh 30cc7354eb import: Move 'process_avatars' to import_util. 2018-08-07 16:45:40 -07:00
Rhea Parekh 87cc1a6280 import: Move 'build_subscription' and 'build_recipient' to import_util. 2018-08-07 16:35:56 -07:00
Rhea Parekh a516f80646 import: Move 'build_avatar' to import_util. 2018-08-07 16:35:56 -07:00
Rhea Parekh 1117455a90 import: Move 'ZerverFieldsT' and 'build_zerver_realm' to import_util. 2018-08-07 16:35:56 -07:00
Rhea Parekh ee37866687 import: Add gitter import file in zerver/data_import directory. 2018-08-01 11:52:14 -07:00