Commit Graph

181 Commits

Author SHA1 Message Date
Anders Kaseorg 46babbe9e1 import_realm: Close the memcached connection before forking.
This prevents the memcached connection from being shared across
multiple processes, and hopefully addresses unexpected behavior from
cached functions like get_user_profile_by_id invoked inside the worker
processes.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-01 11:20:39 -07:00
Anders Kaseorg 7f410ff0de import_realm: Migrate from run_parallel to multiprocessing.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-09-14 16:22:23 -07:00
Anders Kaseorg b7b7475672 python: Use standard secrets module to generate random tokens.
There are three functional side effects:

• Correct an insignificant but mathematically offensive bias toward
repeated characters in generate_api_key introduced in commit
47b4283c4b4c70ecde4d3c8de871c90ee2506d87; its entropy is increased
from 190.52864 bits to 190.53428 bits.

• Use the base32 alphabet in confirmation.models.generate_key; its
entropy is reduced from 124.07820 bits to the documented 120 bits, but
now it uses 1 syscall instead of 24.

• Use the base32 alphabet in get_bigbluebutton_url; its entropy is
reduced from 51.69925 bits to 50 bits, but now it uses 1 syscall
instead of 10.

(The base32 alphabet is A-Z 2-7.  We could probably replace all of
these with plain secrets.token_urlsafe, since I expect most callers
can handle the full urlsafe_b64 alphabet A-Z a-z 0-9 - _ without
problems.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-09 15:52:57 -07:00
Anders Kaseorg 02725d32dd python: Rewrite list() as [].
Suggested by the flake8-comprehensions plugin.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-02 11:15:41 -07:00
Anders Kaseorg ab120a03bc python: Replace unnecessary intermediate lists with generators.
Mostly suggested by the flake8-comprehension plugin.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-02 11:15:41 -07:00
Anders Kaseorg 61d0417e75 python: Replace ujson with orjson.
Fixes #6507.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-08-11 10:55:12 -07:00
Anders Kaseorg 768f9f93cd docs: Capitalize Markdown consistently.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-08-11 10:23:06 -07:00
Steve Howell c44500175d database: Remove short_name from UserProfile.
A few major themes here:

    - We remove short_name from UserProfile
      and add the appropriate migration.

    - We remove short_name from various
      cache-related lists of fields.

    - We allow import tools to continue to
      write short_name to their export files,
      and then we simply ignore the field
      at import time.

    - We change functions like do_create_user,
      create_user_profile, etc.

    - We keep short_name in the /json/bots
      API.  (It actually gets turned into
      an email.)

    - We don't modify our LDAP code much
      here.
2020-07-17 11:15:15 -07:00
Steve Howell 2374e25b94 import: Import AlertWord table. 2020-07-16 08:50:31 -07:00
arpit551 653928bdfe audit_log: Log acting_user in do_change_avatar_fields. 2020-07-06 17:24:18 -07:00
Mohit Gupta f8d1e0f86a refactor: Rename convert to markdown_convert.
Prior to this commit whenever convert was imported from zerver.lib.markdown
it was aliased as markdown_convert for readability.
This commit rename convert function to markdown_convert so that it can be
directly import it without aliasing and without compromising readability.
2020-07-06 12:39:59 -07:00
Steve Howell 0b65abcdf5 pointer: Remove pointer from UserProfile.
Most of the changes here are just that we no
longer need to provide a value for pointer
when we create UserProfile objects.
2020-07-03 13:08:40 +00:00
Mohit Gupta c1b6fbbc7d refactor: Rename bugdown to markdown in import_realm.py.
This commit is part of series of commits aimed at renaming bugdown to
markdown.
2020-06-29 14:58:30 -07:00
Mohit Gupta 3f5fc13491 refactor: Rename zerver.lib.bugdown to zerver.lib.markdown .
This commit is first of few commita which aim to change all the
bugdown references to markdown. This commits rename the files,
file path mentions and change the imports.
Variables and other references to bugdown will be renamed in susequent
commits.
2020-06-26 17:08:37 -07:00
Anders Kaseorg f33bfaf545 import_realm: Avoid unchecked cast.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-22 17:13:48 -07:00
Anders Kaseorg 74c17bf94a python: Convert more percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Now including %d, %i, %u, and multi-line strings.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 23:27:22 -07:00
Anders Kaseorg 57a80856a5 python: Convert more "".format to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus --keep-percent-format.

Now including %d, %i, %u, and multi-line strings.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-13 15:39:00 -07:00
Anders Kaseorg cfcbf58cd1 do_render_markdown: Remove unused message_user_ids parameter.
It’s unused since commit 7c5f316cb8
(#11586).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-13 15:31:27 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Anders Kaseorg 67e7a3631d python: Convert percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-10 15:02:09 -07:00
Anders Kaseorg 3aab9c03a9 fix_unreads: Use cursor.execute correctly.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-09 21:12:43 -07:00
Anders Kaseorg 2604ebba38 import_message_data: Use psycopg2.extras.execute_values.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-09 21:12:43 -07:00
Anders Kaseorg 8dd83228e7 python: Convert "".format to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus --keep-percent-format, but with the
NamedTuple changes reverted (see commit
ba7906a3c6, #15132).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-08 15:31:20 -07:00
Anders Kaseorg 1f565a9f41 timezone: Use standard library datetime.timezone.utc consistently.
datetime.timezone is available in Python ≥ 3.2.  This also lets us
remove a pytz dependency from the PostgreSQL scripts.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-05 09:34:17 -07:00
whoodes cea7d713cd requirements: Upgrade boto to boto3.
Fixes: #3490

Contributors include:

Author:    whoodes <hoodesw@hawaii.edu>
Author:    zhoufeng1989 <zhoufengloop@gmail.com>
Author:    rht <rhtbot@protonmail.com>
2020-05-26 23:18:07 -07:00
Anders Kaseorg cf923b49d3 python: Remove extra pass statements with autoflake.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-26 11:43:40 -07:00
Anders Kaseorg a9651e3e43 import_realm: Use cursor.execute correctly.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-04 09:35:30 -07:00
Anders Kaseorg bdc365d0fe logging: Pass format arguments to logging.
https://docs.python.org/3/howto/logging.html#optimization

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-02 10:18:02 -07:00
Anders Kaseorg fead14951c python: Convert assignment type annotations to Python 3.6 style.
This commit was split by tabbott; this piece covers the vast majority
of files in Zulip, but excludes scripts/, tools/, and puppet/ to help
ensure we at least show the right error messages for Xenial systems.

We can likely further refine the remaining pieces with some testing.

Generated by com2ann, with whitespace fixes and various manual fixes
for runtime issues:

-    invoiced_through: Optional[LicenseLedger] = models.ForeignKey(
+    invoiced_through: Optional["LicenseLedger"] = models.ForeignKey(

-_apns_client: Optional[APNsClient] = None
+_apns_client: Optional["APNsClient"] = None

-    notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
-    signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)

-    author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)
+    author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)

-    bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)
+    bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)

-    default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
-    default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)

-descriptors_by_handler_id: Dict[int, ClientDescriptor] = {}
+descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {}

-worker_classes: Dict[str, Type[QueueProcessingWorker]] = {}
-queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {}
+worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {}
+queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {}

-AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None
+AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-22 11:02:32 -07:00
Udit107710 16218d6de3 streams: Remove dependency of streams on actions.
Refactored code in actions.py and streams.py to move stream related
functions into streams.py and remove the dependency on actions.py.

validate_sender_can_write_to_stream function in actions.py was renamed
to access_stream_for_send_message in streams.py.
2020-04-18 16:56:59 -07:00
Tim Abbott 5eb5b6a5ad import: Make sure the internal realm is created before import.
This is critical for importing the very first realm into an empty
server, since in 27b15a9722, we changed
the model to create the internal realm when the first real realm would
be created, but neglected the data import code path.
2020-04-02 14:34:32 -07:00
Stefan Weil d2fa058cc1
text: Fix some typos (most of them found and fixed by codespell).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-03-27 17:25:56 -07:00
Anders Kaseorg 39f9abeb3f python: Convert json.loads(f.read()) to json.load(f).
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-03-24 10:46:32 -07:00
Mateusz Mandera b4ce167a88 models: Add recipient foreign key to Huddle.
This follows the already tested approach from
8acfa17fe6.
2020-03-17 05:41:11 -07:00
Vishnu KS 51f5701879 export: Canonicalize the email of cross realm bot to default value.
Fixes #13496
2020-02-19 14:44:50 -08:00
Mateusz Mandera 920d22524b import: Use re_map_foreign_keys on the realm column of UserPresence.
We forgot to make this adjustment in the recent denormalization of realm
into UserPresence. It's needed for imports to work correctly.
2020-02-18 10:45:38 -08:00
Vishnu Ks 5a59bf329e import: Skip setting user_profile_id metadata only if unavailable. 2020-02-03 14:09:05 -08:00
Vishnu Ks 2ea53a347a import: Support importing realm icon and logo.
Fixes #11216
2020-02-03 14:09:05 -08:00
Ryan Rehman 3dc7d60ffe muting: Record DateTime when a Topic is muted.
This includes the necessary migration to add
the date_muted field to the MutedTopic class
and populates it with a hard coded value.
2020-02-02 20:49:53 -08:00
Tim Abbott dd969b5339 install: Remove references to "Zulip Voyager".
"Zulip Voyager" was a name invented during the Hack Week to open
source Zulip for what a single-system Zulip server might be called, as
a Star Trek pun on the code it was based on, "Zulip Enterprise".

At the time, we just needed a name quickly, but it was never a good
name, just a placeholder.  This removes that placeholder name from
much of the codebase.  A bit more work will be required to transition
the `zulip::voyager` Puppet class, as that has some migration work
involved.
2020-01-30 12:40:41 -08:00
Tim Abbott d70e799466 bots: Remove FEEDBACK_BOT implementation.
This legacy cross-realm bot hasn't been used in several years, as far
as I know.  If we wanted to re-introduce it, I'd want to implement it
as an embedded bot using those common APIs, rather than the totally
custom hacky code used for it that involves unnecessary queue workers
and similar details.

Fixes #13533.
2020-01-25 22:41:39 -08:00
Mateusz Mandera 8acfa17fe6 models: Add recipient foreign key in UserProfile and Stream.
This is adds foreign keys to the corresponding Recipient object in the
UserProfile on Stream tables, a denormalization intended to improve
performance as this is a common query.

In the migration for setting the field correctly for existing users,
we do a direct SQL query (because Django 1.11 doesn't provide any good
method for doing it properly in bulk using the ORM.).

A consequence of this change to the model is that a bit of code needs
to be added to the functions responsible for creating new users (to
set the field after the Recipient object gets created).  Fortunately,
there's only a few code paths for doing that.

Also an adjustment is needed in the import system - this introduces a
circular relation between Recipient and UserProfile. The field cannot be
set until the Recipient objects have been created, but UserProfiles need
to be created before their corresponding Recipients. We deal with this
by first importing UserProfiles same way as before, but we leave the
personal_recipient field uninitialized. After creating the Recipient
objects, we call a function to set the field for all the imported users
in bulk.

A similar change is made for managing Stream objects.
2019-12-09 15:14:41 -08:00
Mateusz Mandera dbe508bb91 models: Migration of Message.pub_date to date_sent, part 2.
Fixes #1727.

With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.
2019-10-05 19:01:34 -07:00
Tim Abbott 02d55928ea import: Fix importing slack avatars into S3_UPLOAD_BACKEND.
Apparently, a subtle mismatch between the filename/URL formats for our
upload codebases meant that importing Slack avatars into systems using
S3_UPLOAD_BACKEND would end up with the avatars having the wrong URLs.
2019-07-21 21:25:31 -07:00
Tim Abbott fd25ced43c import: Fix check for whether data-user-id is present.
Apparently, the `in` keyword in Beautiful Soup does something different.
2019-06-18 11:13:32 -07:00
Tim Abbott 649e363ee3 import: Fix handling of legacy and wildcard mentions.
Our recently-added code for rewriting user IDs on data import didn't
correctly handle wildcard mentions and mentions generated by very old
versions of Zulip (pre data-user-id).
2019-06-18 10:35:01 -07:00
Tim Abbott 2538f84447 import: Fix bad database query for first_message_id.
The previous query ended up doing an awkward join that did not
guarantee use of the Recipient index on zerver_message, turning a very
fast query into something that could take much longer for a single
stream than the rest of the import combined.
2019-06-18 10:25:00 -07:00
Vishnu Ks 8718846c2a import: Use html.parser instead of lxml in bs4.
lxml parser appends html and body tags to the soup object which
are not reqired. There are no other major parsing diffrences between
the two parsers as long the HTML input is perfectly formated.
lxml parser is much faster than html.parser but it hardly matters
in our case.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#differences-
between-parsers
2019-06-02 14:53:13 -07:00
Vishnu Ks 31151dadbf import: Replace data-user-group-id in rendered_content.
See the data-user-id commit for details.
2019-05-28 12:53:20 -07:00
Vishnu Ks ce1d6044db import: Replace data-stream-id in rendered_content.
See the data-user-id commit for details.
2019-05-28 12:53:20 -07:00
Vishnu Ks cb5b3f347b import: Replace data-user-id in rendered_content with new user id.
Previously, if you exported a Zulip organization and then re-imported
it, we'd end up renumbering the user IDs and all direct foreign key
references to them in the database, but not the data-user-id
references in mentions.  Fix this by parsing the message content and
doing that renumbering.

(Because we import raw markdown, not HTML, from third-party tools,
these changes won't affect data import from slack etc.)

Fixes the high-priority part of #11293.
2019-05-28 12:53:19 -07:00
Anders Kaseorg 643bd18b9f lint: Fix code that evaded our lint checks for string % non-tuple.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-04-23 15:21:37 -07:00
Challa Venkata Raghava Reddy b69aec2dbc streams: Add first_message_id tracking first message in stream.
This field is primarily intended to support avoiding displaying the
"more topics" feature in new organizations and streams, where we might
know that all messages in the stream are already available in the
browser.

Based on original work by Roman Godov, and significantly modified by
tabbott.

The second migration involved here could be expensive on Zulip Cloud,
but is unlikely to be an issue on other servers.
2019-03-11 13:30:49 -07:00
Hemanth V. Alluri ae126c452b stream-descriptions: Create wrapper for rendering stream descriptions.
In commit de65a04 we can see that if the need ever arises to modify
how stream descriptions are rendered, we would need to make changes
at 5 different call points which can be quite cumbersome. So this
functionality has been extracted to a new method called
'render_stream_descriptions'.
2019-03-06 17:16:14 -08:00
Bennet Sunder 7c5f316cb8 alert_words: Performance improvements in looking for alert_words.
This commit leverages the ahocorasick algorithm to build a set of user_ids
that have their alert_words present in the message. It runs in linear time
of the order of length of the input message as opposed to number of
alert_words. This is after building a ahocorasick Automaton which runs
in O(number of alert_words in entire realm) which is usually cached.
2019-03-01 15:36:39 -08:00
Tim Abbott de65a04ae0 streams: Disable inline URL preview when rendering stream descriptions.
We want to use the baseline features of bugdown, but not fancy things
like inline URL previews, since the whole structure of stream
descriptions is to have a single-line thing supporting some
formatting.

The migration part of this change fixes a bug encountered by some
organizations upgrading from older versions of Zulip.
2019-02-28 17:00:40 -08:00
Tim Abbott 4d08461ab1 import: Set plan_type to SELF_HOSTED on import.
We've for a while had logic to set plan_type to LIMITED when importing
into Zulip Cloud; we need corresponding logic to set it to SELF_HOSTED
when importing into a self-hosted server.

Fixes #11541.
2019-02-12 16:01:02 -08:00
Wyatt Hoodes 9c68a97472 import/export: Use separate analytics.json for analytics data.
This helps keep the realm.json small and easy to process; previously,
almost the entire size of that file was the analytics data.

We implement this by refactoring the analytics Config objects into a
separate subroutine that writes to a separate file, plus the
corresponding import code.

Manual testing was performed by exporting the 'analytics' realm, and
importing back to a newly created 'test' realm.  The 'test' realm was
then exported and the json files were inspected.  The data appeared
consistent with no abnormalities.

Fixes: #11220.
2019-02-04 10:59:24 -08:00
Hemanth V. Alluri 73d26c8b28 streams: Render and store the stream description from the backend.
This commit does the following three things:
    1. Update stream model to accomodate rendered description.
    2. Render and save the stream rendered description on update.
    3. Render and save stream descriptions on creation.

Further, the stream's rendered description is also sent whenever the
stream's description is being sent.

This is preparatory work for eliminating the use of the
non-authoritative marked.js markdown parser for stream descriptions.
2019-02-01 22:24:18 -08:00
Vishnu Ks bec875a9af import realm: Use processes for resizing avatar images.
This should significantly improve the data import performance when
importing large open source realms from Slack.

Fixes #11009.
2019-01-25 12:37:12 -08:00
Tim Abbott dfaa2e481d import: Log a warning when avatars can't be thumbnailed.
This fixes a potential crash in the import tool if a single user has a
broken avatar image.
2019-01-15 16:48:04 -08:00
Tim Abbott 6eda129741 export: Export and import analytics table data.
This should eliminate the need to do manual analytics work when
importing organizations imported/exported using the zulip -> zulip
import/export tools.
2019-01-04 16:22:18 -08:00
Tim Abbott 48ccb3ad18 import: Move realm_tables to the appropriate file.
These had ended up in the wrong place when we split export from
import.
2019-01-04 16:22:18 -08:00
Tim Abbott b33e0ad539 import: Fix pointer logic to sort by message_id.
Previously, the pointer calculation logic wasn't sorting by message
ID, which caused the database queries to not properly use the indexes
they should.
2019-01-04 16:22:18 -08:00
Tim Abbott a1919971e4 import: Handle invalid data-user-id values for mentions.
This is an issue with zulip -> zulip server data imports.
2019-01-02 15:23:09 -08:00
Tim Abbott b63f8b59b2 import: Handle corner case around EMAIL_GATEWAY_BOT emails. 2019-01-02 15:23:09 -08:00
Tim Abbott 8cfea958de import: Fix pointer logic for zulip->zulip imports.
Previously, the pointer was almost guaranteed to be an invalid random
value, because we renumber message IDs unconditionally now.
2019-01-02 15:23:09 -08:00
Tim Abbott 74ff77d366 import: Always set a valid content-type for S3 backend.
The octet-stream content type is potentially under-specified, but it's
better than potentially submitting None and increases consistency of
this part of the codebase.
2018-12-29 22:13:11 -08:00
Tim Abbott f0c7424957 import: Fix sending floats to boto S3 metadata keys.
The boto library's s3 interface allows setting only string-format
metadata keys.  So we need to cast the last_modified floating-point
timestamp into a string before storing on the S3 object.

This bug mostly broke uploading avatars when using the S3 storage backend.
2018-12-29 22:09:31 -08:00
Tim Abbott c995e8e2ae import: Ensure presence of basic avatar images for HipChat.
Our HipChat conversion tool didn't properly handle basic avatar
images, resulting in only the medium-size avatar images being imported
properly.  This fixes that bug by asking the import tool to do the
thumbnailing for the basic avatar image (from the .original file) as
well as the medium avatar image.
2018-12-27 17:47:09 -08:00
Rishi Gupta 8a95526ced billing: Always transition to Realm.LIMITED via do_change_plan_type.
Fixes a bug in import_realm where secondary attributes like message
visibility weren't being set, and also makes bugs like this less likely in
the future.

Also, putting the plan_type change at the end of import_realm, so that
future restrictions to LIMITED realms don't affect the import process.
2018-12-13 13:26:24 -08:00
Tim Abbott 1adc40f014 import: Deduplicate functions for uploading to S3/files.
We've had a long stream of bugs existed because only one of these two
code paths was tested (usually the local uploads backend).  By
deduplicating these functions, we ensure that this category of bugs no
longer happens.

Following my recent refactor, this is just a straightforward merge,
with code for one or the other backend ending up inside an if
statement.
2018-12-05 16:15:01 -08:00
Tim Abbott c9b801efde import: Use the s3_path attribute for path_maps unconditionally.
While the s3_path is almost always the same as the path, structurally,
`path` is the location in the export object, whereas s3_path is the
URL path.
2018-12-05 16:15:01 -08:00
Tim Abbott f4c5a45f4f import: Fix S3 paths for imported avatar PNG.
Previously, we were incorrectly importing avatar PNGs to a filename
without the .png extension, resulting in them effectively not being
imported.

This was mitigated by the fact that we imported the originals and ran
the appropriate `ensure_` functions, but still a bug.
2018-12-05 16:15:01 -08:00
Tim Abbott 412dc8dcda import: Set last_modified in import_uploads_local.
This has no effect other than to make the S3 and local code paths more
nearly identical.
2018-12-05 16:15:01 -08:00
Tim Abbott d8d0492d64 import: Restructure uploads path logic to be more similar.
This is preparation for future deduplication of the two redundant
uploads backends.
2018-12-05 16:15:01 -08:00
Tim Abbott 671ceccd78 import: Deduplicate medium avatars special logic.
This requires a bit of care with upload_backend to avoid breaking how
we mock that class in our tests.
2018-12-05 16:15:01 -08:00
Tim Abbott 36b43a6d7a import: Deduplicate first block of import_uploads logic. 2018-12-05 16:15:01 -08:00
Tim Abbott f80bab58c0 import_realm: Add progress indicator for importing uploads.
This makes it easier to see how we're doing when uploading a very
large number of files.
2018-12-05 16:15:01 -08:00
Steve Howell 88f50b97fd import: Render content before inserting messages.
By rendering content before bulk importing messages,
we avoid O(N) database hops.
2018-11-07 10:33:11 -08:00
Steve Howell bf3f7d93d0 Simplify params for fix_message_rendered_content. 2018-11-07 10:33:11 -08:00
Steve Howell 0878d86706 import: Avoid unnecessary Message lookups.
We now no longer go the DB to get a Message object
during render.
2018-11-07 10:33:11 -08:00
Steve Howell 1e12b13a56 import: Avoid unnecessary sender lookups.
This commit speeds up the import by avoiding
sender lookups and instead using the data
for users that we already have in memory.

This avoids a few DB hops, many hops to memcached,
plus some object construction.

We now call do_render_markdown() directly.  This
also makes it more explicit that the import has
never rendered alert words.
2018-11-07 10:33:10 -08:00
Steve Howell f9a7451167 import: Pass in realm to render codepath.
We avoid querying the same realm multiple times.
2018-11-07 10:08:46 -08:00
Steve Howell 92a7f04149 import: Inline save_message_rendered_content().
This function requires a message object, whereas
we want to work with JSON data to avoid necessary
queries when we import data.  Inlining the function
sets us up for a subsequent refactoring.

We change the way we deal with theoretical return
values of `None` to use an assertion; otherwise,
we would have to loosen up a bunch of mypy types
from `str` to `Optional[str]`.  It's not clear `None`
is even possible--we've moved toward throwing exceptions
there instead of silently failing.
2018-11-07 10:08:45 -08:00
Tim Abbott e14a35b490 import: Don't assume a last_modified key is present.
This fixes an exception when importing uploaded file data from
Slack/HipChat.
2018-11-07 09:52:35 -08:00
Tim Abbott 1bf385e35f import: Avoid sending a content-type of None to S3.
The previous logic was incorrect, in that if `content_type` was set to
None (which happens with Slack/HipChat export, among other things),
then we wouldn't run the `guess_type` logic to auto-detect the
Content-Type to send to S3.
2018-11-06 13:03:14 -08:00
Steve Howell a092bee6b3 import: Reduce memory usage for UserMessage ids.
The UserMessage table can be huge, so creating a
bunch of entries in `ID_MAP` can overflow memory.

We don't have any tables that depend on `UserMessage`,
and we don't send the 'id' fields from `zerver_usermessage`
to the database, so re-mapping them was just busy-work.
2018-11-05 10:18:01 -08:00
Steve Howell 53436b4b41 import: Rename id_maps -> ID_MAP. 2018-10-23 17:27:37 -05:00
Steve Howell bd9e4ef0c8 import: Use pub_date to sort message ids.
When we create new ids for message rows, we
now sort the new ids by their corresponding
pub_date values in the rows.

This takes a sizable chunk of memory.

This feature only gets turned on if you
set sort_by_date to True in realm.json.
2018-10-23 17:27:37 -05:00
Steve Howell 2d4b09f59d utils: Add process_list_in_batches(). 2018-10-15 10:54:23 -07:00
Steve Howell 493aae2958 imports: Make loading UserMessage faster and more robust.
We use UserMessageLite to avoid Django overhead, and we
do updates in chunks of 10000.  (The export may be broken
into several files already, but a reasonable chunking at
import time is good defense against running out of memory.)
2018-10-13 16:43:28 -07:00
Steve Howell 329154da32 import: Speed up create_subscription_events().
The code was needlessly querying the DB to get full
objects for entities where we only needed user_id,
realm_id, and stream_id.

With my test data of ~1000 records this sped up the
function from ~8s to ~0.5s.  The speedup would probably
be even more for larger data sets.
2018-10-02 16:55:16 -07:00
Tim Abbott a0451b692f import: Move zerver_client import before realm import.
This table is independent of the realm/stream table dance, and moving
it here helps makes the flow read more clearly.
2018-09-21 10:58:24 -07:00
Rishi Gupta b470cef864 import: Set Realm.plan_type to SELF_HOSTED on import.
Tweaked by tabbott to avoid an unnecessary .save().
2018-09-21 10:57:22 -07:00
Tim Abbott e2bd03365e import: Fix handling of recipient IDs for welcome bot.
If any user had sent the reply to the welcome bot recommended by our
tutorial, then the Zulip export/import process didn't work properly,
because we weren't including (and then remapping) the recipient ID for
sending PMs to the cross-realm bots.  This commit fixes that gap, by
recording the necessary data on the export side, and doing the
appropriate remapping on the import side.
2018-09-20 17:55:17 -07:00
Tim Abbott c9189439de import: Handle signup_notifications_stream_id.
Previously, our realm import logic only did the special remapping
logic for the original notifications_stream_id; when we added the new
signup_notifications_stream_id field, we neglected to handle it in the
same way.
2018-09-20 17:41:55 -07:00
Rhea Parekh 20bca1409f import: Set emoji records 'last_modified' value in 'import_uploads_s3'.
The 'last_modified' value in emoji records is
needed for uploading the file to the S3 backend.
We set the same in the function 'import_uploads_s3'.

We also have to remove the keyword 'last_modified'
while building the RealmEmoji dict, as it is not
a field which exists in RealmEmoji objects.
2018-08-10 16:20:36 -07:00
Tim Abbott 2f6f38fa7f import: Guess upload content-types when unavailable from export.
This is mostly for exports from other software like Slack, that might
not provide a content-type.
2018-08-10 09:32:28 -07:00