This updates our error handling of invalid Slack API tokens (and other
networking error handling) to mostly make sense:
* A token that doesn't start with `xoxp-` gives an extended error early.
* An AssertionError for the codebase is correctly declared as such.
* We check for token shape errors before querying the Slack API.
We could still do useful work to raise custom exception classes here.
Thanks to @stavrospat for raising this issue.
Fixes#1727.
With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.
This commits reduces the number of values returned by
channel_to_zerver_stream function by setting the values
directly in realm dict and returning it instead.
Fixes a bug in import_realm where secondary attributes like message
visibility weren't being set, and also makes bugs like this less likely in
the future.
Also, putting the plan_type change at the end of import_realm, so that
future restrictions to LIMITED realms don't affect the import process.
This works by yielding messages sorted based on timestamp. Because
the Slack exports are broken into files by date, it's convenient to do
a 2-layer sorting process, where we open all the files for a given
day, and then sort their messages by timestamp before yielding them.
Fixes#10930.
We (lexically) remove "subject" from the conversion code. The
`build_message` helper calls `set_topic_name` under the hood,
so things still have "subject" in the JSON.
There was good code coverage on `build_message`.
This is mostly an extraction, but it does change the
way we calculate `content`. We append the markdown
links from ALL files to any content that came in the
message itself.
Separating this out also allows us to add more
test coverage for the extracted code.
We now use subscriber_map for building UserMessage
rows in Slack/Gitter conversions.
This is mostly designed to simplify the code, rather
than having to scan the entire subscribers for each
message.
I am guessing this will improve performance for most
conversions. We sort small lists on every message,
in order to be deterministic, but the sorting cost
is probably more than offset by avoiding the O(N)
scans across all subscriptions. Also, it's probably
negligible in the grand scheme of things, compared
to JSON parsing, file I/O, etc.
This commits also fixes some typos with mentioned_users_id ->
mentioned_user_ids and cleans up a test a bit as well.
* If `zerver_realmauditlog` is present in the exported data,
`RealmAuditLog` would be imported normally.
* If it is not present, `create_subscription_events`
function in would create the `subscription_created`
events for RealmAuditLog. The reason this function
is in `import_realm` module and not in the individual
export tool scripts (like Slack) is because this
function would be common for all export tools.
This fixes#9846 for users who have not already done an import of
their organization from Slack.
Fixes#9846.
https://github.com/houstondatavis/slack-export/blob/master/users.json
JSON or JavaScript decodes "\/" to / (and some encoders always write
"\/" to avoid accidentally creating a </script> tag), while Python
assumes "\/" is a typo for "\\/" and decodes it to \/.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
The only changes visible at the AST level, checked using
https://github.com/asottile/astpretty, are
zerver/lib/test_fixtures.py:
'\x1b\\[(1|0)m' ↦ '\\x1b\\[(1|0)m'
'\\[[X| ]\\] (\\d+_.+)\n' ↦ '\\[[X| ]\\] (\\d+_.+)\\n'
which is fine because re treats '\\x1b' and '\\n' the same way as
'\x1b' and '\n'.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Messages can be bulky, and storing them in a single
data structure can cause a memory error.
In this commit, the messages are written to a file
batch-wise, thus avoiding the memory error.
Previously, the messages where being stored in a output file from
outside the function 'convert_slack_workspace_messages', but
now we store it from the inside the mentioned function.
This will help in processing and saving the messages batch-wise
so as to avoid a memory error.
Reactions are returned separately from 'convert_slack_workspace_messages'
rather than 'message_json'.
Also updated test for 'convert_slack_workspace_messages' and an additional
test for reactions is added.
This was stored as a fixture file under zerver/fixtures, which caused
problems, since we don't show that directory under production (as its
part of the test system).
The simplest emergency fix here would be to just move the file, but
when looking at it, it's clear that we don't need or want a fixture
file here; we want a Python object, so we just do that.
A valuable follow-up improvement to this block would be to create an
actual new Realm object (not saved to the database), and dump it the
same code we use in the export tool; that should handle the vast
majority of these correctly.
Fixes#9123.
Change 'get_user_data' function to a more general function
to get data from the slack api using legacy tokens.
Also, change the error handling as upon invalid token,
the response is 200, but the response has an error
field in it.
For eg. Go to the following link with invalid token:
https://slack.com/api/emoji.list?token=xoxp-249056023425
Remove allocation ID function from slack import script. All the IDs
count will start from 0. Hence the ID List returned
by the allocation function is of no use, and we remove its implementation.
(example: get_total_messages_and_attachments function is of no use anymore,
hence we remove it)
In importing avatars, we use the implementation where the 'avatar_path'
is seperately calculated using realm and user ID and then the content
of the path provided in the avatar's 'records.json' are copied to this
'avatar_path'.
Similary, here for the uploads, 's3_file_name' is seperately calculated
using the realm ID and uploaded file name and then the content of the
path provided in upload's 'records.json' are copied to this 's3_file_name'.
The domain name is being set in the helper function
'slack_workspace_to_realm', but it should be set in the main function
'do_convert_data', as we need it in other child functions of
'do_convert_data'.
if the test fails, the 'output_dir' would not be deleted and
hence it would give an error when we run the tests next time,
as 'do_convert_data' expects an empty 'output_dir'.
Also the unzipped data file should be removed if the test fails
at 'do_convert_data'.
The messages were first being read and passed to the helper
functions channel wise.
This function makes a list of all the messages in the all the channels
beforehand which would be used to pass in the helper functions.
slack avatar urls have the format:
'https://ca.slack-edge.com/<team_id>-<user_id>-<avatar_hash>-<size>'
For any url of this form, if the user hasn't uploaded an image,
Slack uses default gravatar, but we don't have a way of knowing if Slack
has used the uploaded image or the custom gravatar
eg: https://ca.slack-edge.com/T5YFFM2QY-U6006P1CN-gd41c3c33cbe-512.
Hence, avatar_source should be mapped to 'U'.
The check for the channel ('general' and 'random') must be added before
'build_defaultstream' function is called and then the id is incremented.
Otherwise, the id appended at the end of second defaultstream object, which would be
greater than the total number of defaultstream objects would crash at
'defaultstream_id_list[defaultstream_id]' which is a paramater of 'build_defaultstream'.
Added tests to prevent the same.
We use the command
'select nextval('sequence') from generate_series(1, increment_number)'
which returns a list of allocated values for the ids.
This list is used to assign ids to the to be converted objects.
The fresh imported data shows that the users emails are not included
in the data. However, the data received from the older method of slack
(which is using legacy tokens) contains the email data of the users.