Normal hipchat exports use integer ids for their
users and "rooms," which we just borrowed during
conversion.
Atlassian Stride uses stride UUIDs for these instead, but otherwise
has the same export format.
We now introduce IdMapper to handle external ids
that aren't integer. The IdMapper will map UUID
ids to ints and remember them. For ints it just
leaves them alone.
Fixes#10805.
Our webhook-errors.log file is riddled with exceptions that are
logged when a webhook is incorrectly configured to send data in
a non-JSON format. To avoid this, api_key_only_webhook_view
now supports an additional argument, notify_bot_owner_on_invalid_json.
This argument, when True, will send a PM notification to the bot's
owner notifying them of the configuration issue.
Until we resolve https://github.com/zulip/zulip/issues/10832, we will
need to maintain our own forked copy of Django's SessionMiddleware.
We apparently let this get out of date.
This fixes a few subtle bugs involving the user logout experience that
were throwing occasional exceptions (e.g. the UpdateError fix you can
see).
We (lexically) remove "subject" from the conversion code. The
`build_message` helper calls `set_topic_name` under the hood,
so things still have "subject" in the JSON.
There was good code coverage on `build_message`.
We're trying to sweep "subject" out of the codebase,
even when it has nothing to do our legacy "subject"
field. The rewording here will prevent some linter
noise.
This was a pretty nasty error, where we were accidentally accessing
the parent list in this inner loop function.
This appears to have been introduced as a refactoring bug in
7822ef38c2.
The various vars here that had recipient_subject
in the name now have either bucket or bucket_tup
there.
The shorter names are a bit easier to read, and the
original names were misleading for the PM case.
This was basically two search/replaces, and we have
good test coverage here, so it's pretty low risk
despite the messy diff.
We now attach zulip_db_data to the markdown engines
for classes that need it. This was the last remaining
global we had, so we remove `arguments.py` here.
The Markdown processor makes it fairly simple for
the helper classes to access the `md` engine. We
now write `_md_engine.zulip_message` to avoid having
the current message in the global namespace.
Note that we do reuse engines for multiple messages,
but each engine is specific to a realm. And we therefore
avoid even the theoretical possibility of leaking message
data between realms.
This makes us consistent with how we import codehilite.
Using Python's normal import mechanism avoids some overhead
with Markdown having to parse dotted notation.
These modules are tiny, so they shouldn't impact startup
too much. Also, by explicitly importing them, we avoid
the pitfall of having a sucessful startup and a broken
renderer.
We were building the same link regex every time
we build a Markdown engine, which happens twice
per realm. It's an expensive operation due to
the complexity of the regex and us reading a file.
Nested classes are kind of expensive in Python,
particularly when you throw in mypy annotations.
Also, flatter is arguably better, although it is
kind of a pain here not to have closures.
This change avoids hitting the Django ORM when
we don't find any possible group mentions in
the message content.
Django doesn't necessarily actually hit the database,
but it's still slow and shows up in profiles.
This commit speeds up the import by avoiding
sender lookups and instead using the data
for users that we already have in memory.
This avoids a few DB hops, many hops to memcached,
plus some object construction.
We now call do_render_markdown() directly. This
also makes it more explicit that the import has
never rendered alert words.
For the import-data codepath, we will call
the extracted function directly in a
subsequent commit.
The do_render_markdown() function has more
required parameters, which allows for more
explicit code and also allows us to flatten
out some logic related to alert words. (We
just pass in empty sets/dicts as needed).
We can rely on `message_realm` being the same
as `message.sender.realm`, which allows us to
skip two queries to the database for the rare
Zephyr mirroring case.
This function requires a message object, whereas
we want to work with JSON data to avoid necessary
queries when we import data. Inlining the function
sets us up for a subsequent refactoring.
We change the way we deal with theoretical return
values of `None` to use an assertion; otherwise,
we would have to loosen up a bunch of mypy types
from `str` to `Optional[str]`. It's not clear `None`
is even possible--we've moved toward throwing exceptions
there instead of silently failing.
This is somewhat hairy logic, so it's nice
to extract it and not worry about variable leaks.
Also, this moves some legacy "subject" references out
of actions.py.
We start by including functions that do custom
queries for topic history.
The goal of this library is partly to quarantine
the legacy "subject" column on Message.
A recent change to check_send_webhook_message allows webhooks to
unescape stream names before sending a message. This commit adds
a test for the edge case where the webhook URL is escaped twice by
a third-party.
Recently, one of our users reported that a JIRA webhook was not
able to send messages to a stream with a space character in its
name. Turns out that JIRA does something weird with webhook URLs,
such that escaped space characters (%20) are escaped again, so
that when the request gets to Zulip, the double escaped %20 is
evaluated as the literal characters `%20`, and not as a space.
We fix this by unescaping the stream name on our end before
sending the message forward!
The previous logic was incorrect, in that if `content_type` was set to
None (which happens with Slack/HipChat export, among other things),
then we wouldn't run the `guess_type` logic to auto-detect the
Content-Type to send to S3.
The last_modified field is intended to support setting the
orig-last-modified field in the S3 backend when importing, basically
to keep track of this bit of pre-export data for debugging. In the
event that it isn't available, the correct thing to do is not write
out an invalid `last_modified` field; we should just not write it out
at all.