We now attach zulip_db_data to the markdown engines
for classes that need it. This was the last remaining
global we had, so we remove `arguments.py` here.
The Markdown processor makes it fairly simple for
the helper classes to access the `md` engine. We
now write `_md_engine.zulip_message` to avoid having
the current message in the global namespace.
Note that we do reuse engines for multiple messages,
but each engine is specific to a realm. And we therefore
avoid even the theoretical possibility of leaking message
data between realms.
This makes us consistent with how we import codehilite.
Using Python's normal import mechanism avoids some overhead
with Markdown having to parse dotted notation.
These modules are tiny, so they shouldn't impact startup
too much. Also, by explicitly importing them, we avoid
the pitfall of having a sucessful startup and a broken
renderer.
We were building the same link regex every time
we build a Markdown engine, which happens twice
per realm. It's an expensive operation due to
the complexity of the regex and us reading a file.
This change avoids hitting the Django ORM when
we don't find any possible group mentions in
the message content.
Django doesn't necessarily actually hit the database,
but it's still slow and shows up in profiles.
We can rely on `message_realm` being the same
as `message.sender.realm`, which allows us to
skip two queries to the database for the rare
Zephyr mirroring case.
This is a prepartory commit for the upcoming changes. It was meaningful
to extract this one out because this function is essentially a condition
check on whether a given url is one of the user_uploads or an external
one. Based on its value we decide whether a url must be thumbnailed or
not and thus this function will also be used in an upcoming commit
patching lib/thumbnail.py to do the same check before thumbnail url
generation.
We are basically adding a check for url's to be external (belonging
to some 3rd party web site hosting the image) or be one of the
user uploaded files. User uploaded files are served by a separate
endpoint which is /user_uploads/. Any other local url such as
/user_avatars/ or /static/ should never be sent to thumbor for
thumbnailing.
Not sending /user_avatars/ to thumbor for thumbnailing makes sense
because they are already properly thumbnailed and stored properly.
/static/ urls host very few images we use for demo and can be safely
be excluded from thumbnailing.
We start by stripping the ids in front of the name before the database
lookup. This has the advantage of not mentioning anyone if an incorrect
user id and full name combination is specified, as well as not having
the query the database twice, once by fullname and next by id.
Previously, we were storing only the most recent person with the same
full name as others; this commit adds new keys to the dict such that
simply looking by name would get you the newest user with this name,
and the get_user_by_id function can index the remaining users.
python-twitter was consuming a significant amount of import time.
However, this commit seems to not save any time at all, probably
because its recursive dependencies are imported elsewhere in Zulip.
Extracting this helper library will help us avoid an import loop
between notifications.py and message.py (with bugdown in between).
But in addition to that, it's a more natural model, since some of the
uses for these functions weren't part of the notifications code
anyway.
This has the benefit that we now get the usual data about the
user/request/etc. in error emails related to bugdown exceptions;
previously we were just getting the traceback in the emails (since our
`mail_admins` template was very simple) and no other debugging
details.
Comments tweaked by tabbott to help make clear exactly what's going on
here, since it's a little subtle and a little hacky.
Fixes#8843.
Various pieces of our thumbor-based thumbnailing system were already
merged; this adds the remaining pieces required for it to work:
* a THUMBOR_URL Django setting that controls whether thumbor is
enabled on the Zulip server (and if so, where thumbor is hosted).
* Replaces the overly complicated prototype cryptography logic
* Adds a /thumbnail endpoint (supported both on web and mobile) for
accessing thumbnails in messages, designed to support hosting both
external URLs as well as uploaded files (and applying Zulip's
security model for access to thumbnails of uploaded files).
* Modifies bugdown to, when THUMBOR_URL is set, render images with the
`src` attribute pointing /thumbnail (to provide a small thumbnail
for the image), along with adding a "data-original" attribute that
can be used to access the "original/full" size version of the image.
There are a few things that don't work quite yet:
* The S3 backend support is incomplete and doesn't work yet.
* The error pages for unauthorized access are ugly.
* We might want to rename data-original and /thumbnail?size=original
to use some other name, like "full", that better reflects the fact
that we're potentially not serving the original image URL.
This has two advantages;
* We can split bugdown/__init__.py into several modules, and each
module can access these arguments by importing these
* We get rid of the super-ugly `global db_data` construct, replacing
it with a only slightly ugly monkey-ish patching of the
`zerver.lib.bugdown.arguments` module, which is at least
considerably more clear on reading as to what it's purpose is.
The only changes visible at the AST level, checked using
https://github.com/asottile/astpretty, are
zerver/lib/test_fixtures.py:
'\x1b\\[(1|0)m' ↦ '\\x1b\\[(1|0)m'
'\\[[X| ]\\] (\\d+_.+)\n' ↦ '\\[[X| ]\\] (\\d+_.+)\\n'
which is fine because re treats '\\x1b' and '\\n' the same way as
'\x1b' and '\n'.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This bug is caused by the conversion of newlines to `<br>` statements,
since `>` is not allowed as a character around an emoticon during
translation.
Also, add a new test case for preventing this bug from occurring in the
future.
Fix#9763.
This commit increases the rendered_content limit from 2x to 10x of the
original message length.
Earlier, we had placed a limit of MAX_MESSAGE_LENGTH * 2 for the
rendered content (explained in commit
77addc5456). That limit was based on
the assumption that in most cases, the rendered content wouldn't cause
a large increase in message length. However, quite prominently in
syntax highlighted codeblocks, that wasn't true and this caused the
limit condition to be hit for long messages composed primarily of code
blocks.
Example: The following message would render close to 10x it's original size.
```py
if:
def:
print("x", var)
x = y
```
Because the syntax highlighted logic is extremely compressible, having
rendered_content reach up to 100KB doesn't create a network
performance problem.
Also switches the default behaviour of the code to not translate the
emoticons. Earlier, the code was testing-aware, and used to translate
when there was no user profile data available(assuming that as a testing
environment).
In 18e43895ff we replaced
stream subscribe buttons with stream links. The new feature
has been well tested and well received for over a year now,
so it's safe to remove the older feature at this point.
Older sites will have super old messages that still have the
rendered markup; this commit does not attempt to address those
situations. Most likely, clicking on an old button in the old
message will either do nothing or look like a message reply.
This commit refactors the bugdown to perform a lookup only on active
realm emojis. This was needed because once we migrate realm emojis
to be addressed by `id` rather than name, it will be costly to
perform a lookup on all the realm emojis.