`youtube.com/playlist?list=<list-id>` incorrectly matches the regex since the
change in 8afda1c1bb. The regex was modified to
match URLs of the form `youtu.be/<id>` and this playlist URL incorrectly matches
with the `<id>` set to `playlist`.
This commit avoids this match by verifying that the ID is not playlist.
When an emoji is nested inside another inline tag - like em or strong -
it was getting double processed because of the way the inlinePattern
TreeProcessor runs (it runs recursively). With this fix, we set the
inner text of the emoji span as an AtomicString, preventing us from
double processing the emoji's text.
Fixes#11621
Test Plan:
* Add test case for **😄**, verify it passes.
* Go into local dev server and send "**😄**" to self and verify the DOM
does not have double <span> tags for the emoji.
* Run zerver.tests.test_push_notifications and verify the markdown test case matches
the text_content field properly
This fixes an issue where the hanging unordered list was not
rendering in blockquote; the problem was that we were not
adding an empty line(to satisfy the markdown) for hanging
unordered list if it is in blockquote. Both blockquote
and code block is fenced but we want to avoid rendering
the list if it's in the code block but not in blockquote.
Fixes: #11916.
Some urls which end with image file extensions (eg .jpg) may link to
html pages. This adds handling for linx.li, wikipedia.org and
pasteboard.co. If it is possible, we redirect to the actual image url
otherwise we do not attempt to render it as an image.
Fixes#10438.
This reverts commit ff90c0101c but keeps
the test cases added for reference.
This was reverted because it was both not a clean solution and created
other realm filters bugs involving dashes (etc.).
This fixes an issue where invalid emoji name prevents following
emojis from rendering.
This reverts the code change in
8842349629, while still passing the
tests added in that commit (it seems the original commit had
misdiagnosed an ordering bug and thus introduced this issue).
Fixes: #11770.
This commit leverages the ahocorasick algorithm to build a set of user_ids
that have their alert_words present in the message. It runs in linear time
of the order of length of the input message as opposed to number of
alert_words. This is after building a ahocorasick Automaton which runs
in O(number of alert_words in entire realm) which is usually cached.
This fixes an issue where blank lines between blocks were causing
auto-numbering of list to stop before the blank line resulting
in two separate numbered list instead of one.
Edited significantly by tabbott to explain the tricky details in the
comments.
Fixes: #11651.
This allows us to have some features using bugdown rendering where
inline image previews will not be rendered (which would be problematic
for e.g. stream descriptions).
This change should help people discover to distinguish
silent mentions in text as a part of Zulip syntax while
differentiating them from regular mentions.
Earlier, our realm filters didn't render for languages that do not
use spaces (eg: Japanese) since we used to check for the presence
of an actual space character. This commit replaces that logic with
a complex scheme to detect word boundaries.
Also, we convert the RealmFilterPattern to subclass InlineProcessor
and make use of the new no-op feature in py-markdown 3.0.1 where we
can tell py-markdown that our pattern didn't find a match despite
the initial regex getting matched.
Fixes#9883.
Since we are building our parser from scratch now:
1. We have control over which proccessor goes at what priority number.
Thus, we have also shifted the deprecated `.add()` calls to use the
new `.register()` calls with explicit priorities, but maintaining
the original order that the old method generated.
2. We do not have to remove the processors added by py-markdown that
we do not use in Zulip; we explicitly add only the processors we
do require.
3. We can cluster the building of each type of parser in one place,
and in the order they need to be so that when we register them,
there is no need to sort the list. This also makes for a huge
improvement in the readability of the code, as all the components
of each type are registered in the same function.
These are significant performance improvements, because we save on
calls to `str.startswith` in `.add()`, all the resources taken to
generate the default to-be-removed processors and the time taken to
sort the list of processors.
Following are the profiling results for the changes made. Here, we
build 10 engines one after the other and note the time taken to build
each of them. 1st pass represents the state after this commit and 2nd
pass represent the state after some regex modifications in the commits
that follow by Steve Howell. All times are in microseconds.
| nth Engine | Old Time | 1st Pass | 2nd Pass |
| ---------- | -------- | -------- | -------- |
| 1 | 92117.0 | 81775.0 | 76710.0 |
| 2 | 1254.0 | 558.0 | 341.0 |
| 3 | 1170.0 | 472.0 | 305.0 |
| 4 | 1155.0 | 519.0 | 301.0 |
| 5 | 1170.0 | 546.0 | 326.0 |
| 6 | 1271.0 | 609.0 | 416.0 |
| 7 | 1125.0 | 459.0 | 299.0 |
| 8 | 1146.0 | 476.0 | 390.0 |
| 9 | 1274.0 | 446.0 | 301.0 |
| 10 | 1135.0 | 451.0 | 297.0 |
We avoid re-computing the regex string here, and we
also avoid re-compiling the regex itself.
I decided to put the "one_time" decorator in the
bugdown file itself, just to reduce friction in
folks reading the "buyer beware" comments.
Unfortunately, we can't use this for the
get_web_link_regex() function due to testing concerns,
so that continues to do an inelegant cache-with-global-var
scheme.
We use early-exit to flatten the code.
I also tweaked the comments a bit based on some recent
profile findings. (e.g. reading the file isn't actually
a big bottleneck, it's more the regex itself)
On the backend, we extend the BlockQuoteProcessor's clean function that
just removes '>' from the start of each line to convert each mention to
have the silent mention syntax, before UserMentionPattern is invoked.
The frontend, however, has an edge case where if you are mentioned in
some message and you quote it while having mentioned yourself above
the quoted message, you wouldn't see the red highlight till we get the
final rendered message from the backend.
This is such a subtle glitch that it's likely not worth worrying about.
Fixes#8025.
These mentions look like regular mentions except they do not
trigger any notification for the person mentioned. These are
primarily to be used when you make a bot take an action and
the bot mentions you, or when you quote a message that mentions
you.
Fixes#11221.
This is a major upgrade, and requires some significant compatibility
work:
* Migrating the pattern-removal logic to use the Registry feature.
* Handling the removal of positional arguments in markdown extensions.
* Handling the removal of safe mode.
This setting splits away part of responsibility from THUMBOR_URL.
Now on, this setting will be responsible for controlling whether
we thumbnail images or not by asking bugdown to render image links
to hit our /thumbnail endpoint. This is irrespective of what
THUMBOR_URL is set to though ideally THUMBOR_URL should be set
to point to a running thumbor instance.
This commit changes the return type of get_possible_mentions_info to a
list instead of a dict, thus disposing off the hacky logic of storing
users with duplicate full names with name|id keys that made the code
obfuscated.
The other functions continue to use the dicts as before, however, there
are minor variable changes where needed in accordance with the updated
definition of get_possible_mentions_info.
We now attach zulip_db_data to the markdown engines
for classes that need it. This was the last remaining
global we had, so we remove `arguments.py` here.
The Markdown processor makes it fairly simple for
the helper classes to access the `md` engine. We
now write `_md_engine.zulip_message` to avoid having
the current message in the global namespace.
Note that we do reuse engines for multiple messages,
but each engine is specific to a realm. And we therefore
avoid even the theoretical possibility of leaking message
data between realms.
This makes us consistent with how we import codehilite.
Using Python's normal import mechanism avoids some overhead
with Markdown having to parse dotted notation.
These modules are tiny, so they shouldn't impact startup
too much. Also, by explicitly importing them, we avoid
the pitfall of having a sucessful startup and a broken
renderer.
We were building the same link regex every time
we build a Markdown engine, which happens twice
per realm. It's an expensive operation due to
the complexity of the regex and us reading a file.
This change avoids hitting the Django ORM when
we don't find any possible group mentions in
the message content.
Django doesn't necessarily actually hit the database,
but it's still slow and shows up in profiles.
We can rely on `message_realm` being the same
as `message.sender.realm`, which allows us to
skip two queries to the database for the rare
Zephyr mirroring case.
This is a prepartory commit for the upcoming changes. It was meaningful
to extract this one out because this function is essentially a condition
check on whether a given url is one of the user_uploads or an external
one. Based on its value we decide whether a url must be thumbnailed or
not and thus this function will also be used in an upcoming commit
patching lib/thumbnail.py to do the same check before thumbnail url
generation.
We are basically adding a check for url's to be external (belonging
to some 3rd party web site hosting the image) or be one of the
user uploaded files. User uploaded files are served by a separate
endpoint which is /user_uploads/. Any other local url such as
/user_avatars/ or /static/ should never be sent to thumbor for
thumbnailing.
Not sending /user_avatars/ to thumbor for thumbnailing makes sense
because they are already properly thumbnailed and stored properly.
/static/ urls host very few images we use for demo and can be safely
be excluded from thumbnailing.
We start by stripping the ids in front of the name before the database
lookup. This has the advantage of not mentioning anyone if an incorrect
user id and full name combination is specified, as well as not having
the query the database twice, once by fullname and next by id.
Previously, we were storing only the most recent person with the same
full name as others; this commit adds new keys to the dict such that
simply looking by name would get you the newest user with this name,
and the get_user_by_id function can index the remaining users.
python-twitter was consuming a significant amount of import time.
However, this commit seems to not save any time at all, probably
because its recursive dependencies are imported elsewhere in Zulip.
Extracting this helper library will help us avoid an import loop
between notifications.py and message.py (with bugdown in between).
But in addition to that, it's a more natural model, since some of the
uses for these functions weren't part of the notifications code
anyway.
This has the benefit that we now get the usual data about the
user/request/etc. in error emails related to bugdown exceptions;
previously we were just getting the traceback in the emails (since our
`mail_admins` template was very simple) and no other debugging
details.
Comments tweaked by tabbott to help make clear exactly what's going on
here, since it's a little subtle and a little hacky.
Fixes#8843.
Various pieces of our thumbor-based thumbnailing system were already
merged; this adds the remaining pieces required for it to work:
* a THUMBOR_URL Django setting that controls whether thumbor is
enabled on the Zulip server (and if so, where thumbor is hosted).
* Replaces the overly complicated prototype cryptography logic
* Adds a /thumbnail endpoint (supported both on web and mobile) for
accessing thumbnails in messages, designed to support hosting both
external URLs as well as uploaded files (and applying Zulip's
security model for access to thumbnails of uploaded files).
* Modifies bugdown to, when THUMBOR_URL is set, render images with the
`src` attribute pointing /thumbnail (to provide a small thumbnail
for the image), along with adding a "data-original" attribute that
can be used to access the "original/full" size version of the image.
There are a few things that don't work quite yet:
* The S3 backend support is incomplete and doesn't work yet.
* The error pages for unauthorized access are ugly.
* We might want to rename data-original and /thumbnail?size=original
to use some other name, like "full", that better reflects the fact
that we're potentially not serving the original image URL.
This has two advantages;
* We can split bugdown/__init__.py into several modules, and each
module can access these arguments by importing these
* We get rid of the super-ugly `global db_data` construct, replacing
it with a only slightly ugly monkey-ish patching of the
`zerver.lib.bugdown.arguments` module, which is at least
considerably more clear on reading as to what it's purpose is.
The only changes visible at the AST level, checked using
https://github.com/asottile/astpretty, are
zerver/lib/test_fixtures.py:
'\x1b\\[(1|0)m' ↦ '\\x1b\\[(1|0)m'
'\\[[X| ]\\] (\\d+_.+)\n' ↦ '\\[[X| ]\\] (\\d+_.+)\\n'
which is fine because re treats '\\x1b' and '\\n' the same way as
'\x1b' and '\n'.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This bug is caused by the conversion of newlines to `<br>` statements,
since `>` is not allowed as a character around an emoticon during
translation.
Also, add a new test case for preventing this bug from occurring in the
future.
Fix#9763.
This commit increases the rendered_content limit from 2x to 10x of the
original message length.
Earlier, we had placed a limit of MAX_MESSAGE_LENGTH * 2 for the
rendered content (explained in commit
77addc5456). That limit was based on
the assumption that in most cases, the rendered content wouldn't cause
a large increase in message length. However, quite prominently in
syntax highlighted codeblocks, that wasn't true and this caused the
limit condition to be hit for long messages composed primarily of code
blocks.
Example: The following message would render close to 10x it's original size.
```py
if:
def:
print("x", var)
x = y
```
Because the syntax highlighted logic is extremely compressible, having
rendered_content reach up to 100KB doesn't create a network
performance problem.
Also switches the default behaviour of the code to not translate the
emoticons. Earlier, the code was testing-aware, and used to translate
when there was no user profile data available(assuming that as a testing
environment).
In 18e43895ff we replaced
stream subscribe buttons with stream links. The new feature
has been well tested and well received for over a year now,
so it's safe to remove the older feature at this point.
Older sites will have super old messages that still have the
rendered markup; this commit does not attempt to address those
situations. Most likely, clicking on an old button in the old
message will either do nothing or look like a message reply.
This commit refactors the bugdown to perform a lookup only on active
realm emojis. This was needed because once we migrate realm emojis
to be addressed by `id` rather than name, it will be costly to
perform a lookup on all the realm emojis.
This field has been unused by clients for some time, and isn't great
for our public archive feature plans (where we'll not want to be
including email addresses in messages).
Add `translate_emoticons` to `prop_types` and `expected_keys`.
Furthermore, create a emoji-translating Markdown inline pattern.
Also use a JavaScript version of `translate_emoticons` and then use
this function during Markdown previews and as a preprocessor. This
is only needed for previews, because usually emoticon translation
happens on the backend after sending.
Add tests for emoticon translation, a settings UI, and a /help/ page
as well.
Tweaked by tabbott to fix various test failurse as well as how this
handles whitespace, requiring emoticons to not have adjacent
characters.
Fixes#1768.
This commit prefixes stream names in urls with stream ids,
so that the urls don't break when we rename streams.
strean name: foo bar.com%
before: #narrow/stream/foo.20bar.2Ecom.25
after: #narrow/stream/20-foo-bar.2Ecom.25
For new realms, everything is simple under the new scheme, since
we just parse out the stream id every time to figure out where
to narrow.
For old realms, any old URLs will still work under the new scheme,
assuming the stream hasn't been renamed (and of course old urls
wouldn't have survived stream renaming in the first place). The one
exception is the hopefully rare case of a stream name starting with
something like "99-" and colliding with another stream whose id is 99.
The way that we enocde the stream name portion of the URL is kind
of unimportant now, since we really only look at the stream id, but
we still want a safe encoding of the name that is mostly human
readable, so we now convert spaces to dashes in the stream name. Also,
we try to ensure more code on both sides (frontend and backend) calls
common functions to do the encoding.
Fixes#4713
This enforces `**` around all the mentions including "at-all" and
"at-everyone" mentions. Hence this makes `@all` and `@everyone`
invalid mentions, resulting into proper syntax for these mentions as
`@**all**` and `@**everyone**` respectively.
Note from tabbott: This removes an old feature/syntax, which made
sense back when @Tim was also a way to mention a user with Tim as
their first name. Given how nice typeahead is now, the user part of
the feature was removed a while ago; this should have gone at the same
time.
Fixes: #8143.
If some bug in Bugdown results in a rendered message content that is
bigger than twice the message size, we now just throw an exception
from Bugdown. This is considerably better than the old behavior,
which might result in an enormous message being placed in the database
(potentially, bigger than the 1MB limit to store in memcached), which
would in turn result in tragic consequences.
This fixes#8322, in that it prevents the super bad outcome seen there
(where basically Zulip became unusable for everyone on the stream
where the message is posted). Now, the failure mode is just the
message failing to send. Still not ideal (and requires further work
on the URL embed feature), but not a minor problem, not a major one.
This also amends a commit from Brock Whittaker <brock@zulipchat.com>
that merges two separate functions for YouTube videos and Vimeo videos
into a generic video recall function.
Fixes#7550.
This uses the correct regex for strikethrough. Also, added
a test to make sure that strikethrough works when it contains
link with whitespace.
Fixes#7596.
This name hasn't been right since f7f2ec0ac back in 2013; this handler
sends the log record to a queue, whose consumer will not only maybe
send a Zulip message but definitely send an email. I found this
pretty confusing when I first worked on this logging code and was
looking for how exception emails got sent; so now that I see exactly
what's actually happening here, fix it.