The max inline preview limit was previously increased to 10 by #20789.
However, as issue #23624 shows, it's still causing confusion for users
when they include more than 10 links.
Bump this limit up to 24, which is a multiple of the 4 image preview
per line logic.
This uses the linkifier index among the list of linkifiers in the
replacement as the priority to order the replacement order for
patterns in the topic. This avoids having multiple overlapping matches
that each produce a link.
The linkifier with the lowest id will be prioritized when its pattern
overlaps with another. Linkifiers are prioritized over raw URLs.
Note that the same algorithm is used for local echoing and the
backend markdown processor.
Fixes#23715.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
The same pattern being matched multiple times in a topic cannot be
properly ordered using topic_name.find(match_text) and etc. when there
are multiple matches of the same pattern in the topic.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
9381a3bd45 added support for linkifier pattern URLs containing
`%20`-style escapes, but only did so for the codepath which is used in
the message body -- topic links did not understand them.
Expand the support to include when they are substituted into topics.
Due to mismatches between the URL parsers in Python and browsers, it
was possible to hoodwink rewrite_local_links_to_relative into
generating links that browsers would interpret as absolute.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Now the following characters are allowed before @-mentions and stream
references (starting with #) for proper rendering - {, [, /.
This commit makes the markdown rendering consistent with autocomplete
(anything that is autocompleted is also rendered properly).
markdown-include is GPL licensed.
Also, rewrite it as a block processor, so that it works correctly
inside indented blocks.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The `get_link_embed_data` / `link_embed_data_from_cache` pair as
introduced in c93f1d4eda uses the cache
as a temporary store inside of the `embed_links` worker; this means
that it must be durable storage, or the worker will stall and re-fetch
the same links to preview them.
Switch to plumbing through the fetched URL embed data as an parameter
to the Markdown evaluation which uses them, rather than using the
cache as an intermediary. This frees up the cache to be merely a
non-durable cache.
As a side-effect, this removes get_cache_with_key, and
link_embed_data_from_cache which was its only callsite.
`prepare_linkifier_pattern`, as of db934be064, adds a match to the
end of the regex, of either the end of string, or a non-word character
-- this is in place of a negative look-ahead, which is no longer
possible in re2. This causes the regex to consume trailing
whitespace, and thus not be able to match twice in succession with
`pattern.finditer` -- "#1234#5678" fails to match because the space
is consumed by the first match of the regex.
Rather than use `pattern.finditer`, write own own version, which
rewinds over the non-word character consumed after the match, if any.
This allows the same "after" non-word character to also satisfy the
"before" of the next match.
Fixes#21502.
Wordle has recently become a thing and it uses green, yellow and white (or
black in dark mode) large square unicode characters to let people share their
gameplay. Zulip converts the white and black large square unicode characters to
emojis, but not the green and yellow ones. This causes the Wordle grid to be
misaligned when shared on Zulip.
This commit adds green and yellow large square emojis to our emoji list to fix
the problem.
The limit here is purely to prevent breakage in case of a pathological
number of images in a single message; 5 images is entirely possible in
a reasonable message, and causes user confusion when they are not
expended.
Increase the limit to 10 per message.
This diff looks slightly noisy, but the main chunk of
code that we moved here has the same logic as before,
and it just gets realm_id from MentionBackend now, instead
of having our markdown processor have to supply it.
We basically want MentionData to be the gatekeeper of
mention data, and then we delegate backend tasks to
MentionBackend.
Soon we will add a cache to MentionBacked, which will
justify this change a bit more.
When our handlers specifically reference self.md.zulip_db_data,
we now use an explicit type.
We probably want a more robust solution here, such as a semgrep
rule.
We now serialize still_url as None for non-animated emojis,
instead of omitting the field. The webapp does proper checks
for falsiness here. The mobile app does not yet use the field
(to my knowledge).
We bump the API version here. More discussion here:
https://chat.zulip.org/#narrow/stream/378-api-design/topic/still_url/near/1302573
Not proxying these requests through camo is a security concern.
Furthermore, on the desktop client, any embed image which is hosted on
a server with an expired or otherwise invalid certificate will trigger
a blocking modal window with no clear source and a confusing error
message; see zulip/zulip-desktop#1119.
Rewrite all `message_embed_image` URLs through camo, if it is enabled.
Supporting URL percent-encoded bytes is possible using `%%20`, but this
is not necessarily very understandable to end-users, even those that
understand percent encoding.
Allow `%20` in linkifier URL format strings, and transform them into
`%%20` in the pattern just before they are applied in markdown
translation. Care must be taken here, such that already-escaped `%`s
are not escaped an extra time.
We do this before rendering, and not before storage, as
a simplification; the JS-side linkifier at present only understands
`%(foo)s` and thus needs no changes, and to avoid an un-escaping pass
before showing in the admin UI.
og:image is supposed to be an absolute URL, but some sites incorrectly
provide a relative URL. In this case, it makes more sense to
interpret it relative to the full page URL after redirects, rather
than relative to just the domain part of the page URL before
redirects.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Zulip attempts to validate that the regular expressions that admins
enter for linkifiers are well-formatted, and only contain a specific
subset of regex grammar. The process of checking these
properties (via a regex!) can cause denial-of-service via
backtracking.
Furthermore, this validation itself does not prevent the creation of
linkifiers which themselves cause denial-of-service when they are
executed. As the validator accepts literally anything inside of a
`(?P<word>...)` block, any quadratic backtracking expression can be
hidden therein.
Switch user-provided linkifier patterns to be matched in the Markdown
processor by the `re2` library, which is guaranteed constant-time.
This somewhat limits the possible features of the regular
expression (notably, look-head and -behind, and back-references);
however, these features had never been advertised as working in the
context of linkifiers.
A migration removes any existing linkifiers which would not function
under re2, after printing them for posterity during the upgrade; they
are unlikely to be common, and are impossible to fix automatically.
The denial-of-service in the linkifier validator was discovered by
@erik-krogh and @yoff, as GHSL-2021-118.
This check was copied from upstream python-markdown's "safe mode"
before they removed that feature. The upstream history is that they
introduced this check in
2db5d1c8e4,
which was not a complete security check, and then added the
immediately following check (with an allowlist of schemes) in
0b4ffbb60e.
Their first, incomplete check provides no security benefit and makes
the code hard to reason about, so we remove it.