Commit Graph

393 Commits

Author SHA1 Message Date
Rohitt Vashishtha 630c564fc7 bugdown: Rewrite List Preprocessor logic to properly parse fences.
Previously, we didn't track opening and closing fences separately,
with led to bugs like not parsing a list that was immediately after
a quoted fence; we treated each ``` as a new fence.

This commit rewrites the function to maintain a stack of currently
open fences. If any of the parent fences is a code fence, we do not
insert a new line before a list.

We also add some test cases specifically to test this behavior with
complexly nested lists.

Fixes #13745.
2020-01-27 17:14:27 -08:00
Tim Abbott 7ccc8373e2 bugdown: Fix logic for extracting attachment path_id.
In 3892a8afd8, we restructured the
system for managing uploaded files to a much cleaner model where we
just do parsing inside bugdown.

That new model had potentially buggy handling of cases around both
relative URLs and URLS starting with `realm.host`.

We address this by further rewriting the handling of attachments to
avoid regular expressions entirely, instead relying on urllib for
parsing, and having bugdown output `path_id` values, so that there's
no need for any conversions between formats outside bugdowm.

The check_attachment_reference_change function for processing message
updates is significantly simplified in the process.

The new check on the hostname has the side effect of requiring us to
fix some previously weird/buggy test data.

Co-Author-By: Anders Kaseorg <anders@zulipchat.com>
Co-Author-By: Rohitt Vashishtha <aero31aero@gmail.com>
2019-12-12 20:30:26 -08:00
Anders Kaseorg 8e37862b69 CVE-2019-19775: Close open redirect in thumbnail view.
This closes an open redirect vulnerability, one case of which was
found by Graham Bleaney and Ibrahim Mohamed using Pysa.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-12-12 17:29:20 -08:00
Rohitt Vashishtha 3892a8afd8 messages: Set has_attachment correctly using Bugdown.
Previously, we would naively set has_attachment just by searching
the whole messages for strings like `/user_uploads/...`. We now
prevent running do_claim_attachments for messages that obviously
do not have an attachment in them that we previously ran.

For example: attachments in codeblocks or
             attachments that otherwise do not match our link syntax.

The new implementation runs that check on only the urls that
bugdown determines should be rendered. We also refactor some
Attachment tests in test_messages to test this change.

The new method is:

1. Create a list of potential_attachment_urls in Bugdown while rendering.
2. Loop over this list in do_claim_attachments for the actual claiming.
   For saving:
3. If we claimed an attachment, set message.has_attachment to True.
   For updating:
3. If claimed_attachment != message.has_attachment: update has_attachment.

We do not modify the logic for 'unclaiming' attachments when editing.
2019-12-11 11:03:44 -08:00
Rohitt Vashishtha 4674cc5098 bugdown: Set message.has_image while rendering message. 2019-12-11 17:01:41 +05:30
dustinheestand 157c98de99 bugdown: Correctly set has_link attribute on messages.
Now autolinks and message edits affect the has_link attribute on messages.
2019-12-11 17:01:41 +05:30
Rohitt Vashishtha 182503e5c0 bugdown: Move helper methods to InlineInterestingLinksProcessor.
add_a, add_oembed_data and add_embed are only called by
InlineInterestingLinksProcessor and this commit allows
these methods to access self.markdown object.
2019-12-10 15:35:00 -08:00
Rohitt Vashishtha 1229e69e9b bugdown: Reenable -,+ to begin a markdown list.
This commit has a side-effect that we also now allow mixed lists,
but they have different syntax from the commonmark implementation
and our marked output. For example, without the closing li tags:

  Input    Bugdown     Marked
-------------------------------------
         <ul>
- Hello    <li>Hello  <ul><li>Hello</ul>
+ World    <li>World  <ul><li>World
+ Again    <li>Again      <li>Again</ul>
* And      <li>And    <ul><li>And
* Again    <li>Again      <li>Again</ul>
         </ul>

The bugdown render is in line with what a user in #13447 requests.

Fixes #13477.
2019-12-09 16:13:02 -08:00
Rohitt Vashishtha 9174c636ce bugdown: Store if message has wildcards in MentionData.
We also switch the underlying exctact_mention_text method to use
a regular for loop, as well as make the related methods return
tuples of (names, is_wildcard). This abstraction is hidden from the
MentionData callers behind mention_data.message_has_wildcards().

Concerns #13430.
2019-12-02 12:12:35 -08:00
Anders Kaseorg cce85f6ec7 dependencies: Upgrade katex from 0.10.2 to 0.11.1.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-11-11 16:26:31 -08:00
Thomas Ip 574c35c0b8 markdown: Render ordered lists using <ol> markup.
This brings us in line, and also allows us to style these more like
unordered lists, which is visually more appealing.

On the backend, we now use the default list blockprocessor + sane list
extension of python-markdown to get proper list markup; on the
frontend, we mostly return to upstream's code as they have followed
CommonMark on this issue.

Using <ol> here necessarily removes the behaviour of not renumbering
on lists written like 3, 4, 7; hopefully users will be OK with the
change.

Fixes #12822.
2019-09-08 16:42:20 -07:00
Tim Abbott 62c9ea7cf9 linkifiers: Fix problems with capture groups called "name".
Apparently, due to poor naming of the outer capture group we use to
separate the actual match from the surrounding whitespace (etc.) we
use to determine if the syntax is a possible linkifier start/end, if
you created a linkifier using "name" as the capture group, we'd try to
compile a pattern with two capture groups called "name", which would
500, preventing anyone from accessing the organization.
2019-08-30 09:36:14 -07:00
Rohitt Vashishtha 8b443a25b8 markdown: Show link href if title is empty.
Fixes #6221.
2019-08-25 21:36:42 -07:00
Rohitt Vashishtha abe2dab88c markdown: Upgrade to use InlineProcessor for links.
This commit wraps up the major work that we held back when upgrading
py-markdown 2.6.11 to 3.0.1. Since we were making our custom changes
to the link syntax, at the time we stuck to using the old method of
parsing links. This lays the groundwork for further changes to our
link and image link handling, and brings us on par with upstream.

Also, we now better document the ways in which our link handling is
different from upstream.
2019-08-25 21:36:42 -07:00
Anders Kaseorg 68dd8e4ec8 mypy: Migrate from mypy_extensions to typing_extensions.
This gives us access to typing_extensions.Deque, which was not added
to typing until 3.5.4.

(PROVISION_VERSION is not bumped because the transitive dependency set
in dev.txt hasn’t changed.)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-08-05 17:24:09 -07:00
Rohitt Vashishtha a7f2bedb15 markdown: Enable hashheadings syntax.
Our implementation requires at least 1 space after the
'#' not not break existing linkifiers like '#123', etc.
that generally follow the convention we show in linkifier
examples.

- [valid]  : # Hello
- [valid]  : #  Hello
- [invalid]: #Hello

For the frontend, we have taken the code from v0.7.0 of
upstream marked and made minor changes to avoid having
to refactor a significant part of our marked code.

For the backend, we merely have to change the regex to
force require spaces after #, and add hashheader to our
list of blockparsers.

Fixes #11418.
2019-08-02 15:15:34 -07:00
Wyatt Hoodes 1706e06884 bugdown/init: Fix typing for fence variable. 2019-07-29 15:23:10 -07:00
Anders Kaseorg fd7803e7f4 settings: Unset STATIC_ROOT in development.
Django’s default FileSystemFinder disallows STATICFILES_DIRS from
containing STATIC_ROOT (by raising an ImproperlyConfigured exception),
because STATIC_ROOT is supposed to be the result of collecting all the
static files in the project, not one of the potentially many sources
of static files.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-07-24 17:40:31 -07:00
Rohitt Vashishtha 726d5003e1 bugdown: Force absolute urls in topic links.
If a url doesn't have a scheme, browsers would treat it as a relative
url and open something like: https://chat.zulip.org/google.com instead.

This PR fixes the issue on the backend; the frontend implementation
remains out of sync and the user sending the message wouldn't see
any linkification for urls without a scheme.

Fixes #12791.
2019-07-19 12:02:52 -07:00
Hariom Verma 107da5402c url preview: Replace YouTube URLs with their titles.
Modified by punchagan to:
* Replace URLs with titles only if the inline url embed previews are turned on
* Add a test for youtube titles replacing URLs

The titles for the videos are fetched asynchronously after the message has been
sent via the code that fetches metadata for open graph previews. So, the URLs
are replaced with titles only if the inline embed url previews feature is
enabled.

Ideally, YouTube previews should be shown only if inline url previews are
enabled, but this feature is in beta, while YouTube previews are pretty stable.
Once this feature is out of beta, YouTube previews should be shown only if the
url previews feature is turned on.

YouTube preview image is calculated as soon as the message is sent, while the
title needs to be fetched using a network request. This means that the URL is
replaced only after the data has been fetched from the request, and happens a
couple of seconds after the message has been rendered.

Closes #7549
2019-07-12 19:14:19 -07:00
Puneeth Chaganti 865bc24f67 url preview: Avoid showing previews for URLs in blockquotes.
Messages with links embedded in blockquotes turn out to be replies to
messages with links, more often than not. Showing previews for links in
replies seems like clutter, and it seems reasonable to turn off previews for
such links.
2019-07-12 19:14:00 -07:00
Puneeth Chaganti ad98f536bf bugdown: Fix incorrect type annotation for ElementPair. 2019-07-12 19:14:00 -07:00
Rohitt Vashishtha 0ba332bcc0 topic-mention: Add Bugdown implementation as StreamTopicPattern. 2019-07-11 14:53:10 -07:00
Aayush Agrawal 54584f6c16 url preview: Create a single preview for each URL in a message.
Modified by punchagan to:
* Add a separate markdown test for de-duplicating inline previews
* Check for number of unique URLs to see if per limit message is crossed
* Use a set for processed URLs instead of a list

Fixes #8379.
2019-07-11 13:37:15 -07:00
Puneeth Chaganti b10fc1d896 url preview: Don't show a message embed if there's no image. 2019-07-03 14:38:19 -07:00
Rohitt Vashishtha 047086b81c markdown: Make raw urls in topic names navigable.
We reuse the link regexes we use elsewhere inn markdown
for parsing links in topic names and add a button to open
them in new tabs similar to our behavior with linkifiers
in topic names.

Fixes #12391.
2019-06-27 15:18:42 -07:00
Rohitt Vashishtha 96d7c1f3b0 markdown: Test escaping of topic_links and document. 2019-06-27 15:18:30 -07:00
Puneeth Chaganti 64c40287f1 url preview: Rename type_ variable to oembed_resource_type. 2019-06-02 14:31:39 -07:00
Puneeth Chaganti 30dcf805ea url preview: Use oEmbed preview for Vimeo, instead of custom code. 2019-06-02 14:31:39 -07:00
Puneeth Chaganti 9aa5a2b369 url preview: Use oEmbed html for videos.
Ensure that the html is safe, before using it. The html is considered if it is
in an iframe with a http/https src, based on the recommendations here:
https://oembed.com/#section3

We directly embed the `iframe` html into the lightbox overlay.
2019-05-31 15:59:03 -07:00
Puneeth Chaganti c8cb785950 url preview: Show inline images as previews for oEmbed photo pages. 2019-05-31 15:59:03 -07:00
Puneeth Chaganti 5dee17dca0 bugdown: Show previews for manually created youtube playlists.
Youtube playlists can be created by manually listing video_ids, as follows:
https://youtube.com/watch_videos?video_ids=vid1,vid2,vid3. This commit adds
previews for URLs of this type, using the first video ID.
2019-05-12 22:24:42 -07:00
Puneeth Chaganti a1f0713b2c bugdown: Show previews for youtube playlist URLs, if possible.
If a youtube playlist URL has a video-id, we show a preview for the URL.

Closes #8562
2019-05-12 22:24:42 -07:00
Puneeth Chaganti 4de261c2de bugdown: Don't show previews for youtube URLs without video ids.
`youtube.com/playlist?list=<list-id>` incorrectly matches the regex since the
change in 8afda1c1bb. The regex was modified to
match URLs of the form `youtu.be/<id>` and this playlist URL incorrectly matches
with the `<id>` set to `playlist`.

This commit avoids this match by verifying that the ID is not playlist.
2019-05-12 22:24:42 -07:00
okay 1694831029 bugdown: Fix double processed emoji tags inside inline tags.
When an emoji is nested inside another inline tag - like em or strong -
it was getting double processed because of the way the inlinePattern
TreeProcessor runs (it runs recursively). With this fix, we set the
inner text of the emoji span as an AtomicString, preventing us from
double processing the emoji's text.

Fixes #11621

Test Plan:

* Add test case for **😄**, verify it passes.
* Go into local dev server and send "**😄**" to self and verify the DOM
does not have double <span> tags for the emoji.
* Run zerver.tests.test_push_notifications and verify the markdown test case matches
the text_content field properly
2019-05-01 17:03:15 -07:00
Anders Kaseorg 643bd18b9f lint: Fix code that evaded our lint checks for string % non-tuple.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-04-23 15:21:37 -07:00
Puneeth Chaganti ca8e9fb800 bugdown: Make the youtube URL regex slightly easier to read. 2019-04-13 20:25:37 -07:00
Puneeth Chaganti 8afda1c1bb bugdown: Show preview for urls copied from the Youtube share widget. 2019-04-13 20:25:37 -07:00
overide b263671c9e markdown: Fix unordered list not rendering in blockquote.
This fixes an issue where the hanging unordered list was not
rendering in blockquote; the problem was that we were not
adding an empty line(to satisfy the markdown) for hanging
unordered list if it is in blockquote. Both blockquote
and code block is fenced but we want to avoid rendering
the list if it's in the code block but not in blockquote.

Fixes: #11916.
2019-04-13 19:23:59 -07:00
YashRE42 a724a38c03 markdown: Improve handling of broken img urls.
Some urls which end with image file extensions (eg .jpg) may link to
html pages. This adds handling for linx.li, wikipedia.org and
pasteboard.co. If it is possible, we redirect to the actual image url
otherwise we do not attempt to render it as an image.

Fixes #10438.
2019-03-08 13:39:34 -08:00
Rohitt Vashishtha 3ed85f4cd7 Revert "bugdown: Process word boundaries properly in realm_filters."
This reverts commit ff90c0101c but keeps
the test cases added for reference.

This was reverted because it was both not a clean solution and created
other realm filters bugs involving dashes (etc.).
2019-03-07 11:03:35 -08:00
overide 58d28eed5d markdown: Fix emojis not rendering with :bogus: in the line.
This fixes an issue where invalid emoji name prevents following
emojis from rendering.

This reverts the code change in
8842349629, while still passing the
tests added in that commit (it seems the original commit had
misdiagnosed an ordering bug and thus introduced this issue).

Fixes: #11770.
2019-03-05 16:05:25 -08:00
Bennet Sunder 7c5f316cb8 alert_words: Performance improvements in looking for alert_words.
This commit leverages the ahocorasick algorithm to build a set of user_ids
that have their alert_words present in the message. It runs in linear time
of the order of length of the input message as opposed to number of
alert_words. This is after building a ahocorasick Automaton which runs
in O(number of alert_words in entire realm) which is usually cached.
2019-03-01 15:36:39 -08:00
overide 0dcfc22406 markdown: Fix numbered list handling of blank lines between blocks.
This fixes an issue where blank lines between blocks were causing
auto-numbering of list to stop before the blank line resulting
in two separate numbered list instead of one.

Edited significantly by tabbott to explain the tricky details in the
comments.

Fixes: #11651.
2019-03-01 15:29:07 -08:00
Tim Abbott d6c09eac51 bugdown: Add support for no_previews argument.
This allows us to have some features using bugdown rendering where
inline image previews will not be rendered (which would be problematic
for e.g. stream descriptions).
2019-02-28 16:54:04 -08:00
Rohitt Vashishtha 44ec83ef28 markdown: Render silent mentions as **name**.
This change should help people discover to distinguish
silent mentions in text as a part of Zulip syntax while
differentiating them from regular mentions.
2019-02-20 10:41:42 -08:00
Rohitt Vashishtha 57b9991396 markdown: Change syntax of silent mentions ( _@person -> @_person). 2019-02-20 10:41:42 -08:00
Anders Kaseorg e12c433745 bugdown: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:25:22 -08:00
Steve Howell c2fcfc087a bugdown: Include message id in exceptions. 2019-01-29 12:49:56 -08:00
Rohitt Vashishtha ff90c0101c bugdown: Process word boundaries properly in realm_filters.
Earlier, our realm filters didn't render for languages that do not
use spaces (eg: Japanese) since we used to check for the presence
of an actual space character. This commit replaces that logic with
a complex scheme to detect word boundaries.

Also, we convert the RealmFilterPattern to subclass InlineProcessor
and make use of the new no-op feature in py-markdown 3.0.1 where we
can tell py-markdown that our pattern didn't find a match despite
the initial regex getting matched.

Fixes #9883.
2019-01-28 14:48:15 -08:00