Commit Graph

55 Commits

Author SHA1 Message Date
Tim Abbott ae58ed5a74 markdown: Tweak data-code-language testing and comments.
This should make it clearer the precise decisions we've made about the
intended semantics of this feature.
2020-09-15 12:30:57 -07:00
Sumanth V Rao b0c9e0a295 markdown: Rename fenced code data-attribute to data-code-language. 2020-09-15 20:09:58 +05:30
Sumanth V Rao 033351609d markdown: Add data-codehilite-language attr for fenced code.
When converting fenced code markdown, we add the language (if specified)
in a data-attribute by tweaking the HTML generated. Doing so, allows the
frontend to make use of this attr to display view-in-playground option
for codeblocks.

We use pygments to get the lexer subclass name and use that instead of
directly using the language in the data-attribute. Doing so, helps us
map different language aliases (like `js` and `javascript`) into a common
variable (like `JavaScript`) - and avoids the client from dealing with
multiple tags corresponding to the same language.

The html structure for a message like this:

``` js
..content..
```

would now be:

<div class="codehilite" data-codehilite-language="JavaScript">
    <pre>..content..</pre>
</div>

Tests and fixtures amended.
2020-09-14 21:25:19 -07:00
Gittenburg 45e19dd6b9 emoji: Rename :slight_smile: to 😄.
Zulip converts :) to the 1F642 Unicode emoji and promotes the same emoji
in the popular section of the emoji picker.

Previously Zulip has labeled 1F642 as "slight smile". While that name
conforms to the Unicode standard (which describes the code point as
SLIGHTLY SMILING FACE), it didn't match our use case of the emoji.

If a user types :) or selects the first smile in the emoji picker they
probably mean to express a regular "smile" and not a "slight smile",
which raises the question why they are only smiling slightly.

This commit relabels 1F642 as 😄 and our previous 😄 263A as
:smiling_face:. Note that 263A looks different in our three supported
emoji sets, so it is not suited to be our "default smile".

This change does not require a migration since our emoji system stores
both unicode points and names and handles name changes transparently.
2020-07-21 16:49:54 -07:00
Anders Kaseorg aa16208fd8 dependencies: Upgrade JavaScript dependencies.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-20 10:56:31 -07:00
Rohitt Vashishtha b64ba98e90 markdown: Use unicode ellipses for collapsing spoilers.
We had initially implemented this feature using `(...)` but `(…)` is the
better variation.
2020-07-15 23:30:28 -07:00
Rohitt Vashishtha 1a9a478e5d markdown: Assert we handle timestamps sensibly in push notifications.
We could certainly do better with the handling here, but using the raw
string that the user gave us is okayish for now.

Proper formatting of timestamps requires handling locales and timezones
of the receiver as well which is a larger project.
2020-07-15 11:18:32 -07:00
Rohitt Vashishtha 78c48935ca markdown: Format spoilers for push notifications.
We now do something sensible for spoilers in notifications. A message
like:

    ```spoiler Luke's father is
    Vader. Don't tell anyone else.
    ```

would be rendered as:

    Luke's father is (...)
2020-07-15 11:17:38 -07:00
Rohitt Vashishtha 912e372c4e markdown: Remove !avatar() and !gravatar() syntax.
This particular commit has been a long time coming. For reference,
!avatar(email) was an undocumented syntax that simply rendered an
inline 50px avatar for a user in a message, essentially allowing
you to create a user pill like:

`!avatar(alice@example.com) Alice: hey!`

---

Reimplementation

If we decide to reimplement this or a similar feature in the future,
we could use something like `<avatar:userid>` syntax which is more
in line with creating links in markdown. Even then, it would not be
a good idea to add this instead of supporting inline images directly.

Since any usecases of such a syntax are in automation, we do not need
to make it userfriendly and something like the following is a better
implementation that doesn't need a custom syntax:

`![avatar for Alice](/avatar/1234?s=50) Alice: hey!`

---

History

We initially added this syntax back in 2012 and it was 'deprecated'
from the get go. Here's what the original commit had to say about
the new syntax:

> We'll use this internally for the commit bot.  We might eventually
> disable it for external users.

We eventually did start using this for our github integrations in 2013
but since then, those integrations have been neglected in favor of
our GitHub webhooks which do not use this syntax.

When we copied `!gravatar` to add the `!avatar` syntax, we also noted
that we want to deprecate the `!gravatar` syntax entirely - in 2013!

Since then, we haven't advertised either of these syntaxes anywhere
in our docs, and the only two places where this syntax remains is
our game bots that could easily do without these, and the git commit
integration that we have deprecated anyway.

We do not have any evidence of someone asking about this syntax on
chat.zulip.org when developing an integration and rightfully so- only
the people who work on Zulip (and specifically, markdown) are likely
to stumble upon it and try it out.

This is also the only peice of code due to which we had to look up
emails -> userid mapping in our backend markdown. By removing this,
we entirely remove the backend markdown's dependency on user emails
to render messages.

---

Relevant commits:

- Oct 2012, Initial commit        c31462c278
- Nov 2013, Update commit bot     968c393826
- Nov 2013, Add avatar syntax     761c0a0266
- Sep 2017, Avoid email use       c3032a7fe8
- Apr 2019, Remove from webhook   674fcfcce1
2020-07-07 10:39:44 -07:00
Rohitt Vashishtha 0b510cd66d timestamp: Hide timestamp forrmat errors in local echo. 2020-07-06 15:53:56 -07:00
Rohitt Vashishtha 732ec3c0e6 timestamp: Change syntax to `<time:timestammp>`.
We had been using !time() syntax for timestamps so far. Since its
an unreleased feature, we can make changes without affecting many
people.

Fixes #15442.
2020-07-06 15:53:56 -07:00
Mohit Gupta 8356c6c568 refactor: Rename bugdown to backend_markdown.
This commit changes the name of fixture that uses reference to bugdown.
Word backend in backend_markdown is important so to make it clear that
it is backend markdown. These test fixtures are also used in frontend,
so highlighting this is useful.

This commit is part of series of commits aimed at renaming bugdown to
markdown.
2020-06-29 15:03:20 -07:00
Chris Heald 42f2399155 markdown: Escape HTML entities in inline code blocks.
This fixes an issues that causes HTML entities inside of inline code
blocks to be converted rather than being displayed literally.

The upstream python-markdown now handles this correctly, so we just use
their implementation with our changes for removing .strip(). As a result
of this migration, we switch backtick pattern to an inline processor
too.

Fixes #12056.

For the codeblock counterpart of this issue, we should follow the
upstream PR https://github.com/Python-Markdown/markdown/pull/990.

Co-authored-by: Rohitt Vashishtha <aero31aero@gmail.com>
2020-06-25 14:46:33 -07:00
Rohitt Vashishtha 6ea3816fa6 markdown: Use html5 <time> tag for timestamps.
Previously, we had implemented:
    <span class="timestamp" data-timestamp="unix time">Original text</span>
The new syntax is:
    <time timestamp="ISO 8601 string">Original text</time>
    <span class="timestamp-error">Invalid time format: Original text</span>

Since python and JS interpretations of the ISO format are very
slightly different, we force both of them to drop milliseconds
and use 'Z' instead of '+00:00' to represent that the string is
in UTC. The resultant strings look like: 2011-04-11T10:20:30Z.

Fixes #15431.
2020-06-18 14:11:33 -07:00
Sara Gulotta 1cb040647b markdown: Add support for spoilers.
This adds support for a "spoiler" syntax in Zulip's markdown, which
can be used to hide content that one doesn't want to be immediately
visible without a click.

We use our own spoiler block syntax inspired by Zulip's existing quote
and math block markdown extensions, rather than requiring a token on
every line, as is present in some other markdown spoiler
implementations.

Fixes #5802.

Co-authored-by: Dylan Nugent <dylnuge@gmail.com>
2020-06-16 16:14:10 -07:00
Anders Kaseorg 47b4e45931 markdown_test_cases: Update encoded zulipchat.com links too.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-08 19:47:07 -07:00
Tim Abbott 71078adc50 docs: Update URLs to use https://zulip.com.
We're migrating to using the cleaner zulip.com domain, which involves
changing all of our links from ReadTheDocs and other places to point
to the cleaner URL.
2020-06-08 18:10:45 -07:00
Tim Abbott 463f1503fc Revert "markdown: Process fenced code blocks in blockquotes."
This reverts commit 7002f98ea1.

This failed tests due to some sort of conflict with a recent
python-markdown upgrade.
2020-05-25 18:13:03 -07:00
Rohitt Vashishtha 7002f98ea1 markdown: Process fenced code blocks in blockquotes.
We handle fenced code blocks in a preprocessor, and > style blockquotes
are parsed in a blockprocessor. Pymarkdown doesn't run the preprocessors
again on any blocks that it is parsing, and is unlikely to accept our
solution upstream; they intend to convert fenced_code to a block parser.

We simply run all the preprocessors on the text again, with the exception
of NormalizeWhitespace which removed delimiters used by HtmlStash to mark
preprocessed html code. To counter this, we subclass NormalizeWhitespace
and use our customized version for when it is called from a blockparser.

Upstream issue: https://github.com/Python-Markdown/markdown/issues/53

Fixes #12800.
2020-05-25 17:35:10 -07:00
Rohitt Vashishtha 88367a129c markdown: Disable tex and latex for math rendering.
We now parse tex and latex as regular languages, highlighting them
with pygments. We only allow 'math' to trigger latex rendering,
which is in line with the documentation.
2020-05-21 12:30:27 -07:00
Rohitt Vashishtha 52c25a9301 markdown-timestamp: Use data-timestamp attribute.
This commit shifts our timestamp syntax to be of the form:

    <span class="timestamp data-timestamp="123456"></span>

since value is not a valid attribute of span elements.
2020-05-20 14:28:08 -07:00
Rohitt Vashishtha b062e8332f markdown: Add timestamp syntax to markdown processors.
This adds support for syntax like: !time(Jun 7 2017, 6:30 PM) so that
everyone sees the time in their own local timezone. This can be used
when scheduling online meetings, etc.

This adds some hardcoded values for timezones, because of there
being no sureshot way of determining the timezone easily. However,
since the main way of using the feature should be a typeahead for
entering the time, this shouldn't be cause of much concern.

Fixes #5176.
2020-05-20 14:23:55 -07:00
Anders Kaseorg 78c70b1424 bugdown: Leave link titles alone until clean_user_content_links.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-09 16:32:40 -07:00
Rohitt Vashishtha 7d3a31cd8b bugdown: Support hanging_lists preprocessor for indented lists.
Previously, hanging_lists preprocessor didn't consider anything
indented at 4 or above spaces to be a list. This meant that when
we had a list like:

1. 1
  2. 2
    3. 3
  2. 2a
1. 1a

We would insert a newline between 3. 3 and 2. 2a. This resulted
in the block processor breaeking down 1 list into 2 blocks, which
messed up the nesting and indentation for the second block.
2020-04-30 17:54:40 -07:00
Anders Kaseorg 8e93175822 requirements: Upgrade Python-Markdown from 3.1.1 to 3.2.1.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-18 13:09:51 -07:00
Anders Kaseorg ddcb828349 markdown: Match Python-Markdown code whitespace more closely in JS.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-18 13:09:51 -07:00
Anders Kaseorg 2d45308546 CVE-2020-10935: Fix XSS vulnerability in local link rewriting.
Make sure rewrite_local_links_to_relative does not accidentally change
the meaning of links.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-01 14:01:45 -07:00
Anders Kaseorg 4f748fb627 markdown: Stop setting target="_blank".
This setting is being overridden by the frontend since the last
commit, and the security model is clearer and more robust if we don't
make it appear as though the markdown processor is handling this
issue.

Co-authored-by: Tim Abbott <tabbott@zulipchat.com>
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-01 14:01:45 -07:00
Tim Abbott e3a4aeeffa CVE-2020-9445: Remove unused and insecure modal_link feature.
Zulip's modal_link markdown feature has not been used since 2017; it
was a hack used for a 2013-era tutorial feature and was never used
outside that use case.

Unfortunately, it's sloppy implementation was exposed in the markdown
processor for all users, not just the tutorial use case.

More importantly, it was buggy, in that it did not validate the link
using the standard validation approach used by our other code
interacting with links.

The right solution is simply to remove it.
2020-04-01 14:01:45 -07:00
Rohitt Vashishtha ff5e2b6eb7 bugdown: Avoid hanging list paragraphs being processed as codeblocks.
Previously, the input:

====================
- One
  - Two

    Two continued
====================

Would produce the same output as:

====================
- One
  - Two

```
Two continued
```
====================

This was because our CodeBlockProcessor had a higher priority than
the ListIndentProcessor. This issue was discussed here:
https://chat.zulip.org/#narrow/stream/9-issues/topic/continuation.20paragraphs.20in.20list.20items.
2020-03-03 12:08:19 -08:00
Anders Kaseorg 8e356368f7 markdown: Fix HTML escaping of &.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-02-13 17:50:59 -08:00
Rohitt Vashishtha 630c564fc7 bugdown: Rewrite List Preprocessor logic to properly parse fences.
Previously, we didn't track opening and closing fences separately,
with led to bugs like not parsing a list that was immediately after
a quoted fence; we treated each ``` as a new fence.

This commit rewrites the function to maintain a stack of currently
open fences. If any of the parent fences is a code fence, we do not
insert a new line before a list.

We also add some test cases specifically to test this behavior with
complexly nested lists.

Fixes #13745.
2020-01-27 17:14:27 -08:00
Rohitt Vashishtha 1229e69e9b bugdown: Reenable -,+ to begin a markdown list.
This commit has a side-effect that we also now allow mixed lists,
but they have different syntax from the commonmark implementation
and our marked output. For example, without the closing li tags:

  Input    Bugdown     Marked
-------------------------------------
         <ul>
- Hello    <li>Hello  <ul><li>Hello</ul>
+ World    <li>World  <ul><li>World
+ Again    <li>Again      <li>Again</ul>
* And      <li>And    <ul><li>And
* Again    <li>Again      <li>Again</ul>
         </ul>

The bugdown render is in line with what a user in #13447 requests.

Fixes #13477.
2019-12-09 16:13:02 -08:00
Anders Kaseorg cce85f6ec7 dependencies: Upgrade katex from 0.10.2 to 0.11.1.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-11-11 16:26:31 -08:00
Thomas Ip 574c35c0b8 markdown: Render ordered lists using <ol> markup.
This brings us in line, and also allows us to style these more like
unordered lists, which is visually more appealing.

On the backend, we now use the default list blockprocessor + sane list
extension of python-markdown to get proper list markup; on the
frontend, we mostly return to upstream's code as they have followed
CommonMark on this issue.

Using <ol> here necessarily removes the behaviour of not renumbering
on lists written like 3, 4, 7; hopefully users will be OK with the
change.

Fixes #12822.
2019-09-08 16:42:20 -07:00
Rohitt Vashishtha 8b443a25b8 markdown: Show link href if title is empty.
Fixes #6221.
2019-08-25 21:36:42 -07:00
Rohitt Vashishtha a7f2bedb15 markdown: Enable hashheadings syntax.
Our implementation requires at least 1 space after the
'#' not not break existing linkifiers like '#123', etc.
that generally follow the convention we show in linkifier
examples.

- [valid]  : # Hello
- [valid]  : #  Hello
- [invalid]: #Hello

For the frontend, we have taken the code from v0.7.0 of
upstream marked and made minor changes to avoid having
to refactor a significant part of our marked code.

For the backend, we merely have to change the regex to
force require spaces after #, and add hashheader to our
list of blockparsers.

Fixes #11418.
2019-08-02 15:15:34 -07:00
Puneeth Chaganti 865bc24f67 url preview: Avoid showing previews for URLs in blockquotes.
Messages with links embedded in blockquotes turn out to be replies to
messages with links, more often than not. Showing previews for links in
replies seems like clutter, and it seems reasonable to turn off previews for
such links.
2019-07-12 19:14:00 -07:00
Aayush Agrawal 54584f6c16 url preview: Create a single preview for each URL in a message.
Modified by punchagan to:
* Add a separate markdown test for de-duplicating inline previews
* Check for number of unique URLs to see if per limit message is crossed
* Use a set for processed URLs instead of a list

Fixes #8379.
2019-07-11 13:37:15 -07:00
Thomas Ip e17fb33b47 dependencies: Upgrade katex to 0.10.2.
The markup output changed but the rendering is the same, so modified
expected output in tests.

There is a regression introduced in one of the new versions of KaTeX,
which produces a warning in our node tests:
```
No character metrics for ' ' in style 'Main-Bold'
```
but the rendering is correct so we can ignore it.
Tracking issue: KaTeX/KaTeX#1994

Fixes #12472.
2019-06-24 17:58:26 -07:00
okay 1694831029 bugdown: Fix double processed emoji tags inside inline tags.
When an emoji is nested inside another inline tag - like em or strong -
it was getting double processed because of the way the inlinePattern
TreeProcessor runs (it runs recursively). With this fix, we set the
inner text of the emoji span as an AtomicString, preventing us from
double processing the emoji's text.

Fixes #11621

Test Plan:

* Add test case for **😄**, verify it passes.
* Go into local dev server and send "**😄**" to self and verify the DOM
does not have double <span> tags for the emoji.
* Run zerver.tests.test_push_notifications and verify the markdown test case matches
the text_content field properly
2019-05-01 17:03:15 -07:00
overide b263671c9e markdown: Fix unordered list not rendering in blockquote.
This fixes an issue where the hanging unordered list was not
rendering in blockquote; the problem was that we were not
adding an empty line(to satisfy the markdown) for hanging
unordered list if it is in blockquote. Both blockquote
and code block is fenced but we want to avoid rendering
the list if it's in the code block but not in blockquote.

Fixes: #11916.
2019-04-13 19:23:59 -07:00
overide 58d28eed5d markdown: Fix emojis not rendering with :bogus: in the line.
This fixes an issue where invalid emoji name prevents following
emojis from rendering.

This reverts the code change in
8842349629, while still passing the
tests added in that commit (it seems the original commit had
misdiagnosed an ordering bug and thus introduced this issue).

Fixes: #11770.
2019-03-05 16:05:25 -08:00
overide 0dcfc22406 markdown: Fix numbered list handling of blank lines between blocks.
This fixes an issue where blank lines between blocks were causing
auto-numbering of list to stop before the blank line resulting
in two separate numbered list instead of one.

Edited significantly by tabbott to explain the tricky details in the
comments.

Fixes: #11651.
2019-03-01 15:29:07 -08:00
Rohitt Vashishtha 19672241e6 markdown: Disable definition/reference links in marked.
We had disabled reference style links in bugdown, however,
we hadn't disabled them in marked. This commit rectifies
that and adds test cases for the same.

Fixes #11350.
2019-02-04 11:16:37 -08:00
Harshit Bansal 5f76a65b1d emoji: Make unicode/span emojis more accessible.
This commit adds `aria-label="<title_text>"` and `role="img"` to
the generated HTML.

Fixes: #5975.
2019-01-16 09:07:19 -08:00
Aditya Bansal 77651ece39 thumbnails: Rename size value 'original' to 'full'. 2018-07-30 13:00:23 -07:00
Aditya Bansal 5b5d8bb310 thumbnails: Rename data-original to data-src-fullsize. 2018-07-30 13:00:23 -07:00
Harshit Bansal cf5b2b4815 emoji: Change emoticon mapping for `:)`, `(:` and `:(`.
See discussion on CZO:
https://chat.zulip.org/#narrow/stream/101-design/subject/emoji.20picker/near/617811
2018-07-26 11:17:03 -07:00
Aditya Bansal 98a4e87e1d thumbor: Complete implementation of thumbnailing.
Various pieces of our thumbor-based thumbnailing system were already
merged; this adds the remaining pieces required for it to work:

* a THUMBOR_URL Django setting that controls whether thumbor is
  enabled on the Zulip server (and if so, where thumbor is hosted).

* Replaces the overly complicated prototype cryptography logic

* Adds a /thumbnail endpoint (supported both on web and mobile) for
  accessing thumbnails in messages, designed to support hosting both
  external URLs as well as uploaded files (and applying Zulip's
  security model for access to thumbnails of uploaded files).

* Modifies bugdown to, when THUMBOR_URL is set, render images with the
  `src` attribute pointing /thumbnail (to provide a small thumbnail
  for the image), along with adding a "data-original" attribute that
  can be used to access the "original/full" size version of the image.

There are a few things that don't work quite yet:
* The S3 backend support is incomplete and doesn't work yet.
* The error pages for unauthorized access are ugly.
* We might want to rename data-original and /thumbnail?size=original
  to use some other name, like "full", that better reflects the fact
  that we're potentially not serving the original image URL.
2018-07-15 00:39:41 +05:30