This reverts commit 9c6d8d9d81 (#16916).
This feature has known bugs, and also wants some design changes to
make it customizable like linkifiers, so we’re retargeting this to
post-4.x.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
We add support to shorten links and test their shortening in
well-organized, clean manner that makes it trivial to extend the
GitHub approach for GitLab and perhaps other services.
We only shorten basic types of GitHub links (issue, PR, commit) that
fit a set of simple common patterns; the default behaviour of Autolink
is kept for everything else.
Logic added in frontend and backend Markdown Processor is identical.
This makes easy to extend the logic for other services like GitLab.
Fixes#11895.
Adjustments made due to changes in Django 3.0:
(https://docs.djangoproject.com/en/3.0/releases/3.0/)
- test_signup: INTERNAL_RESET_URL_TOKEN was moved to
PasswordResetConfirmView.reset_url_token
- test_message_fetch:
"add_never_cache_headers() and never_cache() now add the private
directive to Cache-Control headers."
- "django.utils.html.escape() now uses html.escape() to escape HTML.
This converts ' to ' instead of the previous equivalent decimal
code '." - this requires adjusting the expected decimal code
in some of the string fixtures in tests.
This commit fixes a bug in marked.js which caused it to double-escape
HTML when rendering messages of the form: *[text](url)*.
This fixes a bug introduced in
3bdc8bbaa5, where an unnecessary
escape() call was added for the <em> code path, likely just because it
was adjacent to the others that needed it in the file.
Fix this, and add tests to verify that things are still being escaped
once after removing this extra escape.
Fixes#14845.
Initally, when writing two or more quotes, having
a blank line in between them, merges those quotes.
This created confusion especially in "quote and reply".
This commit fixes such issues. Now two or more quotes
having a blank line in between them, will not get merged.
This change is correct both for usability and for improving our
compatibility with CommonMark.
Fixes#14379.
Upstream has slightly changed the whitespace around stashes. Take
this opportunity to clean up the extra blank lines we were outputting.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This mimics the backend logic for adding the data-attribute -
to know what Pygments language was used to highlight the code
block - in locally echoed messages.
New test added checks our logic for canonicalizing pygments alias
(for both frontend and backend).
Other fixtures and tests amended.
When converting fenced code markdown, we add the language (if specified)
in a data-attribute by tweaking the HTML generated. Doing so, allows the
frontend to make use of this attr to display view-in-playground option
for codeblocks.
We use pygments to get the lexer subclass name and use that instead of
directly using the language in the data-attribute. Doing so, helps us
map different language aliases (like `js` and `javascript`) into a common
variable (like `JavaScript`) - and avoids the client from dealing with
multiple tags corresponding to the same language.
The html structure for a message like this:
``` js
..content..
```
would now be:
<div class="codehilite" data-codehilite-language="JavaScript">
<pre>..content..</pre>
</div>
Tests and fixtures amended.
Zulip converts :) to the 1F642 Unicode emoji and promotes the same emoji
in the popular section of the emoji picker.
Previously Zulip has labeled 1F642 as "slight smile". While that name
conforms to the Unicode standard (which describes the code point as
SLIGHTLY SMILING FACE), it didn't match our use case of the emoji.
If a user types :) or selects the first smile in the emoji picker they
probably mean to express a regular "smile" and not a "slight smile",
which raises the question why they are only smiling slightly.
This commit relabels 1F642 as 😄 and our previous 😄 263A as
:smiling_face:. Note that 263A looks different in our three supported
emoji sets, so it is not suited to be our "default smile".
This change does not require a migration since our emoji system stores
both unicode points and names and handles name changes transparently.
We could certainly do better with the handling here, but using the raw
string that the user gave us is okayish for now.
Proper formatting of timestamps requires handling locales and timezones
of the receiver as well which is a larger project.
We now do something sensible for spoilers in notifications. A message
like:
```spoiler Luke's father is
Vader. Don't tell anyone else.
```
would be rendered as:
Luke's father is (...)
This particular commit has been a long time coming. For reference,
!avatar(email) was an undocumented syntax that simply rendered an
inline 50px avatar for a user in a message, essentially allowing
you to create a user pill like:
`!avatar(alice@example.com) Alice: hey!`
---
Reimplementation
If we decide to reimplement this or a similar feature in the future,
we could use something like `<avatar:userid>` syntax which is more
in line with creating links in markdown. Even then, it would not be
a good idea to add this instead of supporting inline images directly.
Since any usecases of such a syntax are in automation, we do not need
to make it userfriendly and something like the following is a better
implementation that doesn't need a custom syntax:
`![avatar for Alice](/avatar/1234?s=50) Alice: hey!`
---
History
We initially added this syntax back in 2012 and it was 'deprecated'
from the get go. Here's what the original commit had to say about
the new syntax:
> We'll use this internally for the commit bot. We might eventually
> disable it for external users.
We eventually did start using this for our github integrations in 2013
but since then, those integrations have been neglected in favor of
our GitHub webhooks which do not use this syntax.
When we copied `!gravatar` to add the `!avatar` syntax, we also noted
that we want to deprecate the `!gravatar` syntax entirely - in 2013!
Since then, we haven't advertised either of these syntaxes anywhere
in our docs, and the only two places where this syntax remains is
our game bots that could easily do without these, and the git commit
integration that we have deprecated anyway.
We do not have any evidence of someone asking about this syntax on
chat.zulip.org when developing an integration and rightfully so- only
the people who work on Zulip (and specifically, markdown) are likely
to stumble upon it and try it out.
This is also the only peice of code due to which we had to look up
emails -> userid mapping in our backend markdown. By removing this,
we entirely remove the backend markdown's dependency on user emails
to render messages.
---
Relevant commits:
- Oct 2012, Initial commit c31462c278
- Nov 2013, Update commit bot 968c393826
- Nov 2013, Add avatar syntax 761c0a0266
- Sep 2017, Avoid email use c3032a7fe8
- Apr 2019, Remove from webhook 674fcfcce1
We had been using !time() syntax for timestamps so far. Since its
an unreleased feature, we can make changes without affecting many
people.
Fixes#15442.
This commit changes the name of fixture that uses reference to bugdown.
Word backend in backend_markdown is important so to make it clear that
it is backend markdown. These test fixtures are also used in frontend,
so highlighting this is useful.
This commit is part of series of commits aimed at renaming bugdown to
markdown.
This fixes an issues that causes HTML entities inside of inline code
blocks to be converted rather than being displayed literally.
The upstream python-markdown now handles this correctly, so we just use
their implementation with our changes for removing .strip(). As a result
of this migration, we switch backtick pattern to an inline processor
too.
Fixes#12056.
For the codeblock counterpart of this issue, we should follow the
upstream PR https://github.com/Python-Markdown/markdown/pull/990.
Co-authored-by: Rohitt Vashishtha <aero31aero@gmail.com>
Previously, we had implemented:
<span class="timestamp" data-timestamp="unix time">Original text</span>
The new syntax is:
<time timestamp="ISO 8601 string">Original text</time>
<span class="timestamp-error">Invalid time format: Original text</span>
Since python and JS interpretations of the ISO format are very
slightly different, we force both of them to drop milliseconds
and use 'Z' instead of '+00:00' to represent that the string is
in UTC. The resultant strings look like: 2011-04-11T10:20:30Z.
Fixes#15431.
This adds support for a "spoiler" syntax in Zulip's markdown, which
can be used to hide content that one doesn't want to be immediately
visible without a click.
We use our own spoiler block syntax inspired by Zulip's existing quote
and math block markdown extensions, rather than requiring a token on
every line, as is present in some other markdown spoiler
implementations.
Fixes#5802.
Co-authored-by: Dylan Nugent <dylnuge@gmail.com>
We're migrating to using the cleaner zulip.com domain, which involves
changing all of our links from ReadTheDocs and other places to point
to the cleaner URL.
We handle fenced code blocks in a preprocessor, and > style blockquotes
are parsed in a blockprocessor. Pymarkdown doesn't run the preprocessors
again on any blocks that it is parsing, and is unlikely to accept our
solution upstream; they intend to convert fenced_code to a block parser.
We simply run all the preprocessors on the text again, with the exception
of NormalizeWhitespace which removed delimiters used by HtmlStash to mark
preprocessed html code. To counter this, we subclass NormalizeWhitespace
and use our customized version for when it is called from a blockparser.
Upstream issue: https://github.com/Python-Markdown/markdown/issues/53Fixes#12800.
We now parse tex and latex as regular languages, highlighting them
with pygments. We only allow 'math' to trigger latex rendering,
which is in line with the documentation.
This commit shifts our timestamp syntax to be of the form:
<span class="timestamp data-timestamp="123456"></span>
since value is not a valid attribute of span elements.
This adds support for syntax like: !time(Jun 7 2017, 6:30 PM) so that
everyone sees the time in their own local timezone. This can be used
when scheduling online meetings, etc.
This adds some hardcoded values for timezones, because of there
being no sureshot way of determining the timezone easily. However,
since the main way of using the feature should be a typeahead for
entering the time, this shouldn't be cause of much concern.
Fixes#5176.
Previously, hanging_lists preprocessor didn't consider anything
indented at 4 or above spaces to be a list. This meant that when
we had a list like:
1. 1
2. 2
3. 3
2. 2a
1. 1a
We would insert a newline between 3. 3 and 2. 2a. This resulted
in the block processor breaeking down 1 list into 2 blocks, which
messed up the nesting and indentation for the second block.
This setting is being overridden by the frontend since the last
commit, and the security model is clearer and more robust if we don't
make it appear as though the markdown processor is handling this
issue.
Co-authored-by: Tim Abbott <tabbott@zulipchat.com>
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Zulip's modal_link markdown feature has not been used since 2017; it
was a hack used for a 2013-era tutorial feature and was never used
outside that use case.
Unfortunately, it's sloppy implementation was exposed in the markdown
processor for all users, not just the tutorial use case.
More importantly, it was buggy, in that it did not validate the link
using the standard validation approach used by our other code
interacting with links.
The right solution is simply to remove it.
Previously, the input:
====================
- One
- Two
Two continued
====================
Would produce the same output as:
====================
- One
- Two
```
Two continued
```
====================
This was because our CodeBlockProcessor had a higher priority than
the ListIndentProcessor. This issue was discussed here:
https://chat.zulip.org/#narrow/stream/9-issues/topic/continuation.20paragraphs.20in.20list.20items.
Previously, we didn't track opening and closing fences separately,
with led to bugs like not parsing a list that was immediately after
a quoted fence; we treated each ``` as a new fence.
This commit rewrites the function to maintain a stack of currently
open fences. If any of the parent fences is a code fence, we do not
insert a new line before a list.
We also add some test cases specifically to test this behavior with
complexly nested lists.
Fixes#13745.
This commit has a side-effect that we also now allow mixed lists,
but they have different syntax from the commonmark implementation
and our marked output. For example, without the closing li tags:
Input Bugdown Marked
-------------------------------------
<ul>
- Hello <li>Hello <ul><li>Hello</ul>
+ World <li>World <ul><li>World
+ Again <li>Again <li>Again</ul>
* And <li>And <ul><li>And
* Again <li>Again <li>Again</ul>
</ul>
The bugdown render is in line with what a user in #13447 requests.
Fixes#13477.
This brings us in line, and also allows us to style these more like
unordered lists, which is visually more appealing.
On the backend, we now use the default list blockprocessor + sane list
extension of python-markdown to get proper list markup; on the
frontend, we mostly return to upstream's code as they have followed
CommonMark on this issue.
Using <ol> here necessarily removes the behaviour of not renumbering
on lists written like 3, 4, 7; hopefully users will be OK with the
change.
Fixes#12822.
Our implementation requires at least 1 space after the
'#' not not break existing linkifiers like '#123', etc.
that generally follow the convention we show in linkifier
examples.
- [valid] : # Hello
- [valid] : # Hello
- [invalid]: #Hello
For the frontend, we have taken the code from v0.7.0 of
upstream marked and made minor changes to avoid having
to refactor a significant part of our marked code.
For the backend, we merely have to change the regex to
force require spaces after #, and add hashheader to our
list of blockparsers.
Fixes#11418.