2016-06-26 18:49:35 +02:00
|
|
|
# Markdown implementation
|
2016-04-01 06:58:14 +02:00
|
|
|
|
2020-06-29 23:52:38 +02:00
|
|
|
Zulip uses a special flavor of Markdown/CommonMark for its message
|
2021-08-20 21:53:28 +02:00
|
|
|
formatting. Our Markdown flavor is unique primarily to add important
|
2020-06-29 23:52:38 +02:00
|
|
|
extensions, such as quote blocks and math blocks, and also to do
|
2021-08-20 21:53:28 +02:00
|
|
|
previews and correct issues specific to the chat context. Beyond
|
2020-06-29 23:52:38 +02:00
|
|
|
that, it has a number of minor historical variations resulting from
|
2022-02-08 00:13:33 +01:00
|
|
|
its history predating CommonMark (and thus Zulip choosing different
|
2020-06-29 23:52:38 +02:00
|
|
|
solutions to some problems) and based in part on Python-Markdown,
|
2021-08-20 21:53:28 +02:00
|
|
|
which is proudly a classic Markdown implementation. We reduce these
|
2020-06-29 23:52:38 +02:00
|
|
|
variations with every major Zulip release.
|
|
|
|
|
|
|
|
Zulip has two implementations of Markdown. The backend implementation
|
2020-06-25 15:00:33 +02:00
|
|
|
at `zerver/lib/markdown/` is based on
|
2018-01-09 14:33:36 +01:00
|
|
|
[Python-Markdown](https://pypi.python.org/pypi/Markdown) and is used to
|
2016-11-06 20:30:27 +01:00
|
|
|
authoritatively render messages to HTML (and implements
|
|
|
|
slow/expensive/complex features like querying the Twitter API to
|
2021-08-20 21:53:28 +02:00
|
|
|
render tweets nicely). The frontend implementation is in JavaScript,
|
2016-11-06 20:30:27 +01:00
|
|
|
based on [marked.js](https://github.com/chjj/marked)
|
2023-02-22 23:03:47 +01:00
|
|
|
(`web/src/echo.js`), and is used to preview and locally echo
|
2020-08-11 02:09:14 +02:00
|
|
|
messages the moment the sender hits Enter, without waiting for round
|
2021-08-20 21:53:28 +02:00
|
|
|
trip from the server. Those frontend renderings are only shown to the
|
2017-02-17 01:18:00 +01:00
|
|
|
sender of a message, and they are (ideally) identical to the backend
|
|
|
|
rendering.
|
2016-11-06 20:30:27 +01:00
|
|
|
|
2020-08-11 01:47:49 +02:00
|
|
|
The JavaScript Markdown implementation has a function,
|
2017-07-29 02:51:33 +02:00
|
|
|
`markdown.contains_backend_only_syntax`, that is used to check whether a message
|
2016-11-06 20:30:27 +01:00
|
|
|
contains any syntax that needs to be rendered to HTML on the backend.
|
2017-07-29 02:51:33 +02:00
|
|
|
If `markdown.contains_backend_only_syntax` returns true, the frontend simply won't
|
2016-11-06 20:30:27 +01:00
|
|
|
echo the message for the sender until it receives the rendered HTML
|
2021-08-20 21:53:28 +02:00
|
|
|
from the backend. If there is a bug where `markdown.contains_backend_only_syntax`
|
2016-11-07 16:54:27 +01:00
|
|
|
returns false incorrectly, the frontend will discover this when the
|
2016-11-06 20:30:27 +01:00
|
|
|
backend returns the newly sent message, and will update the HTML based
|
|
|
|
on the authoritative backend rendering (which would cause a change in
|
|
|
|
the rendering that is visible only to the sender shortly after a
|
2021-08-20 21:53:28 +02:00
|
|
|
message is sent). As a result, we try to make sure that
|
2017-07-29 02:51:33 +02:00
|
|
|
`markdown.contains_backend_only_syntax` is always correct.
|
2016-11-06 20:30:27 +01:00
|
|
|
|
|
|
|
## Testing
|
|
|
|
|
|
|
|
The Python-Markdown implementation is tested by
|
2020-06-25 17:35:25 +02:00
|
|
|
`zerver/tests/test_markdown.py`, and the marked.js implementation and
|
2017-07-29 02:51:33 +02:00
|
|
|
`markdown.contains_backend_only_syntax` are tested by
|
2023-02-22 23:04:10 +01:00
|
|
|
`web/tests/markdown.test.js`.
|
2017-07-29 03:28:08 +02:00
|
|
|
|
|
|
|
A shared set of fixed test data ("test fixtures") is present in
|
2018-04-19 20:17:24 +02:00
|
|
|
`zerver/tests/fixtures/markdown_test_cases.json`, and is automatically used
|
2017-07-31 18:47:43 +02:00
|
|
|
by both test suites; as a result, it is the preferred place to add new
|
2021-08-20 21:53:28 +02:00
|
|
|
tests for Zulip's Markdown system. Some important notes on reading
|
2017-07-29 03:28:08 +02:00
|
|
|
this file:
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- `expected_output` is the expected output for the backend Markdown
|
2017-07-29 03:28:08 +02:00
|
|
|
processor.
|
2021-08-20 21:45:39 +02:00
|
|
|
- When the frontend processor doesn't support a feature and it should
|
2017-07-29 03:28:08 +02:00
|
|
|
just be rendered on the backend, we set `backend_only_rendering` to
|
|
|
|
`true` in the fixtures; this will automatically verify that
|
|
|
|
`markdown.contains_backend_only_syntax` rejects the syntax, ensuring
|
|
|
|
it will be rendered only by the backend processor.
|
2021-08-20 21:45:39 +02:00
|
|
|
- When the two processors disagree, we set `marked_expected_output` in
|
2021-08-20 21:53:28 +02:00
|
|
|
the fixtures; this will ensure that the syntax stays that way. If
|
2017-10-31 10:30:24 +01:00
|
|
|
the differences are important (i.e. not just whitespace), we should
|
2017-07-29 03:28:08 +02:00
|
|
|
also open an issue on GitHub to track the problem.
|
2021-08-20 21:45:39 +02:00
|
|
|
- For mobile push notifications, we need a text version of the
|
2017-10-21 01:07:34 +02:00
|
|
|
rendered content, since the APNS and GCM push notification systems
|
2021-08-20 21:53:28 +02:00
|
|
|
don't support richer markup. Mostly, this involves stripping HTML,
|
|
|
|
but there's some syntax we take special care with. Tests for what
|
2017-10-21 01:07:34 +02:00
|
|
|
this plain-text version of content should be are stored in the
|
|
|
|
`text_content` field.
|
2016-04-01 06:58:14 +02:00
|
|
|
|
2017-02-17 01:18:00 +01:00
|
|
|
If you're going to manually test some changes in the frontend Markdown
|
|
|
|
implementation, the easiest way to do this is as follows:
|
|
|
|
|
2020-08-11 02:20:10 +02:00
|
|
|
1. Log in to your development server.
|
2020-08-11 02:09:14 +02:00
|
|
|
2. Stop your Zulip server with Ctrl-C, leaving the browser open.
|
2021-08-20 21:53:28 +02:00
|
|
|
3. Compose and send the messages you'd like to test. They will be
|
2017-02-17 01:18:00 +01:00
|
|
|
locally echoed using the frontend rendering.
|
|
|
|
|
2021-08-20 21:53:28 +02:00
|
|
|
This procedure prevents any server-side rendering. If you don't do
|
2017-02-17 01:18:00 +01:00
|
|
|
this, backend will likely render the Markdown you're testing and swap
|
|
|
|
it in before you can see the frontend's rendering.
|
2017-02-17 00:07:38 +01:00
|
|
|
|
2018-03-28 10:40:44 +02:00
|
|
|
If you are working on a feature that breaks multiple testcases, and want
|
|
|
|
to debug the testcases one by one, you can add `"ignore": true` to any
|
|
|
|
testcases in `markdown_test_cases.json` that you want to ignore. This
|
|
|
|
is a workaround due to lack of comments support in JSON. Revert your
|
|
|
|
"ignore" changes before committing. After this, you can run the frontend
|
|
|
|
tests with `tools/test-js-with-node markdown` and backend tests with
|
2020-06-27 00:35:15 +02:00
|
|
|
`tools/test-backend zerver.tests.test_markdown.MarkdownTest.test_markdown_fixtures`.
|
2018-03-28 10:40:44 +02:00
|
|
|
|
2020-08-11 01:47:49 +02:00
|
|
|
## Changing Zulip's Markdown processor
|
2016-11-06 20:47:18 +01:00
|
|
|
|
2017-11-28 08:35:16 +01:00
|
|
|
First, you will likely find these third-party resources helpful:
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- **[Python-Markdown](https://pypi.python.org/pypi/Markdown)** is the Markdown
|
2020-08-11 01:47:49 +02:00
|
|
|
library used by Zulip as a base to build our custom Markdown syntax upon.
|
2021-08-20 21:45:39 +02:00
|
|
|
- **[Python's XML ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html)**
|
2017-11-28 08:35:16 +01:00
|
|
|
is the part of the Python standard library used by Python Markdown
|
|
|
|
and any custom extensions to generate and modify the output HTML.
|
|
|
|
|
2020-08-11 01:47:49 +02:00
|
|
|
When changing Zulip's Markdown syntax, you need to update several
|
2016-11-06 20:47:18 +01:00
|
|
|
places:
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- The backend Markdown processor (`zerver/lib/markdown/__init__.py`).
|
2023-02-22 23:03:47 +01:00
|
|
|
- The frontend Markdown processor (`web/src/markdown.js` and sometimes
|
|
|
|
`web/third/marked/lib/marked.js`), or `markdown.contains_backend_only_syntax` if
|
2016-11-06 20:47:18 +01:00
|
|
|
your changes won't be supported in the frontend processor.
|
2023-02-22 23:03:47 +01:00
|
|
|
- If desired, the typeahead logic in `web/src/composebox_typeahead.js`.
|
2021-08-20 21:45:39 +02:00
|
|
|
- The test suite, probably via adding entries to `zerver/tests/fixtures/markdown_test_cases.json`.
|
2023-02-22 23:03:47 +01:00
|
|
|
- The in-app Markdown documentation (`markdown_help_rows` in `web/src/info_overlay.js`).
|
2021-08-20 21:45:39 +02:00
|
|
|
- The list of changes to Markdown at the end of this document.
|
2016-11-06 20:47:18 +01:00
|
|
|
|
2016-11-06 20:54:59 +01:00
|
|
|
Important considerations for any changes are:
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Security: A bug in the Markdown processor can lead to XSS issues.
|
2016-11-06 20:54:59 +01:00
|
|
|
For example, we should not insert unsanitized HTML from a
|
|
|
|
third-party web application into a Zulip message.
|
2021-08-20 21:45:39 +02:00
|
|
|
- Uniqueness: We want to avoid users having a bad experience due to
|
2020-08-11 01:47:49 +02:00
|
|
|
accidentally triggering Markdown syntax or typeahead that isn't
|
2016-11-06 20:54:59 +01:00
|
|
|
related to what they are trying to express.
|
2021-08-20 21:45:39 +02:00
|
|
|
- Performance: Zulip can render a lot of messages very quickly, and
|
2021-08-20 21:53:28 +02:00
|
|
|
we'd like to keep it that way. New regular expressions similar to
|
2016-11-06 20:54:59 +01:00
|
|
|
the ones already present are unlikely to be a problem, but we need
|
|
|
|
to be thoughtful about expensive computations or third-party API
|
|
|
|
requests.
|
2021-08-20 21:45:39 +02:00
|
|
|
- Database: The backend Markdown processor runs inside a Python thread
|
2016-11-06 20:54:59 +01:00
|
|
|
(as part of how we implement timeouts for third-party API queries),
|
|
|
|
and for that reason we currently should avoid making database
|
2021-08-20 21:53:28 +02:00
|
|
|
queries inside the Markdown processor. This is a technical
|
2016-11-06 20:54:59 +01:00
|
|
|
implementation detail that could be changed with a few days of work,
|
2019-01-22 15:52:38 +01:00
|
|
|
but is an important detail to know about until we do that work.
|
2021-08-20 21:45:39 +02:00
|
|
|
- Testing: Every new feature should have both positive and negative
|
2016-11-06 20:54:59 +01:00
|
|
|
tests; they're easy to write and give us the flexibility to refactor
|
|
|
|
frequently.
|
|
|
|
|
2018-11-17 20:49:40 +01:00
|
|
|
## Per-realm features
|
|
|
|
|
2020-08-11 01:47:49 +02:00
|
|
|
Zulip's Markdown processor's rendering supports a number of features
|
2021-08-20 21:53:28 +02:00
|
|
|
that depend on realm-specific or user-specific data. For example, the
|
2018-11-17 20:49:40 +01:00
|
|
|
realm could have
|
2021-03-29 04:23:35 +02:00
|
|
|
[linkifiers](https://zulip.com/help/add-a-custom-linkifier)
|
2020-10-23 02:43:28 +02:00
|
|
|
or [custom emoji](https://zulip.com/help/add-custom-emoji)
|
2018-11-17 20:49:40 +01:00
|
|
|
configured, and Zulip supports mentions for streams, users, and user
|
|
|
|
groups (which depend on data like users' names, IDs, etc.).
|
|
|
|
|
|
|
|
At a backend code level, these are controlled by the `message_realm`
|
|
|
|
object and other arguments passed into `do_convert` (`sent_by_bot`,
|
2021-08-20 21:53:28 +02:00
|
|
|
`translate_emoticons`, `mention_data`, etc.). Because
|
2021-04-25 23:11:21 +02:00
|
|
|
Python-Markdown doesn't support directly passing arguments into the
|
2020-08-11 01:47:49 +02:00
|
|
|
Markdown processor, our logic attaches these data to the Markdown
|
2018-11-17 20:49:40 +01:00
|
|
|
processor object via e.g. `_md_engine.zulip_db_data`, and then
|
2020-08-11 01:47:49 +02:00
|
|
|
individual Markdown rules can access the data from there.
|
2018-11-17 20:49:40 +01:00
|
|
|
|
|
|
|
For non-message contexts (e.g. an organization's profile (aka the
|
|
|
|
thing on the right-hand side of the login page), stream descriptions,
|
|
|
|
or rendering custom profile fields), one needs to just pass in a
|
|
|
|
`message_realm` (see, for example, `zulip_default_context` for the
|
2021-08-20 21:53:28 +02:00
|
|
|
organization profile code for this). But for messages, we need to
|
2018-11-17 20:49:40 +01:00
|
|
|
pass in attributes like `sent_by_bot` and `translate_emoticons` that
|
|
|
|
indicate details about how the user sending the message is configured.
|
|
|
|
|
2016-04-01 06:58:14 +02:00
|
|
|
## Zulip's Markdown philosophy
|
|
|
|
|
2016-11-06 20:30:27 +01:00
|
|
|
Note that this discussion is based on a comparison with the original
|
|
|
|
Markdown, not newer Markdown variants like CommonMark.
|
|
|
|
|
2016-04-01 06:58:14 +02:00
|
|
|
Markdown is great for group chat for the same reason it's been
|
|
|
|
successful in products ranging from blogs to wikis to bug trackers:
|
|
|
|
it's close enough to how people try to express themselves when writing
|
2017-01-15 05:13:22 +01:00
|
|
|
plain text (e.g. emails) that it helps more than getting in the way.
|
2016-04-01 06:58:14 +02:00
|
|
|
|
|
|
|
The main issue for using Markdown in instant messaging is that the
|
|
|
|
Markdown standard syntax used in a lot of wikis/blogs has nontrivial
|
|
|
|
error rates, where the author needs to go back and edit the post to
|
2021-08-20 21:53:28 +02:00
|
|
|
fix the formatting after typing it the first time. While that's
|
2016-04-01 06:58:14 +02:00
|
|
|
basically fine when writing a blog, it gets annoying very fast in a
|
|
|
|
chat product; even though you can edit messages to fix formatting
|
2021-08-20 21:53:28 +02:00
|
|
|
mistakes, you don't want to be doing that often. There are basically
|
2016-04-01 06:58:14 +02:00
|
|
|
2 types of error rates that are important for a product like Zulip:
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- What fraction of the time, if you pasted a short technical email
|
2019-01-22 15:52:38 +01:00
|
|
|
that you wrote to your team and passed it through your Markdown
|
|
|
|
implementation, would you need to change the text of your email for it
|
2021-08-20 21:53:28 +02:00
|
|
|
to render in a reasonable way? This is the "accidental Markdown
|
2019-01-22 15:52:38 +01:00
|
|
|
syntax" problem, common with Markdown syntax like the italics syntax
|
|
|
|
interacting with talking about `char *`s.
|
2016-04-01 06:58:14 +02:00
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- What fraction of the time do users attempting to use a particular
|
2021-08-20 21:53:28 +02:00
|
|
|
Markdown syntax actually succeed at doing so correctly? Syntax like
|
2019-01-22 15:52:38 +01:00
|
|
|
required a blank line between text and the start of a bulleted list
|
|
|
|
raise this figure substantially.
|
2016-04-01 06:58:14 +02:00
|
|
|
|
|
|
|
Both of these are minor issues for most products using Markdown, but
|
|
|
|
they are major problems in the instant messaging context, because one
|
2019-01-22 15:52:38 +01:00
|
|
|
can't edit a message that has already been sent before others read it
|
|
|
|
and users are generally writing quickly. Zulip's Markdown strategy is
|
|
|
|
based on the principles of giving users the power they need to express
|
|
|
|
complicated ideas in a chat context while minimizing those two error rates.
|
2016-04-01 06:58:14 +02:00
|
|
|
|
2020-08-11 01:47:54 +02:00
|
|
|
## Zulip's changes to Markdown
|
2016-04-01 06:58:14 +02:00
|
|
|
|
|
|
|
Below, we document the changes that Zulip has against stock
|
|
|
|
Python-Markdown; some of the features we modify / disable may already
|
|
|
|
be non-standard.
|
|
|
|
|
2020-06-29 23:52:38 +02:00
|
|
|
**Note** This section has not been updated in a few years and is not
|
|
|
|
accurate.
|
|
|
|
|
2016-04-01 06:58:14 +02:00
|
|
|
### Basic syntax
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Enable `nl2br` extension: this means one newline creates a line
|
2016-04-01 06:58:14 +02:00
|
|
|
break (not paragraph break).
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Allow only `*` syntax for italics, not `_`. This resolves an issue where
|
2017-02-14 21:23:34 +01:00
|
|
|
people were using `_` and hitting it by mistake too often. Asterisks
|
|
|
|
surrounded by spaces won't trigger italics, either (e.g. with stock Markdown
|
|
|
|
`You should use char * instead of void * there` would produce undesired
|
|
|
|
results).
|
2016-04-01 06:58:14 +02:00
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Allow only `**` syntax for bold, not `__` (easy to hit by mistake if
|
2017-02-14 21:23:34 +01:00
|
|
|
discussing Python `__init__` or something).
|
2016-04-01 06:58:14 +02:00
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Add `~~` syntax for strikethrough.
|
2016-11-08 07:26:38 +01:00
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Disable special use of `\` to escape other syntax. Rendering `\\` as
|
2016-04-01 06:58:14 +02:00
|
|
|
`\` was hugely controversial, but having no escape syntax is also
|
2021-08-20 21:53:28 +02:00
|
|
|
controversial. We may revisit this. For now you can always put
|
2016-04-01 06:58:14 +02:00
|
|
|
things in code blocks.
|
|
|
|
|
|
|
|
### Lists
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Allow tacking a bulleted list or block quote onto the end of a
|
2017-02-14 21:23:34 +01:00
|
|
|
paragraph, i.e. without a blank line before it.
|
2016-04-01 06:58:14 +02:00
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Allow only `*` for bulleted lists, not `+` or `-` (previously
|
2016-04-01 06:58:14 +02:00
|
|
|
created confusion with diff-style text sloppily not included in a
|
2017-02-14 21:23:34 +01:00
|
|
|
code block).
|
2016-04-01 06:58:14 +02:00
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Disable ordered list syntax: stock Markdown automatically renumbers, which
|
2017-02-14 21:23:34 +01:00
|
|
|
can be really confusing when sending a numbered list across multiple
|
2016-04-01 06:58:14 +02:00
|
|
|
messages.
|
|
|
|
|
|
|
|
### Links
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Enable auto-linkification, both for `http://...` and guessing at
|
2016-04-01 06:58:14 +02:00
|
|
|
things like `t.co/foo`.
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Force links to be absolute. `[foo](google.com)` will go to
|
2020-03-27 01:32:21 +01:00
|
|
|
`http://google.com`, and not `https://zulip.com/google.com` which
|
2016-04-01 06:58:14 +02:00
|
|
|
is the default behavior.
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Set `title=`(the URL) on every link tag.
|
2016-04-01 06:58:14 +02:00
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Disable link-by-reference syntax,
|
2020-03-27 01:32:21 +01:00
|
|
|
`[foo][bar]` ... `[bar]: https://google.com`.
|
2016-04-01 06:58:14 +02:00
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Enable linking to other streams using `#**streamName**`.
|
2016-12-06 00:13:06 +01:00
|
|
|
|
2016-04-01 06:58:14 +02:00
|
|
|
### Code
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Enable fenced code block extension, with syntax highlighting.
|
2016-04-01 06:58:14 +02:00
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Disable line-numbering within fenced code blocks -- the `<table>`
|
2016-04-01 06:58:14 +02:00
|
|
|
output confused our web client code.
|
|
|
|
|
|
|
|
### Other
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Disable headings, both `# foo` and `== foo ==` syntax: they don't
|
2016-04-01 06:58:14 +02:00
|
|
|
make much sense for chat messages.
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Disabled images with `![]()` (images from links are shown as an inline
|
2017-02-14 21:23:34 +01:00
|
|
|
preview).
|
2016-04-01 06:58:14 +02:00
|
|
|
|
2021-08-20 21:53:28 +02:00
|
|
|
- Allow embedding any avatar as a tiny (list bullet size) image. This
|
2016-04-01 06:58:14 +02:00
|
|
|
is used primarily by version control integrations.
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- We added the `~~~ quote` block quote syntax.
|