This is a major upgrade, and requires some significant compatibility
work:
* Migrating the pattern-removal logic to use the Registry feature.
* Handling the removal of positional arguments in markdown extensions.
* Handling the removal of safe mode.
This setting splits away part of responsibility from THUMBOR_URL.
Now on, this setting will be responsible for controlling whether
we thumbnail images or not by asking bugdown to render image links
to hit our /thumbnail endpoint. This is irrespective of what
THUMBOR_URL is set to though ideally THUMBOR_URL should be set
to point to a running thumbor instance.
This commit adds a custom Markdown include extension which is
identical to the original except when a macro file can't
be found, it raises a custom JsonableError exception, which
we can catch and then trigger an appropriate test failure.
Fixes: #10947
This commit changes the return type of get_possible_mentions_info to a
list instead of a dict, thus disposing off the hacky logic of storing
users with duplicate full names with name|id keys that made the code
obfuscated.
The other functions continue to use the dicts as before, however, there
are minor variable changes where needed in accordance with the updated
definition of get_possible_mentions_info.
We now attach zulip_db_data to the markdown engines
for classes that need it. This was the last remaining
global we had, so we remove `arguments.py` here.
The Markdown processor makes it fairly simple for
the helper classes to access the `md` engine. We
now write `_md_engine.zulip_message` to avoid having
the current message in the global namespace.
Note that we do reuse engines for multiple messages,
but each engine is specific to a realm. And we therefore
avoid even the theoretical possibility of leaking message
data between realms.
This makes us consistent with how we import codehilite.
Using Python's normal import mechanism avoids some overhead
with Markdown having to parse dotted notation.
These modules are tiny, so they shouldn't impact startup
too much. Also, by explicitly importing them, we avoid
the pitfall of having a sucessful startup and a broken
renderer.
We were building the same link regex every time
we build a Markdown engine, which happens twice
per realm. It's an expensive operation due to
the complexity of the regex and us reading a file.
Nested classes are kind of expensive in Python,
particularly when you throw in mypy annotations.
Also, flatter is arguably better, although it is
kind of a pain here not to have closures.
This change avoids hitting the Django ORM when
we don't find any possible group mentions in
the message content.
Django doesn't necessarily actually hit the database,
but it's still slow and shows up in profiles.
We can rely on `message_realm` being the same
as `message.sender.realm`, which allows us to
skip two queries to the database for the rare
Zephyr mirroring case.
This is a prepartory commit for the upcoming changes. It was meaningful
to extract this one out because this function is essentially a condition
check on whether a given url is one of the user_uploads or an external
one. Based on its value we decide whether a url must be thumbnailed or
not and thus this function will also be used in an upcoming commit
patching lib/thumbnail.py to do the same check before thumbnail url
generation.
We are basically adding a check for url's to be external (belonging
to some 3rd party web site hosting the image) or be one of the
user uploaded files. User uploaded files are served by a separate
endpoint which is /user_uploads/. Any other local url such as
/user_avatars/ or /static/ should never be sent to thumbor for
thumbnailing.
Not sending /user_avatars/ to thumbor for thumbnailing makes sense
because they are already properly thumbnailed and stored properly.
/static/ urls host very few images we use for demo and can be safely
be excluded from thumbnailing.
The Zulip API is to be used on both development and production
servers, and really we just need to talk about zuliprc files.
There's a similar issue for the JS docs, but we need to fix the
copy/paste issues with those as well.
We start by stripping the ids in front of the name before the database
lookup. This has the advantage of not mentioning anyone if an incorrect
user id and full name combination is specified, as well as not having
the query the database twice, once by fullname and next by id.
Previously, we were storing only the most recent person with the same
full name as others; this commit adds new keys to the dict such that
simply looking by name would get you the newest user with this name,
and the get_user_by_id function can index the remaining users.
Having HTML (or HTML-like) content in the examples was making parts of
the content invisible, since the browser identified them as HTML tags
rather than verbose text.
python-twitter was consuming a significant amount of import time.
However, this commit seems to not save any time at all, probably
because its recursive dependencies are imported elsewhere in Zulip.
Extracting this helper library will help us avoid an import loop
between notifications.py and message.py (with bugdown in between).
But in addition to that, it's a more natural model, since some of the
uses for these functions weren't part of the notifications code
anyway.
Whenever a parameter for an endpoint in our REST API has a default
value, it is displayed under the "Description" section of the
arguments table in the docs.
This way, we don't need to explicitly indicate the default values in the
description, thus avoiding duplicate information in the OpenAPI source.
This has the benefit that we now get the usual data about the
user/request/etc. in error emails related to bugdown exceptions;
previously we were just getting the traceback in the emails (since our
`mail_admins` template was very simple) and no other debugging
details.
Comments tweaked by tabbott to help make clear exactly what's going on
here, since it's a little subtle and a little hacky.
Fixes#8843.
Our nested code block processor wasn't using the correct test for
whether a paragraph was empty of other content; first, we need to
confirm no children, and second, we need to confirm there is no text
before/after the code block element inside the p tag.
Various pieces of our thumbor-based thumbnailing system were already
merged; this adds the remaining pieces required for it to work:
* a THUMBOR_URL Django setting that controls whether thumbor is
enabled on the Zulip server (and if so, where thumbor is hosted).
* Replaces the overly complicated prototype cryptography logic
* Adds a /thumbnail endpoint (supported both on web and mobile) for
accessing thumbnails in messages, designed to support hosting both
external URLs as well as uploaded files (and applying Zulip's
security model for access to thumbnails of uploaded files).
* Modifies bugdown to, when THUMBOR_URL is set, render images with the
`src` attribute pointing /thumbnail (to provide a small thumbnail
for the image), along with adding a "data-original" attribute that
can be used to access the "original/full" size version of the image.
There are a few things that don't work quite yet:
* The S3 backend support is incomplete and doesn't work yet.
* The error pages for unauthorized access are ugly.
* We might want to rename data-original and /thumbnail?size=original
to use some other name, like "full", that better reflects the fact
that we're potentially not serving the original image URL.
Some of the arguments in our REST API have to be sent as JSON objects,
which only accept double quotes for strings.
If we display the examples as normal Python objects, the syntax would be
quite similar but it would use simple quotes, which is invalid JSON (and
isn't accepted by the server).
That's why all the examples should be JSON-serialized in order to comply
with the API's requirements.
Until now, we were displaying an empty "Arguments" section in the REST
API docs whenever an endpoint didn't use input arguments.
In the case of OpenAPI-based docs, that was also annoying because it
required removing the {generate_api_arguments_table|...} template tag or
leaving an empty "parameters" field in zulip.yaml.
After this, we show a paragraph indicating that the endpoint doesn't
need arguments under the "Arguments" section.
If the argument table generator isn't able to reach a file that is
supposed to read, the two most likely causes are:
- The source .md documentation file that is requesting the table has a
typo in the path.
- The file with the arguments isn't there, for some reason.
In either case, we don't want the server to fail silently-ish and
display the docs as if there was no arguments for that endpoint. That's
why the most logic thing to do is to raise an exception and let the
admins know that there's something wrong.
This commit adds a Markdown tree-processor extension that renders
multi-line code blocks that are nested inside lists with the
formatting. Note that the code block could be nested inside multiple
list levels and would still get rendered correctly.
Tim: This fixes the need for unpleasant workarounds like
f5bfa4e793 and makes nested code blocks
in our documentation look exactly how users would expect them to.
This has two advantages;
* We can split bugdown/__init__.py into several modules, and each
module can access these arguments by importing these
* We get rid of the super-ugly `global db_data` construct, replacing
it with a only slightly ugly monkey-ish patching of the
`zerver.lib.bugdown.arguments` module, which is at least
considerably more clear on reading as to what it's purpose is.
The only changes visible at the AST level, checked using
https://github.com/asottile/astpretty, are
zerver/lib/test_fixtures.py:
'\x1b\\[(1|0)m' ↦ '\\x1b\\[(1|0)m'
'\\[[X| ]\\] (\\d+_.+)\n' ↦ '\\[[X| ]\\] (\\d+_.+)\\n'
which is fine because re treats '\\x1b' and '\\n' the same way as
'\x1b' and '\n'.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This bug is caused by the conversion of newlines to `<br>` statements,
since `>` is not allowed as a character around an emoticon during
translation.
Also, add a new test case for preventing this bug from occurring in the
future.
Fix#9763.
Tweaked by tabbott to add a test and fix a super subtle issue with the
relative_settings_link variable having been set once the first time a
/help article was rendered.
This commit increases the rendered_content limit from 2x to 10x of the
original message length.
Earlier, we had placed a limit of MAX_MESSAGE_LENGTH * 2 for the
rendered content (explained in commit
77addc5456). That limit was based on
the assumption that in most cases, the rendered content wouldn't cause
a large increase in message length. However, quite prominently in
syntax highlighted codeblocks, that wasn't true and this caused the
limit condition to be hit for long messages composed primarily of code
blocks.
Example: The following message would render close to 10x it's original size.
```py
if:
def:
print("x", var)
x = y
```
Because the syntax highlighted logic is extremely compressible, having
rendered_content reach up to 100KB doesn't create a network
performance problem.
Also switches the default behaviour of the code to not translate the
emoticons. Earlier, the code was testing-aware, and used to translate
when there was no user profile data available(assuming that as a testing
environment).
In 18e43895ff we replaced
stream subscribe buttons with stream links. The new feature
has been well tested and well received for over a year now,
so it's safe to remove the older feature at this point.
Older sites will have super old messages that still have the
rendered markup; this commit does not attempt to address those
situations. Most likely, clicking on an old button in the old
message will either do nothing or look like a message reply.
This commit refactors the bugdown to perform a lookup only on active
realm emojis. This was needed because once we migrate realm emojis
to be addressed by `id` rather than name, it will be costly to
perform a lookup on all the realm emojis.
This field has been unused by clients for some time, and isn't great
for our public archive feature plans (where we'll not want to be
including email addresses in messages).
Add `translate_emoticons` to `prop_types` and `expected_keys`.
Furthermore, create a emoji-translating Markdown inline pattern.
Also use a JavaScript version of `translate_emoticons` and then use
this function during Markdown previews and as a preprocessor. This
is only needed for previews, because usually emoticon translation
happens on the backend after sending.
Add tests for emoticon translation, a settings UI, and a /help/ page
as well.
Tweaked by tabbott to fix various test failurse as well as how this
handles whitespace, requiring emoticons to not have adjacent
characters.
Fixes#1768.
The Markdown extension that lives inside
zerver/lib/bugdown/api_code_example.py previously used ujson.
ujson's `dumps` function doesn't accept a `separators` argument,
which means we have no control over how the JSON is pretty-printed.
This resulted in JSON fixtures with no spaces after the colon, which
looks unnecessarily convoluted.
So now, we use the built-in `json` module to get around this.
For further reading, this issue
<https://github.com/esnme/ultrajson/issues/82> opened on ujson's
repo explains why they are reluctant to support such formatting
due to performance considerations.
This commit prefixes stream names in urls with stream ids,
so that the urls don't break when we rename streams.
strean name: foo bar.com%
before: #narrow/stream/foo.20bar.2Ecom.25
after: #narrow/stream/20-foo-bar.2Ecom.25
For new realms, everything is simple under the new scheme, since
we just parse out the stream id every time to figure out where
to narrow.
For old realms, any old URLs will still work under the new scheme,
assuming the stream hasn't been renamed (and of course old urls
wouldn't have survived stream renaming in the first place). The one
exception is the hopefully rare case of a stream name starting with
something like "99-" and colliding with another stream whose id is 99.
The way that we enocde the stream name portion of the URL is kind
of unimportant now, since we really only look at the stream id, but
we still want a safe encoding of the name that is mostly human
readable, so we now convert spaces to dashes in the stream name. Also,
we try to ensure more code on both sides (frontend and backend) calls
common functions to do the encoding.
Fixes#4713
This enforces `**` around all the mentions including "at-all" and
"at-everyone" mentions. Hence this makes `@all` and `@everyone`
invalid mentions, resulting into proper syntax for these mentions as
`@**all**` and `@**everyone**` respectively.
Note from tabbott: This removes an old feature/syntax, which made
sense back when @Tim was also a way to mention a user with Tim as
their first name. Given how nice typeahead is now, the user part of
the feature was removed a while ago; this should have gone at the same
time.
Fixes: #8143.
This commit modifies the Markdown extension in bugdown/api_code_examples.py
to support rendering code examples in multiple languages by specifying
the language like so:
{generate_code_example(python)|doc.md|example}
This makes us one step closer towards adding support for testable
JavaScript code examples.
If some bug in Bugdown results in a rendered message content that is
bigger than twice the message size, we now just throw an exception
from Bugdown. This is considerably better than the old behavior,
which might result in an enormous message being placed in the database
(potentially, bigger than the 1MB limit to store in memcached), which
would in turn result in tragic consequences.
This fixes#8322, in that it prevents the super bad outcome seen there
(where basically Zulip became unusable for everyone on the stream
where the message is posted). Now, the failure mode is just the
message failing to send. Still not ideal (and requires further work
on the URL embed feature), but not a minor problem, not a major one.
This commit adds support for passing in an argument to the macro
"call" to explicitly specify a fixture to render, like so:
{generate_code_example|doc_name|fixture(stream_message_with_args)}
To generate a code exammple,
{generate_code_example|<api_doc_md>|example} sounds better and
more intuitive than,
{generate_code_example|<api_doc_md>|method}
Some of our code examples can only be run with administrator
credentials (such as create-user). Thus, the Markdown extension
for generating code examples should have an option to include
the lines that recommend using an admin zuliprc instead of a
non-admin one.
Now that the Markdown extension defined in
zerver/lib/bugdown/api_generate_examples depended on code in the
tools/lib/* directory, it caused the production tests to fail since
the tools/ directory wouldn't exist in a production environment.
This commit adds a Markdown extension that allows the following
syntax,
{generate_code_example|<md_file_name>|<fixture or method>}
to generate code examples and fixtures found in tools/lib/api_tests.py
and templates/zerver/api/fixtures.json, respectively.
This also amends a commit from Brock Whittaker <brock@zulipchat.com>
that merges two separate functions for YouTube videos and Vimeo videos
into a generic video recall function.
Fixes#7550.
This commit does the following:
* Move the Arguments table data from stream-message.md and
private-message.md to a JSON file.
* Add a Markdown extension that allows one to include and render
a table from a JSON file like so:
{generate_arguments_table|arguments.json|private-stream.md}
* Use Bootstrap's .table class to format the table instead of
relying on custom CSS.
This uses the correct regex for strikethrough. Also, added
a test to make sure that strikethrough works when it contains
link with whitespace.
Fixes#7596.
This name hasn't been right since f7f2ec0ac back in 2013; this handler
sends the log record to a queue, whose consumer will not only maybe
send a Zulip message but definitely send an email. I found this
pretty confusing when I first worked on this logging code and was
looking for how exception emails got sent; so now that I see exactly
what's actually happening here, fix it.
Adds a markdown preprocessor that finds ordered lists where all items
use the same number and change them to be in normal increasing order,
starting with that number.
Fixes#5159.
Hides URL if the message content == image url so that sending gifs or
images feels less cluttered. Uses the url_to_a() function to generate
the expected url string for matching.
Fixes#7324.
The character ">" now only starts a blockquote if the resulting
blockquote would be non-empty. Thus, by itself, ">" is now
interpreted literally by bugdown, fixing #687. The message
with contents consisting of ">>>" is now parsed as a doubly
(not triply) nested blockquote with contents ">". Properly
formed blockquotes have identical behavior as before, but now
bugdown can no longer produce empty blockquotes as output.
Fixes#2886, #687.
This commit helps reduce clutter on the navigation sidebar.
Creates new directories and moves relevant files into them.
Modifies index.rst, symlinks, and image paths accordingly.
This commit also enables expandable/collapsible navigation items,
renames files in docs/development and docs/production,
modifies /tools/test-documentation so that it overrides a theme setting,
Also updates links to other docs, file paths in the codebase that point
to developer documents, and files that should be excluded from lint tests.
Note that this commit does not update direct links to
zulip.readthedocs.io in the codebase; those will be resolved in an
upcoming follow-up commit (it'll be easier to verify all the links
once this is merged and ReadTheDocs is updated).
Fixes#5265.
While fixing an issue related to email gateway messages not getting
rendered properly, I unknowingly introduced a bug in the markdown
engine updation code. This commit fixes it. The issue was that for
a realm having email gateway setup, updation of realm filters would
lead to the updation of only one of the markdown engines not both.
The intended use of $$ is for inline expressions, not for multiline
ones; ```math is an acceptable alternative for the latter. Hence,
the $$-syntax for inline TeX no longer permits newlines within it.
This was also necessary for the next change to be sensible; namely
allowing for spaces around both $$ when crafting inline TeX instead of
forcing everything to be crammed together, e.g. $$x=7$$. In order to
avoid uninentionally creating inline expressions, the opening and
closing $$'s of an inline expression must now both exactly consist of
two dollar signs, no more and no less.
Fixes: #6488.
Generally emails are not written with markdown in mind and hence
sometimes render in strange ways. This commit fixes a particular
issue that was causing whitespace before paragraphs to be treated
as code block due to which email content was being rendered in a
box that scrolls in right direction a lot.
Fixes: #7045.
This adds the data model and bugdown support for the new UserGroup
mention feature.
Before it'll be fully operational, we'll still need:
* A backend API for making these.
* A UI for interacting with that API.
* Typeahead on the frontend.
* CSS to make them look pretty and see who's in them.
We now have a MentionData class that encapsulates
the users who are possibly mentioned in a message.
Not that the rendering code may not keep all the mentions,
since things like backticks will suppress the mention.
We populate this now in do_send_messages, so that we can use
the info earlier in the message-sending process. This info
now gets passed down the call stack as an optional parameter.
Note that bugdown.convert() still populates the data when its
callers decline to pass in a MentionData object.
This is mostly a preparatory commit, as we don't take advantage
of the data yet in do_send_messages.
Nobody has used this feature in years, and it causes certain types of
markdown issues in development to completely DoS the development
environment by making it possible for the "Bugdown timeout" exception
handler to timeout in bugdown.
Since we already send an email to the server administrators, there's
no need to replace this feature with anything.
This commit switches to use sprite sheets for rendering emojis
in all the remaining places, i.e., message bodies and composebox
typeahead. This commit also includes some changes to notifications.py
file so that the spans used for rendering emojis can be converted
to corresponding image tags so that we don't break the emoji rendering
in missed message emails since we can't use sprite sheets there.
As part of switching the bugdown system to use sprite sheets, we need
to switch the name_to_codepoint mappings to match the new sprite
sheets. This has the side effect of fixing a bunch of emoji like
numbers and flag emoji in the emoji pickers.
Fixes: #3895.
Fixes: #3972.
We now triage message content for possible mentions before
going to the cache/DB to get name info. This will create an
extra data hop for messages with mentions, but it will save
a fairly expensive cache lookup for most messages. (This will
be especially helpful for large realms.)
[Note that we need a subsequent commit to actually make the speedup
happen here, since avatars also cause us to look up all users in
the realm.]
Given typeahed and the fact that this only worked if the person had a
full name that didn't contain whitespace, this side effect of the
original @shortname mentionfeature that we removed was experienced by
users as a bug.
Fixes#6142.
Process the unicode emojis in twitter link previews and render them
properly. Before this we were not processing the unicode emojis in
twitter link previews and hence on the systems which don't have
fonts for displaying them they were rendered as blank boxes.
Fixes: #5427.
This commit renames list named `to_linkify` in twitter link processor
to `to_process` and adds a `type` field to each entry in it to
indicate the type of data represented by that particular entry.
Unicode emojis when rendered should display canonical short name.
Similarly, the alt text should be of the format `:<short_name>:`.
For both of these we currently display the actual unicode symbol.
As some systems don't have the fonts necessary for displaying them
properly, they are rendered as empty square blocks. This commit also
ensures that the markup generated for emoji generated by canonical
name and by an unicode emoji is same.
Fixes: #5555.
We used shortnames for mentioning users before we had autocomplete
feature. Since we now have autocomplete typeahead, this syntax is
no more useful and just causes problems. This commit removes the
shortname mention syntax.
Fixes: #4189.
A deactivated realm emoji should neither be accepted further as a
reaction nor its further occurences in a message be rendered as an
emoji. However, all the old occurences should continue to render
normally.
The `data-toggle` property prevented the new style of overlay modals
from launching, and regardless, isn't a future-proof options for how
this should work.
The regex we were using didn't cover all the unicode blocks
to which our emojis belong. This commit fixes the regex to
include all the unicode blocks and also updates the
corresponding JS regex in marked.js.
Fixes: #3460.
Unicode codepoints are of minimum length 4, padded with extra
zeroes if the length is less than 4. This commit fixes the
`unicode_emoji_to_codepoint()` to ensure that the codepoint
it generates are of correct length.