Commit Graph

358 Commits

Author SHA1 Message Date
Anders Kaseorg 643bd18b9f lint: Fix code that evaded our lint checks for string % non-tuple.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-04-23 15:21:37 -07:00
Puneeth Chaganti ca8e9fb800 bugdown: Make the youtube URL regex slightly easier to read. 2019-04-13 20:25:37 -07:00
Puneeth Chaganti 8afda1c1bb bugdown: Show preview for urls copied from the Youtube share widget. 2019-04-13 20:25:37 -07:00
overide b263671c9e markdown: Fix unordered list not rendering in blockquote.
This fixes an issue where the hanging unordered list was not
rendering in blockquote; the problem was that we were not
adding an empty line(to satisfy the markdown) for hanging
unordered list if it is in blockquote. Both blockquote
and code block is fenced but we want to avoid rendering
the list if it's in the code block but not in blockquote.

Fixes: #11916.
2019-04-13 19:23:59 -07:00
YashRE42 a724a38c03 markdown: Improve handling of broken img urls.
Some urls which end with image file extensions (eg .jpg) may link to
html pages. This adds handling for linx.li, wikipedia.org and
pasteboard.co. If it is possible, we redirect to the actual image url
otherwise we do not attempt to render it as an image.

Fixes #10438.
2019-03-08 13:39:34 -08:00
Rohitt Vashishtha 3ed85f4cd7 Revert "bugdown: Process word boundaries properly in realm_filters."
This reverts commit ff90c0101c but keeps
the test cases added for reference.

This was reverted because it was both not a clean solution and created
other realm filters bugs involving dashes (etc.).
2019-03-07 11:03:35 -08:00
overide 58d28eed5d markdown: Fix emojis not rendering with :bogus: in the line.
This fixes an issue where invalid emoji name prevents following
emojis from rendering.

This reverts the code change in
8842349629, while still passing the
tests added in that commit (it seems the original commit had
misdiagnosed an ordering bug and thus introduced this issue).

Fixes: #11770.
2019-03-05 16:05:25 -08:00
Bennet Sunder 7c5f316cb8 alert_words: Performance improvements in looking for alert_words.
This commit leverages the ahocorasick algorithm to build a set of user_ids
that have their alert_words present in the message. It runs in linear time
of the order of length of the input message as opposed to number of
alert_words. This is after building a ahocorasick Automaton which runs
in O(number of alert_words in entire realm) which is usually cached.
2019-03-01 15:36:39 -08:00
overide 0dcfc22406 markdown: Fix numbered list handling of blank lines between blocks.
This fixes an issue where blank lines between blocks were causing
auto-numbering of list to stop before the blank line resulting
in two separate numbered list instead of one.

Edited significantly by tabbott to explain the tricky details in the
comments.

Fixes: #11651.
2019-03-01 15:29:07 -08:00
Tim Abbott d6c09eac51 bugdown: Add support for no_previews argument.
This allows us to have some features using bugdown rendering where
inline image previews will not be rendered (which would be problematic
for e.g. stream descriptions).
2019-02-28 16:54:04 -08:00
Rohitt Vashishtha 44ec83ef28 markdown: Render silent mentions as **name**.
This change should help people discover to distinguish
silent mentions in text as a part of Zulip syntax while
differentiating them from regular mentions.
2019-02-20 10:41:42 -08:00
Rohitt Vashishtha 57b9991396 markdown: Change syntax of silent mentions ( _@person -> @_person). 2019-02-20 10:41:42 -08:00
Anders Kaseorg e12c433745 bugdown: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:25:22 -08:00
Steve Howell c2fcfc087a bugdown: Include message id in exceptions. 2019-01-29 12:49:56 -08:00
Rohitt Vashishtha ff90c0101c bugdown: Process word boundaries properly in realm_filters.
Earlier, our realm filters didn't render for languages that do not
use spaces (eg: Japanese) since we used to check for the presence
of an actual space character. This commit replaces that logic with
a complex scheme to detect word boundaries.

Also, we convert the RealmFilterPattern to subclass InlineProcessor
and make use of the new no-op feature in py-markdown 3.0.1 where we
can tell py-markdown that our pattern didn't find a match despite
the initial regex getting matched.

Fixes #9883.
2019-01-28 14:48:15 -08:00
Steve Howell ad071ced47 bugdown: Avoid recomputing the stream-link regex. 2019-01-28 13:12:37 -08:00
Rohitt Vashishtha 2dc447d707 bugdown: List py-markdown 3.0.1 features that we do not use.
Tweaked by tabbott to extend the documentation.
2019-01-28 13:12:37 -08:00
Rohitt Vashishtha 434094e599 bugdown: Restructure Bugdown to extend Markdown from being an extension.
Since we are building our parser from scratch now:

1. We have control over which proccessor goes at what priority number.
   Thus, we have also shifted the deprecated `.add()` calls to use the
   new `.register()` calls with explicit priorities, but maintaining
   the original order that the old method generated.

2. We do not have to remove the processors added by py-markdown that
   we do not use in Zulip; we explicitly add only the processors we
   do require.

3. We can cluster the building of each type of parser in one place,
   and in the order they need to be so that when we register them,
   there is no need to sort the list. This also makes for a huge
   improvement in the readability of the code, as all the components
   of each type are registered in the same function.

These are significant performance improvements, because we save on
calls to `str.startswith` in `.add()`, all the resources taken to
generate the default to-be-removed processors and the time taken to
sort the list of processors.

Following are the profiling results for the changes made. Here, we
build 10 engines one after the other and note the time taken to build
each of them. 1st pass represents the state after this commit and 2nd
pass represent the state after some regex modifications in the commits
that follow by Steve Howell. All times are in microseconds.

| nth Engine | Old Time | 1st Pass | 2nd Pass |
| ---------- | -------- | -------- | -------- |
|          1 |  92117.0 |  81775.0 |  76710.0 |
|          2 |   1254.0 |    558.0 |    341.0 |
|          3 |   1170.0 |    472.0 |    305.0 |
|          4 |   1155.0 |    519.0 |    301.0 |
|          5 |   1170.0 |    546.0 |    326.0 |
|          6 |   1271.0 |    609.0 |    416.0 |
|          7 |   1125.0 |    459.0 |    299.0 |
|          8 |   1146.0 |    476.0 |    390.0 |
|          9 |   1274.0 |    446.0 |    301.0 |
|         10 |   1135.0 |    451.0 |    297.0 |
2019-01-28 13:12:37 -08:00
Rohitt Vashishtha 9f2c52c86e bugdown: Rename variables regex to REGEX for importing regex module. 2019-01-28 12:00:58 -08:00
Steve Howell 3b7d899532 bugdown: Use CompiledPattern in AtomicLinkPattern.
We avoid re-computing the regex string here, and we
also avoid re-compiling the regex itself.

I decided to put the "one_time" decorator in the
bugdown file itself, just to reduce friction in
folks reading the "buyer beware" comments.

Unfortunately, we can't use this for the
get_web_link_regex() function due to testing concerns,
so that continues to do an inelegant cache-with-global-var
scheme.
2019-01-28 11:58:47 -08:00
Steve Howell eea711a805 bugdown: Flatten get_web_link_regex().
We use early-exit to flatten the code.

I also tweaked the comments a bit based on some recent
profile findings.  (e.g. reading the file isn't actually
a big bottleneck, it's more the regex itself)
2019-01-28 11:58:46 -08:00
Steve Howell 852756aeb3 bugdown: Eliminate LinkPattern class.
The only code that used LinkPattern was AtomicLinkPattern.

We just move the helper method into AtomicLinkPattern.
2019-01-28 11:58:05 -08:00
Steve Howell 77446a710c bugdown: Optimize CompiledPattern.
We don't need the hacky step of passing in a blank
regex to the superclass's __init__ function.

All we need to do is assign `md` to `self.md`.
2019-01-28 11:57:28 -08:00
Steve Howell 025df33d7a bugdown: Rename VerbosePattern to CompiledPattern.
This class only requires a compiled regex--it's up
to the callers how they compile it (verbose or
otherwise).
2019-01-28 11:56:01 -08:00
Rohitt Vashishtha 96aa1d4b37 markdown: Reduce mentions inside blockquotes to silent-mentions.
On the backend, we extend the BlockQuoteProcessor's clean function that
just removes '>' from the start of each line to convert each mention to
have the silent mention syntax, before UserMentionPattern is invoked.

The frontend, however, has an edge case where if you are mentioned in
some message and you quote it while having mentioned yourself above
the quoted message, you wouldn't see the red highlight till we get the
final rendered message from the backend.

This is such a subtle glitch that it's likely not worth worrying about.

Fixes #8025.
2019-01-16 16:08:37 -08:00
Rohitt Vashishtha f993fdd480 markdown: Add _@**Name** syntax for silent mentions.
These mentions look like regular mentions except they do not
trigger any notification for the person mentioned. These are
primarily to be used when you make a bot take an action and
the bot mentions you, or when you quote a message that mentions
you.

Fixes #11221.
2019-01-16 16:01:06 -08:00
Harshit Bansal 5f76a65b1d emoji: Make unicode/span emojis more accessible.
This commit adds `aria-label="<title_text>"` and `role="img"` to
the generated HTML.

Fixes: #5975.
2019-01-16 09:07:19 -08:00
Rohitt Vashishtha b7c5ae7bca dependencies: Upgrade markdown from 2.6.11 -> 3.0.1.
This is a major upgrade, and requires some significant compatibility
work:
* Migrating the pattern-removal logic to use the Registry feature.
* Handling the removal of positional arguments in markdown extensions.
* Handling the removal of safe mode.
2019-01-11 11:40:18 -08:00
Aditya Bansal 3ee69f3da9 thumbnails: Add setting THUMBNAIL_IMAGES.
This setting splits away part of responsibility from THUMBOR_URL.
Now on, this setting will be responsible for controlling whether
we thumbnail images or not by asking bugdown to render image links
to hit our /thumbnail endpoint. This is irrespective of what
THUMBOR_URL is set to though ideally THUMBOR_URL should be set
to point to a running thumbor instance.
2019-01-04 10:27:04 -08:00
Rohitt Vashishtha c4e50a34d3 bugdown: Refactor get_user to get_user_by_name.
Also adds a warning against the use of this function.
2018-11-29 10:19:08 -08:00
Rohitt Vashishtha 681368b937 bugdown: Refactor get_possible_mentions_info and related functions.
This commit changes the return type of get_possible_mentions_info  to a
list instead of a dict, thus disposing off the hacky logic of storing
users with duplicate full names with name|id keys that made the code
obfuscated.

The other functions continue to use the dicts as before, however, there
are minor variable changes where needed in accordance with the updated
definition of get_possible_mentions_info.
2018-11-28 14:07:52 -08:00
Tim Abbott e4946dd182 bugdown: Rename full_names to mention_texts.
This is another straight variable rename, which will help clarify the
upcoming commits.
2018-11-28 14:07:23 -08:00
Rohitt Vashishtha ccdf893af7 bugdown: Rename get_full_name_info to get_possible_mentions_info. 2018-11-28 14:04:50 -08:00
Steve Howell ff9a6c5ced minor: Rename subject -> topic_name in bugdown. 2018-11-08 16:21:14 +00:00
Steve Howell d05f731c1c Eliminate the use of arguments.db_data.
We now attach zulip_db_data to the markdown engines
for classes that need it.  This was the last remaining
global we had, so we remove `arguments.py` here.
2018-11-07 10:44:49 -08:00
Steve Howell b66304e167 refactor: Pass db_data down to helpers.
This mostly preps for the next commit.
2018-11-07 10:44:49 -08:00
Steve Howell fa6f642c9c refactor: Remove global argument.current_realm. 2018-11-07 10:44:48 -08:00
Steve Howell e1113c7011 refactor: Remove the global arguments.current_message.
The Markdown processor makes it fairly simple for
the helper classes to access the `md` engine.  We
now write `_md_engine.zulip_message` to avoid having
the current message in the global namespace.

Note that we do reuse engines for multiple messages,
but each engine is specific to a realm.  And we therefore
avoid even the theoretical possibility of leaking message
data between realms.
2018-11-07 10:44:48 -08:00
Steve Howell ab24cc2535 minor: Pass in arguments.current_message to helpers. 2018-11-07 10:44:48 -08:00
Steve Howell c26768ea63 bugdown: Import nl2br and tables extensions "normally".
This makes us consistent with how we import codehilite.

Using Python's normal import mechanism avoids some overhead
with Markdown having to parse dotted notation.

These modules are tiny, so they shouldn't impact startup
too much.  Also, by explicitly importing them, we avoid
the pitfall of having a sucessful startup and a broken
renderer.
2018-11-07 10:44:48 -08:00
Steve Howell c8a2081526 bugdown: Break out helper functions for extending bugdown.
These will make profiling a lot easier, and you
can also quickly disable features.  The overhead
of these function calls is dwarfed by other concerns.
2018-11-07 10:44:47 -08:00
Steve Howell ffa4daf936 bugdown: Reduce overhead of building link regexes.
We were building the same link regex every time
we build a Markdown engine, which happens twice
per realm.  It's an expensive operation due to
the complexity of the regex and us reading a file.
2018-11-07 10:33:11 -08:00
Steve Howell 18a76c54de bugdown: Extract build_engine.
This separates out the main job of building
an instance of Markdown from the fairly orthogonal
task of maintaining a list of engines.
2018-11-07 10:33:11 -08:00
Steve Howell dfadbcd3bc bugdown: Avoid ORM when there are no group names.
This change avoids hitting the Django ORM when
we don't find any possible group mentions in
the message content.

Django doesn't necessarily actually hit the database,
but it's still slow and shows up in profiles.
2018-11-07 10:33:11 -08:00
Steve Howell 35e9e5928f render: Upstream calculation of translate_emoticons. 2018-11-07 10:20:14 -08:00
Steve Howell 82b808e620 bugdown: Avoid zephyr-related queries in rendering.
We can rely on `message_realm` being the same
as `message.sender.realm`, which allows us to
skip two queries to the database for the rare
Zephyr mirroring case.
2018-11-07 10:11:06 -08:00
Steve Howell 659c9dde00 bugdown: Avoid unnecessary realm queries.
We now keep realm in the arguments variable,
which avoids some lookups.

We also test settings before even trying to
get realms.
2018-11-07 10:08:46 -08:00
Steve Howell 32232377f7 Rename bugdown.subject_links -> topic_links. 2018-11-07 10:03:53 -08:00
Aditya Bansal 86eeae6faa bugdown: Rename use_thumbnails to already_thumbnailed for clarity. 2018-10-26 16:51:54 -07:00
Aditya Bansal 5d68bd92ad thumbnails: Extract user_uploads_or_external as a function.
This is a prepartory commit for the upcoming changes. It was meaningful
to extract this one out because this function is essentially a condition
check on whether a given url is one of the user_uploads or an external
one. Based on its value we decide whether a url must be thumbnailed or
not and thus this function will also be used in an upcoming commit
patching lib/thumbnail.py to do the same check before thumbnail url
generation.
2018-10-16 16:00:47 -07:00