Since we are building our parser from scratch now:
1. We have control over which proccessor goes at what priority number.
Thus, we have also shifted the deprecated `.add()` calls to use the
new `.register()` calls with explicit priorities, but maintaining
the original order that the old method generated.
2. We do not have to remove the processors added by py-markdown that
we do not use in Zulip; we explicitly add only the processors we
do require.
3. We can cluster the building of each type of parser in one place,
and in the order they need to be so that when we register them,
there is no need to sort the list. This also makes for a huge
improvement in the readability of the code, as all the components
of each type are registered in the same function.
These are significant performance improvements, because we save on
calls to `str.startswith` in `.add()`, all the resources taken to
generate the default to-be-removed processors and the time taken to
sort the list of processors.
Following are the profiling results for the changes made. Here, we
build 10 engines one after the other and note the time taken to build
each of them. 1st pass represents the state after this commit and 2nd
pass represent the state after some regex modifications in the commits
that follow by Steve Howell. All times are in microseconds.
| nth Engine | Old Time | 1st Pass | 2nd Pass |
| ---------- | -------- | -------- | -------- |
| 1 | 92117.0 | 81775.0 | 76710.0 |
| 2 | 1254.0 | 558.0 | 341.0 |
| 3 | 1170.0 | 472.0 | 305.0 |
| 4 | 1155.0 | 519.0 | 301.0 |
| 5 | 1170.0 | 546.0 | 326.0 |
| 6 | 1271.0 | 609.0 | 416.0 |
| 7 | 1125.0 | 459.0 | 299.0 |
| 8 | 1146.0 | 476.0 | 390.0 |
| 9 | 1274.0 | 446.0 | 301.0 |
| 10 | 1135.0 | 451.0 | 297.0 |
This is a major upgrade, and requires some significant compatibility
work:
* Migrating the pattern-removal logic to use the Registry feature.
* Handling the removal of positional arguments in markdown extensions.
* Handling the removal of safe mode.
Nested classes are kind of expensive in Python,
particularly when you throw in mypy annotations.
Also, flatter is arguably better, although it is
kind of a pain here not to have closures.
The changes that required us to fork this extension had been merged
into upstream CodeHilite, so we can remove it and switch to using the
version that comes with python-markdown.
This updates Bugdown to reflect the changes in the updated
markdown. In particular, we now pass a default config object in the
__init__ for the Bugdown extension, update the make_md_engine function
to take kwargs as opposed to a config list, and have UListProcessor
inherit from ulist as opposed to olist (which no longer works).
We update the (forked from upstream) fenced_code extension's
makeExtension to take args and kwargs, and update
FencedBlockPreprocessor __init__ method with updated Codehilite
arguments.
We update the (forked from upstream) Codehilite extension to
mirror the logic with the latest upstream Codehilite:
Add parse_hl_lines function
update makeExtension to take args and kwarfs instead of config
list
Add regex for highlight lines
use linenums instead of linenos
use get_formatter_by_name instead of HtmlFormatter
user get_lexer_by_name instead of TextLexer
add hl_lines and use_pygments arguments to the codehlite
constructor
Add a class 'BaseHandler' and make it a base class of OuterHandler,
QuoteHandler and CodeHandler. This will help annotate some functions
and improve type checking.
See #2357. We now support `~~~ .py ` with that trailing space.
Note that the test coverage is Python-side only due to
bugdown_matches_marked being set to false, since we don't yet
support language syntax on the client side.
(imported from commit ccd5fcb0eee01478d349161400103480678d7486)
Now we can nest fenced code/quote blocks inside of quote
blocks down to arbitrary depths. Code blocks are always leafs.
Fenced blocks start with at least three tildes or backticks,
and the clump of punctuation then becomes the terminator for
the block. If the user ends their message without terminators,
all blocks are automatically closed.
When inside a quote block, you can start another fenced block
with any header that doesn't match the end-string of the outer
block. (If you don't want to specify a language, then you
can change the number of backticks/tildes to avoid amiguity.)
Most of the heavy lifting happens in FencedBlockPreprocessor.run().
The parser works by pushing handlers on to a stack and popping
them off when the ends of blocks are encountered. Parents communicate
with their children by passing in a simple Python list of strings
for the child to append to. Handlers also maintain their own
lists for their own content, and when their done() method is called,
they render their data as needed.
The handlers are objects returned by functions, and the handler
functions close on variables push, pop, and processor. The closure
style here makes the handlers pretty tightly coupled to the outer
run() method. If we wanted to move to a class-based style, the
tradeoff would be that the class instances would have to marshall
push/pop/processor etc., but we could test the components more
easily in isolation.
Dealing with blank lines is very fiddly inside of bugdown.
The new functionality here is captured in the test
BugdownTest.test_complexly_nested_quote().
(imported from commit 53886c8de74bdf2bbd3cef8be9de25f05bddb93c)
Trac #1162
The process_fence method replaces code blocks with placeholders, so
indexes stored before the replacement are incorrect. However, because
the closed code blocks have been replaced, we can simply search the
whole string for any remaining opening code block markers.
(imported from commit 6a9e6924840f8f3ca5175da7c52a905e27c1fabd)
It is triggered by specifying the "language" of a code block to
"quote" or "quoted":
Hamlet said:
~~~ quote
To be or **not** to be.
That is the question
~~~
(imported from commit 847a0602e335e9f2955e32d9955adf8ac8de068c)
This needs to be deployed to both staging and prod at the same
off-peak time (and the schema migration run).
At the time it is deployed, we need to make a few changes directly in
the database:
(1) UPDATE django_content_type set app_label='zerver' where app_label='zephyr';
(2) UPDATE south_migrationhistory set app_name='zerver' where app_name='zephyr';
(imported from commit eb3fd719571740189514ef0b884738cb30df1320)