This commit adds a boolean field `mentions_topic_wildcard`
to the `MessageRenderingResult` dataclass.
The field is set to true only if message rendering determines
the message has an actual topic wildcard mention in it (and not,
e.g., topic wildcard mention syntax inside a code block).
The rendered content for topic wildcard mention is
'<span class="topic-mention">{wildcard}</span>'.
The 'topic-mention' class is the identifier for the wildcard
mention being a topic wildcard mention.
We don't use 'data-user-id="*"' and "user-mention" class for
topic wildcard mentions and eventually plan to remove them for
stream wildcard mentions too in a separate mini-project.
This prep commit merges separate tests for '**@all**',
'**@stream**' and '**@everyone**' stream wildcard mentions
into a single test named 'test_mention_stream_wildcard'.
Similarly, it merges separate tests for '@all', '@stream',
and '@everyone' stream wildcard mentions into a single test
named 'test_mention_at_stream_wildcard'.
The aim is to finally have two separate tests for stream and
topic wildcard mentions (when we introduce topic wildcards)
instead of having separate tests for each mention text
(i.e. all, everyone, stream, topic).
This commit adds a 'has_topic_wildcards' instance variable
to the 'MentionData' class for the detection of
- possible topic wildcards mentions.
Fixes part of #22829.
Co-authored-by: Prakhar Pratyush <prakhar841301@gmail.com>
Co-authored-by: orientor <aditya.verma@students.iiit.ac.in>
This is a prep commit to replace 'wildcard' with 'stream_wildcard'.
This wasn't included in 179d5cb because we didn't decide to
use a different rendered_content for topic wildcard mention,
i.e., ''<span class="user-mention topic-mention">{wildcard}</span>'.
Our intention was not to create separate tests for both stream
and topic wildcard mentions, as they were expected to have the
same rendered content format.
Updates find_proper_insertion_index to check for the inline image
classes as matching at least one of the classes in the element's
attrib["class"] so that cases where an inline preview image has
multiple classes, like YouTube video previews, will have the
correct insertion index.
Fixes#26186.
This prep commit replaces the 'wildcard' keyword in the codebase
with 'stream_wildcard' at some places for better readability, as
we plan to introduce 'topic_wildcards' as a part of the
'@topic mention' project.
Currently, 'wildcards = ["all", "everyone", "stream"]' which is an
alias to mention everyone in the stream, hence better renamed as
'stream_wildcards'.
Eventually, we will have:
'stream_wildcard' as an alias to mention everyone in the stream.
'topic_wildcard' as an alias to mention everyone in the topic.
'wildcard' refers to 'stream_wildcard' and 'topic_wildcard' as a whole.
THUMBNAIL_IMAGES was previously set to true as there were tests on a new
thumbnail functionality. The feature was never stable enough to remain in
the codebase and the setting was left enabled. This setting also doesn't
reflect how the production deployments are and it has been decided that we
should drop setting from test_extra_settings altogether.
Co-authored-by: Joseph Ho <josephho678@gmail.com>
Twitter removed their v1 API. We take care to keep the existing cached
results around for now, and to not poison that cache, since we might
be able replace this with something that can still use the existing
cache.
This swaps out url_format_string from all of our APIs and replaces it
with url_template. Note that the documentation changes in the following
commits will be squashed with this commit.
We change the "url_format" key to "url_template" for the
realm_linkifiers events in event_schema, along with updating
LinkifierDict. "url_template" is the name chosen to normalize
mixed usages of "url_format_string" and "url_format" throughout
the backend.
The markdown processor is updated to stop handling the format string
interpolation and delegate the task template expansion to the uri_template
library instead.
This change affects many test cases. We mostly just replace "%(name)s"
with "{name}", "url_format_string" with "url_template" to make sure that
they still pass. There are some test cases dedicated for testing "%"
escaping, which aren't relevant anymore and are subject to removal.
But for now we keep most of them as-is, and make sure that "%" is always
escaped since we do not use it for variable substitution any more.
Since url_format_string is not populated anymore, a migration is created
to remove this field entirely, and make url_template non-nullable since
we will always populate it. Note that it is possible to have
url_template being null after migration 0422 and before 0424, but
in practice, url_template will not be None after backfilling and the
backend now is always setting url_template.
With the removal of url_format_string, RealmFilter model will now be cleaned
with URL template checks, and the old checks for escapes are removed.
We also modified RealmFilter.clean to skip the validation when the
url_template is invalid. This avoids raising mulitple ValidationError's
when calling full_clean on a linkifier. But we might eventually want to
have a more centric approach to data validation instead of having
the same validation in both the clean method and the validator.
Fixes#23124.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
"check_add_user_group" is a safer helper function than
"create_user_group" to use when creating user_groups. It does
error handling and notify the client with the appropriate event.
Note that the populate_db command still uses "create_user_group"
because we do not need to enqueue events at that point.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Since this function creates a new user group into the database,
it is more appropriate to have it not as a generic "lib" function
but as an "action".
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
The Django convention is for __repr__ to include the type and __str__
to omit it. In fact its default __repr__ implementation for models
automatically adds a type prefix to __str__, which has resulted in the
type being duplicated:
>>> UserProfile.objects.first()
<UserProfile: <UserProfile: emailgateway@zulip.com <Realm: zulipinternal 1>>>
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This lets us simplify the long-ish ‘../../static/js’ paths, and will
remove the need for the ‘zrequire’ wrapper.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The max inline preview limit was previously increased to 10 by #20789.
However, as issue #23624 shows, it's still causing confusion for users
when they include more than 10 links.
Bump this limit up to 24, which is a multiple of the 4 image preview
per line logic.
This uses the linkifier index among the list of linkifiers in the
replacement as the priority to order the replacement order for
patterns in the topic. This avoids having multiple overlapping matches
that each produce a link.
The linkifier with the lowest id will be prioritized when its pattern
overlaps with another. Linkifiers are prioritized over raw URLs.
Note that the same algorithm is used for local echoing and the
backend markdown processor.
Fixes#23715.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This explicitly enforces ordering on the linkifiers. This is useful when
there are overlapping linkifier patterns that matches the same text. In
our current linkifier implementation, this order affects how the
patterns are handled in the markdown processor, with the earlier ones
being prioritized.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
The same pattern being matched multiple times in a topic cannot be
properly ordered using topic_name.find(match_text) and etc. when there
are multiple matches of the same pattern in the topic.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
9381a3bd45 added support for linkifier pattern URLs containing
`%20`-style escapes, but only did so for the codepath which is used in
the message body -- topic links did not understand them.
Expand the support to include when they are substituted into topics.
These were useful as a transitional workaround to ignore type errors
that only show up with django-stubs, while avoiding errors about
unused type: ignore comments without django-stubs. Now that the
django-stubs transition is complete, switch to type: ignore comments
so that mypy will tell us if they become unnecessary. Many already
have.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Due to mismatches between the URL parsers in Python and browsers, it
was possible to hoodwink rewrite_local_links_to_relative into
generating links that browsers would interpret as absolute.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
We now use MODERATORS_GROUP_NAME instead of writing
the actual group name at multiple places, so that we
can have all the group names coded at one place only.
Now the following characters are allowed before @-mentions and stream
references (starting with #) for proper rendering - {, [, /.
This commit makes the markdown rendering consistent with autocomplete
(anything that is autocompleted is also rendered properly).
This commit removes the instances of using "Stream.objects.create"
in tests with make_stream function. This change will help us to
avoid adding code for things to be done after creating streams in
multiple places. We can instead just add it in make_stream function
only.
The `get_link_embed_data` / `link_embed_data_from_cache` pair as
introduced in c93f1d4eda uses the cache
as a temporary store inside of the `embed_links` worker; this means
that it must be durable storage, or the worker will stall and re-fetch
the same links to preview them.
Switch to plumbing through the fetched URL embed data as an parameter
to the Markdown evaluation which uses them, rather than using the
cache as an intermediary. This frees up the cache to be merely a
non-durable cache.
As a side-effect, this removes get_cache_with_key, and
link_embed_data_from_cache which was its only callsite.
`prepare_linkifier_pattern`, as of db934be064, adds a match to the
end of the regex, of either the end of string, or a non-word character
-- this is in place of a negative look-ahead, which is no longer
possible in re2. This causes the regex to consume trailing
whitespace, and thus not be able to match twice in succession with
`pattern.finditer` -- "#1234#5678" fails to match because the space
is consumed by the first match of the regex.
Rather than use `pattern.finditer`, write own own version, which
rewinds over the non-word character consumed after the match, if any.
This allows the same "after" non-word character to also satisfy the
"before" of the next match.
Fixes#21502.
Various backend tests use the `PATCH /messages/{msg_id}` endpoint.
For that endpoint, the message ID is encoded in the URL path and
ignored if provided as a parameter in the the query.
Verified that the tests were providing the same message ID to both
the path and then removed the ignored parameter in the query.
This commit ensures that all markdown fixtures have unique
test names by rewriting the names of some of them and adding
a test in `test_markdown.py`.
Earlier this was over-writing the value for same keys in
`load_markdown_tests` in `test_markdown.py`.
Supporting URL percent-encoded bytes is possible using `%%20`, but this
is not necessarily very understandable to end-users, even those that
understand percent encoding.
Allow `%20` in linkifier URL format strings, and transform them into
`%%20` in the pattern just before they are applied in markdown
translation. Care must be taken here, such that already-escaped `%`s
are not escaped an extra time.
We do this before rendering, and not before storage, as
a simplification; the JS-side linkifier at present only understands
`%(foo)s` and thus needs no changes, and to avoid an un-escaping pass
before showing in the admin UI.
Zulip attempts to validate that the regular expressions that admins
enter for linkifiers are well-formatted, and only contain a specific
subset of regex grammar. The process of checking these
properties (via a regex!) can cause denial-of-service via
backtracking.
Furthermore, this validation itself does not prevent the creation of
linkifiers which themselves cause denial-of-service when they are
executed. As the validator accepts literally anything inside of a
`(?P<word>...)` block, any quadratic backtracking expression can be
hidden therein.
Switch user-provided linkifier patterns to be matched in the Markdown
processor by the `re2` library, which is guaranteed constant-time.
This somewhat limits the possible features of the regular
expression (notably, look-head and -behind, and back-references);
however, these features had never been advertised as working in the
context of linkifiers.
A migration removes any existing linkifiers which would not function
under re2, after printing them for posterity during the upgrade; they
are unlikely to be common, and are impossible to fix automatically.
The denial-of-service in the linkifier validator was discovered by
@erik-krogh and @yoff, as GHSL-2021-118.
We do not allow mentioning system user groups for now
because this can lead to circumventing the wildcard
mention restrictions. It will be enabled once we add
a setting to control that.
This is implemented by just ignoring it as one of the
mentioned user group even if the message content
inlcudes the mention syntax for it and the message
is sent normally.
We still keep the for_mention parameter for accessing
user group while sending email and push notifications
as mentioning system user groups will be allowed in
future.
This commit also removes the test for email notifications
for system user groups as we are not allowing mentioning
them.
This commit is only for backend change as we already
exclude the system groups from mention typeaheads and
other UI.
This adds a new class called MessageRenderingResult to contain the
additional properties we added to the Message object (like alert_words)
as well as the rendered content to ensure typesafe reference. No
behavioral change is made except changes in typing.
This is a preparatory change for adding django-stubs to the backend.
Related: #18777
This should help with #17425, where messages with lots of LaTeX are
lost, due to the large expansion factor.
This isn't a total fix for this - large messages with lots of LaTeX
can still end up larger than 1MB, and rendering could timeout, but
this fix should help significantly.
1MB is still small enough that I don't expect we'll run into any DOS
problems - my testing didn't show any problems rendering messages that
contain ~1MB of LaTeX.
This will offer users who are self-hosting to adjust
this value. Moreover, this will help to reduce the
overall time taken to test `test_markdown.py` (since
this can be now overridden with `override_settings`
Django decorator).
This is done as a prep commit for #18641.
?dl=1 causes Dropbox to send Content-Type: application/binary, which
can’t be interpreted by Camo. Use ?raw=1 instead.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Prior to this, we only supported direct mention to
the user groups. This commit extends that support
to silent mention for the user groups.
A related test case is also added.
Fixes: #11711.
A message containing wildcard mention when quoted (which
is turned into a silent mention) or message with silent
wildcard mention notifies the users by sending desktop,
sound, and missed message email notifications. This
is clearly a bug which is fixed by this commit.
Fixes: #18354.
Requesting external images is a privacy risk, so route all external
images through Camo.
Tweaked by tabbott for better test coverage, more comments, and to fix
bugs.
Changed the name of the test-user cordelia from `Cordelia Lear` to
`Cordelia, Lear's daughter`.
This change will enable us to test users with escape characters in
their names.
I also updated the Node, Puppeteer, Backend tests and Fixtures to
support this change.
This logic likely never ran due to a combination of bugs.
* Running `maybe_update_markdown_engines` unconditionally meant that
`if md_engine_key in md_engines` was likely always true.
* Introduced in 65838bb: DEFAULT_MARKDOWN_KEY could never be in
md_engines, so should we have ever reached that code path, we'd have
tried to rebuild all markdown engines every time.
And it also wasn't clearly helpful -- because we fetch all linkifiers
for a realm on every request anyway, we don't really save database
queries by doing a bulk fetch on startup, and doing so would likely
result in a material regression to Zulip's overall startup time that
we were creating markdown engines for large numbers of realms in bulk
during process startup.
This adds the is_user_active with the appropriate code for setting the
value correctly in the future. In the following commit a migration to
backfill the value for existing Subscriptions will be added.
To ensure correct user_profile.is_active handling also in tests, we
replace all direct .is_active mutation with calls to appropriate
functions.
Backend logic for handling user mention was cluttered
because it was handled at two stages first in
get_possible_mentions_info while fetching mention data
based on the messsage and then later in UserMentionPattern
which handles processing of text for mention.
Ideally UserMentionPattern should depend on
get_possible_mentions_info only for data but there was a
shared logic between these two that made it hard to debug
any possible bugs.
Updates in this commit make both of these functions
coherent in terms of logic and also add appropiate
comments to improve readability of these functions.
There was also a hidden bug that if a user A is
mentioned in with @**name|id** then @**invalid|id**
again mentioned A because of the way we handled mentions
earlier. It is solved as a result of this refactor and
appropiate test has been added for this.
This has been tested manually as well as by adding new
test to address missing case.
Extend our markdown system to support mentioning of users
by id also. Following these changes, it would be possible
to mention users with @**|user_id** and silently mention
using @_**|user_id**.
Main intention for extending the mention syntax is to make
it convenient for bots to mention a users using their ids. It
is to be noted that previous syntax are also supported.
Documentation tweaked by tabbott for better readability.
The changes were tested manually in development server, and also
by adding some new backend and frontend tests.
Fixes: #17487.
Modifies `StreamPattern` and `StreamTopicPattern` to inherit
from InlineProcessor instead of Pattern. This change is done
because Pattern stopped checking for matching patterns as soon
as it found a match which was not a valid stream. Due to this
all the subsequent mention failed, even if they were valid.
This bug was only present in backend renderring due to
markdown.inlinepatterns.Pattern.
Due to above changes verbose_compile is no longer used for
precompiling STREAM_LINK_REGEX, STREAM_TOPIC_LINK_REGEX as
adds ^(.*?) and (.*?)$ which cause extra overhead of matching
pattern which is not required. With new InlineProcessor these
extra patterns at beggining and end are not required.
So, StreamPattern and StreamTopicPattern now define their own
__init__ method for precompiling the regex.
Fixes#17535.
These changes were tested locally in dev server and by adding
some new markdown tests to test these.
Modifies `UserGroupMentionPattern` to inherit from InlineProcessor
instead of Pattern. This change is done because Pattern
stopped checking for matching patterns as soon as it found
a match which was not a valid user group. Due to this all
the subsequent user group mention failed, even if they were
valid. This bug was only present in backend renderring due to
markdown.inlinepatterns.Pattern.
This was reported as issue #17535.
These changes were tested locally in dev server and by adding
some new markdown tests to test these.
Modifies `UserMentionPattern` to inherit from InlineProcessor
instead of Pattern. This change is done because Pattern
stopped checking for matching patterns as soon as it found
a match which was not a valid user. Due to this all the
subsequent user mention failed. This bug was only present in
backend renderring due to markdown.inlinepatterns.Pattern.
This was reported as issue #17535.
These changes were tested locally in dev server and by adding
some new markdown tests to test these.
Initally, when writing two or more quotes, having
a blank line in between them, merges those quotes.
This created confusion especially in "quote and reply".
This commit fixes such issues. Now two or more quotes
having a blank line in between them, will not get merged.
This change is correct both for usability and for improving our
compatibility with CommonMark.
Fixes#14379.
Upstream has slightly changed the whitespace around stashes. Take
this opportunity to clean up the extra blank lines we were outputting.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
If multiple filters match the same string, we run into an infinite
loop of converting string into urls. To fix it, we mark the matched
string as atomic after first conversion.
When converting fenced code markdown, we add the language (if specified)
in a data-attribute by tweaking the HTML generated. Doing so, allows the
frontend to make use of this attr to display view-in-playground option
for codeblocks.
We use pygments to get the lexer subclass name and use that instead of
directly using the language in the data-attribute. Doing so, helps us
map different language aliases (like `js` and `javascript`) into a common
variable (like `JavaScript`) - and avoids the client from dealing with
multiple tags corresponding to the same language.
The html structure for a message like this:
``` js
..content..
```
would now be:
<div class="codehilite" data-codehilite-language="JavaScript">
<pre>..content..</pre>
</div>
Tests and fixtures amended.