Commit Graph

140 Commits

Author SHA1 Message Date
Alex Vandiver 8c8dbb3d66 markdown: Stop attempting to expand/collapse re2 regex.
549dd8a4c4 changed the regex that we build to contain whitespace for
readability, and strip that back out before returning it.
Unfortunately, this also serves to strip out whitespace in the source
linkifier, causing it to not match expected strings.

Revert 549dd8a4c4.

Fixes: #27854.
2023-11-28 15:07:23 -08:00
Alex Vandiver 70b20e9d2b markdown: Use \p{White_Space} equivalent for linkifier boundaries.
We do not use \p{White_Space} itself because re2 does not support it.
2023-11-14 20:43:39 -08:00
roanster007 dc492867af user_mention: Fix mentions of deactivated users.
Previously, when a deactivated user was mentioned, he wasn't
rendered as a Pill. This is because the dataset for validating mentions
only included active users, which is fixed by removing that filter.

To allow only silent mentions of them, an extra is_active property
added to FullNameInfo class, which is populated from the query,
which tells if user is deactivated. This is used to convert any
mentions of them to silent mentions in the backend markdown.

Fixes #26857
2023-11-08 09:48:31 -08:00
Sahil Batra e458b73a01 user_groups: Move constants for system group names to a new class.
This commit moves constants for system group names to a new
"SystemGroups" class so that we can use these group names
in multiple classes in models.py without worrying about the
order of defining them.
2023-11-01 10:42:56 -07:00
Anders Kaseorg a50eb2e809 mypy: Enable new error explicit-override.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-10-12 12:28:41 -07:00
Anders Kaseorg 7b4a74cc4d codespell: Fix typos caught by codespell.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-10-09 11:55:15 -07:00
Adrián Oliva 732ad89f3d markdown: Fix URL link topic skipping query.
When searching for links inside a topic name, the question mark (?)
was used to split the topic. If a URL had a query after the URL
(e.g., "?foo=bar"), then the query was trimmed from the URL.

Removing the question mark from `basic_link_splitter` is sufficient
to fix this issue. The `get_web_link_regex` function then removes
the trailing punctuation if any, including literal question marks.

Fixes #26368.
2023-09-08 16:17:11 -07:00
evykassirer 0289beb784 emoji: Match emoji sequences in markdown.
Fixes #11767.

Previously multi-character emoji sequences weren't matched in the
emoji regex, so we'd convert the characters to separate images,
breaking the intended display.

This change allows us to match the full emoji sequence, and
therefore show the correct image.
2023-08-23 16:18:15 -07:00
Sahil Batra 4f30447b95 test_markdown: Set realm for Message objects.
We do not set realm to Message objects defined for markdown tests
and this works because we currently access realm from sender object.

This commit changes the code to set realm in Message objects as
we would be accessing realm from Message object directly in further
commits.
2023-08-23 11:38:32 -07:00
Zixuan James Li 37660dd0e7 linkifier: Support reordering linkifiers.
This adds API support to reorder linkifiers and makes sure that the
returned lists of linkifiers from `GET /events`, `POST /register`, and
`GET /realm/linkifiers` are always sorted with the order that they
should processed when rendering linkifiers.

We set the new `order` field to the ID with the migration. This
preserves the order of the existing linkifiers.

New linkifiers added will always be ordered the last. When reordering,
the `order` field of all linkifiers in the same realm is updated, in
a manner similar to how we implement ordering for
`custom_profile_fields`.
2023-08-14 15:21:48 -07:00
Zixuan James Li 011b4c1f7a populate_db: Populate linkifiers.
The curl examples of reordering linkifiers require there to be some
linkifiers in the database to be reordered. This adjusts some test cases
so they do not assume that there is no linkifier in the test db.
2023-08-14 15:21:48 -07:00
Steve Howell 51db22c86c per-request caches: Add per_request_cache library.
We have historically cached two types of values
on a per-request basis inside of memory:

    * linkifiers
    * display recipients

Both of these caches were hand-written, and they
both actually cache values that are also in memcached,
so the per-request cache essentially only saves us
from a few memcached hits.

I think the linkifier per-request cache is a necessary
evil. It's an important part of message rendering, and
it's not super easy to structure the code to just get
a single value up front and pass it down the stack.

I'm not so sure we even need the display recipient
per-request cache any more, as we are generally pretty
smart now about hydrating recipient data in terms of
how the code is organized. But I haven't done thorough
research on that hypotheseis.

Fortunately, it's not rocket science to just write
a glorified memoize decorator and tie it into key
places in the code:

    * middleware
    * tests (e.g. asserting db counts)
    * queue processors

That's what I did in this commit.

This commit definitely reduces the amount of code
to maintain. I think it also gets us closer to
possibly phasing out this whole technique, but that
effort is beyond the scope of this PR. We could
add some instrumentation to the decorator to see
how often we get a non-trivial number of saved
round trips to memcached.

Note that when we flush linkifiers, we just use
a big hammer and flush the entire per-request
cache for linkifiers, since there is only ever
one realm in the cache.
2023-08-11 11:09:34 -07:00
Steve Howell 730ae61ce5 tests: Improve linkifiers test.
We test at a higher level now.
2023-08-11 11:09:34 -07:00
Anders Kaseorg 562a79ab76 ruff: Fix PERF401 Use a list comprehension to create a transformed list.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-08-07 17:23:55 -07:00
Anders Kaseorg e932e2ce52 ruff: Fix UP032 Use f-string instead of `format` call.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-08-02 15:58:55 -07:00
Lauryn Menard 1cccdd8103 realm-settings: Make default_code_block_language empty string as default.
Updates the realm field `default_code_block_language` to have a default
value of an empty string instead of None. Also updates the web-app to
check for the empty string and not `null` to indicate no default is set.

This means that both new realms and existing realms that have no default
set will have the same value for this setting: an empty string.

Previously, new realms would have None if no default was set, while realms
that had set and then unset a value for this field would have an empty
string when no default was set.
2023-07-21 18:54:02 +02:00
Prakhar Pratyush 0891f9f65a mention: Determine @topic mention during message rendering.
This commit adds a boolean field `mentions_topic_wildcard`
to the `MessageRenderingResult` dataclass.

The field is set to true only if message rendering determines
the message has an actual topic wildcard mention in it (and not,
e.g., topic wildcard mention syntax inside a code block).

The rendered content for topic wildcard mention is
'<span class="topic-mention">{wildcard}</span>'.

The 'topic-mention' class is the identifier for the wildcard
mention being a topic wildcard mention.

We don't use 'data-user-id="*"' and "user-mention" class for
topic wildcard mentions and eventually plan to remove them for
stream wildcard mentions too in a separate mini-project.
2023-07-13 11:34:48 -07:00
Prakhar Pratyush 806d8f2dc7 test_markdown: Merge similar tests into a single test case.
This prep commit merges separate tests for '**@all**',
'**@stream**' and '**@everyone**' stream wildcard mentions
into a single test named 'test_mention_stream_wildcard'.

Similarly, it merges separate tests for '@all', '@stream',
and '@everyone' stream wildcard mentions into a single test
named 'test_mention_at_stream_wildcard'.

The aim is to finally have two separate tests for stream and
topic wildcard mentions (when we introduce topic wildcards)
instead of having separate tests for each mention text
(i.e. all, everyone, stream, topic).
2023-07-13 11:34:48 -07:00
Prakhar Pratyush 1df63ed448 mention: Add 'has_topic_wildcards' to 'MentionData'.
This commit adds a 'has_topic_wildcards' instance variable
to the 'MentionData' class for the detection of
- possible topic wildcards mentions.

Fixes part of #22829.

Co-authored-by: Prakhar Pratyush <prakhar841301@gmail.com>
Co-authored-by: orientor <aditya.verma@students.iiit.ac.in>
2023-07-13 11:34:48 -07:00
Prakhar Pratyush 2b42df4ef1 mention: Replace 'wildcard' with 'stream_wildcard'.
This is a prep commit to replace 'wildcard' with 'stream_wildcard'.

This wasn't included in 179d5cb because we didn't decide to
use a different rendered_content for topic wildcard mention,
i.e., ''<span class="user-mention topic-mention">{wildcard}</span>'.

Our intention was not to create separate tests for both stream
and topic wildcard mentions, as they were expected to have the
same rendered content format.
2023-07-13 11:34:48 -07:00
Lauryn Menard d84fd73db4 markdown-processor: Update insertion_index check for multiple classes.
Updates find_proper_insertion_index to check for the inline image
classes as matching at least one of the classes in the element's
attrib["class"] so that cases where an inline preview image has
multiple classes, like YouTube video previews, will have the
correct insertion index.

Fixes #26186.
2023-07-07 11:07:45 -04:00
Alex Vandiver ff53ee8e28 markdown: Only attempt to adjust /wiki/File: paths on Wikipedia. 2023-07-06 17:50:25 -07:00
Prakhar Pratyush 179d5cb37d mention: Replace 'wildcards' with 'stream_wildcards'.
This prep commit replaces the 'wildcard' keyword in the codebase
with 'stream_wildcard' at some places for better readability, as
we plan to introduce 'topic_wildcards' as a part of the
'@topic mention' project.

Currently, 'wildcards = ["all", "everyone", "stream"]' which is an
alias to mention everyone in the stream, hence better renamed as
'stream_wildcards'.

Eventually, we will have:
'stream_wildcard' as an alias to mention everyone in the stream.
'topic_wildcard' as an alias to mention everyone in the topic.
'wildcard' refers to 'stream_wildcard' and 'topic_wildcard' as a whole.
2023-07-03 22:03:17 -07:00
Alex Vandiver 76d7a5a53a dev_settings: Remove `THUMBNAIL_IMAGES` from test_extra_settings.
THUMBNAIL_IMAGES was previously set to true as there were tests on a new
thumbnail functionality. The feature was never stable enough to remain in
the codebase and the setting was left enabled. This setting also doesn't
reflect how the production deployments are and it has been decided that we
should drop setting from test_extra_settings altogether.

Co-authored-by: Joseph Ho <josephho678@gmail.com>
2023-06-12 16:26:55 -07:00
Prakhar Pratyush 79e5d32ef6 mention: Refactor 'possible_mentions' to return a dataclass.
This prep commit refactors 'possible_mentions' to
return a dataclass instead of a tuple for better readability.
2023-06-07 16:55:31 -07:00
Tim Abbott dce4a3c98e markdown: Remove most of Twitter integration.
Twitter removed their v1 API. We take care to keep the existing cached
results around for now, and to not poison that cache, since we might
be able replace this with something that can still use the existing
cache.
2023-05-29 10:43:35 -07:00
Zixuan James Li 268f858f39 linkifier: Support URL templates for linkifiers.
This swaps out url_format_string from all of our APIs and replaces it
with url_template. Note that the documentation changes in the following
commits  will be squashed with this commit.

We change the "url_format" key to "url_template" for the
realm_linkifiers events in event_schema, along with updating
LinkifierDict. "url_template" is the name chosen to normalize
mixed usages of "url_format_string" and "url_format" throughout
the backend.

The markdown processor is updated to stop handling the format string
interpolation and delegate the task template expansion to the uri_template
library instead.

This change affects many test cases. We mostly just replace "%(name)s"
with "{name}", "url_format_string" with "url_template" to make sure that
they still pass. There are some test cases dedicated for testing "%"
escaping, which aren't relevant anymore and are subject to removal.
But for now we keep most of them as-is, and make sure that "%" is always
escaped since we do not use it for variable substitution any more.

Since url_format_string is not populated anymore, a migration is created
to remove this field entirely, and make url_template non-nullable since
we will always populate it. Note that it is possible to have
url_template being null after migration 0422 and before 0424, but
in practice, url_template will not be None after backfilling and the
backend now is always setting url_template.

With the removal of url_format_string, RealmFilter model will now be cleaned
with URL template checks, and the old checks for escapes are removed.

We also modified RealmFilter.clean to skip the validation when the
url_template is invalid. This avoids raising mulitple ValidationError's
when calling full_clean on a linkifier. But we might eventually want to
have a more centric approach to data validation instead of having
the same validation in both the clean method and the validator.

Fixes #23124.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2023-04-19 12:20:49 -07:00
AcKindle3 b0ef8f0822 test: Replace occurences of `uri` with `url`.
In all the tests files, replaced all occurences of `uri` with `url`
appeared in comments, local variablles, function names and their callers.
2023-04-08 16:27:55 -07:00
Zixuan James Li e331c356e4 user_groups: Use check_add_user_group instead in test cases.
"check_add_user_group" is a safer helper function than
"create_user_group" to use when creating user_groups. It does
error handling and notify the client with the appropriate event.

Note that the populate_db command still uses "create_user_group"
because we do not need to enqueue events at that point.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2023-03-27 09:05:00 -07:00
Zixuan James Li 0f5d6432a4 user_groups: Move create_user_group to zerver.actions.user_groups.
Since this function creates a new user group into the database,
it is more appropriate to have it not as a generic "lib" function
but as an "action".

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2023-03-27 09:05:00 -07:00
Anders Kaseorg 2d9b2a2a05 models: Remove type prefixes from __str__ values.
The Django convention is for __repr__ to include the type and __str__
to omit it.  In fact its default __repr__ implementation for models
automatically adds a type prefix to __str__, which has resulted in the
type being duplicated:

    >>> UserProfile.objects.first()
    <UserProfile: <UserProfile: emailgateway@zulip.com <Realm: zulipinternal 1>>>

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-03-08 22:56:55 -08:00
Anders Kaseorg cea1119423 node_tests: Move to web/tests.
This lets us simplify the long-ish ‘../../static/js’ paths, and will
remove the need for the ‘zrequire’ wrapper.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-23 16:04:17 -08:00
Anders Kaseorg 0a1904a6a7 markdown: Rewrite YouTube URL parser without regex spaghetti.
This also adds support for the new YouTube Shorts URLs.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-09 22:34:51 -08:00
Anders Kaseorg df001db1a9 black: Reformat with Black 23.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.

(This does not actually upgrade our Python environment to Black 23
yet.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-02 10:40:13 -08:00
Anders Kaseorg cb8c7f2a17 ruff: Fix UP032 Use f-string instead of `format` call.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-26 10:16:30 -08:00
Trident Pancake c6ea673cc9 markdown: Update max inline preview from 10 to 24.
The max inline preview limit was previously increased to 10 by #20789.
However, as issue #23624 shows, it's still causing confusion for users
when they include more than 10 links.

Bump this limit up to 24, which is a multiple of the 4 image preview
per line logic.
2023-01-18 14:58:00 -05:00
Anders Kaseorg 17300f196c ruff: Fix ISC003 Explicitly concatenated string.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-04 16:25:07 -08:00
Anders Kaseorg 2c5e114f8b ruff: Fix ISC001 Implicitly concatenated string literals on one line.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-04 16:25:07 -08:00
Zixuan James Li a3a0103d86 markdown: Calculate linkifier precedence in topics.
This uses the linkifier index among the list of linkifiers in the
replacement as the priority to order the replacement order for
patterns in the topic. This avoids having multiple overlapping matches
that each produce a link.

The linkifier with the lowest id will be prioritized when its pattern
overlaps with another. Linkifiers are prioritized over raw URLs.

Note that the same algorithm is used for local echoing and the
backend markdown processor.

Fixes #23715.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-12-13 15:16:20 -08:00
Zixuan James Li 5f4d857d3c linkifier: Order linkifiers by id on query.
This explicitly enforces ordering on the linkifiers. This is useful when
there are overlapping linkifier patterns that matches the same text. In
our current linkifier implementation, this order affects how the
patterns are handled in the markdown processor, with the earlier ones
being prioritized.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-12-13 15:16:20 -08:00
Zixuan James Li 4602c34108 markdown: Correctly retrieve indices for repeated matches.
The same pattern being matched multiple times in a topic cannot be
properly ordered using topic_name.find(match_text) and etc. when there
are multiple matches of the same pattern in the topic.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-12-13 15:16:20 -08:00
Zixuan James Li b3aba796f1 user_groups: Track acting user for user group creation.
This is a prep-commit for populating RealmAuditLogs for changes made to
UserGroup.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-12-13 14:58:58 -08:00
Anders Kaseorg 73c4da7974 ruff: Fix N818 exception name should be named with an Error suffix.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-11-17 16:52:00 -08:00
Anders Kaseorg 3d853caf16 ruff: Fix C417 Unnecessary `map` usage.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-11-03 12:10:15 -07:00
Lauryn Menard 98074951ef api-docs: Update examples of queue_id for uuid format. 2022-10-13 10:08:42 -07:00
Alex Vandiver 5d42a0cb00 linkifiers: Support %20 in URLs for topic links.
9381a3bd45 added support for linkifier pattern URLs containing
`%20`-style escapes, but only did so for the codepath which is used in
the message body -- topic links did not understand them.

Expand the support to include when they are substituted into topics.
2022-10-11 14:31:13 -07:00
Anders Kaseorg 1385a827c2 python: Clean up getattr, setattr, delattr calls with literal names.
These were useful as a transitional workaround to ignore type errors
that only show up with django-stubs, while avoiding errors about
unused type: ignore comments without django-stubs.  Now that the
django-stubs transition is complete, switch to type: ignore comments
so that mypy will tell us if they become unnecessary.  Many already
have.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-10-10 08:40:28 -07:00
Anders Kaseorg fcd81a8473 python: Replace avoidable uses of __special__ attributes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-10-10 08:32:29 -07:00
Anders Kaseorg 4a61e36def CVE-2022-36048: Rewrite only specific local links to relative.
Due to mismatches between the URL parsers in Python and browsers, it
was possible to hoodwink rewrite_local_links_to_relative into
generating links that browsers would interpret as absolute.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-08-24 16:29:09 -07:00
Sahil Batra 9a94d2b762 user_groups: Add MODERATORS_GROUP_NAME constant.
We now use MODERATORS_GROUP_NAME instead of writing
the actual group name at multiple places, so that we
can have all the group names coded at one place only.
2022-08-11 04:38:36 -07:00