zulip

Commit Graph

Author	SHA1	Message	Date
Anders Kaseorg	1629d6bfb3	python: Reformat with Black 22 (stable). Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-18 18:03:13 -08:00
Anders Kaseorg	df304c40da	markdown: Use built-in hex formatting for unicode_emoji_to_codepoint. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-03 11:00:04 -08:00
Puneeth Chaganti	d55c137277	emoji: Add yellow_large_square and green_large_square emojis. Wordle has recently become a thing and it uses green, yellow and white (or black in dark mode) large square unicode characters to let people share their gameplay. Zulip converts the white and black large square unicode characters to emojis, but not the green and yellow ones. This causes the Wordle grid to be misaligned when shared on Zulip. This commit adds green and yellow large square emojis to our emoji list to fix the problem.	2022-02-02 16:26:31 -08:00
Puneeth Chaganti	6beb84b553	emoji: Use str.rjust to pad codepoint strings instead of a loop.	2022-02-02 16:26:30 -08:00
Puneeth Chaganti	0eeb74b3c2	emoji: Fix minor typo in unicode_emoji_to_codepoint comment.	2022-02-02 16:26:28 -08:00
Alex Vandiver	19f891968d	markdown: Increase the maximum number of image previews per message. The limit here is purely to prevent breakage in case of a pathological number of images in a single message; 5 images is entirely possible in a reasonable message, and causes user confusion when they are not expended. Increase the limit to 10 per message.	2022-01-14 11:30:07 -08:00
Steve Howell	4adcaf92f7	refactor: Attach get_stream_name_map to MentionData. This diff looks slightly noisy, but the main chunk of code that we moved here has the same logic as before, and it just gets realm_id from MentionBackend now, instead of having our markdown processor have to supply it. We basically want MentionData to be the gatekeeper of mention data, and then we delegate backend tasks to MentionBackend. Soon we will add a cache to MentionBacked, which will justify this change a bit more.	2021-12-30 11:28:15 -08:00
Steve Howell	c6448263c3	refactor: Add MentionBackend. We will eventually use this to avoid redundant queries. The diff is slightly noisy here, but there are no logic changes.	2021-12-30 11:28:15 -08:00
Steve Howell	ea252ab53e	refactor: Convert FullNameInfo to a dataclass. As part of this we no longer query for email, which is a vestige of when we used emails to identify users on the frontend.	2021-12-30 11:28:15 -08:00
Steve Howell	f5fc348786	mypy: Add explicit types for dbdata references. When our handlers specifically reference self.md.zulip_db_data, we now use an explicit type. We probably want a more robust solution here, such as a semgrep rule.	2021-12-30 11:28:15 -08:00
Steve Howell	df84892aad	markdown: Convert DbData to a dataclass.	2021-12-30 11:28:15 -08:00
Steve Howell	4e551f8279	refactor: Introduce get_stream_name_map. We only need a name -> id map, and the FullNameInfo type was a lie.	2021-12-30 11:28:15 -08:00
Steve Howell	c04a8097f3	mypy: Add EmojiInfo type. We now serialize still_url as None for non-animated emojis, instead of omitting the field. The webapp does proper checks for falsiness here. The mobile app does not yet use the field (to my knowledge). We bump the API version here. More discussion here: https://chat.zulip.org/#narrow/stream/378-api-design/topic/still_url/near/1302573	2021-12-30 11:28:14 -08:00
Alex Vandiver	6a40c17ccf	markdown: CSS-escape preview links. This adds `soupsieve` as an explicit dependency, but intentionally does not adjust the provision version, as it was already an indirect dependency.	2021-10-26 18:17:23 -07:00
Alex Vandiver	52f74bbd9b	markdown: Run URL preview links through camo. Not proxying these requests through camo is a security concern. Furthermore, on the desktop client, any embed image which is hosted on a server with an expired or otherwise invalid certificate will trigger a blocking modal window with no clear source and a confusing error message; see zulip/zulip-desktop#1119. Rewrite all `message_embed_image` URLs through camo, if it is enabled.	2021-10-26 18:17:23 -07:00
Anders Kaseorg	58920affd4	python: Remove re.UNICODE flag (redundant in Python 3). https://docs.python.org/3/library/re.html#re.A Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-10-22 13:42:29 -07:00
Alex Vandiver	9381a3bd45	linkifiers: Support URL percent-encoded bytes. Supporting URL percent-encoded bytes is possible using `%%20`, but this is not necessarily very understandable to end-users, even those that understand percent encoding. Allow `%20` in linkifier URL format strings, and transform them into `%%20` in the pattern just before they are applied in markdown translation. Care must be taken here, such that already-escaped `%`s are not escaped an extra time. We do this before rendering, and not before storage, as a simplification; the JS-side linkifier at present only understands `%(foo)s` and thus needs no changes, and to avoid an un-escaping pass before showing in the admin UI.	2021-10-22 13:00:20 -07:00
Anders Kaseorg	4839b7ed27	url_preview: Interpret og:image relative to full page URL. og:image is supposed to be an absolute URL, but some sites incorrectly provide a relative URL. In this case, it makes more sense to interpret it relative to the full page URL after redirects, rather than relative to just the domain part of the page URL before redirects. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-10-21 12:20:37 -07:00
Alex Vandiver	db934be064	CVE-2021-41115: Use re2 for user-supplied linkifier patterns. Zulip attempts to validate that the regular expressions that admins enter for linkifiers are well-formatted, and only contain a specific subset of regex grammar. The process of checking these properties (via a regex!) can cause denial-of-service via backtracking. Furthermore, this validation itself does not prevent the creation of linkifiers which themselves cause denial-of-service when they are executed. As the validator accepts literally anything inside of a `(?P<word>...)` block, any quadratic backtracking expression can be hidden therein. Switch user-provided linkifier patterns to be matched in the Markdown processor by the `re2` library, which is guaranteed constant-time. This somewhat limits the possible features of the regular expression (notably, look-head and -behind, and back-references); however, these features had never been advertised as working in the context of linkifiers. A migration removes any existing linkifiers which would not function under re2, after printing them for posterity during the upgrade; they are unlikely to be common, and are impossible to fix automatically. The denial-of-service in the linkifier validator was discovered by @erik-krogh and @yoff, as GHSL-2021-118.	2021-10-04 21:26:24 +00:00
Tim Abbott	545911b051	markdown: Remove useless locless_schemes check. This check was copied from upstream python-markdown's "safe mode" before they removed that feature. The upstream history is that they introduced this check in `2db5d1c8e4`, which was not a complete security check, and then added the immediately following check (with an allowlist of schemes) in `0b4ffbb60e`. Their first, incomplete check provides no security benefit and makes the code hard to reason about, so we remove it.	2021-09-09 09:03:40 -07:00
rht	c24ab8c4d3	markdown: Expand list of safelisted URL schemes to match HTML spec.	2021-09-09 09:03:40 -07:00
Anders Kaseorg	66ad6a4583	docs: Inline code spans are not blocks. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-09-07 16:12:39 -07:00
Anders Kaseorg	646c04eff2	Rename default branch to ‘main’. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-09-06 12:56:35 -07:00
Alex Vandiver	4d428490fd	outgoing_http: Use OutgoingSession subclasses in more places. This adds the X-Smokescreen-Role header to proxy connections, to track usage from various codepaths, and enforces a timeout. Timeouts were kept consistent with their previous values, or set to 5s if they had none previously.	2021-09-01 05:34:13 -07:00
Priyansh Garg	1e51c23494	markdown: Remove unnecessary checks for zulip_message. This commits removes some unnecessary checks for `self.md.zulip_message`, which were put there historically, as earlier we used to add the additional properties like mentions_user_ids, alert_words, etc. to Message dict only. These were later moved to MessageRenderingResult class in commit `75cea329b` but the checks weren't removed. This is important because while rendering the messages imported from other chat tools (like Rocket.Chat), the Message dict is not passed to the markdown, due to which the checks for `self.md.zerver_message` fails and hence, things like user mentions, stream/topic mentions are not rendered in the imported messages properly.	2021-08-31 16:53:42 -07:00
Anders Kaseorg	4206e5f00b	python: Remove locally dead code. These changes are all independent of each other; I just didn’t feel like making dozens of commits for them. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-19 01:51:37 -07:00
Anders Kaseorg	806494da06	markdown: Stream and parse incrementally in fetch_open_graph_image. This way we can stop reading as soon as we get to the body. Also, send an Accept header, check that the request was actually successful, use lxml.etree.iterparse instead of a broken hand-rolled state machine, and support XHTML, all for negative 28 lines of code. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-05 09:17:32 -07:00
Priyansh Garg	0a875c1c4c	markdown: Fix jpeg extension in `IMAGE_EXTENSIONS`.	2021-08-05 08:54:02 -07:00
Anders Kaseorg	42fa62e563	Revert "time_widget: Make the generated time string more readable." This reverts commit `1965584eec`. This syntax has a bad interaction with table syntax and needs to be rethought. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-03 16:45:31 -07:00
Ganesh Pawar	1965584eec	time_widget: Make the generated time string more readable. Before: <time:2021-07-14T00:14:00-07:00> After: <time:2021-07-14\|00:14:00\|UTC-07:00> Fixes #19205	2021-08-02 23:17:01 -07:00
Anders Kaseorg	3665deb93a	python: Remove unnecessary intermediate lists. Generated automatically by pyupgrade. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-02 15:53:52 -07:00
Anders Kaseorg	162e9d6c0b	fenced_code: Optimize FENCE_RE to fix cubic worst-case complexity. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-07-22 16:40:44 -07:00
Anders Kaseorg	c56440ded0	requirements: Upgrade Python requirements. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-07-05 12:23:06 -07:00
Priyansh Garg	94a2be06f3	markdown: Use a shared variable for IMAGE_EXTENSION.	2021-07-02 11:22:55 -07:00
akshatdalton	44a298b671	minor: Use `OUTER_CAPTURE_GROUP` variable instead of string value.	2021-06-25 17:43:27 -07:00
akshatdalton	490f6b6880	markdown: Extract regex in local variables.	2021-06-25 17:43:01 -07:00
PIG208	75cea329b4	markdown: Refactor out additional properties added to Message. This adds a new class called MessageRenderingResult to contain the additional properties we added to the Message object (like alert_words) as well as the rendered content to ensure typesafe reference. No behavioral change is made except changes in typing. This is a preparatory change for adding django-stubs to the backend. Related: #18777	2021-06-24 18:14:53 -07:00
akshatdalton	c507931ac8	refactor: Export non-markdown logic in mention.py.	2021-06-14 13:26:30 -07:00
Wesley Aptekar-Cassels	d5ba94082a	markdown: Increase max rendered message length to 1MB. This should help with #17425, where messages with lots of LaTeX are lost, due to the large expansion factor. This isn't a total fix for this - large messages with lots of LaTeX can still end up larger than 1MB, and rendering could timeout, but this fix should help significantly. 1MB is still small enough that I don't expect we'll run into any DOS problems - my testing didn't show any problems rendering messages that contain ~1MB of LaTeX.	2021-06-03 10:10:35 -07:00
akshatdalton	7df62ebbaf	settings: Make `MAX_MESSAGE_LENGTH` a server-level setting. This will offer users who are self-hosting to adjust this value. Moreover, this will help to reduce the overall time taken to test `test_markdown.py` (since this can be now overridden with `override_settings` Django decorator). This is done as a prep commit for #18641.	2021-06-03 09:26:28 -07:00
akshatdalton	832c763c38	minor: Remove unnecessary `__init__` method in `InlineInterestingLinkProcessor`. Subclass `Treeprocessor` takes care of the `__init__` method.	2021-05-26 17:13:03 -07:00
Anders Kaseorg	bac96cae80	markdown: Fix Dropbox image previews. ?dl=1 causes Dropbox to send Content-Type: application/binary, which can’t be interpreted by Camo. Use ?raw=1 instead. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-05-25 13:42:29 -07:00
akshatdalton	503247ebfa	refactor: Add class `CompiledInlineProcessor` to de-duplicate code.	2021-05-23 14:30:22 -07:00
akshatdalton	78f26b6031	minor: Use `super` to initialize subclass.	2021-05-23 14:30:22 -07:00
akshatdalton	18203d8af3	markdown: Silence user group mention inside blockquotes.	2021-05-18 17:31:25 -07:00
akshatdalton	0245b590e9	markdown: Add support for user group silent mention. Prior to this, we only supported direct mention to the user groups. This commit extends that support to silent mention for the user groups. A related test case is also added. Fixes: #11711.	2021-05-18 17:31:25 -07:00
akshatdalton	f56fca308a	mention: Refactor `USER_GROUP_MENTIONS_RE` and simplify its related code path. Earlier, USER_GROUP_MENTIONS_RE was: r"(?<![^\s\'\"\(,:<])@(\[^\]+\)" For the syntax: foo, this was unnecessarily capturing it as foo* and the extraction of `foo` was done using another helper function: `extract_user_group`. This is now changed as: r"(?<![^\s\'\"\(,:<])@(\(?P<match>[^\]+)\*)" and extraction of `foo` can be done just by using the named capture group `match`. This change also helps to simplify its related code path.	2021-05-18 17:31:25 -07:00
akshatdalton	d5a36ac5e2	mention: Refactor `MENTIONS_RE` and simplify its related code path. Earlier, MENTIONS_RE was: r"(?<![^\s\'\"\(,:<])@(?P<silent>_?)(?P<match>\\[^\]+\\)" For the syntax: foo, this was unnecessarily capturing it as foo* and adding extra operation for the extraction of `foo`. This is now changed as: r"(?<![^\s\'\"\(,:<])@(?P<silent>_?)(\\(?P<match>[^\]+)\\*)" and extraction of `foo` can be done just by using the named capture group `match`. This change also helps to simplify its related code path.	2021-05-18 17:31:25 -07:00
akshatdalton	a9d89b3c56	minor: Convert `unicode_emoji_regex` to uppercase. Following the convention, we use uppercase for regex. Also, `unicode_emoji_regex` is given a conventional name ending with `*_RE`: `UNICODE_EMOJI_RE`.	2021-05-18 17:31:25 -07:00
akshatdalton	ffc4724287	minor: Convert `emoticon_regex` to uppercase. Following the convention, we use uppercase for regex. Also, `emoticon_regex` is given a conventional name ending with `*_RE`: `EMOTICON_RE`.	2021-05-18 17:31:25 -07:00

1 2 3

124 Commits