Commit Graph

28 Commits

Author SHA1 Message Date
Anders Kaseorg 08db41660a python: Avoid deprecated cgi module, removed in Python 3.13.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-10-22 10:05:01 -07:00
Anders Kaseorg 531b34cb4c ruff: Fix UP007 Use `X | Y` for type annotations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg 7b1bb984b3 ruff: Fix RUF022 `__all__` is not sorted.
This is a preview rule, not yet enabled by default.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-03-01 09:30:04 -08:00
Anders Kaseorg 223b626256 python: Use urlsplit instead of urlparse.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-05 13:03:07 -08:00
Anders Kaseorg a50eb2e809 mypy: Enable new error explicit-override.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-10-12 12:28:41 -07:00
Anders Kaseorg da3cf5ea7a ruff: Fix RSE102 Unnecessary parentheses on raised exception.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-04 16:34:55 -08:00
Alex Vandiver 327ff9ea0f preview: Use a dataclass for the embed data.
This is significantly cleaner than passing around `Dict[str, Any]` all
of the time.
2022-04-15 14:48:12 -07:00
Alex Vandiver e53f9fad29 url_preview: Only return image URLs that validate as URLs. 2022-02-18 15:32:27 -08:00
Anders Kaseorg 4922632601 mypy: Add types-beautifulsoup4.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-23 23:39:40 -08:00
Anders Kaseorg 6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg bf45f921a7 url_preview: Allow Beautiful Soup to get the charset from <meta>.
An HTML document sent without a charset in the Content-Type header
needs to be scanned for a charset in <meta> tags.  We need to pass
bytes instead of str to Beautiful Soup to allow it to do this.

Fixes #16843.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-12-15 11:30:57 -08:00
Alex Vandiver ad8943a64a url_preview: Only extract img tags with an `src`.
Some `<img>` tags do not have an SRC, if they are rewritten using JS
to have one later.  Attempting to access `first_image['src']` on these
will raise an exception, as they have no such attribute.

Only look for images which have a defined `src` attribute on them.  We
could instead check if `first_image.has_attr('src')`, but this seems
only likely to produce fewer valid images.
2020-08-18 14:26:21 -04:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Tim Abbott 4901dc3795 url_preview: Fix parsing of open graph tags.
Our open graph parser logic sloppily mixed data obtained by parsing
open graph properties with trusted data set by our oembed parser.

We fix this by consistenly using our explicit whitelist of generic
properties (image, title, and description) in both places where we
interact with open graph properties.  The fixes are redundant with
each other, but doing both helps in making the intent of the code
clearer.

This issue fixed here was originally reported as an XSS vulnerability
in the upcoming Inline URL Previews feature found by Graham Bleaney
and Ibrahim Mohamed using Pysa.  The recent Oembed changes close that
vulnerability, but this change is still worth doing to make the
implementation do what it looks like it does.
2019-12-12 15:24:38 -08:00
Puneeth Chaganti d56b16b275 url preview: Ignore open graph tags without a content attribute. 2019-05-06 12:37:32 -07:00
Puneeth Chaganti d02eb99831 url preview: Return generic parser <p> text as str (not bs4 string). 2019-05-06 12:37:32 -07:00
Tim Abbott 4d03c15848 url_preview: Don't import beautifulsoup at import time.
This is a small performance optimization to Django startup, in line
with other recent commits.
2018-08-08 14:19:42 -07:00
Tim Abbott 3006b3f52f url_preview: Fix crash when description has no content.
There's several things we'll want to cleanup with this feature, but
for now we're content to just make this not crash.
2018-05-17 12:40:43 -07:00
Aditya Bansal 1f9244e060 zerver/lib: Change use of typing.Text to str. 2018-05-10 14:19:49 -07:00
rht 3f4bf2d22f zerver/lib: Use python 3 syntax for typing.
Extracted from a larger commit by tabbott because these changes will
not create significant merge conflicts.
2017-11-21 20:56:40 -08:00
rht e311842a1b zerver/lib: Remove inheritance from object. 2017-11-06 08:53:48 -08:00
neiljp (Neil Pilgrim) be856bad46 mypy: Reduce use of Any in zerver/lib/url_preview/ return types. 2017-11-04 16:18:27 -07:00
rht f43e54d352 zerver/lib: Remove absolute_import. 2017-09-27 10:00:39 -07:00
Mark Shannon c7c47fe11d Replace buggy NotImplemented with NotImplementedError(). 2017-05-23 20:33:35 -07:00
Robert Hönig 0917493588 mypy: Convert zerver/lib to use typing.Text. 2016-12-25 10:33:45 -08:00
Tim Abbott 6bb959ff4e url_preview: Fix BeautifulSoup DeprecationWarning. 2016-12-15 17:05:10 -08:00
Igor Tokarev c93f1d4eda Add oembed/Open Graph/Meta tags data retrieval from inline links.
This change adds support for displaying inline open graph previews for
links posted into Zulip.

It is designed to interact correctly with message editing.

This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control
whether this feature is enabled.

By default, this setting is currently disabled, so that we can burn it
in for a bit before it impacts users more broadly.

Eventually, we may want to make this manageable via a (set of?)
per-realm settings.  E.g. I can imagine a realm wanting to be able to
enable/disable it for certain URLs.
2016-12-07 17:40:18 -08:00