Commit Graph

42 Commits

Author SHA1 Message Date
Anders Kaseorg 11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
akshatdalton 5f8a10124e url preview: Update Zulip User-Agent.
This commit updates the Zulip User-Agent to
'Mozilla/5.0 (compatible; ZulipURLPreview/{version}; +{external_host})'
as the older User-Agent was rendering Markdown YouTube titles as
'YouTube - YouTube'.

Fixes #16970.
2021-01-25 14:24:48 -08:00
Anders Kaseorg bf45f921a7 url_preview: Allow Beautiful Soup to get the charset from <meta>.
An HTML document sent without a charset in the Content-Type header
needs to be scanned for a charset in <meta> tags.  We need to pass
bytes instead of str to Beautiful Soup to allow it to do this.

Fixes #16843.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-12-15 11:30:57 -08:00
Alex Vandiver ad8943a64a url_preview: Only extract img tags with an `src`.
Some `<img>` tags do not have an SRC, if they are rewritten using JS
to have one later.  Attempting to access `first_image['src']` on these
will raise an exception, as they have no such attribute.

Only look for images which have a defined `src` attribute on them.  We
could instead check if `first_image.has_attr('src')`, but this seems
only likely to produce fewer valid images.
2020-08-18 14:26:21 -04:00
Anders Kaseorg 69c0959f34 python: Fix misuse of Optional types for optional parameters.
There seems to have been a confusion between two different uses of the
word “optional”:

• An optional parameter may be omitted and replaced with a default
  value.
• An Optional type has None as a possible value.

Sometimes an optional parameter has a default value of None, or None
is otherwise a meaningful value to provide, in which case it makes
sense for the optional parameter to have an Optional type.  But in
other cases, optional parameters should not have Optional type.  Fix
them.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-13 15:31:27 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Graham Bleaney 461d5b1a3e pysa: Introduce sanitizers, models, and inline marking safe.
This commit adds three `.pysa` model files: `false_positives.pysa`
for ruling out false positive flows with `Sanitize` annotations,
`req_lib.pysa` for educating pysa about Zulip's `REQ()` pattern for
extracting user input, and `redirects.pysa` for capturing the risk
of open redirects within Zulip code. Additionally, this commit
introduces `mark_sanitized`, an identity function which can be used
to selectively clear taint in cases where `Sanitize` models will not
work. This commit also puts `mark_sanitized` to work removing known
false postive flows.
2020-06-11 12:57:49 -07:00
Puneeth Chaganti 2a65be2bf5 url preview: Use Chrome's user agent instead of a Zulip one.
Some sites don't render correctly unless you are one of the latest browsers.
YouTube Music, for instance, changes the page title to "Your browser is
deprecated, please upgrade.", which makes our URL previews look bad.
2020-04-26 10:16:43 -07:00
Mateusz Mandera 770086f983 url_preview: Discard url in oembed if server returns invalid json.
This fixes the scenario where we'd get errors in the
FetchLinksEmbedData queue processor if oembed got invalid json from the
URL.
2020-04-11 11:54:54 -07:00
Tim Abbott 4901dc3795 url_preview: Fix parsing of open graph tags.
Our open graph parser logic sloppily mixed data obtained by parsing
open graph properties with trusted data set by our oembed parser.

We fix this by consistenly using our explicit whitelist of generic
properties (image, title, and description) in both places where we
interact with open graph properties.  The fixes are redundant with
each other, but doing both helps in making the intent of the code
clearer.

This issue fixed here was originally reported as an XSS vulnerability
in the upcoming Inline URL Previews feature found by Graham Bleaney
and Ibrahim Mohamed using Pysa.  The recent Oembed changes close that
vulnerability, but this change is still worth doing to make the
implementation do what it looks like it does.
2019-12-12 15:24:38 -08:00
Anders Kaseorg faa3ea0b8e oembed: Remove unsound HTML filtering.
The frontend now takes care of confining the HTML.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-12-12 15:24:38 -08:00
Tim Abbott 9f223bb7c2 url_preview: Simplify path to oembed code. 2019-12-12 13:34:49 -08:00
Puneeth Chaganti 64c40287f1 url preview: Rename type_ variable to oembed_resource_type. 2019-06-02 14:31:39 -07:00
Puneeth Chaganti 9aa5a2b369 url preview: Use oEmbed html for videos.
Ensure that the html is safe, before using it. The html is considered if it is
in an iframe with a http/https src, based on the recommendations here:
https://oembed.com/#section3

We directly embed the `iframe` html into the lightbox overlay.
2019-05-31 15:59:03 -07:00
Puneeth Chaganti c8cb785950 url preview: Show inline images as previews for oEmbed photo pages. 2019-05-31 15:59:03 -07:00
Puneeth Chaganti 22d0cd9696 url preview: Don't cache embed data when fetch has network errors. 2019-05-30 16:45:22 -07:00
Puneeth Chaganti 4ac9778d69 url preview: Catch network errors during get for page content.
We may be successfully able to get the page once, to get the content type, but
the server or network may go down and cause problems when fetching the page for
parsing its meta tags.
2019-05-13 13:55:00 -07:00
Puneeth Chaganti 9fd1c40bb1 url preview: Timeout requests after 15 seconds. 2019-05-13 13:54:59 -07:00
Puneeth Chaganti 0b76b16101 url preview: Set a custom user agent for requests.
Some sites seem to block the default user agent of the requests
library. Using a custom user agent lets us show previews for some of
these sites.
2019-05-13 13:54:43 -07:00
Puneeth Chaganti 59555ee7e5 url preview: Confirm content-type before trying to show previews.
Currently, we only show previews for URLs which are HTML pages, which could
contain other media. We don't show previews for links to non-HTML pages, like
pdf documents or audio/video files. To verify that the URL posted is an HTML
page, we verify the content-type of the page, either using server headers or by
sniffing the content.

Closes #8358
2019-05-13 13:45:17 -07:00
Puneeth Chaganti da33b72848 url preview: Use in-memory caching in dev environment. 2019-05-06 12:37:32 -07:00
Puneeth Chaganti 1f6306a5a7 url preview: Cleanup import ordering. 2019-05-06 12:37:32 -07:00
Puneeth Chaganti d56b16b275 url preview: Ignore open graph tags without a content attribute. 2019-05-06 12:37:32 -07:00
Puneeth Chaganti d02eb99831 url preview: Return generic parser <p> text as str (not bs4 string). 2019-05-06 12:37:32 -07:00
Anders Kaseorg 649235cfec python: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-22 16:54:36 -08:00
Tim Abbott a4b294da98 url preview: Remove useless logging.error in open graph code path.
As detailed in the comment, someone pasting a broken URL isn't a
situation that a server administrator needs to be notified about.
2019-02-05 13:25:47 -08:00
Steve Howell 76deb30312 preview: Hash cache keys for preview urls.
We don't want really long urls to lead to truncated
keys, or we could theoretically have two different
urls get mixed up previews.

Also, this suppresses warnings about exceeding the
250 char limit.

Finally, this gives the key a proper prefix.
2018-10-14 09:28:57 -07:00
Tim Abbott 4d03c15848 url_preview: Don't import beautifulsoup at import time.
This is a small performance optimization to Django startup, in line
with other recent commits.
2018-08-08 14:19:42 -07:00
neiljp (Neil Pilgrim) e4821875f7 mypy: Improve typing of oembed data, to Dict[str, Any]. 2018-06-19 10:48:38 -07:00
Tim Abbott 3006b3f52f url_preview: Fix crash when description has no content.
There's several things we'll want to cleanup with this feature, but
for now we're content to just make this not crash.
2018-05-17 12:40:43 -07:00
Aditya Bansal 1f9244e060 zerver/lib: Change use of typing.Text to str. 2018-05-10 14:19:49 -07:00
rht 3f4bf2d22f zerver/lib: Use python 3 syntax for typing.
Extracted from a larger commit by tabbott because these changes will
not create significant merge conflicts.
2017-11-21 20:56:40 -08:00
neiljp (Neil Pilgrim) 1dcc981af8 mypy: Add explicit Any type parameters for embedded data Dicts. 2017-11-07 11:26:46 -08:00
rht e311842a1b zerver/lib: Remove inheritance from object. 2017-11-06 08:53:48 -08:00
neiljp (Neil Pilgrim) be856bad46 mypy: Reduce use of Any in zerver/lib/url_preview/ return types. 2017-11-04 16:18:27 -07:00
rht f43e54d352 zerver/lib: Remove absolute_import. 2017-09-27 10:00:39 -07:00
Aditya Bansal f32c1892ff preview.py: Fix error raised on uploading file with unicode filename. 2017-06-19 14:58:44 -04:00
Mark Shannon c7c47fe11d Replace buggy NotImplemented with NotImplementedError(). 2017-05-23 20:33:35 -07:00
Robert Hönig 0917493588 mypy: Convert zerver/lib to use typing.Text. 2016-12-25 10:33:45 -08:00
Tim Abbott 6bb959ff4e url_preview: Fix BeautifulSoup DeprecationWarning. 2016-12-15 17:05:10 -08:00
Igor Tokarev fae59502ab URL preview: Improve test coverage. 2016-12-13 10:43:02 -08:00
Igor Tokarev c93f1d4eda Add oembed/Open Graph/Meta tags data retrieval from inline links.
This change adds support for displaying inline open graph previews for
links posted into Zulip.

It is designed to interact correctly with message editing.

This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control
whether this feature is enabled.

By default, this setting is currently disabled, so that we can burn it
in for a bit before it impacts users more broadly.

Eventually, we may want to make this manageable via a (set of?)
per-realm settings.  E.g. I can imagine a realm wanting to be able to
enable/disable it for certain URLs.
2016-12-07 17:40:18 -08:00