zulip/zerver/lib/url_preview/oembed.py

from typing import Optional, Dict, Any
from pyoembed import oEmbed, PyOembedException
import json

def get_oembed_data(url: str,
                    maxwidth: Optional[int]=640,
                    maxheight: Optional[int]=480) -> Optional[Dict[str, Any]]:
    try:
        data = oEmbed(url, maxwidth=maxwidth, maxheight=maxheight)
    except (PyOembedException, json.decoder.JSONDecodeError):
        return None

    oembed_resource_type = data.get('type', '')
    image = data.get('url', data.get('image'))
    thumbnail = data.get('thumbnail_url')
    html = data.pop('html', '')
    if oembed_resource_type == 'photo' and image:
        return dict(
            oembed=True,
            image=image,
            type=oembed_resource_type,
            title=data.get('title'),
            description=data.get('description'),
        )

    if oembed_resource_type == 'video' and html and thumbnail:
        return dict(
            oembed=True,
            image=thumbnail,
            type=oembed_resource_type,
            html=strip_cdata(html),
            title=data.get('title'),
            description=data.get('description'),
        )

    # Otherwise, start with just the embed type.
    return dict(
        type=oembed_resource_type,
        title=data.get('title'),
        description=data.get('description'),
    )

def strip_cdata(html: str) -> str:
    # Work around a bug in SoundCloud's XML generation:
    # <html>&lt;![CDATA[&lt;iframe ...&gt;&lt;/iframe&gt;]]&gt;</html>
    if html.startswith('<![CDATA[') and html.endswith(']]>'):
        html = html[9:-3]
    return html
zerver/lib: Change use of typing.Text to str. 2018-05-10 19:13:36 +02:00			`from typing import Optional, Dict, Any`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`from pyoembed import oEmbed, PyOembedException`
url_preview: Discard url in oembed if server returns invalid json. This fixes the scenario where we'd get errors in the FetchLinksEmbedData queue processor if oembed got invalid json from the URL. 2020-04-11 13:24:06 +02:00			`import json`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00
zerver/lib: Change use of typing.Text to str. 2018-05-10 19:13:36 +02:00			`def get_oembed_data(url: str,`
zerver/lib: Use python 3 syntax for typing. Extracted from a larger commit by tabbott because these changes will not create significant merge conflicts. 2017-11-05 11:15:10 +01:00			`maxwidth: Optional[int]=640,`
mypy: Improve typing of oembed data, to Dict[str, Any]. 2018-06-16 23:00:17 +02:00			`maxheight: Optional[int]=480) -> Optional[Dict[str, Any]]:`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`try:`
			`data = oEmbed(url, maxwidth=maxwidth, maxheight=maxheight)`
url_preview: Discard url in oembed if server returns invalid json. This fixes the scenario where we'd get errors in the FetchLinksEmbedData queue processor if oembed got invalid json from the URL. 2020-04-11 13:24:06 +02:00			`except (PyOembedException, json.decoder.JSONDecodeError):`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`return None`

url preview: Rename type_ variable to oembed_resource_type. 2019-06-01 13:05:30 +02:00			`oembed_resource_type = data.get('type', '')`
url preview: Show inline images as previews for oEmbed photo pages. 2019-05-26 06:27:01 +02:00			`image = data.get('url', data.get('image'))`
url preview: Use oEmbed html for videos. Ensure that the html is safe, before using it. The html is considered if it is in an iframe with a http/https src, based on the recommendations here: https://oembed.com/#section3 We directly embed the `iframe` html into the lightbox overlay. 2019-05-02 18:58:39 +02:00			`thumbnail = data.get('thumbnail_url')`
			`html = data.pop('html', '')`
url preview: Rename type_ variable to oembed_resource_type. 2019-06-01 13:05:30 +02:00			`if oembed_resource_type == 'photo' and image:`
url_preview: Fix parsing of open graph tags. Our open graph parser logic sloppily mixed data obtained by parsing open graph properties with trusted data set by our oembed parser. We fix this by consistenly using our explicit whitelist of generic properties (image, title, and description) in both places where we interact with open graph properties. The fixes are redundant with each other, but doing both helps in making the intent of the code clearer. This issue fixed here was originally reported as an XSS vulnerability in the upcoming Inline URL Previews feature found by Graham Bleaney and Ibrahim Mohamed using Pysa. The recent Oembed changes close that vulnerability, but this change is still worth doing to make the implementation do what it looks like it does. 2019-12-12 02:10:50 +01:00			`return dict(`
			`oembed=True,`
			`image=image,`
			`type=oembed_resource_type,`
			`title=data.get('title'),`
			`description=data.get('description'),`
			`)`
url preview: Use oEmbed html for videos. Ensure that the html is safe, before using it. The html is considered if it is in an iframe with a http/https src, based on the recommendations here: https://oembed.com/#section3 We directly embed the `iframe` html into the lightbox overlay. 2019-05-02 18:58:39 +02:00
url_preview: Fix parsing of open graph tags. Our open graph parser logic sloppily mixed data obtained by parsing open graph properties with trusted data set by our oembed parser. We fix this by consistenly using our explicit whitelist of generic properties (image, title, and description) in both places where we interact with open graph properties. The fixes are redundant with each other, but doing both helps in making the intent of the code clearer. This issue fixed here was originally reported as an XSS vulnerability in the upcoming Inline URL Previews feature found by Graham Bleaney and Ibrahim Mohamed using Pysa. The recent Oembed changes close that vulnerability, but this change is still worth doing to make the implementation do what it looks like it does. 2019-12-12 02:10:50 +01:00			`if oembed_resource_type == 'video' and html and thumbnail:`
			`return dict(`
			`oembed=True,`
			`image=thumbnail,`
			`type=oembed_resource_type,`
			`html=strip_cdata(html),`
			`title=data.get('title'),`
			`description=data.get('description'),`
			`)`
url preview: Show inline images as previews for oEmbed photo pages. 2019-05-26 06:27:01 +02:00
url_preview: Fix parsing of open graph tags. Our open graph parser logic sloppily mixed data obtained by parsing open graph properties with trusted data set by our oembed parser. We fix this by consistenly using our explicit whitelist of generic properties (image, title, and description) in both places where we interact with open graph properties. The fixes are redundant with each other, but doing both helps in making the intent of the code clearer. This issue fixed here was originally reported as an XSS vulnerability in the upcoming Inline URL Previews feature found by Graham Bleaney and Ibrahim Mohamed using Pysa. The recent Oembed changes close that vulnerability, but this change is still worth doing to make the implementation do what it looks like it does. 2019-12-12 02:10:50 +01:00			`# Otherwise, start with just the embed type.`
			`return dict(`
			`type=oembed_resource_type,`
			`title=data.get('title'),`
			`description=data.get('description'),`
			`)`
url preview: Use oEmbed html for videos. Ensure that the html is safe, before using it. The html is considered if it is in an iframe with a http/https src, based on the recommendations here: https://oembed.com/#section3 We directly embed the `iframe` html into the lightbox overlay. 2019-05-02 18:58:39 +02:00
oembed: Remove unsound HTML filtering. The frontend now takes care of confining the HTML. Signed-off-by: Anders Kaseorg <anders@zulipchat.com> 2019-12-12 09:39:41 +01:00			`def strip_cdata(html: str) -> str:`
			`# Work around a bug in SoundCloud's XML generation:`
			`# <html><![CDATA[<iframe ...></iframe>]]></html>`
url preview: Use oEmbed html for videos. Ensure that the html is safe, before using it. The html is considered if it is in an iframe with a http/https src, based on the recommendations here: https://oembed.com/#section3 We directly embed the `iframe` html into the lightbox overlay. 2019-05-02 18:58:39 +02:00			`if html.startswith('<![CDATA[') and html.endswith(']]>'):`
			`html = html[9:-3]`
oembed: Remove unsound HTML filtering. The frontend now takes care of confining the HTML. Signed-off-by: Anders Kaseorg <anders@zulipchat.com> 2019-12-12 09:39:41 +01:00			`return html`