zulip/zerver/lib/url_preview/preview.py

import re
from typing import Any, Optional, Dict
from typing.re import Match
import requests
from zerver.lib.cache import cache_with_key, get_cache_with_key, preview_url_cache_key
from zerver.lib.url_preview.oembed import get_oembed_data
from zerver.lib.url_preview.parsers import OpenGraphParser, GenericParser
from django.utils.encoding import smart_text


CACHE_NAME = "database"
# Based on django.core.validators.URLValidator, with ftp support removed.
link_regex = re.compile(
    r'^(?:http)s?://'  # http:// or https://
    r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'  # domain...
    r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'  # ...or ip
    r'(?::\d+)?'  # optional port
    r'(?:/?|[/?]\S+)$', re.IGNORECASE)


def is_link(url: str) -> Match[str]:
    return link_regex.match(smart_text(url))


@cache_with_key(preview_url_cache_key, cache_name=CACHE_NAME, with_statsd_key="urlpreview_data")
def get_link_embed_data(url: str,
                        maxwidth: Optional[int]=640,
                        maxheight: Optional[int]=480) -> Optional[Dict[str, Any]]:
    if not is_link(url):
        return None
    # Fetch information from URL.
    # We are using three sources in next order:
    # 1. OEmbed
    # 2. Open Graph
    # 3. Meta tags
    try:
        data = get_oembed_data(url, maxwidth=maxwidth, maxheight=maxheight)
    except requests.exceptions.RequestException:
        # This is what happens if the target URL cannot be fetched; in
        # that case, there's nothing to do here, and this URL has no
        # open graph data.
        return None
    data = data or {}
    response = requests.get(url)
    if response.ok:
        og_data = OpenGraphParser(response.text).extract_data()
        if og_data:
            data.update(og_data)
        generic_data = GenericParser(response.text).extract_data() or {}
        for key in ['title', 'description', 'image']:
            if not data.get(key) and generic_data.get(key):
                data[key] = generic_data[key]
    return data


@get_cache_with_key(preview_url_cache_key, cache_name=CACHE_NAME)
def link_embed_data_from_cache(url: str, maxwidth: Optional[int]=640, maxheight: Optional[int]=480) -> Any:
    return
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`import re`
zerver/lib: Change use of typing.Text to str. 2018-05-10 19:13:36 +02:00			`from typing import Any, Optional, Dict`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`from typing.re import Match`
			`import requests`
preview: Hash cache keys for preview urls. We don't want really long urls to lead to truncated keys, or we could theoretically have two different urls get mixed up previews. Also, this suppresses warnings about exceeding the 250 char limit. Finally, this gives the key a proper prefix. 2018-10-14 14:41:15 +02:00			`from zerver.lib.cache import cache_with_key, get_cache_with_key, preview_url_cache_key`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`from zerver.lib.url_preview.oembed import get_oembed_data`
			`from zerver.lib.url_preview.parsers import OpenGraphParser, GenericParser`
preview.py: Fix error raised on uploading file with unicode filename. 2017-06-16 00:23:35 +02:00			`from django.utils.encoding import smart_text`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00

			`CACHE_NAME = "database"`
			`# Based on django.core.validators.URLValidator, with ftp support removed.`
			`link_regex = re.compile(`
			`r'^(?:http)s?://' # http:// or https://`
			`r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?\|[A-Z0-9-]{2,}\.?)\|' # domain...`
			`r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip`
			`r'(?::\d+)?' # optional port`
			`r'(?:/?\|[/?]\S+)$', re.IGNORECASE)`


zerver/lib: Change use of typing.Text to str. 2018-05-10 19:13:36 +02:00			`def is_link(url: str) -> Match[str]:`
preview.py: Fix error raised on uploading file with unicode filename. 2017-06-16 00:23:35 +02:00			`return link_regex.match(smart_text(url))`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00

preview: Hash cache keys for preview urls. We don't want really long urls to lead to truncated keys, or we could theoretically have two different urls get mixed up previews. Also, this suppresses warnings about exceeding the 250 char limit. Finally, this gives the key a proper prefix. 2018-10-14 14:41:15 +02:00			`@cache_with_key(preview_url_cache_key, cache_name=CACHE_NAME, with_statsd_key="urlpreview_data")`
zerver/lib: Change use of typing.Text to str. 2018-05-10 19:13:36 +02:00			`def get_link_embed_data(url: str,`
zerver/lib: Use python 3 syntax for typing. Extracted from a larger commit by tabbott because these changes will not create significant merge conflicts. 2017-11-05 11:15:10 +01:00			`maxwidth: Optional[int]=640,`
mypy: Improve typing of oembed data, to Dict[str, Any]. 2018-06-16 23:00:17 +02:00			`maxheight: Optional[int]=480) -> Optional[Dict[str, Any]]:`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`if not is_link(url):`
			`return None`
			`# Fetch information from URL.`
			`# We are using three sources in next order:`
			`# 1. OEmbed`
			`# 2. Open Graph`
			`# 3. Meta tags`
			`try:`
			`data = get_oembed_data(url, maxwidth=maxwidth, maxheight=maxheight)`
			`except requests.exceptions.RequestException:`
url preview: Remove useless logging.error in open graph code path. As detailed in the comment, someone pasting a broken URL isn't a situation that a server administrator needs to be notified about. 2019-02-05 22:24:58 +01:00			`# This is what happens if the target URL cannot be fetched; in`
			`# that case, there's nothing to do here, and this URL has no`
			`# open graph data.`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`return None`
			`data = data or {}`
			`response = requests.get(url)`
			`if response.ok:`
			`og_data = OpenGraphParser(response.text).extract_data()`
			`if og_data:`
			`data.update(og_data)`
			`generic_data = GenericParser(response.text).extract_data() or {}`
			`for key in ['title', 'description', 'image']:`
			`if not data.get(key) and generic_data.get(key):`
			`data[key] = generic_data[key]`
			`return data`


preview: Hash cache keys for preview urls. We don't want really long urls to lead to truncated keys, or we could theoretically have two different urls get mixed up previews. Also, this suppresses warnings about exceeding the 250 char limit. Finally, this gives the key a proper prefix. 2018-10-14 14:41:15 +02:00			`@get_cache_with_key(preview_url_cache_key, cache_name=CACHE_NAME)`
zerver/lib: Change use of typing.Text to str. 2018-05-10 19:13:36 +02:00			`def link_embed_data_from_cache(url: str, maxwidth: Optional[int]=640, maxheight: Optional[int]=480) -> Any:`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`return`