zulip/zerver/lib/url_preview/preview.py

import re
import logging
import traceback
from typing import Any, Optional, Dict
from typing.re import Match
import requests
from zerver.lib.cache import cache_with_key, get_cache_with_key, preview_url_cache_key
from zerver.lib.url_preview.oembed import get_oembed_data
from zerver.lib.url_preview.parsers import OpenGraphParser, GenericParser
from django.utils.encoding import smart_text


CACHE_NAME = "database"
# Based on django.core.validators.URLValidator, with ftp support removed.
link_regex = re.compile(
    r'^(?:http)s?://'  # http:// or https://
    r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'  # domain...
    r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'  # ...or ip
    r'(?::\d+)?'  # optional port
    r'(?:/?|[/?]\S+)$', re.IGNORECASE)


def is_link(url: str) -> Match[str]:
    return link_regex.match(smart_text(url))


@cache_with_key(preview_url_cache_key, cache_name=CACHE_NAME, with_statsd_key="urlpreview_data")
def get_link_embed_data(url: str,
                        maxwidth: Optional[int]=640,
                        maxheight: Optional[int]=480) -> Optional[Dict[str, Any]]:
    if not is_link(url):
        return None
    # Fetch information from URL.
    # We are using three sources in next order:
    # 1. OEmbed
    # 2. Open Graph
    # 3. Meta tags
    try:
        data = get_oembed_data(url, maxwidth=maxwidth, maxheight=maxheight)
    except requests.exceptions.RequestException:
        msg = 'Unable to fetch information from url {0}, traceback: {1}'
        logging.error(msg.format(url, traceback.format_exc()))
        return None
    data = data or {}
    response = requests.get(url)
    if response.ok:
        og_data = OpenGraphParser(response.text).extract_data()
        if og_data:
            data.update(og_data)
        generic_data = GenericParser(response.text).extract_data() or {}
        for key in ['title', 'description', 'image']:
            if not data.get(key) and generic_data.get(key):
                data[key] = generic_data[key]
    return data


@get_cache_with_key(preview_url_cache_key, cache_name=CACHE_NAME)
def link_embed_data_from_cache(url: str, maxwidth: Optional[int]=640, maxheight: Optional[int]=480) -> Any:
    return
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`import re`
			`import logging`
			`import traceback`
zerver/lib: Change use of typing.Text to str. 2018-05-10 19:13:36 +02:00			`from typing import Any, Optional, Dict`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`from typing.re import Match`
			`import requests`
preview: Hash cache keys for preview urls. We don't want really long urls to lead to truncated keys, or we could theoretically have two different urls get mixed up previews. Also, this suppresses warnings about exceeding the 250 char limit. Finally, this gives the key a proper prefix. 2018-10-14 14:41:15 +02:00			`from zerver.lib.cache import cache_with_key, get_cache_with_key, preview_url_cache_key`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`from zerver.lib.url_preview.oembed import get_oembed_data`
			`from zerver.lib.url_preview.parsers import OpenGraphParser, GenericParser`
preview.py: Fix error raised on uploading file with unicode filename. 2017-06-16 00:23:35 +02:00			`from django.utils.encoding import smart_text`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00

			`CACHE_NAME = "database"`
			`# Based on django.core.validators.URLValidator, with ftp support removed.`
			`link_regex = re.compile(`
			`r'^(?:http)s?://' # http:// or https://`
			`r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?\|[A-Z0-9-]{2,}\.?)\|' # domain...`
			`r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip`
			`r'(?::\d+)?' # optional port`
			`r'(?:/?\|[/?]\S+)$', re.IGNORECASE)`


zerver/lib: Change use of typing.Text to str. 2018-05-10 19:13:36 +02:00			`def is_link(url: str) -> Match[str]:`
preview.py: Fix error raised on uploading file with unicode filename. 2017-06-16 00:23:35 +02:00			`return link_regex.match(smart_text(url))`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00

preview: Hash cache keys for preview urls. We don't want really long urls to lead to truncated keys, or we could theoretically have two different urls get mixed up previews. Also, this suppresses warnings about exceeding the 250 char limit. Finally, this gives the key a proper prefix. 2018-10-14 14:41:15 +02:00			`@cache_with_key(preview_url_cache_key, cache_name=CACHE_NAME, with_statsd_key="urlpreview_data")`
zerver/lib: Change use of typing.Text to str. 2018-05-10 19:13:36 +02:00			`def get_link_embed_data(url: str,`
zerver/lib: Use python 3 syntax for typing. Extracted from a larger commit by tabbott because these changes will not create significant merge conflicts. 2017-11-05 11:15:10 +01:00			`maxwidth: Optional[int]=640,`
mypy: Improve typing of oembed data, to Dict[str, Any]. 2018-06-16 23:00:17 +02:00			`maxheight: Optional[int]=480) -> Optional[Dict[str, Any]]:`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`if not is_link(url):`
			`return None`
			`# Fetch information from URL.`
			`# We are using three sources in next order:`
			`# 1. OEmbed`
			`# 2. Open Graph`
			`# 3. Meta tags`
			`try:`
			`data = get_oembed_data(url, maxwidth=maxwidth, maxheight=maxheight)`
			`except requests.exceptions.RequestException:`
			`msg = 'Unable to fetch information from url {0}, traceback: {1}'`
			`logging.error(msg.format(url, traceback.format_exc()))`
			`return None`
			`data = data or {}`
			`response = requests.get(url)`
			`if response.ok:`
			`og_data = OpenGraphParser(response.text).extract_data()`
			`if og_data:`
			`data.update(og_data)`
			`generic_data = GenericParser(response.text).extract_data() or {}`
			`for key in ['title', 'description', 'image']:`
			`if not data.get(key) and generic_data.get(key):`
			`data[key] = generic_data[key]`
			`return data`


preview: Hash cache keys for preview urls. We don't want really long urls to lead to truncated keys, or we could theoretically have two different urls get mixed up previews. Also, this suppresses warnings about exceeding the 250 char limit. Finally, this gives the key a proper prefix. 2018-10-14 14:41:15 +02:00			`@get_cache_with_key(preview_url_cache_key, cache_name=CACHE_NAME)`
zerver/lib: Change use of typing.Text to str. 2018-05-10 19:13:36 +02:00			`def link_embed_data_from_cache(url: str, maxwidth: Optional[int]=640, maxheight: Optional[int]=480) -> Any:`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00			`return`