zulip/zerver/lib/url_preview/parsers/base.py

import cgi
from typing import Any, Optional


class BaseParser:
    def __init__(self, html_source: bytes, content_type: Optional[str]) -> None:
        # We import BeautifulSoup here, because it's not used by most
        # processes in production, and bs4 is big enough that
        # importing it adds 10s of milliseconds to manage.py startup.
        from bs4 import BeautifulSoup

        charset = None
        if content_type is not None:
            charset = cgi.parse_header(content_type)[1].get("charset")
        self._soup = BeautifulSoup(html_source, "lxml", from_encoding=charset)

    def extract_data(self) -> Any:
        raise NotImplementedError()
url_preview: Allow Beautiful Soup to get the charset from <meta>. An HTML document sent without a charset in the Content-Type header needs to be scanned for a charset in <meta> tags. We need to pass bytes instead of str to Beautiful Soup to allow it to do this. Fixes #16843. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2020-12-08 04:26:30 +01:00			`import cgi`
			`from typing import Any, Optional`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00
python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2020-06-11 00:54:34 +02:00
zerver/lib: Remove inheritance from object. 2017-11-05 11:37:41 +01:00			`class BaseParser:`
url_preview: Allow Beautiful Soup to get the charset from <meta>. An HTML document sent without a charset in the Content-Type header needs to be scanned for a charset in <meta> tags. We need to pass bytes instead of str to Beautiful Soup to allow it to do this. Fixes #16843. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2020-12-08 04:26:30 +01:00			`def __init__(self, html_source: bytes, content_type: Optional[str]) -> None:`
url_preview: Don't import beautifulsoup at import time. This is a small performance optimization to Django startup, in line with other recent commits. 2018-08-08 22:24:20 +02:00			`# We import BeautifulSoup here, because it's not used by most`
			`# processes in production, and bs4 is big enough that`
			`# importing it adds 10s of milliseconds to manage.py startup.`
			`from bs4 import BeautifulSoup`
python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2021-02-12 08:19:30 +01:00
url_preview: Allow Beautiful Soup to get the charset from <meta>. An HTML document sent without a charset in the Content-Type header needs to be scanned for a charset in <meta> tags. We need to pass bytes instead of str to Beautiful Soup to allow it to do this. Fixes #16843. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2020-12-08 04:26:30 +01:00			`charset = None`
			`if content_type is not None:`
			`charset = cgi.parse_header(content_type)[1].get("charset")`
			`self._soup = BeautifulSoup(html_source, "lxml", from_encoding=charset)`
Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs. 2016-10-27 12:06:44 +02:00
zerver/lib: Use python 3 syntax for typing. Extracted from a larger commit by tabbott because these changes will not create significant merge conflicts. 2017-11-05 11:15:10 +01:00			`def extract_data(self) -> Any:`
Replace buggy NotImplemented with NotImplementedError(). 2017-05-24 02:39:38 +02:00			`raise NotImplementedError()`