zulip

Commit Graph

Author	SHA1	Message	Date
Anders Kaseorg	b0ce4f1bce	docs: Fix many spelling mistakes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-07 18:51:06 -08:00
Anders Kaseorg	4922632601	mypy: Add types-beautifulsoup4. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-01-23 23:39:40 -08:00
Anders Kaseorg	4839b7ed27	url_preview: Interpret og:image relative to full page URL. og:image is supposed to be an absolute URL, but some sites incorrectly provide a relative URL. In this case, it makes more sense to interpret it relative to the full page URL after redirects, rather than relative to just the domain part of the page URL before redirects. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-10-21 12:20:37 -07:00
Alex Vandiver	4d428490fd	outgoing_http: Use OutgoingSession subclasses in more places. This adds the X-Smokescreen-Role header to proxy connections, to track usage from various codepaths, and enforces a timeout. Timeouts were kept consistent with their previous values, or set to 5s if they had none previously.	2021-09-01 05:34:13 -07:00
Anders Kaseorg	2939d29b6d	python: Convert deprecated Django smart_text alias to smart_str. django.utils.encoding.smart_text is a deprecated alias of django.utils.encoding.smart_str as of Django 3.0, and will be removed in Django 4.0. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-04-15 18:01:34 -07:00
Anders Kaseorg	9864907985	mypy: Correct typing.re imports to typing. Although typing.re exists in the standard library, mypy has never recognized it. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-03-17 18:41:46 -07:00
Anders Kaseorg	6e4c3e41dc	python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	11741543da	python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
akshatdalton	5f8a10124e	url preview: Update Zulip User-Agent. This commit updates the Zulip User-Agent to 'Mozilla/5.0 (compatible; ZulipURLPreview/{version}; +{external_host})' as the older User-Agent was rendering Markdown YouTube titles as 'YouTube - YouTube'. Fixes #16970.	2021-01-25 14:24:48 -08:00
Anders Kaseorg	bf45f921a7	url_preview: Allow Beautiful Soup to get the charset from <meta>. An HTML document sent without a charset in the Content-Type header needs to be scanned for a charset in <meta> tags. We need to pass bytes instead of str to Beautiful Soup to allow it to do this. Fixes #16843. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-12-15 11:30:57 -08:00
Alex Vandiver	ad8943a64a	url_preview: Only extract img tags with an `src`. Some `<img>` tags do not have an SRC, if they are rewritten using JS to have one later. Attempting to access `first_image['src']` on these will raise an exception, as they have no such attribute. Only look for images which have a defined `src` attribute on them. We could instead check if `first_image.has_attr('src')`, but this seems only likely to produce fewer valid images.	2020-08-18 14:26:21 -04:00
Anders Kaseorg	69c0959f34	python: Fix misuse of Optional types for optional parameters. There seems to have been a confusion between two different uses of the word “optional”: • An optional parameter may be omitted and replaced with a default value. • An Optional type has None as a possible value. Sometimes an optional parameter has a default value of None, or None is otherwise a meaningful value to provide, in which case it makes sense for the optional parameter to have an Optional type. But in other cases, optional parameters should not have Optional type. Fix them. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-13 15:31:27 -07:00
Anders Kaseorg	365fe0b3d5	python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-11 16:45:32 -07:00
Graham Bleaney	461d5b1a3e	pysa: Introduce sanitizers, models, and inline marking safe. This commit adds three `.pysa` model files: `false_positives.pysa` for ruling out false positive flows with `Sanitize` annotations, `req_lib.pysa` for educating pysa about Zulip's `REQ()` pattern for extracting user input, and `redirects.pysa` for capturing the risk of open redirects within Zulip code. Additionally, this commit introduces `mark_sanitized`, an identity function which can be used to selectively clear taint in cases where `Sanitize` models will not work. This commit also puts `mark_sanitized` to work removing known false postive flows.	2020-06-11 12:57:49 -07:00
Puneeth Chaganti	2a65be2bf5	url preview: Use Chrome's user agent instead of a Zulip one. Some sites don't render correctly unless you are one of the latest browsers. YouTube Music, for instance, changes the page title to "Your browser is deprecated, please upgrade.", which makes our URL previews look bad.	2020-04-26 10:16:43 -07:00
Mateusz Mandera	770086f983	url_preview: Discard url in oembed if server returns invalid json. This fixes the scenario where we'd get errors in the FetchLinksEmbedData queue processor if oembed got invalid json from the URL.	2020-04-11 11:54:54 -07:00
Tim Abbott	4901dc3795	url_preview: Fix parsing of open graph tags. Our open graph parser logic sloppily mixed data obtained by parsing open graph properties with trusted data set by our oembed parser. We fix this by consistenly using our explicit whitelist of generic properties (image, title, and description) in both places where we interact with open graph properties. The fixes are redundant with each other, but doing both helps in making the intent of the code clearer. This issue fixed here was originally reported as an XSS vulnerability in the upcoming Inline URL Previews feature found by Graham Bleaney and Ibrahim Mohamed using Pysa. The recent Oembed changes close that vulnerability, but this change is still worth doing to make the implementation do what it looks like it does.	2019-12-12 15:24:38 -08:00
Anders Kaseorg	faa3ea0b8e	oembed: Remove unsound HTML filtering. The frontend now takes care of confining the HTML. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-12-12 15:24:38 -08:00
Tim Abbott	9f223bb7c2	url_preview: Simplify path to oembed code.	2019-12-12 13:34:49 -08:00
Puneeth Chaganti	64c40287f1	url preview: Rename type_ variable to oembed_resource_type.	2019-06-02 14:31:39 -07:00
Puneeth Chaganti	9aa5a2b369	url preview: Use oEmbed html for videos. Ensure that the html is safe, before using it. The html is considered if it is in an iframe with a http/https src, based on the recommendations here: https://oembed.com/#section3 We directly embed the `iframe` html into the lightbox overlay.	2019-05-31 15:59:03 -07:00
Puneeth Chaganti	c8cb785950	url preview: Show inline images as previews for oEmbed photo pages.	2019-05-31 15:59:03 -07:00
Puneeth Chaganti	22d0cd9696	url preview: Don't cache embed data when fetch has network errors.	2019-05-30 16:45:22 -07:00
Puneeth Chaganti	4ac9778d69	url preview: Catch network errors during get for page content. We may be successfully able to get the page once, to get the content type, but the server or network may go down and cause problems when fetching the page for parsing its meta tags.	2019-05-13 13:55:00 -07:00
Puneeth Chaganti	9fd1c40bb1	url preview: Timeout requests after 15 seconds.	2019-05-13 13:54:59 -07:00
Puneeth Chaganti	0b76b16101	url preview: Set a custom user agent for requests. Some sites seem to block the default user agent of the requests library. Using a custom user agent lets us show previews for some of these sites.	2019-05-13 13:54:43 -07:00
Puneeth Chaganti	59555ee7e5	url preview: Confirm content-type before trying to show previews. Currently, we only show previews for URLs which are HTML pages, which could contain other media. We don't show previews for links to non-HTML pages, like pdf documents or audio/video files. To verify that the URL posted is an HTML page, we verify the content-type of the page, either using server headers or by sniffing the content. Closes #8358	2019-05-13 13:45:17 -07:00
Puneeth Chaganti	da33b72848	url preview: Use in-memory caching in dev environment.	2019-05-06 12:37:32 -07:00
Puneeth Chaganti	1f6306a5a7	url preview: Cleanup import ordering.	2019-05-06 12:37:32 -07:00
Puneeth Chaganti	d56b16b275	url preview: Ignore open graph tags without a content attribute.	2019-05-06 12:37:32 -07:00
Puneeth Chaganti	d02eb99831	url preview: Return generic parser <p> text as str (not bs4 string).	2019-05-06 12:37:32 -07:00
Anders Kaseorg	649235cfec	python: Remove unused imports. Signed-off-by: Anders Kaseorg <andersk@mit.edu>	2019-02-22 16:54:36 -08:00
Tim Abbott	a4b294da98	url preview: Remove useless logging.error in open graph code path. As detailed in the comment, someone pasting a broken URL isn't a situation that a server administrator needs to be notified about.	2019-02-05 13:25:47 -08:00
Steve Howell	76deb30312	preview: Hash cache keys for preview urls. We don't want really long urls to lead to truncated keys, or we could theoretically have two different urls get mixed up previews. Also, this suppresses warnings about exceeding the 250 char limit. Finally, this gives the key a proper prefix.	2018-10-14 09:28:57 -07:00
Tim Abbott	4d03c15848	url_preview: Don't import beautifulsoup at import time. This is a small performance optimization to Django startup, in line with other recent commits.	2018-08-08 14:19:42 -07:00
neiljp (Neil Pilgrim)	e4821875f7	mypy: Improve typing of oembed data, to Dict[str, Any].	2018-06-19 10:48:38 -07:00
Tim Abbott	3006b3f52f	url_preview: Fix crash when description has no content. There's several things we'll want to cleanup with this feature, but for now we're content to just make this not crash.	2018-05-17 12:40:43 -07:00
Aditya Bansal	1f9244e060	zerver/lib: Change use of typing.Text to str.	2018-05-10 14:19:49 -07:00
rht	3f4bf2d22f	zerver/lib: Use python 3 syntax for typing. Extracted from a larger commit by tabbott because these changes will not create significant merge conflicts.	2017-11-21 20:56:40 -08:00
neiljp (Neil Pilgrim)	1dcc981af8	mypy: Add explicit Any type parameters for embedded data Dicts.	2017-11-07 11:26:46 -08:00
rht	e311842a1b	zerver/lib: Remove inheritance from object.	2017-11-06 08:53:48 -08:00
neiljp (Neil Pilgrim)	be856bad46	mypy: Reduce use of Any in zerver/lib/url_preview/ return types.	2017-11-04 16:18:27 -07:00
rht	f43e54d352	zerver/lib: Remove absolute_import.	2017-09-27 10:00:39 -07:00
Aditya Bansal	f32c1892ff	preview.py: Fix error raised on uploading file with unicode filename.	2017-06-19 14:58:44 -04:00
Mark Shannon	c7c47fe11d	Replace buggy NotImplemented with NotImplementedError().	2017-05-23 20:33:35 -07:00
Robert Hönig	0917493588	mypy: Convert zerver/lib to use typing.Text.	2016-12-25 10:33:45 -08:00
Tim Abbott	6bb959ff4e	url_preview: Fix BeautifulSoup DeprecationWarning.	2016-12-15 17:05:10 -08:00
Igor Tokarev	fae59502ab	URL preview: Improve test coverage.	2016-12-13 10:43:02 -08:00
Igor Tokarev	c93f1d4eda	Add oembed/Open Graph/Meta tags data retrieval from inline links. This change adds support for displaying inline open graph previews for links posted into Zulip. It is designed to interact correctly with message editing. This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control whether this feature is enabled. By default, this setting is currently disabled, so that we can burn it in for a bit before it impacts users more broadly. Eventually, we may want to make this manageable via a (set of?) per-realm settings. E.g. I can imagine a realm wanting to be able to enable/disable it for certain URLs.	2016-12-07 17:40:18 -08:00

49 Commits