zulip

Commit Graph

Author	SHA1	Message	Date
Anders Kaseorg	08db41660a	python: Avoid deprecated cgi module, removed in Python 3.13. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-10-22 10:05:01 -07:00
Anders Kaseorg	0fa5e7f629	ruff: Fix UP035 Import from `collections.abc`, `typing` instead. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-07-13 22:28:22 -07:00
Anders Kaseorg	531b34cb4c	ruff: Fix UP007 Use `X \| Y` for type annotations. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-07-13 22:28:22 -07:00
Anders Kaseorg	7b1bb984b3	ruff: Fix RUF022 `__all__` is not sorted. This is a preview rule, not yet enabled by default. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-03-01 09:30:04 -08:00
Anders Kaseorg	223b626256	python: Use urlsplit instead of urlparse. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-12-05 13:03:07 -08:00
Anders Kaseorg	a50eb2e809	mypy: Enable new error explicit-override. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-10-12 12:28:41 -07:00
Anders Kaseorg	50e6cba1af	ruff: Fix UP032 Use f-string instead of `format` call. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-07-19 16:14:59 -07:00
Anders Kaseorg	9db3451333	Remove statsd support. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-04-25 19:58:16 -07:00
Anders Kaseorg	da3cf5ea7a	ruff: Fix RSE102 Unnecessary parentheses on raised exception. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-02-04 16:34:55 -08:00
Anders Kaseorg	a2825e5984	python: Use Python 3.8 typing.{Protocol,TypedDict}. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-04-27 12:57:49 -07:00
Alex Vandiver	56058f3316	caches: Remove unnecessary "in-memory" cache. This cache was added in `da33b72848` to serve as a replacement for the durable database cache, in development; the previous commit has switched that to be the non-durable memcached backend. The special-case for "in-memory" in development is mostly-unnecessary in contrast to memcached -- `./tools/run-dev.py` flushes memcached on every startup. This differs in behaviour slightly, in that if the codepath is changed and `run-dev` restarts Django, the cache is not cleared. This seems an unlikely occurrence, however, and the code cleanup from its removal is worth it.	2022-04-15 14:48:12 -07:00
Alex Vandiver	04ca2e92f7	caches: Cache link preview data in memcached, not in PostgreSQL. The choice to cache these in the database dates back to `c93f1d4eda`, with the comment added in `da33b72848` while working around the durability of the "database" cache in local development. The values were stored in a durable cache, as they needed to be ensured to persist between when they were inserted in `get_link_embed_data` and when they were used in `render_incoming_message` via `link_embed_data_from_cache`. However, database accesses are not fast compared to memcached, and we wish to avoid the overhead of the database connection from the `embed_links` worker. Specifically, making the connection may not be thread-safe -- and in low-memory (and Docker) configurations, all workers run as separate threads in a single process. This can lead to stalled database connections in `embed_links` workers, and failed previews. Since the previous commit made the durability of the cache no longer necessary, this will have minimal effect; at worst, posting the same URL twice, on either side of an upgrade, will result in two preview fetches of it.	2022-04-15 14:48:12 -07:00
Alex Vandiver	351bdfaf78	preview: Use cache only as a non-durable cache, not an IPC. The `get_link_embed_data` / `link_embed_data_from_cache` pair as introduced in `c93f1d4eda` uses the cache as a temporary store inside of the `embed_links` worker; this means that it must be durable storage, or the worker will stall and re-fetch the same links to preview them. Switch to plumbing through the fetched URL embed data as an parameter to the Markdown evaluation which uses them, rather than using the cache as an intermediary. This frees up the cache to be merely a non-durable cache. As a side-effect, this removes get_cache_with_key, and link_embed_data_from_cache which was its only callsite.	2022-04-15 14:48:12 -07:00
Alex Vandiver	327ff9ea0f	preview: Use a dataclass for the embed data. This is significantly cleaner than passing around `Dict[str, Any]` all of the time.	2022-04-15 14:48:12 -07:00
Alex Vandiver	e53f9fad29	url_preview: Only return image URLs that validate as URLs.	2022-02-18 15:32:27 -08:00
Anders Kaseorg	b0ce4f1bce	docs: Fix many spelling mistakes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-07 18:51:06 -08:00
Anders Kaseorg	4922632601	mypy: Add types-beautifulsoup4. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-01-23 23:39:40 -08:00
Anders Kaseorg	4839b7ed27	url_preview: Interpret og:image relative to full page URL. og:image is supposed to be an absolute URL, but some sites incorrectly provide a relative URL. In this case, it makes more sense to interpret it relative to the full page URL after redirects, rather than relative to just the domain part of the page URL before redirects. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-10-21 12:20:37 -07:00
Alex Vandiver	4d428490fd	outgoing_http: Use OutgoingSession subclasses in more places. This adds the X-Smokescreen-Role header to proxy connections, to track usage from various codepaths, and enforces a timeout. Timeouts were kept consistent with their previous values, or set to 5s if they had none previously.	2021-09-01 05:34:13 -07:00
Anders Kaseorg	2939d29b6d	python: Convert deprecated Django smart_text alias to smart_str. django.utils.encoding.smart_text is a deprecated alias of django.utils.encoding.smart_str as of Django 3.0, and will be removed in Django 4.0. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-04-15 18:01:34 -07:00
Anders Kaseorg	9864907985	mypy: Correct typing.re imports to typing. Although typing.re exists in the standard library, mypy has never recognized it. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-03-17 18:41:46 -07:00
Anders Kaseorg	6e4c3e41dc	python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	11741543da	python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
akshatdalton	5f8a10124e	url preview: Update Zulip User-Agent. This commit updates the Zulip User-Agent to 'Mozilla/5.0 (compatible; ZulipURLPreview/{version}; +{external_host})' as the older User-Agent was rendering Markdown YouTube titles as 'YouTube - YouTube'. Fixes #16970.	2021-01-25 14:24:48 -08:00
Anders Kaseorg	bf45f921a7	url_preview: Allow Beautiful Soup to get the charset from <meta>. An HTML document sent without a charset in the Content-Type header needs to be scanned for a charset in <meta> tags. We need to pass bytes instead of str to Beautiful Soup to allow it to do this. Fixes #16843. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-12-15 11:30:57 -08:00
Alex Vandiver	ad8943a64a	url_preview: Only extract img tags with an `src`. Some `<img>` tags do not have an SRC, if they are rewritten using JS to have one later. Attempting to access `first_image['src']` on these will raise an exception, as they have no such attribute. Only look for images which have a defined `src` attribute on them. We could instead check if `first_image.has_attr('src')`, but this seems only likely to produce fewer valid images.	2020-08-18 14:26:21 -04:00
Anders Kaseorg	69c0959f34	python: Fix misuse of Optional types for optional parameters. There seems to have been a confusion between two different uses of the word “optional”: • An optional parameter may be omitted and replaced with a default value. • An Optional type has None as a possible value. Sometimes an optional parameter has a default value of None, or None is otherwise a meaningful value to provide, in which case it makes sense for the optional parameter to have an Optional type. But in other cases, optional parameters should not have Optional type. Fix them. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-13 15:31:27 -07:00
Anders Kaseorg	365fe0b3d5	python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-11 16:45:32 -07:00
Graham Bleaney	461d5b1a3e	pysa: Introduce sanitizers, models, and inline marking safe. This commit adds three `.pysa` model files: `false_positives.pysa` for ruling out false positive flows with `Sanitize` annotations, `req_lib.pysa` for educating pysa about Zulip's `REQ()` pattern for extracting user input, and `redirects.pysa` for capturing the risk of open redirects within Zulip code. Additionally, this commit introduces `mark_sanitized`, an identity function which can be used to selectively clear taint in cases where `Sanitize` models will not work. This commit also puts `mark_sanitized` to work removing known false postive flows.	2020-06-11 12:57:49 -07:00
Puneeth Chaganti	2a65be2bf5	url preview: Use Chrome's user agent instead of a Zulip one. Some sites don't render correctly unless you are one of the latest browsers. YouTube Music, for instance, changes the page title to "Your browser is deprecated, please upgrade.", which makes our URL previews look bad.	2020-04-26 10:16:43 -07:00
Mateusz Mandera	770086f983	url_preview: Discard url in oembed if server returns invalid json. This fixes the scenario where we'd get errors in the FetchLinksEmbedData queue processor if oembed got invalid json from the URL.	2020-04-11 11:54:54 -07:00
Tim Abbott	4901dc3795	url_preview: Fix parsing of open graph tags. Our open graph parser logic sloppily mixed data obtained by parsing open graph properties with trusted data set by our oembed parser. We fix this by consistenly using our explicit whitelist of generic properties (image, title, and description) in both places where we interact with open graph properties. The fixes are redundant with each other, but doing both helps in making the intent of the code clearer. This issue fixed here was originally reported as an XSS vulnerability in the upcoming Inline URL Previews feature found by Graham Bleaney and Ibrahim Mohamed using Pysa. The recent Oembed changes close that vulnerability, but this change is still worth doing to make the implementation do what it looks like it does.	2019-12-12 15:24:38 -08:00
Anders Kaseorg	faa3ea0b8e	oembed: Remove unsound HTML filtering. The frontend now takes care of confining the HTML. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-12-12 15:24:38 -08:00
Tim Abbott	9f223bb7c2	url_preview: Simplify path to oembed code.	2019-12-12 13:34:49 -08:00
Puneeth Chaganti	64c40287f1	url preview: Rename type_ variable to oembed_resource_type.	2019-06-02 14:31:39 -07:00
Puneeth Chaganti	9aa5a2b369	url preview: Use oEmbed html for videos. Ensure that the html is safe, before using it. The html is considered if it is in an iframe with a http/https src, based on the recommendations here: https://oembed.com/#section3 We directly embed the `iframe` html into the lightbox overlay.	2019-05-31 15:59:03 -07:00
Puneeth Chaganti	c8cb785950	url preview: Show inline images as previews for oEmbed photo pages.	2019-05-31 15:59:03 -07:00
Puneeth Chaganti	22d0cd9696	url preview: Don't cache embed data when fetch has network errors.	2019-05-30 16:45:22 -07:00
Puneeth Chaganti	4ac9778d69	url preview: Catch network errors during get for page content. We may be successfully able to get the page once, to get the content type, but the server or network may go down and cause problems when fetching the page for parsing its meta tags.	2019-05-13 13:55:00 -07:00
Puneeth Chaganti	9fd1c40bb1	url preview: Timeout requests after 15 seconds.	2019-05-13 13:54:59 -07:00
Puneeth Chaganti	0b76b16101	url preview: Set a custom user agent for requests. Some sites seem to block the default user agent of the requests library. Using a custom user agent lets us show previews for some of these sites.	2019-05-13 13:54:43 -07:00
Puneeth Chaganti	59555ee7e5	url preview: Confirm content-type before trying to show previews. Currently, we only show previews for URLs which are HTML pages, which could contain other media. We don't show previews for links to non-HTML pages, like pdf documents or audio/video files. To verify that the URL posted is an HTML page, we verify the content-type of the page, either using server headers or by sniffing the content. Closes #8358	2019-05-13 13:45:17 -07:00
Puneeth Chaganti	da33b72848	url preview: Use in-memory caching in dev environment.	2019-05-06 12:37:32 -07:00
Puneeth Chaganti	1f6306a5a7	url preview: Cleanup import ordering.	2019-05-06 12:37:32 -07:00
Puneeth Chaganti	d56b16b275	url preview: Ignore open graph tags without a content attribute.	2019-05-06 12:37:32 -07:00
Puneeth Chaganti	d02eb99831	url preview: Return generic parser <p> text as str (not bs4 string).	2019-05-06 12:37:32 -07:00
Anders Kaseorg	649235cfec	python: Remove unused imports. Signed-off-by: Anders Kaseorg <andersk@mit.edu>	2019-02-22 16:54:36 -08:00
Tim Abbott	a4b294da98	url preview: Remove useless logging.error in open graph code path. As detailed in the comment, someone pasting a broken URL isn't a situation that a server administrator needs to be notified about.	2019-02-05 13:25:47 -08:00
Steve Howell	76deb30312	preview: Hash cache keys for preview urls. We don't want really long urls to lead to truncated keys, or we could theoretically have two different urls get mixed up previews. Also, this suppresses warnings about exceeding the 250 char limit. Finally, this gives the key a proper prefix.	2018-10-14 09:28:57 -07:00
Tim Abbott	4d03c15848	url_preview: Don't import beautifulsoup at import time. This is a small performance optimization to Django startup, in line with other recent commits.	2018-08-08 14:19:42 -07:00

1 2

64 Commits