mirror of https://github.com/zulip/zulip.git
uploads: Extra-escape internal S3 paths.
In nginx, `location` blocks operate on the _decoded_ URI[^1]: > The matching is performed against a normalized URI, after decoding > the text encoded in the “%XX” form This means that if a user-uploaded file contains characters that are not URI-safe, the browser encodes them in UTF-8 and then URI-encodes them -- and nginx decodes them and reassembles the original character before running the `location ~ ^/...` match. This means that the `$2` _is not URI-encoded_ and _may contain non-ASCII characters. When `proxy_pass` is passed a value containing one or more variables, it does no encoding on that expanded value, assuming that the bytes are exactly as they should be passed to the upstream. This means that directly calling `proxy_pass https://$1/$2` would result in sending high-bit characters to the S3 upstream, which would rightly balk. However, a longstanding bug in nginx's `set` directive[^2] means that the following line: ```nginx set $download_url https://$1/$2; ``` ...results in nginx accidentally URI-encoding $1 and $2 when they are inserted, resulting in a `$download_url` which is suitable to pass to `proxy_pass`. This bug is only present with numeric capture variables, not named captures; this is particularly relevant because numeric captures are easily overridden by additional regexes elsewhere, as subsequent commits will add. Fixing this is complicated; nginx does not supply any way to escape values[^3], besides a third-party module[^4] which is an undue complication to begin using. The only variable which nginx exposes which is _not_ un-escaped already is `$request_uri`, which contains the very original URL sent by the browser -- and thus can't respect any work done in Django to generate the `X-Accel-Redirect` (e.g., for `/user_uploads/temporary/` URLs). We also cannot pass these URLs to nginx via query-parameters, since `$arg_foo` values are not URI-decoded by nginx, there is no function to do so[^3], and the values must be URI-encoded because they themselves are URLs with query parameters. Extra-URI-encode the path that we pass to the `X-Accel-Redirect` location, for S3 redirects. We rely on the `location` block un-escaping that layer, leaving `$s3_hostname` and `$s3_path` as they were intended in Django. This works around the nginx bug, with no behaviour change. [^1]: http://nginx.org/en/docs/http/ngx_http_core_module.html#location [^2]: https://trac.nginx.org/nginx/ticket/348 [^3]: https://trac.nginx.org/nginx/ticket/52 [^4]: https://github.com/openresty/set-misc-nginx-module#set_escape_uri
This commit is contained in:
parent
a955f52904
commit
d41a00b83b
|
@ -1,10 +1,15 @@
|
|||
# Handle redirects to S3
|
||||
location ~ ^/internal/s3/([^/]+)/(.*) {
|
||||
location ~ ^/internal/s3/(?<s3_hostname>[^/]+)/(?<s3_path>.*) {
|
||||
internal;
|
||||
include /etc/nginx/zulip-include/headers;
|
||||
|
||||
set $download_url https://$1/$2;
|
||||
proxy_set_header Host $1;
|
||||
# The components of this path are originally double-URI-escaped
|
||||
# (see zerver/view/upload.py). "location" matches are on
|
||||
# unescaped values, which fills $s3_path with a properly
|
||||
# single-escaped path to pass to the upstream server.
|
||||
# (see associated commit message for more details)
|
||||
set $download_url https://$s3_hostname/$s3_path;
|
||||
proxy_set_header Host $s3_hostname;
|
||||
|
||||
# Ensure that we only get _one_ of these headers: the one that
|
||||
# Django added, not the one from S3.
|
||||
|
|
|
@ -99,7 +99,14 @@ def serve_s3(request: HttpRequest, path_id: str, download: bool = False) -> Http
|
|||
# this is acceptable in development.
|
||||
return redirect(url)
|
||||
|
||||
response = internal_nginx_redirect("/internal/s3/" + url[len("https://") :])
|
||||
# We over-escape the path, to work around it being impossible to
|
||||
# get the _unescaped_ new internal request URI in nginx.
|
||||
parsed_url = urlparse(url)
|
||||
assert parsed_url.hostname is not None
|
||||
assert parsed_url.path is not None
|
||||
assert parsed_url.query is not None
|
||||
escaped_path_parts = parsed_url.hostname + quote(parsed_url.path) + "?" + parsed_url.query
|
||||
response = internal_nginx_redirect("/internal/s3/" + escaped_path_parts)
|
||||
patch_disposition_header(response, path_id, download)
|
||||
patch_cache_control(response, private=True, immutable=True)
|
||||
return response
|
||||
|
|
Loading…
Reference in New Issue