This endpoint verifies that the services that Zulip needs to function
are running, and Django can talk to them. It is designed to be used
as a readiness probe[^1] for Zulip, either by Kubernetes, or some other
reverse-proxy load-balancer in front of Zulip. Because of this, it
limits access to only localhost and the IP addresses of configured
reverse proxies.
Tests are limited because we cannot stop running services (which would
impact other concurrent tests) and there would be extremely limited
utility to mocking the very specific methods we're calling to raising
the exceptions that we're looking for.
[^1]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
This is common in cases where the reverse proxy itself is making
health-check requests to the Zulip server; these requests have no
X-Forwarded-* headers, so would normally hit the error case of
"request through the proxy, but no X-Forwarded-Proto header".
Add an additional special-case for when the request's originating IP
address is resolved to be the reverse proxy itself; in these cases,
HTTP requests with no X-Forwarded-Proto are acceptable.
Previously (with ERROR_REPORTING = True), we’d stuff the entire
traceback of the initial exception into the subject line of an error
email, and then also send a separate email for the JSON 500 response.
Instead, log one error with the standard Django format.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
In servers with `application_server.http_only = true` and
`loadbalancer.ips` set, the DetectProxyMisconfiguration middleware
prevents access over HTTP from IP addresses other than the
loadbalancer.
However, this misses the case of access from localhost over HTTP,
which is safe and expected -- for instance, the `email-mirror-postfix`
script used in the email gateway[^1] will post to `http://localhost/`
by default in such configurations. With the
DetectProxyMisconfiguration installed, this will result in a 403
response.
Make an exception for requests from `127.0.0.1` and `::1` from
proxy-misconfiguration rejections.
[^1]: https://zulip.readthedocs.io/en/latest/production/email-gateway.html
We have historically cached two types of values
on a per-request basis inside of memory:
* linkifiers
* display recipients
Both of these caches were hand-written, and they
both actually cache values that are also in memcached,
so the per-request cache essentially only saves us
from a few memcached hits.
I think the linkifier per-request cache is a necessary
evil. It's an important part of message rendering, and
it's not super easy to structure the code to just get
a single value up front and pass it down the stack.
I'm not so sure we even need the display recipient
per-request cache any more, as we are generally pretty
smart now about hydrating recipient data in terms of
how the code is organized. But I haven't done thorough
research on that hypotheseis.
Fortunately, it's not rocket science to just write
a glorified memoize decorator and tie it into key
places in the code:
* middleware
* tests (e.g. asserting db counts)
* queue processors
That's what I did in this commit.
This commit definitely reduces the amount of code
to maintain. I think it also gets us closer to
possibly phasing out this whole technique, but that
effort is beyond the scope of this PR. We could
add some instrumentation to the decorator to see
how often we get a non-trivial number of saved
round trips to memcached.
Note that when we flush linkifiers, we just use
a big hammer and flush the entire per-request
cache for linkifiers, since there is only ever
one realm in the cache.
Pass the HttpRequest explicitly through the two webhooks that log to
the webhook loggers.
get_current_request is now unused, so remove it (in the same commit
for test coverage reasons).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Combine nginx and Django middlware to stop putting misleading warnings
about `CSRF_TRUSTED_ORIGINS` when the issue is untrusted proxies.
This attempts to, in the error logs, diagnose and suggest next steps
to fix common proxy misconfigurations.
See also #24599 and zulip/docker-zulip#403.
Having exactly 17 or 18 middlewares, on Python 3.11.0 and above,
causes python to segfault when running tests with coverage; see
https://github.com/python/cpython/issues/106092
Work around this by adding one or two no-op middlewares if we would
hit those unlucky numbers. We only add them in testing, since
coverage is a requirement to trigger it, and there is no reason to
burden production with additional wrapping.
streaming_content is an iterator. Consuming it within middleware
prevents it from being sent to the browser.
https://docs.djangoproject.com/en/4.2/ref/request-response/#streaminghttpresponse-objects
“The StreamingHttpResponse … has no content attribute. Instead, it has
a streaming_content attribute. This can be used in middleware to wrap
the response iterable, but should not be consumed.”
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
These were useful as a transitional workaround to ignore type errors
that only show up with django-stubs, while avoiding errors about
unused type: ignore comments without django-stubs. Now that the
django-stubs transition is complete, switch to type: ignore comments
so that mypy will tell us if they become unnecessary. Many already
have.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
zerver/migrations/0240_usermessage_migrate_bigint_id_into_id.py needs
to be updated to account for Django 4.1 creating AutoField as an
identity column rather than a serial column.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This removes everything from SCIMClient except the "is_authenticated`
method. Previously, "realm" and "name" were only needed for logging
purposes. It is the best to keep SCIMClient as minimal as possible, as
it is only intended to be used for authenticating requests to SCIM
views.
This change also gurantees that the "LogRequests" middleware will not
rely on the type unsafe access of the format_requestor_for_logs method
on SCIMClient.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This is a type-unsafe workaround before we can fix the problem that
django_scim2 relies on request.user being present to authenticate
requests.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
SCIMClient is a type-unsafe workaround for django-scim2’s conflation
of SCIM users with Django users. Given that a SCIMClient is not a
UserProfile, it might as well not be a model at all, since it’s only
used to satisfy django-scim2’s request.user.is_authenticated queries.
This doesn’t solve the type safety issue with assigning a SCIMClient
to request.user, nor the performance issue with running the SCIM
middleware on non-SCIM requests. But it reduces the risk of potential
consequences worse than crashing, since there’s no longer a
request.user.id for Django to confuse with the ID of an actual
UserProfile.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
django.request logs responses with 5xx response codes (our configuration
of the logger prevents it from logging 4xx as well which it normally
does too). However, it does it without the traceback which results in
quite unhelpful log message that look like
"Bad Gateway:/api/v1/users/me/apns_device_token" - particularly
confusing when sent via email to server admins.
The solution here is to do the logging ourselves, using Django's
log_response() (which is meant for this purpose), and including the
traceback. Django tracks (via response._has_been_logged attribute) that
the response has already been logged, and knows to not duplicate that
action. See log_response() in django's codebase for these details.
Fixes#19596.
This removes ViewFuncT and all the associated type casts with ParamSpec
and Concatenate. This provides more accurate type annotation for
decorators at the cost of making the concatenated parameters
positional-only. This change does not intend to introduce any other
behavioral difference. Note that we retype args in process_view as
List[object] because the view functions can not only be called with
arguments of type str.
Note that the first argument of rest_dispatch needs to be made
positional-only because of the presence of **kwargs.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This eliminates the possibility of having `request.user` as
`RemoteZulipServer` by refactoring it as an attribute of `RequestNotes`.
So we can effectively narrow the type of `request.user` by testing
`user.is_authenticated` in most cases (except that of `SCIMClient`) in
code paths that require access to `.format_requestor_for_logs` where we
previously expect either `UserProfile` or `RemoteZulipServer` backed by
the implied polymorphism.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
StreamingHttpResponse is inferred without the isinstance check in the
else branch. We refactor this is shorten the code and also type narrow
it appropriately.
`request.method` is not `None` in normal use cases, unless an
`HttpRequest` is directly instantiated without the method being set.
This situation does not apply to `WSGIRequest` at all.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Asserting response.stream is False is just suggesting the response being
an `HttpResponse`. This removes `StreamingHttpResponse` with the more
generic `HttpResponseBase` with an isinstance-check.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Similar to the previous commit, we should access request.user only
after it has been initialized, rather than having awkward hasattr
checks.
With updates to the settings comments about LogRequests by tabbott.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
If an API request specified a `client` parameter, we were
already prioritizing that value over parsing the UserAgent.
In order to have these parameters logged in the `RequestNotes`
as processed parameters instead of ignored parameters, we add
the `has_request_variables` decorator to `parse_client` and
then process the potential `client` parameter through the REQ
framework.
Co-authored by: Tim Abbott <tabbott@zulip.com>
Requests to the root subdomain weren't getting request_notes.realm set
even if a realm exists on the root subdomain - which is actually a
common scenario, because simply having one organization, on the root
subdomain, is the simplest and common way for self-hosted deployments.
SOCIAL_AUTH_SUBDOMAIN was potentially very confusing when opened by a
user, as it had various Login/Signup buttons as if there was a realm on
it. Instead, we want to display a more informative page to the user
telling them they shouldn't even be there. If possible, we just redirect
them to the realm they most likely came from.
To make this possible, we have to exclude the subdomain from
ROOT_SUBDOMAIN_ALIASES - so that we can give it special behavior.