We raise two types of json_unauthorized when
MissingAuthenticationError is raised. Raising the one
with www_authenticate let's the client know that user needs
to be logged in to access the requested content.
Sending `www_authenticate='session'` header with the response
also stops modern web-browsers from showing a login form to the
user and let's the client handle it completely.
Structurally, this moves the handling of common authentication errors
to a single shared middleware exception handler.
django.security.DisallowedHost is only one of a set of exceptions that
are "SuspiciousOperation" exceptions; all return a 400 to the user
when they bubble up[1]; all of them are uninteresting to Sentry.
While they may, in bulk, show a mis-configuration of some sort of the
application, such a failure should be detected via the increase in
400's, not via these, which are uninteresting individually.
While all of these are subclasses of SuspiciousOperation, we enumerate
them explicitly for a number of reasons:
- There is no one logger we can ignore that captures all of them.
Each of the errors uses its own logger, and django does not supply
a `django.security` logger that all of them feed into.
- Nor can we catch this by examining the exception object. The
SuspiciousOperation exception is raised too early in the stack for
us to catch the exception by way of middleware and check
`isinstance`. But at the Sentry level, in `add_context`, it is no
longer an exception but a log entry, and as such we have no
`isinstance` that can be applied; we only know the logger name.
- Finally, there is the semantic argument that while we have decided
to ignore this set of security warnings, we _may_ wish to log new
ones that may be added at some point in the future. It is better
to opt into those ignores than to blanket ignore all messages from
the security logger.
This moves the DisallowedHost `ignore_logger` to be adjacent to its
kin, and not on the middleware that may trigger it. Consistency is
more important than locality in this case.
Of these, the DisallowedHost logger if left as the only one that is
explicitly ignored in the LOGGING configuration in
`computed_settings.py`; it is by far the most frequent, and the least
likely to be malicious or impactful (unlike, say, RequestDataTooBig).
[1] https://docs.djangoproject.com/en/3.0/ref/exceptions/#suspiciousoperation
It is more suited for `process_request`, since it should stop
execution of the request if the domain is invalid. This code was
likely added as a process_response (in ea39fb2556) because there was
already a process_response at the time (added 7e786d5426, and no
longer necessary since dce6b4a40f).
It quiets an unnecessary warning when logging in at a non-existent
realm.
This stops performing unnecessary work when we are going to throw it
away and return a 404. The edge case to this is if the request
_creates_ a realm, and is made using the URL of the new realm; this
change would prevent the request before it occurs. While this does
arise in tests, the tests do not reflect reality -- real requests to
/accounts/register/ are made via POST to the same (default) realm,
redirected there from `confirm-preregistrationuser`. The tests are
adjusted to reflect real behavior.
Tweaked by tabbott to add a block comment in HostDomainMiddleware.
This commit is first of few commita which aim to change all the
bugdown references to markdown. This commits rename the files,
file path mentions and change the imports.
Variables and other references to bugdown will be renamed in susequent
commits.
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
While this functionality to post slow queries to a Zulip stream was
very useful in the early days of Zulip, when there were only a few
hundred accounts, it's long since been useless since (1) the total
request volume on larger Zulip servers run by Zulip developers, and
(2) other server operators don't want real-time notifications of slow
backend queries. The right structure for this is just a log file.
We get rid of the queue and replace it with a "zulip.slow_queries"
logger, which will still log to /var/log/zulip/slow_queries.log for
ease of access to this information and propagate to the other logging
handlers. Reducing the amount of queues is good for lowering zulip's
memory footprint and restart performance, since we run at least one
dedicated queue worker process for each one in most configurations.
Generated by autopep8, with the setup.cfg configuration from #14532.
I’m not sure why pycodestyle didn’t already flag these.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Since commit 1d72629dc4, we have been
maintaining a patched copy of Django’s
SessionMiddleware.process_response in order to unconditionally ignore
our own optional cookie_domain setting that we don’t set.
Instead, let’s not do that.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
The information used to be stored in a request._ratelimit dict, but
there's no need for that, and a list is a simpler structure, so this
allows us to simplify the plumbing somewhat.
That's the value that matters to the code that catches the exception,
and this change allows simplifying the plumbing somewhat, and gets rid
of the get_rate_limit_result_from_request function.
This returns us to a consistent logging format regardless of whether
the request is authenticated.
We also update some log examples in docs to be consistent with the new
style.
Instead of trying to set the _requestor_for_logs attribute in all the
relevant places, we try to use request.user when possible (that will be
when it's a UserProfile or RemoteZulipServer as of now). In other
places, we set _requestor_for_logs to avoid manually editing the
request.user attribute, as it should mostly be left for Django to manage
it.
In places where we remove the "request._requestor_for_logs = ..." line,
it is clearly implied by the previous code (or the current surrounding
code) that request.user is of the correct type.
Profiling suggests this saves about 600us in the runtime of every GET
/events request attempting to resolve URLs to determine whether we
need to do the APPEND_SLASH behavior.
It's possible that we end up doing the same URL resolution work later
and we're just moving around some runtime, but I think even if we do,
Django probably doesn't do any fancy caching that would mean doing
this query twice doesn't just do twice the work.
In any case, we probably want to extend this behavior to our whole API
because the APPEND_SLASH redirect behavior is essentially a bug there.
That is a more involved refactor, however.
Django 2.2.x is the next LTS release after Django 1.11.x; I expect
we'll be on it for a while, as Django 3.x won't have an LTS release
series out for a while.
Because of upstream API changes in Django, this commit includes
several changes beyond requirements and:
* urls: django.urls.resolvers.RegexURLPattern has been replaced by
django.urls.resolvers.URLPattern; affects OpenAPI code and related
features which re-parse Django's internals.
https://code.djangoproject.com/ticket/28593
* test_runner: Change number to suffix. Django changed the name in this
ticket: https://code.djangoproject.com/ticket/28578
* Delete now-unnecessary SameSite cookie code (it's now the default).
* forms: urlsafe_base64_encode returns string in Django 2.2.
https://docs.djangoproject.com/en/2.2/ref/utils/#django.utils.http.urlsafe_base64_encode
* upload: Django's File.size property replaces _get_size().
https://docs.djangoproject.com/en/2.2/_modules/django/core/files/base/
* process_queue: Migrate to new autoreload API.
* test_messages: Add an extra query caused by .refresh_from_db() losing
the .select_related() on the Realm object.
* session: Sync SessionHostDomainMiddleware with Django 2.2.
There's a lot more we can do to take advantage of the new release;
this is tracked in #11341.
Many changes by Tim Abbott, Umair Waheed, and Mateusz Mandera squashed
are squashed into this commit.
Fixes#10835.
Since essentially the first use of Tornado in Zulip, we've been
maintaining our Tornado+Django system, AsyncDjangoHandler, with
several hundred lines of Django code copied into it.
The goal for that code was simple: We wanted a way to use our Django
middleware (for code sharing reasons) inside a Tornado process (since
we wanted to use Tornado for our async events system).
As part of the Django 2.2.x upgrade, I looked at upgrading this
implementation to be based off modern Django, and it's definitely
possible to do that:
* Continue forking load_middleware to save response middleware.
* Continue manually running the Django response middleware.
* Continue working out a hack involving copying all of _get_response
to change a couple lines allowing us our Tornado code to not
actually return the Django HttpResponse so we can long-poll. The
previous hack of returning None stopped being viable with the Django 2.2
MiddlewareMixin.__call__ implementation.
But I decided to take this opportunity to look at trying to avoid
copying material Django code, and there is a way to do it:
* Replace RespondAsynchronously with a response.asynchronous attribute
on the HttpResponse; this allows Django to run its normal plumbing
happily in a way that should be stable over time, and then we
proceed to discard the response inside the Tornado `get()` method to
implement long-polling. (Better yet might be raising an
exception?). This lets us eliminate maintaining a patched copy of
_get_response.
* Removing the @asynchronous decorator, which didn't add anything now
that we only have one API endpoint backend (with two frontend call
points) that could call into this. Combined with the last bullet,
this lets us remove a significant hack from our
never_cache_responses function.
* Calling the normal Django `get_response` method from zulip_finish
after creating a duplicate request to process, rather than writing
totally custom code to do that. This lets us eliminate maintaining
a patched copy of Django's load_middleware.
* Adding detailed comments explaining how this is supposed to work,
what problems we encounter, and how we solve various problems, which
is critical to being able to modify this code in the future.
A key advantage of these changes is that the exact same code should
work on Django 1.11, Django 2.2, and Django 3.x, because we're no
longer copying large blocks of core Django code and thus should be
much less vulnerable to refactors.
There may be a modest performance downside, in that we now run both
request and response middleware twice when longpolling (once for the
request we discard). We may be able to avoid the expensive part of
it, Zulip's own request/response middleware, with a bit of additional
custom code to save work for requests where we're planning to discard
the response. Profiling will be important to understanding what's
worth doing here.
We will want to raise RateLimited in authenticate() in rate limiting
code - Django's authenticate() mechanism catches PermissionDenied, which
we don't want for RateLimited. We want RateLimited to propagate to our
code that called the authenticate() function.
As more types of rate limiting of requests are added, one request may
end up having various limits applied to it - and the middleware needs to
be able to handle that. We implement that through a set_response_headers
function, which sets the X-RateLimit-* headers in a sensible way based
on all the limits that were applied to the request.
We were seeing errors when pubishing typical events in the form of
`Dict[str, Any]` as the expected type to be a `Union`. So we instead
change the only non-dictionary call, to pass a dict instead of `str`.
This makes the implementation of `get_realm` consistent with its
declared return type of `Realm` rather than `Optional[Realm]`.
Fixes#12263.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
The entire idea of doing this operation with unchecked string
replacement in a middleware class is in my opinion extremely
ill-conceived, but this fixes the most pressing problem with it
generating invalid HTML.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Apparently, our invalid realm error page had HTTP status 200, which
could be confusing and in particular broken our mobile app's error
handling for this case.
This saves about 8% of the runtime of our total response middleware,
or equivalently close to 2% of the total Tornado response time. Which
is pretty significant given that we're not sure anyone is using statsd
in production.
It's also useful outside Tornado, but the effect is particularly
significant because of how important Tornado performance is.
This avoids parsing these functions on every request, which was
adding roughly 350us to our per-request response times.
The overall impact was more than 10% of basic Tornado response
runtime.
Our html collects extra spaces in a couple of places. The most prominent is
paragraphs that look like the following in the .md file:
* some text
continued
The html will have two spaces before "continued".
Refactor the potentially expensive work done by Beautiful Soup into a
function that is called by the alter_content function, so that we can
cache the result. Saves a significant portion of the runtime of
loading of all of our /help/ and /api/ documentation pages (e.g. 12ms
for /api).
Fixes#11088.
Tweaked by tabbott to use the URL path as the cache key, clean up
argument structure, and use a clearer name for the function.
This is somewhat hacky, in that in order to do what we're doing, we
need to parse the HTML of the rendered page to extract the first
paragraph to include in the open graph description field. But
BeautifulSoup does a good job of it.
This carries a nontrivial performance penalty for loading these pages,
but overall /help/ is a low-traffic site compared to the main app, so
it doesn't matter much.
(As a sidenote, it wouldn't be a bad idea to cache this stuff).
There's lots of things we can improve in this, largely through editing
the articles, but we can deal with that over time.
Thanks to Rishi for writing all the tests.
Until we resolve https://github.com/zulip/zulip/issues/10832, we will
need to maintain our own forked copy of Django's SessionMiddleware.
We apparently let this get out of date.
This fixes a few subtle bugs involving the user logout experience that
were throwing occasional exceptions (e.g. the UpdateError fix you can
see).
Apparently, we weren't resetting the query counters inside the
websockets codebase, resulting in broken log results like this:
SOCKET 403 2ms (db: 1ms/2q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 5ms (db: 2ms/3q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 3ms/4q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 3ms/5q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 4ms/6q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 5ms/7q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 5ms/8q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 3ms (db: 6ms/9q) /socket/auth [transport=websocket] (unknown via ?)
The correct fix for this is to call reset_queries at the start of each
endpoint within the websockets system. As it turns out, we're already
calling record_request_start_data there, and in fact should be calling
`reset_queries` in all code paths that use that function (the other
code paths, in zerver/middleware.py, do it manually with
connection.connection.queries = []).
So we can clean up the code in a way that reduces risk for similar
future issues and fix this logging bug with this simple refactor.
Previously, these timer accounting functions could be easily mistaken
for referring to starting/stopping the request. By adding timer to
the name, we make the code easier for the casual observer to read and
understand.
When our code raises an exception and Django converts it to a 500
response (in django.core.handlers.exception.handle_uncaught_exception),
it attaches the request to the log record, and we use this in our
AdminNotifyHandler to include data like the user and the URL path
in the error email sent to admins.
On this line, when our code raises an exception but we've decided (in
`TagRequests`) to format any errors as JSON errors, we suppress the
exception so we have to generate the log record ourselves. Attach the
request here, just like Django does when we let it do the job.
This still isn't an awesome solution, in that there are lots of other
places where we call `logging.error` or `logging.exception` while
inside a request; this just covers one of them. This is one of the
most common, though, so it's a start.
Originally this used signals, namely SIGRTMIN. But in prod, the
signal handler never fired; I debugged fruitlessly for a while, and
suspect uwsgi was foiling it in a mysterious way (which is kind of
the only way uwsgi does anything.)
So, we listen on a socket. Bit more code, and a bit trickier to
invoke, but it works.
This was developed for the investigation of memory-bloating on
chat.zulip.org that led to a331b4f64 "Optimize query_all_subs_by_stream()".
For usage instructions, see docstring.
These are just instances that jumped out at me while working on the
subdomains code, mostly while grepping for get_subdomain call sites.
I haven't attempted a comprehensive search, and there are likely
still others left.
We originally wrote this because when testing subdomains, you wanted
to be sure you were actually testing subdomains. Now that subdomains
is the default, doesn't seem to actually be a good reason why we
should need this.
Previously, this would always send one to homepage, making visiting
the /help/ documentation in the development environment using the
localhost URL unpleasant.
While this fixes the proximal bug, it's not clear to me that we need
this redirect logic at all, so I'm going to try removing it soon.
This allows us to reliably parse the error in code, rather than
attempt to parse the error text. Because the error text gets
translated into the user's language, this error-handling path
wasn't functioning at all for users using Zulip in any of the
seven non-English languages for which we had a translation for
this string.
Together with 709c3b50f which fixed a similar issue in a
different error-handling path, this fixes#5598.
This provides the main infrastructure for fixing #5598. From here,
it's a matter of on the one hand upgrading exception handlers -- the
many except-blocks in the codebase that look for JsonableError -- to
look beyond the string `msg` and pass on the machine-readable full
error information to their various downstream recipients, and on the
other hand adjusting places where we raise errors to take advantage
of this mechanism to give the errors structured details.
In an ideal future, I think all exception handlers that look (or
should look) for a JsonableError would use its contents in structured
form, never mentioning `msg`; but the majority of error sites might
continue to just instantiate JsonableError with a string message. The
latter is the simplest thing to do, and probably most error types will
never have code looking for them specifically.
Because the new API refactors the `to_json_error_msg` method which was
designed for subclasses to override, update the 4 subclasses that did
so to take full advantage of the new API instead.
With #5598 there will soon be an application-level error code
optionally associated with a `JsonableError`, so rename this
field to make clear that it specifically refers to an
HTTP status code.
Also take this opportunity to eliminate most of the places
that refer to it, which only do so to repeat the default value.
In order to benefit from the modern conveniences of type-checking,
add concrete, non-Any types to the interface for JsonableError.
Relatedly, there's no need at this point to duck-type things at
the places where we receive a JsonableError and try to use it.
Simplify those by using straightforward standard typing.
This adds helpful email notifications for users who just logged into a
Zulip server, as a security protection against accounts being hacked.
Text tweaked by tabbott.
Fixes#2182.
Django 1.10 has changed the implementation of this function to
match our custom implementation; in addition to this, we prefer
render().
Fixes#1914 via #4093.
Finishes the refactoring started in c1bbd8d. The goal of the refactoring is
to change the argument to get_realm from a Realm.domain to a
Realm.string_id. The steps were
* Add a new function, get_realm_by_string_id.
* Change all calls to get_realm to use get_realm_by_string_id instead.
* Remove get_realm.
* (This commit) Rename get_realm_by_string_id to get_realm.
Part of a larger migration to remove the Realm.domain field entirely.