Apparently, our invalid realm error page had HTTP status 200, which
could be confusing and in particular broken our mobile app's error
handling for this case.
This saves about 8% of the runtime of our total response middleware,
or equivalently close to 2% of the total Tornado response time. Which
is pretty significant given that we're not sure anyone is using statsd
in production.
It's also useful outside Tornado, but the effect is particularly
significant because of how important Tornado performance is.
This avoids parsing these functions on every request, which was
adding roughly 350us to our per-request response times.
The overall impact was more than 10% of basic Tornado response
runtime.
Our html collects extra spaces in a couple of places. The most prominent is
paragraphs that look like the following in the .md file:
* some text
continued
The html will have two spaces before "continued".
Refactor the potentially expensive work done by Beautiful Soup into a
function that is called by the alter_content function, so that we can
cache the result. Saves a significant portion of the runtime of
loading of all of our /help/ and /api/ documentation pages (e.g. 12ms
for /api).
Fixes#11088.
Tweaked by tabbott to use the URL path as the cache key, clean up
argument structure, and use a clearer name for the function.
This is somewhat hacky, in that in order to do what we're doing, we
need to parse the HTML of the rendered page to extract the first
paragraph to include in the open graph description field. But
BeautifulSoup does a good job of it.
This carries a nontrivial performance penalty for loading these pages,
but overall /help/ is a low-traffic site compared to the main app, so
it doesn't matter much.
(As a sidenote, it wouldn't be a bad idea to cache this stuff).
There's lots of things we can improve in this, largely through editing
the articles, but we can deal with that over time.
Thanks to Rishi for writing all the tests.
Until we resolve https://github.com/zulip/zulip/issues/10832, we will
need to maintain our own forked copy of Django's SessionMiddleware.
We apparently let this get out of date.
This fixes a few subtle bugs involving the user logout experience that
were throwing occasional exceptions (e.g. the UpdateError fix you can
see).
Apparently, we weren't resetting the query counters inside the
websockets codebase, resulting in broken log results like this:
SOCKET 403 2ms (db: 1ms/2q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 5ms (db: 2ms/3q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 3ms/4q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 3ms/5q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 4ms/6q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 5ms/7q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 5ms/8q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 3ms (db: 6ms/9q) /socket/auth [transport=websocket] (unknown via ?)
The correct fix for this is to call reset_queries at the start of each
endpoint within the websockets system. As it turns out, we're already
calling record_request_start_data there, and in fact should be calling
`reset_queries` in all code paths that use that function (the other
code paths, in zerver/middleware.py, do it manually with
connection.connection.queries = []).
So we can clean up the code in a way that reduces risk for similar
future issues and fix this logging bug with this simple refactor.
Previously, these timer accounting functions could be easily mistaken
for referring to starting/stopping the request. By adding timer to
the name, we make the code easier for the casual observer to read and
understand.
When our code raises an exception and Django converts it to a 500
response (in django.core.handlers.exception.handle_uncaught_exception),
it attaches the request to the log record, and we use this in our
AdminNotifyHandler to include data like the user and the URL path
in the error email sent to admins.
On this line, when our code raises an exception but we've decided (in
`TagRequests`) to format any errors as JSON errors, we suppress the
exception so we have to generate the log record ourselves. Attach the
request here, just like Django does when we let it do the job.
This still isn't an awesome solution, in that there are lots of other
places where we call `logging.error` or `logging.exception` while
inside a request; this just covers one of them. This is one of the
most common, though, so it's a start.
Originally this used signals, namely SIGRTMIN. But in prod, the
signal handler never fired; I debugged fruitlessly for a while, and
suspect uwsgi was foiling it in a mysterious way (which is kind of
the only way uwsgi does anything.)
So, we listen on a socket. Bit more code, and a bit trickier to
invoke, but it works.
This was developed for the investigation of memory-bloating on
chat.zulip.org that led to a331b4f64 "Optimize query_all_subs_by_stream()".
For usage instructions, see docstring.
These are just instances that jumped out at me while working on the
subdomains code, mostly while grepping for get_subdomain call sites.
I haven't attempted a comprehensive search, and there are likely
still others left.
We originally wrote this because when testing subdomains, you wanted
to be sure you were actually testing subdomains. Now that subdomains
is the default, doesn't seem to actually be a good reason why we
should need this.
Previously, this would always send one to homepage, making visiting
the /help/ documentation in the development environment using the
localhost URL unpleasant.
While this fixes the proximal bug, it's not clear to me that we need
this redirect logic at all, so I'm going to try removing it soon.
This allows us to reliably parse the error in code, rather than
attempt to parse the error text. Because the error text gets
translated into the user's language, this error-handling path
wasn't functioning at all for users using Zulip in any of the
seven non-English languages for which we had a translation for
this string.
Together with 709c3b50f which fixed a similar issue in a
different error-handling path, this fixes#5598.
This provides the main infrastructure for fixing #5598. From here,
it's a matter of on the one hand upgrading exception handlers -- the
many except-blocks in the codebase that look for JsonableError -- to
look beyond the string `msg` and pass on the machine-readable full
error information to their various downstream recipients, and on the
other hand adjusting places where we raise errors to take advantage
of this mechanism to give the errors structured details.
In an ideal future, I think all exception handlers that look (or
should look) for a JsonableError would use its contents in structured
form, never mentioning `msg`; but the majority of error sites might
continue to just instantiate JsonableError with a string message. The
latter is the simplest thing to do, and probably most error types will
never have code looking for them specifically.
Because the new API refactors the `to_json_error_msg` method which was
designed for subclasses to override, update the 4 subclasses that did
so to take full advantage of the new API instead.
With #5598 there will soon be an application-level error code
optionally associated with a `JsonableError`, so rename this
field to make clear that it specifically refers to an
HTTP status code.
Also take this opportunity to eliminate most of the places
that refer to it, which only do so to repeat the default value.
In order to benefit from the modern conveniences of type-checking,
add concrete, non-Any types to the interface for JsonableError.
Relatedly, there's no need at this point to duck-type things at
the places where we receive a JsonableError and try to use it.
Simplify those by using straightforward standard typing.
This adds helpful email notifications for users who just logged into a
Zulip server, as a security protection against accounts being hacked.
Text tweaked by tabbott.
Fixes#2182.
Django 1.10 has changed the implementation of this function to
match our custom implementation; in addition to this, we prefer
render().
Fixes#1914 via #4093.
Finishes the refactoring started in c1bbd8d. The goal of the refactoring is
to change the argument to get_realm from a Realm.domain to a
Realm.string_id. The steps were
* Add a new function, get_realm_by_string_id.
* Change all calls to get_realm to use get_realm_by_string_id instead.
* Remove get_realm.
* (This commit) Rename get_realm_by_string_id to get_realm.
Part of a larger migration to remove the Realm.domain field entirely.
This adds support for running a Zulip production server with each
realm on its own unique subdomain, e.g. https://realm_name.example.com.
This patch includes a ton of important features:
* Configuring the Zulip sesion middleware to issue cookier correctly
for the subdomains case.
* Throwing an error if the user tries to visit an invalid subdomain.
* Runs a portion of the Casper tests with REALMS_HAVE_SUBDOMAINS
enabled to test the subdomain signup process.
* Updating our integrations documentation to refer to the current subdomain.
* Enforces that users can only login to the subdomain of their realm
(but does not restrict the API; that will be tightened in a future commit).
Note that toggling settings.REALMS_HAVE_SUBDOMAINS on a live server is
not supported without manual intervention (the main problem will be
adding "subdomain" values for all the existing realms).
[substantially modified by tabbott as part of merging]
In write_log_line, error_content can be binary_type and
error_content_iter can be a Sequence of binary_type. Handle
this this in a python 3 compatible way. Also change annotations
to reflect this fact.
This will make it slightly easier to consume the data from our clients.
Ref:
RFC 6585 §4
(imported from commit 6d323dc25db78a6d84a163add950f039e03e73d3)
Previously, we counted not just the time required to process a
particular request, but also the time required to import+find the view
function via urls.py. The latter is usually fast, but when a new
Django thread starts up, it can take significant time, resulting in us
flagging slow requests on each server restart and also when a new
Django thread starts up on prod to handle requests.
This change means that we no longer include that startup time as part
of request processing time -- but we still log it in the case that it
was more than 5ms, so that we can identify why a particular request
was slower than expected if high startup times become a problem. We
annotate the time in log lines as "+start" rather than just "start" to
make clear that it's not counted in the total.
(imported from commit c677682e23b26005060390d85d386234f4f463ad)
This is useful for the occasional case where we cannot figure out what
is causing a particular problem, but it can be easily reproduced on
staging.
(imported from commit 8b51184a8b686814f2c6ff103ba355538463ceb0)
We also now separate out the times for the socket overhead, the
request service time, and the queuing delays.
(imported from commit e1683f7f28b968b86ebb701b0ac29b00ac6d67c3)
I believe with this change the log lines will fit much better into
Zulip, and the Client string was I suspect rarely important for
responding to slow queries (and is always available in the main log
anyway).
(imported from commit ad56f446bf3fb96a14a56b825f46c1dad9b6babe)
This cache was created to make recipient lookups within a single
request (e.g. when fetching old messages) cheaper. To support stream
name changes, we need to invalidate this cache on every request so
that users get a consistent view of the name change.
(imported from commit 801051b9f6a108c1f50be7eca9a1242d661919b1)
This includes a hack to preserve humbug/backends.py as a symlink, so
that we don't need to regenerate all our old sessions.
(imported from commit b7918988b31c71ec01bbdc270db7017d4069221d)
This needs to be deployed to both staging and prod at the same
off-peak time (and the schema migration run).
At the time it is deployed, we need to make a few changes directly in
the database:
(1) UPDATE django_content_type set app_label='zerver' where app_label='zephyr';
(2) UPDATE south_migrationhistory set app_name='zerver' where app_name='zephyr';
(imported from commit eb3fd719571740189514ef0b884738cb30df1320)