We have a very useful piece of code, _RateLimitFilter, which is
designed to avoid sending us a billion error emails in the event that
a Zulip production server is down in a way that throws the same
exception a lot. The code uses memcached to ensure we send each
traceback roughly once per Zulip server per 10 minutes (or if
memcached is unavailable, at most 1/process/10 minutes, since we use
memcached to coordinate between processes)
However, if memcached is down, there is a logging.error call internal
to the Django/memcached setup that happens inside the cache.set() call,
and those aren't caught by the `except Exception` block around it.
This ends up resulting in infinite recursion, eventually leading to
Fatal Python error: Cannot recover from stack overflow., since this
handler is configured to run for logging.error in addition to
logging.exception.
We fix this using a thread-local variable to detect whether we are
being called recursively.
This change should prevent some nasty failure modes we've had in the
past where memcached being down resulted in infinite recursion
(resulting in extra resources being consumed by our error
notifications code, and most importantly, the error notifications not
being sent).
Fixes#12595.
This gives us access to typing_extensions.Deque, which was not added
to typing until 3.5.4.
(PROVISION_VERSION is not bumped because the transitive dependency set
in dev.txt hasn’t changed.)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
For some webhook endpoints where the third-party API requires us to do
this, the user's API key might appear in error emails through
appearing in the `QUERY_STRING` parameter. Fix that by filtering any
actual content from those; what we usually need for debugging is just
what set of parameters were provided.
This cuts out about 11 calls to `git describe`. In a nice fast LXC
container following our instructions for development on a Linux host,
this might save "only" about 1.5s; in a dev environment on a Windows
host, the savings have been clocked at 49s, presumably due to an
extremely slow filesystem in the VM.
The tests weren't doing much with this codepath as they were, and
there isn't a lot of value to be gained by testing it anyway; it's
totally non-critical and rarely changes.
[Commit message rewritten by greg.]
This diff is nothing but dedentation -- it's empty under
`git diff -b`. These with-statements are only needed for
a pretty narrow scope of code, so make that clear in the
source.
There are two different things you need to patch in order to get error
emails (at `/emails`) in dev. Flip one of them in dev all the time,
and make the comment on the other a bit more explicit.
This helps prevent them from diverging and getting different sets of
features and fixes. As a bonus, the email path gets a nice tweak that
the Zulip path has had for years, since f7f2ec0ac, which makes the
emails clearer and less broken-looking when logging a message with no
stack trace.
This name hasn't been right since f7f2ec0ac back in 2013; this handler
sends the log record to a queue, whose consumer will not only maybe
send a Zulip message but definitely send an email. I found this
pretty confusing when I first worked on this logging code and was
looking for how exception emails got sent; so now that I see exactly
what's actually happening here, fix it.
By default, Django sets up two handlers on this logger, one of them
its AdminEmailHandler. We have our own handler for sending email on
error, and we want to stick to that -- we like the format somewhat
better, and crucially we've given it some rate-limiting through
ZulipLimiter.
Since we cleaned out our logging config in e0a5e6fad, though, we've
been sending error emails through both paths. The config we'd had
before that for `django` was redundant with the config on the root --
but having *a* config there was essential for causing
`logging.config.dictConfig`, when Django passes it our LOGGING dict,
to clear out that logger's previous config. So, give it an empty
config.
Django by default configures two loggers: `django` and
`django.server`. We have our own settings for `django.server`
anyway, so this is the only one we need to add.
The stdlib `logging` and `logging.config` docs aren't 100% clear, and
while the source of `logging` is admirably straightforward the source
of `logging.config` is a little twisty, so it's not easy to become
totally confident that this has the right effect just by reading.
Fortunately we can put some of that source-diving to work in writing
a test for it.
This should make it a little easier to understand our logging config
and make changes to it with confidence.
Many of these items that are now redundant used to be required when we
were setting disable_existing_loggers to True (before 500d81bf2), in
order to exempt those loggers from being cleared out. Now they're not.
One bit of test code needed a tweak to how it got its hands on the
AdminZulipHandler instance; it can do it from the list on the root
logger just as well as on the `django` logger.
mypy will error because of the attribute "request" on the LogRecord
object. Since this field is added in our tests dynamically and is not
on the base object, for now we will ignore the type.