The goal is to reduce load on Sentry if the service is timing out, and
to reduce uwsgi load from long requests. This circuit-breaker is
per-Django-process, so may require more than 2 failures overall before
it trips, and may also "partially" trip for some (but not all)
workers. Since all of this is best-effort, this is fine.
Because this is only for load reduction, we only circuit-breaker on
timeouts, and not unexpected HTTP response codes or the like.
See also #26229, which would move all browser-submitted Sentry
reporting into a single process, which would allow circuit-breaking to
be more effective.
This prevents failure to submit a client-side Sentry trace from
turning into a server-side client trace. If Sentry is down, we merely
log the error to our error logs and carry on.
This helps reduce the impact on busy uwsgi processes in case there are
slow timeout failures of Sentry servers. The p99 is less than 300ms,
and p99.9 per day peaks at around 1s, so this will not affect more
than .1% of requests in normal operation.
This is not a complete solution (see #26229); it is merely stop-gap
mitigation.
The default for Javascript reporting is that Sentry sets the IP
address of the user to the IP address that the report was observed to
come from[^1]. Since all reports come through the Zulip server, this
results in all reports being "from" one IP address, thus undercounting
the number of affected unauthenticated users, and making it difficult
to correlate Sentry reports with server logs.
Consume the Sentry Envelope format[^2] to inject the submitting
client's observed IP address, when possible. This ensures that Sentry
reports contain the same IP address that Zulip's server logs do.
[^1]: https://docs.sentry.io/platforms/python/guides/logging/enriching-events/identify-user/
[^2]: https://develop.sentry.dev/sdk/envelopes/
Some well-intentioned adblockers also block Sentry client-side error
reporting. Provide an endpoint on the Zulip server which forwards to
the Sentry server, so that these requests are not blocked.