zulip

Commit Graph

Author	SHA1	Message	Date
Alex Vandiver	596cf2580b	sentry: Ignore all SuspiciousOperation loggers. django.security.DisallowedHost is only one of a set of exceptions that are "SuspiciousOperation" exceptions; all return a 400 to the user when they bubble up[1]; all of them are uninteresting to Sentry. While they may, in bulk, show a mis-configuration of some sort of the application, such a failure should be detected via the increase in 400's, not via these, which are uninteresting individually. While all of these are subclasses of SuspiciousOperation, we enumerate them explicitly for a number of reasons: - There is no one logger we can ignore that captures all of them. Each of the errors uses its own logger, and django does not supply a `django.security` logger that all of them feed into. - Nor can we catch this by examining the exception object. The SuspiciousOperation exception is raised too early in the stack for us to catch the exception by way of middleware and check `isinstance`. But at the Sentry level, in `add_context`, it is no longer an exception but a log entry, and as such we have no `isinstance` that can be applied; we only know the logger name. - Finally, there is the semantic argument that while we have decided to ignore this set of security warnings, we _may_ wish to log new ones that may be added at some point in the future. It is better to opt into those ignores than to blanket ignore all messages from the security logger. This moves the DisallowedHost `ignore_logger` to be adjacent to its kin, and not on the middleware that may trigger it. Consistency is more important than locality in this case. Of these, the DisallowedHost logger if left as the only one that is explicitly ignored in the LOGGING configuration in `computed_settings.py`; it is by far the most frequent, and the least likely to be malicious or impactful (unlike, say, RequestDataTooBig). [1] https://docs.djangoproject.com/en/3.0/ref/exceptions/#suspiciousoperation	2020-08-12 16:08:38 -07:00
Alex Vandiver	28c627452f	sentry: Ignore DisallowedHost messages. This is a misconfiguration of the client, not the server.	2020-08-11 10:38:14 -07:00
Alex Vandiver	f00ff1ef62	middleware: Make HostDomain into a process_request, not process_response. It is more suited for `process_request`, since it should stop execution of the request if the domain is invalid. This code was likely added as a process_response (in `ea39fb2556`) because there was already a process_response at the time (added `7e786d5426`, and no longer necessary since `dce6b4a40f`). It quiets an unnecessary warning when logging in at a non-existent realm. This stops performing unnecessary work when we are going to throw it away and return a 404. The edge case to this is if the request _creates_ a realm, and is made using the URL of the new realm; this change would prevent the request before it occurs. While this does arise in tests, the tests do not reflect reality -- real requests to /accounts/register/ are made via POST to the same (default) realm, redirected there from `confirm-preregistrationuser`. The tests are adjusted to reflect real behavior. Tweaked by tabbott to add a block comment in HostDomainMiddleware.	2020-08-11 10:37:55 -07:00
Alex Vandiver	9266315a1f	middleware: Stop shadowing top-level logger definition on line 33.	2020-07-27 16:46:13 -07:00
Alex Vandiver	1b2d0271af	sentry: Prevent double-logging of JSON-formatted errors. Capture and report the initial exception, not the formatted text-only message traceback.	2020-07-27 11:07:55 -07:00
Mohit Gupta	44d68c1840	refactor: Rename bugdown words to markdown in stats related functions. This commit is part of series of commits aimed at renaming bugdown to markdown.	2020-06-26 17:20:40 -07:00
Mohit Gupta	3f5fc13491	refactor: Rename zerver.lib.bugdown to zerver.lib.markdown . This commit is first of few commita which aim to change all the bugdown references to markdown. This commits rename the files, file path mentions and change the imports. Variables and other references to bugdown will be renamed in susequent commits.	2020-06-26 17:08:37 -07:00
Anders Kaseorg	5dc9b55c43	python: Manually convert more percent-formatting to f-strings. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	365fe0b3d5	python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-11 16:45:32 -07:00
Anders Kaseorg	69730a78cc	python: Use trailing commas consistently. Automatically generated by the following script, based on the output of lint with flake8-comma: import re import sys last_filename = None last_row = None lines = [] for msg in sys.stdin: m = re.match( r"\x1b\[35mflake8 \\|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg ) if m: filename, row_str, col_str, err = m.groups() row, col = int(row_str), int(col_str) if filename == last_filename: assert last_row != row else: if last_filename is not None: with open(last_filename, "w") as f: f.writelines(lines) with open(filename) as f: lines = f.readlines() last_filename = filename last_row = row line = lines[row - 1] if err in ["C812", "C815"]: lines[row - 1] = line[: col - 1] + "," + line[col - 1 :] elif err in ["C819"]: assert line[col - 2] == "," lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ") if last_filename is not None: with open(last_filename, "w") as f: f.writelines(lines) Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-06-11 16:04:12 -07:00
Anders Kaseorg	67e7a3631d	python: Convert percent formatting to Python 3.6 f-strings. Generated by pyupgrade --py36-plus. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-10 15:02:09 -07:00
Mateusz Mandera	dd40649e04	queue_processors: Remove the slow_queries queue. While this functionality to post slow queries to a Zulip stream was very useful in the early days of Zulip, when there were only a few hundred accounts, it's long since been useless since (1) the total request volume on larger Zulip servers run by Zulip developers, and (2) other server operators don't want real-time notifications of slow backend queries. The right structure for this is just a log file. We get rid of the queue and replace it with a "zulip.slow_queries" logger, which will still log to /var/log/zulip/slow_queries.log for ease of access to this information and propagate to the other logging handlers. Reducing the amount of queues is good for lowering zulip's memory footprint and restart performance, since we run at least one dedicated queue worker process for each one in most configurations.	2020-05-11 00:45:13 -07:00
Tim Abbott	a702894e0e	middleware: Stop using X_REAL_IP. The comment was wrong, in that REMOTE_ADDR is where the real external IP was; X_REAL_IP was the loadbalancer's IP.	2020-05-08 11:40:54 -07:00
Anders Kaseorg	bdc365d0fe	logging: Pass format arguments to logging. https://docs.python.org/3/howto/logging.html#optimization Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-05-02 10:18:02 -07:00
Anders Kaseorg	fead14951c	python: Convert assignment type annotations to Python 3.6 style. This commit was split by tabbott; this piece covers the vast majority of files in Zulip, but excludes scripts/, tools/, and puppet/ to help ensure we at least show the right error messages for Xenial systems. We can likely further refine the remaining pieces with some testing. Generated by com2ann, with whitespace fixes and various manual fixes for runtime issues: - invoiced_through: Optional[LicenseLedger] = models.ForeignKey( + invoiced_through: Optional["LicenseLedger"] = models.ForeignKey( -_apns_client: Optional[APNsClient] = None +_apns_client: Optional["APNsClient"] = None - notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) - signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) + notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) + signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) - author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE) + author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE) - bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL) + bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL) - default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) - default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) + default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) + default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) -descriptors_by_handler_id: Dict[int, ClientDescriptor] = {} +descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {} -worker_classes: Dict[str, Type[QueueProcessingWorker]] = {} -queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {} +worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {} +queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {} -AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None +AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-22 11:02:32 -07:00
Anders Kaseorg	1cf63eb5bf	python: Whitespace fixes from autopep8. Generated by autopep8, with the setup.cfg configuration from #14532. I’m not sure why pycodestyle didn’t already flag these. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-21 17:58:09 -07:00
Anders Kaseorg	dce6b4a40f	middleware: Remove unused cookie_domain setting. Since commit `1d72629dc4`, we have been maintaining a patched copy of Django’s SessionMiddleware.process_response in order to unconditionally ignore our own optional cookie_domain setting that we don’t set. Instead, let’s not do that. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-12 11:55:55 -07:00
Anders Kaseorg	c734bbd95d	python: Modernize legacy Python 2 syntax with pyupgrade. Generated by `pyupgrade --py3-plus --keep-percent-format` on all our Python code except `zthumbor` and `zulip-ec2-configure-interfaces`, followed by manual indentation fixes. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-09 16:43:22 -07:00
Mateusz Mandera	0155193140	rate_limiter: Change type of the RateLimitResult.remaining to int. This is cleaner than it being Optional[int], as the value of None for this object has been synonymous to 0.	2020-04-08 10:29:18 -07:00
Mateusz Mandera	e86cfbdbd7	rate_limiter: Store data in request._ratelimits_applied list. The information used to be stored in a request._ratelimit dict, but there's no need for that, and a list is a simpler structure, so this allows us to simplify the plumbing somewhat.	2020-04-08 10:29:18 -07:00
Mateusz Mandera	9911c6a0f0	rate_limiter: Put secs_to_freedom as message when raising RateLimited. That's the value that matters to the code that catches the exception, and this change allows simplifying the plumbing somewhat, and gets rid of the get_rate_limit_result_from_request function.	2020-04-08 10:29:18 -07:00
Mateusz Mandera	eb0216c5a8	middleware: Log <user.id>@subdomain instead of subdomain/<user.id>. It was decided that the new format is preferable.	2020-03-24 10:25:01 -07:00
Mateusz Mandera	85df6201f6	rate_limit: Move functions called by external code to RateLimitedObject.	2020-03-22 18:42:35 -07:00
Mateusz Mandera	2b51b3c6c5	middleware: Also log request subdomain when logging "unauth" request. This returns us to a consistent logging format regardless of whether the request is authenticated. We also update some log examples in docs to be consistent with the new style.	2020-03-22 18:32:04 -07:00
Mateusz Mandera	89394fc1eb	middleware: Use request.user for logging when possible. Instead of trying to set the _requestor_for_logs attribute in all the relevant places, we try to use request.user when possible (that will be when it's a UserProfile or RemoteZulipServer as of now). In other places, we set _requestor_for_logs to avoid manually editing the request.user attribute, as it should mostly be left for Django to manage it. In places where we remove the "request._requestor_for_logs = ..." line, it is clearly implied by the previous code (or the current surrounding code) that request.user is of the correct type.	2020-03-09 13:54:58 -07:00
Mateusz Mandera	0255ca9b6a	middleware: Log user.id/realm.string_id instead of _email.	2020-03-09 13:54:58 -07:00
Tim Abbott	229090a3a5	middleware: Avoid running APPEND_SLASH logic in Tornado. Profiling suggests this saves about 600us in the runtime of every GET /events request attempting to resolve URLs to determine whether we need to do the APPEND_SLASH behavior. It's possible that we end up doing the same URL resolution work later and we're just moving around some runtime, but I think even if we do, Django probably doesn't do any fancy caching that would mean doing this query twice doesn't just do twice the work. In any case, we probably want to extend this behavior to our whole API because the APPEND_SLASH redirect behavior is essentially a bug there. That is a more involved refactor, however.	2020-02-14 16:15:57 -08:00
rht	41e3db81be	dependencies: Upgrade to Django 2.2.10. Django 2.2.x is the next LTS release after Django 1.11.x; I expect we'll be on it for a while, as Django 3.x won't have an LTS release series out for a while. Because of upstream API changes in Django, this commit includes several changes beyond requirements and: * urls: django.urls.resolvers.RegexURLPattern has been replaced by django.urls.resolvers.URLPattern; affects OpenAPI code and related features which re-parse Django's internals. https://code.djangoproject.com/ticket/28593 * test_runner: Change number to suffix. Django changed the name in this ticket: https://code.djangoproject.com/ticket/28578 * Delete now-unnecessary SameSite cookie code (it's now the default). * forms: urlsafe_base64_encode returns string in Django 2.2. https://docs.djangoproject.com/en/2.2/ref/utils/#django.utils.http.urlsafe_base64_encode * upload: Django's File.size property replaces _get_size(). https://docs.djangoproject.com/en/2.2/_modules/django/core/files/base/ * process_queue: Migrate to new autoreload API. * test_messages: Add an extra query caused by .refresh_from_db() losing the .select_related() on the Realm object. * session: Sync SessionHostDomainMiddleware with Django 2.2. There's a lot more we can do to take advantage of the new release; this is tracked in #11341. Many changes by Tim Abbott, Umair Waheed, and Mateusz Mandera squashed are squashed into this commit. Fixes #10835.	2020-02-13 16:27:26 -08:00
Tim Abbott	1ea2f188ce	tornado: Rewrite Django integration to duplicate less code. Since essentially the first use of Tornado in Zulip, we've been maintaining our Tornado+Django system, AsyncDjangoHandler, with several hundred lines of Django code copied into it. The goal for that code was simple: We wanted a way to use our Django middleware (for code sharing reasons) inside a Tornado process (since we wanted to use Tornado for our async events system). As part of the Django 2.2.x upgrade, I looked at upgrading this implementation to be based off modern Django, and it's definitely possible to do that: * Continue forking load_middleware to save response middleware. * Continue manually running the Django response middleware. * Continue working out a hack involving copying all of _get_response to change a couple lines allowing us our Tornado code to not actually return the Django HttpResponse so we can long-poll. The previous hack of returning None stopped being viable with the Django 2.2 MiddlewareMixin.__call__ implementation. But I decided to take this opportunity to look at trying to avoid copying material Django code, and there is a way to do it: * Replace RespondAsynchronously with a response.asynchronous attribute on the HttpResponse; this allows Django to run its normal plumbing happily in a way that should be stable over time, and then we proceed to discard the response inside the Tornado `get()` method to implement long-polling. (Better yet might be raising an exception?). This lets us eliminate maintaining a patched copy of _get_response. * Removing the @asynchronous decorator, which didn't add anything now that we only have one API endpoint backend (with two frontend call points) that could call into this. Combined with the last bullet, this lets us remove a significant hack from our never_cache_responses function. * Calling the normal Django `get_response` method from zulip_finish after creating a duplicate request to process, rather than writing totally custom code to do that. This lets us eliminate maintaining a patched copy of Django's load_middleware. * Adding detailed comments explaining how this is supposed to work, what problems we encounter, and how we solve various problems, which is critical to being able to modify this code in the future. A key advantage of these changes is that the exact same code should work on Django 1.11, Django 2.2, and Django 3.x, because we're no longer copying large blocks of core Django code and thus should be much less vulnerable to refactors. There may be a modest performance downside, in that we now run both request and response middleware twice when longpolling (once for the request we discard). We may be able to avoid the expensive part of it, Zulip's own request/response middleware, with a bit of additional custom code to save work for requests where we're planning to discard the response. Profiling will be important to understanding what's worth doing here.	2020-02-13 16:13:11 -08:00
Mateusz Mandera	335b804510	exceptions: RateLimited shouldn't inherit from PermissionDenied. We will want to raise RateLimited in authenticate() in rate limiting code - Django's authenticate() mechanism catches PermissionDenied, which we don't want for RateLimited. We want RateLimited to propagate to our code that called the authenticate() function.	2020-02-02 19:15:00 -08:00
Mateusz Mandera	a6a2d70320	rate_limiter: Handle multiple types of rate limiting in middleware. As more types of rate limiting of requests are added, one request may end up having various limits applied to it - and the middleware needs to be able to handle that. We implement that through a set_response_headers function, which sets the X-RateLimit-* headers in a sensible way based on all the limits that were applied to the request.	2020-02-02 19:15:00 -08:00
Wyatt Hoodes	b807c4273e	middleware: Fix exception typing. Mypy seems to have trouble understanding `Exception` inheritance here, so we create a `Union` for the only `Exception` we are looking for.	2019-07-31 12:23:20 -07:00
Anders Kaseorg	0bcae0be55	write_log_line: Fix logging of 4xx error data. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-07-25 14:42:52 -07:00
Wyatt Hoodes	5686821150	middleware: Change write_log_line to publish as a dict. We were seeing errors when pubishing typical events in the form of `Dict[str, Any]` as the expected type to be a `Union`. So we instead change the only non-dictionary call, to pass a dict instead of `str`.	2019-07-22 17:06:41 -07:00
Mateusz Mandera	f73600c82c	rate_limiter: Create a general rate_limit_request_by_entity function.	2019-05-30 16:50:11 -07:00
Anders Kaseorg	9efda71a4b	get_realm: raise DoesNotExist instead of returning None. This makes the implementation of `get_realm` consistent with its declared return type of `Realm` rather than `Optional[Realm]`. Fixes #12263. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-05-06 21:58:16 -07:00
Puneeth Chaganti	a653fcca93	html_to_text: Escape text when using as description.	2019-04-25 15:29:16 -07:00
Puneeth Chaganti	7d7134d45d	html_to_text: Extract code for html to plain text conversion.	2019-04-25 15:29:16 -07:00
Anders Kaseorg	21dc34cc52	open graph: HTML-escape og:description, twitter:description. The entire idea of doing this operation with unchecked string replacement in a middleware class is in my opinion extremely ill-conceived, but this fixes the most pressing problem with it generating invalid HTML. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-04-23 15:53:59 -07:00
Anders Kaseorg	643bd18b9f	lint: Fix code that evaded our lint checks for string % non-tuple. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2019-04-23 15:21:37 -07:00
Tim Abbott	983e24a7f5	auth: Use HTTP status 404 for invalid realms. Apparently, our invalid realm error page had HTTP status 200, which could be confusing and in particular broken our mobile app's error handling for this case.	2019-03-14 13:50:09 -07:00
Tim Abbott	de6f724bc5	middleware: Avoid doing work for statsd when not enabled. This saves about 8% of the runtime of our total response middleware, or equivalently close to 2% of the total Tornado response time. Which is pretty significant given that we're not sure anyone is using statsd in production. It's also useful outside Tornado, but the effect is particularly significant because of how important Tornado performance is.	2019-02-27 17:53:15 -08:00
Tim Abbott	c955b20131	middleware: Don't repreatedly regenerate open graph functions. This avoids parsing these functions on every request, which was adding roughly 350us to our per-request response times. The overall impact was more than 10% of basic Tornado response runtime.	2019-02-27 17:53:13 -08:00
Rishi Gupta	028874bab3	open graph: Remove extraneous spaces from descriptions. Our html collects extra spaces in a couple of places. The most prominent is paragraphs that look like the following in the .md file: * some text continued The html will have two spaces before "continued".	2019-02-11 12:05:19 -08:00
Rishi Gupta	d3125f59e1	open graph: Omit .code-section navigation from open graph.	2019-02-11 12:05:19 -08:00
Rishi Gupta	e1f02dc6f2	open graph: Include multiple paragraphs in description tags.	2019-02-11 12:05:19 -08:00
Anders Kaseorg	f0ecb93515	zerver core: Remove unused imports. Signed-off-by: Anders Kaseorg <andersk@mit.edu>	2019-02-02 17:41:24 -08:00
Wyatt Hoodes	8eac361fb5	docs: Refactor BS work with use of cache_with_key. Refactor the potentially expensive work done by Beautiful Soup into a function that is called by the alter_content function, so that we can cache the result. Saves a significant portion of the runtime of loading of all of our /help/ and /api/ documentation pages (e.g. 12ms for /api). Fixes #11088. Tweaked by tabbott to use the URL path as the cache key, clean up argument structure, and use a clearer name for the function.	2019-01-28 15:21:52 -08:00
Tim Abbott	9c3f38a564	docs: Automatically construct OpenAPI metadata for help center. This is somewhat hacky, in that in order to do what we're doing, we need to parse the HTML of the rendered page to extract the first paragraph to include in the open graph description field. But BeautifulSoup does a good job of it. This carries a nontrivial performance penalty for loading these pages, but overall /help/ is a low-traffic site compared to the main app, so it doesn't matter much. (As a sidenote, it wouldn't be a bad idea to cache this stuff). There's lots of things we can improve in this, largely through editing the articles, but we can deal with that over time. Thanks to Rishi for writing all the tests.	2018-12-19 10:18:20 -08:00
Tim Abbott	ae6fc0a471	sessions: Resync session middleware from Django upstream. Until we resolve https://github.com/zulip/zulip/issues/10832, we will need to maintain our own forked copy of Django's SessionMiddleware. We apparently let this get out of date. This fixes a few subtle bugs involving the user logout experience that were throwing occasional exceptions (e.g. the UpdateError fix you can see).	2018-11-14 15:16:12 -08:00

1 2 3 4

162 Commits