zulip

Commit Graph

Author	SHA1	Message	Date
Anders Kaseorg	61d0417e75	python: Replace ujson with orjson. Fixes #6507. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-08-11 10:55:12 -07:00
Anders Kaseorg	60a25b2721	docs: Fix spelling errors caught by codespell. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-08-11 10:23:06 -07:00
Alex Vandiver	2928bbc8bd	logging: Report stack_info on logging.exception calls. The exception trace only goes from where the exception was thrown up to where the `logging.exception` call is; any context as to where _that_ was called from is lost, unless `stack_info` is passed as well. Having the stack is particularly useful for Sentry exceptions, which gain the full stack trace. Add `stack_info=True` on all `logging.exception` calls with a non-trivial stack; we omit `wsgi.py`. Adjusts tests to match.	2020-08-11 10:16:54 -07:00
Mateusz Mandera	a7039c815e	queue_processors: Fix UnboundLocalError in QueueProcessingWorker. consume_time_seconds wasn't properly defined at the beginning, so when a BaseException that isn't a subclass of Exception is thrown, the finally: block could be entered with it still undefined.	2020-08-11 10:09:42 -07:00
Anders Kaseorg	8e6a439529	queue_processors: Fix strict_optional errors. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-07-06 11:25:48 -07:00
Tim Abbott	52676c0670	lint: Work around a pyflakes bug. Without this change, pyflakes reports this exception: pyflakes \| zerver/worker/queue_processors.py:152:9 local variable 'e' is assigned to but never used pyflakes \| zerver/worker/queue_processors.py:155:81 undefined name 'e'	2020-07-03 17:24:36 -07:00
Mateusz Mandera	d51afcf485	emails: Improve handling of timeouts when sending. We use the EMAIL_TIMEOUT django setting to timeout after 15s of trying to send an email. This will nicely lead to retries in the email_senders queue, due to the retry_send_email_failures decorator. smtlib documentation suggests that socket.timeout can be raised as the result of timing out, so in attempts I'm getting smtplib.SMTPServerDisconnected. Either way, seems appropriate to add socket.timeout to the exception that we catch.	2020-07-03 16:52:50 -07:00
Vishnu KS	0a36f04c20	i18n: Mark notification bot message in queue_processors for translation.	2020-06-26 14:57:18 -07:00
Mateusz Mandera	85d4536486	docs: Update some comments for the new release versioning scheme. With the new scheme, the equivalent of 2.3 is 4.0.	2020-06-25 10:33:03 -07:00
Anders Kaseorg	579f05f3ed	queue_processors: Avoid unchecked casts. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-22 17:18:19 -07:00
Tim Abbott	4d7550d705	views: Extract message_edit.py for message editing views. This is a pretty clean extraction of files that lets us shrink one of our largest files.	2020-06-22 15:08:34 -07:00
Mateusz Mandera	8d2d64c100	CVE-2020-14215: Fix validation in PreregistrationUser queries. The most import change here is the one in maybe_send_to_registration codepath, as the insufficient validation there could lead to fetching an expired PreregistrationUser that was invited as an administrator admin even years ago, leading to this registration ending up in the new user being a realm administrator. Combined with the buggy migration in 0198_preregistrationuser_invited_as.py, this led to users incorrectly joining as organizations administrators by accident. But even without that bug, this issue could have allowed a user who was invited as an administrator but then had that invitation expire and then joined via social authentication incorrectly join as an organization administrator. The second change is in ConfirmationEmailWorker, where this wasn't a security problem, but if the server was stopped for long enough, with some invites to send out email for in the queue, then after starting it up again, the queue worker would send out emails for invites that had already expired.	2020-06-16 23:35:39 -07:00
Anders Kaseorg	5dc9b55c43	python: Manually convert more percent-formatting to f-strings. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	1ed2d9b4a0	logging: Use logging.exception and exc_info for unexpected exceptions. logging.exception() and logging.debug(exc_info=True), etc. automatically include a traceback. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 23:27:22 -07:00
Anders Kaseorg	a803e68528	email-mirror-postfix: Handle 8-bit messages correctly. Since JSON can’t represent bytes, we encode them with base64. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 20:24:06 -07:00
Anders Kaseorg	bff3dcadc8	email: Migrate to new Python ≥ 3.3 email API. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-14 20:24:06 -07:00
Anders Kaseorg	365fe0b3d5	python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-11 16:45:32 -07:00
Anders Kaseorg	69730a78cc	python: Use trailing commas consistently. Automatically generated by the following script, based on the output of lint with flake8-comma: import re import sys last_filename = None last_row = None lines = [] for msg in sys.stdin: m = re.match( r"\x1b\[35mflake8 \\|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg ) if m: filename, row_str, col_str, err = m.groups() row, col = int(row_str), int(col_str) if filename == last_filename: assert last_row != row else: if last_filename is not None: with open(last_filename, "w") as f: f.writelines(lines) with open(filename) as f: lines = f.readlines() last_filename = filename last_row = row line = lines[row - 1] if err in ["C812", "C815"]: lines[row - 1] = line[: col - 1] + "," + line[col - 1 :] elif err in ["C819"]: assert line[col - 2] == "," lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ") if last_filename is not None: with open(last_filename, "w") as f: f.writelines(lines) Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-06-11 16:04:12 -07:00
Graham Bleaney	461d5b1a3e	pysa: Introduce sanitizers, models, and inline marking safe. This commit adds three `.pysa` model files: `false_positives.pysa` for ruling out false positive flows with `Sanitize` annotations, `req_lib.pysa` for educating pysa about Zulip's `REQ()` pattern for extracting user input, and `redirects.pysa` for capturing the risk of open redirects within Zulip code. Additionally, this commit introduces `mark_sanitized`, an identity function which can be used to selectively clear taint in cases where `Sanitize` models will not work. This commit also puts `mark_sanitized` to work removing known false postive flows.	2020-06-11 12:57:49 -07:00
Anders Kaseorg	67e7a3631d	python: Convert percent formatting to Python 3.6 f-strings. Generated by pyupgrade --py36-plus. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-10 15:02:09 -07:00
Anders Kaseorg	8dd83228e7	python: Convert "".format to Python 3.6 f-strings. Generated by pyupgrade --py36-plus --keep-percent-format, but with the NamedTuple changes reverted (see commit `ba7906a3c6`, #15132). Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-08 15:31:20 -07:00
Anders Kaseorg	19cc22e5ab	queue: Fix types to reflect that Pika channels transmit bytes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-07 11:09:24 -07:00
Mateusz Mandera	200ce821a2	user_activity: Put client id instead of name in event dicts. This saves the completely unnecessary work of mapping the Client name to its ID. Because we had in-process caching of the immutable Client objects, this isn't a material performance win, but it will eventually let us delete that caching logic and have a simpler system.	2020-05-29 15:19:55 -07:00
Mateusz Mandera	e2262b0b64	queue_processors: Log time spent getting data for url in embed_links.	2020-05-21 12:13:46 -07:00
Mateusz Mandera	dd40649e04	queue_processors: Remove the slow_queries queue. While this functionality to post slow queries to a Zulip stream was very useful in the early days of Zulip, when there were only a few hundred accounts, it's long since been useless since (1) the total request volume on larger Zulip servers run by Zulip developers, and (2) other server operators don't want real-time notifications of slow backend queries. The right structure for this is just a log file. We get rid of the queue and replace it with a "zulip.slow_queries" logger, which will still log to /var/log/zulip/slow_queries.log for ease of access to this information and propagate to the other logging handlers. Reducing the amount of queues is good for lowering zulip's memory footprint and restart performance, since we run at least one dedicated queue worker process for each one in most configurations.	2020-05-11 00:45:13 -07:00
Anders Kaseorg	bdc365d0fe	logging: Pass format arguments to logging. https://docs.python.org/3/howto/logging.html#optimization Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-05-02 10:18:02 -07:00
Wyatt Hoodes	82e7ad8e25	data exports: Handle pending and failed exports. Prior to this change, there were reports of 500s in production due to `export.extra_data` being a Nonetype. This was reproducible using the s3 backend in development when a row was created in the `RealmAuditLog` table, but the export failed in the `DeferredWorker`. This left an entry lying about that was never updated with an `extra_data` field. To fix this, we catch any exceptions in the `DeferredWorker`, and then update `extra_data` to encode the failure. We also fix the fact that we never updated the export UI table with pending exports. These changes also negated the use for the somewhat hacky `clear_success_banner` logic.	2020-04-30 13:00:59 -07:00
Anders Kaseorg	fead14951c	python: Convert assignment type annotations to Python 3.6 style. This commit was split by tabbott; this piece covers the vast majority of files in Zulip, but excludes scripts/, tools/, and puppet/ to help ensure we at least show the right error messages for Xenial systems. We can likely further refine the remaining pieces with some testing. Generated by com2ann, with whitespace fixes and various manual fixes for runtime issues: - invoiced_through: Optional[LicenseLedger] = models.ForeignKey( + invoiced_through: Optional["LicenseLedger"] = models.ForeignKey( -_apns_client: Optional[APNsClient] = None +_apns_client: Optional["APNsClient"] = None - notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) - signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) + notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) + signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE) - author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE) + author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE) - bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL) + bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL) - default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) - default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) + default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) + default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE) -descriptors_by_handler_id: Dict[int, ClientDescriptor] = {} +descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {} -worker_classes: Dict[str, Type[QueueProcessingWorker]] = {} -queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {} +worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {} +queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {} -AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None +AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-22 11:02:32 -07:00
Mateusz Mandera	fe8f57b8b7	queue_processors: Write a newline char at the end of stats files.	2020-04-10 13:48:16 -07:00
Mateusz Mandera	5252b081bd	queue_processors: Gather statistics on queue worker operations.	2020-04-01 16:44:06 -07:00
arpit551	8f7733cb20	emails: Added placeholders strings in FormAddress. We've had a bug for a while that if any ScheduledEmail objects get created with the wrong email sender address, even after the sysadmin corrects the problem, they'll still get errors because of the objects stored with the wrong format. We solve this by using FromAddress placeholders strings in send_future_email function, so that ScheduledEmail objects end up setting the final `from_address` value when mail is actually sent using the setting in effect at that time. Fixes #11008.	2020-03-27 16:41:02 -07:00
Mateusz Mandera	5da2f80140	queue_processors: Extract a duplicated logic block into do_consume.	2020-03-22 18:45:46 -07:00
Tim Abbott	783a77c532	queue processors: Flush per-request caches after each item. Several of our queues are capable of doing work that includes rendering markdown (outgoing_webhook, embedded_bots, embed_links, and email_mirror). As a result, it's essential that these don't cache per-request data (specifically, realm filters) longer than they should, making editing/deleting linkifiers potentially use old settings until the relevant process was restarted. Flushing these caches is extremely cheap (just clearing two dictionaries) and thus is reasonable to do after every queue event, rather than trying to do it only the ~1/3 of queues that specifically do markdown processing. We do the same in our middleware for reset_queries. It's not worth writing a test for this because it's very difficult to create the test setup situation for this bug with a single test worker process; one needs to edit the linkifier configuration in a different process than the one sending the message in order to see the bug. This was a much larger visible bug on Zulip 2.1.x, where the presence of the message_sender queue meant that this would apply to messages sent via a browser. Fixes #14095.	2020-03-03 15:29:11 -08:00
Steve Howell	2e8dec233e	slow queries: Use internal_send_stream_message(). Note that while the test mocks the actual message send, we now have a `get_stream` call in the queue worker, so we have to set up a real stream for testing (or we could have mocked that as well, but it didn't seem necessary). The setup queries add to the amount of queries reported by the test, plus the `get_stream` call. I just made the query count a digits regex, which is a little bit lame, but I don't think it's worth risking test flakes for this.	2020-02-11 12:20:54 -08:00
Mateusz Mandera	4c5a8e6f0c	queue: Remove missedmessage_email_senders.	2020-01-31 12:13:51 -08:00
Tim Abbott	d70e799466	bots: Remove FEEDBACK_BOT implementation. This legacy cross-realm bot hasn't been used in several years, as far as I know. If we wanted to re-introduce it, I'd want to implement it as an embedded bot using those common APIs, rather than the totally custom hacky code used for it that involves unnecessary queue workers and similar details. Fixes #13533.	2020-01-25 22:41:39 -08:00
Anders Kaseorg	ea6934c26d	dependencies: Remove WebSockets system for sending messages. Zulip has had a small use of WebSockets (specifically, for the code path of sending messages, via the webapp only) since ~2013. We originally added this use of WebSockets in the hope that the latency benefits of doing so would allow us to avoid implementing a markdown local echo; they were not. Further, HTTP/2 may have eliminated the latency difference we hoped to exploit by using WebSockets in any case. While we’d originally imagined using WebSockets for other endpoints, there was never a good justification for moving more components to the WebSockets system. This WebSockets code path had a lot of downsides/complexity, including: * The messy hack involving constructing an emulated request object to hook into doing Django requests. * The `message_senders` queue processor system, which increases RAM needs and must be provisioned independently from the rest of the server). * A duplicate check_send_receive_time Nagios test specific to WebSockets. * The requirement for users to have their firewalls/NATs allow WebSocket connections, and a setting to disable them for networks where WebSockets don’t work. * Dependencies on the SockJS family of libraries, which has at times been poorly maintained, and periodically throws random JavaScript exceptions in our production environments without a deep enough traceback to effectively investigate. * A total of about 1600 lines of our code related to the feature. * Increased load on the Tornado system, especially around a Zulip server restart, and especially for large installations like zulipchat.com, resulting in extra delay before messages can be sent again. As detailed in https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it appears that removing WebSockets moderately increases the time it takes for the `send_message` API query to return from the server, but does not significantly change the time between when a message is sent and when it is received by clients. We don’t understand the reason for that change (suggesting the possibility of a measurement error), and even if it is a real change, we consider that potential small latency regression to be acceptable. If we later want WebSockets, we’ll likely want to just use Django Channels. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-01-14 22:34:00 -08:00
Mateusz Mandera	89046ea1a9	email_mirror: Give extract_and_validate a more descriptive name.	2020-01-12 11:30:18 -08:00
Mateusz Mandera	c011d2c6d3	email_mirror: Migrate missed message addresses from redis to database. Addresses point 1 of #13533. MissedMessageEmailAddress objects get tied to the specific that was missed by the user. A useful benefit of that is that email message sent to that address will handle topic changes - if the message that was missed gets its topic changed, the email response will get posted under the new topic, while in the old model it would get posted under the old topic, which could potentially be confusing. Migrating redis data to this new model is a bit tricky, so the migration code has comments explaining some of the compromises made there, and test_migrations.py tests handling of the various possible cases that could arise.	2020-01-07 13:03:22 -08:00
Mateusz Mandera	e90866876c	queue: Take advantage of ABC for defining abstract worker base classes. QueueProcessingWorker and LoopQueueProcessingWorker are abstract classes meant to be subclassed by a class that will define its own consume() or consume_batch() method. ABCs are suited for that and we can tag consume/consume_batch with the @abstractmethod wrapper which will prevent subclasses that don't define these methods properly to be impossible to even instantiate (as opposed to only crashing once consume() is called). It's also nicely detected by mypy, which will throw errors such as this on invalid use: error: Only concrete class can be given where "Type[TestWorker]" is expected error: Cannot instantiate abstract class 'TestWorker' with abstract attribute 'consume' Due to it being detected by mypy, we can remove the test test_worker_noconsume which just tested the old version of this - raising an exception when the unimplemented consume() gets called. Now it can be handled already on the linter level.	2019-12-28 10:52:17 -08:00
Mateusz Mandera	a54640fc68	queue: Share exception handling code between loop and normal workers. LoopQueueProcessingWorker can handle exceptions inside consume_batch in a similar manner to how QueueProcessingWorker handles exceptions inside consume.	2019-12-28 10:47:36 -08:00
Tim Abbott	1465628c95	queue workers: Use self.queue_name in retry_event calls. This just adds a bit of robustness if we ever end up renaming queues.	2019-12-04 10:08:48 -08:00
Mateusz Mandera	7d0444f903	push_notifs: Improve handling of errors when talking to the bouncer. We use the plumbing introduced in a previous commit, to now raise PushNotificationBouncerRetryLaterError in send_to_push_bouncer in case of issues with talking to the bouncer server. That's a better way of dealing with the errors than the previous approach of returning a "failed" boolean, which generally wasn't checked in the code anyway and did nothing. The PushNotificationBouncerRetryLaterError exception will be nicely handled by queue processors to retry sending again, and due to being a JsonableError, it will also communicate the error to API users.	2019-12-04 09:58:22 -08:00
Mateusz Mandera	20b30e1503	push_notifs: Set up plumbing for retrying in case of bouncer error. We add PushNotificationBouncerRetryLaterError as an exception to signal an error occurred when trying to communicate with the bouncer and it should be retried. We use JsonableError as the base class, because this signal will need to work in two roles: 1. When the push notification was being issued by the queue worker PushNotificationsWorker, it will signal to the worker to requeue the event and try again later. 2. The exception will also possibly be raised (this will be added in the next commit) on codepaths coming from a request to an API endpoint (for example to add a token, to users/me/apns_device_token). In that case, it'll be needed to provide a good error to the API user - and basing this exception on JsonableError will allow that.	2019-12-04 09:58:22 -08:00
Tim Abbott	6407d0b1f9	push_notifications: Clear PushDeviceToken on API key change. This includes adding a new endpoint to the push notification bouncer interface, and code to call it appropriately after resetting a user's personal API key. When we add support for a user having multiple API keys, we may need to add an additional key here to support removing keys associated with just one client.	2019-11-19 15:37:43 -08:00
Tim Abbott	bb64b0fa4d	queue processors: Switch SignupWorker to logging user IDs. This is a better setup than logging emails, especially with EMAIL_ADDRESS_VISIBILITY_ADMINS.	2019-11-15 17:07:24 -08:00
Tim Abbott	d2970a56c2	lint: Remove some unused imports. These were introduced in `ae5bc92602`.	2019-10-10 18:06:30 -07:00
Vishnu KS	ae5bc92602	queue: Don't create confirmation objects twice during invite. A confirmation object is already created when do_send_confirmation_email is called just above. Tweaked by tabbott to remove an unnecessary somewhat hacky database query.	2019-10-10 16:19:42 -07:00
Tim Abbott	1c73ce2450	user_activity: Use LoopQueueProcessingWorker strategy. This should dramatically improve the queue processor's performance in cases where there's a very high volume of requests on a given endpoint by a given user, as described in the new docstring. Until we test this more broadly in production, we won't know if this is a full solution to the problem, but I think it's likely. We've never seen the UserActivityInterval worker end up backlogged without a total queue processor outage, and it should have a similar workload. Fixes #13180.	2019-09-21 11:48:24 -07:00
Tim Abbott	f0d8951035	do_update_user_activity: Refactor to support passing a count. We'll use this in upcoming commits.	2019-09-21 11:47:14 -07:00

1 2 3 4 5 ...

301 Commits