zulip

Commit Graph

Author	SHA1	Message	Date
Alex Vandiver	a8a1f10f3c	digest: Clear the cache once we move to a new realm / cutoff value.	2023-09-13 13:25:59 -07:00
Alex Vandiver	b9f72bdd68	digest: Switch loop to early-abort for clarity.	2023-09-13 13:25:59 -07:00
Alex Vandiver	b555d3f553	digest: Cache per-stream recent topics, rather than batching. The query plan for fetching recent messages from the arbitrary set of streams formed by the intersection of 30 random users can be quite bad, and can descend into a sequential scan on `zerver_recipient`. Worse, this work of pulling recent messages out is redone if the stream appears in the next batch of 30 users. Instead, pull the recent messages for a stream on a one-by-one basis, but cache them in an in-memory cache. Since digests are enqueued in 30-user batches but still one-realm-at-a-time, work will be saved both in terms of faster query plans whose results can also be reused across batches. This requires that we pull the stream-id to stream-name mapping for _all_ streams in the realm at once, but that is well-indexed and unlikely to cause performance issues -- in fact, it may be faster than pulling a random subset of the streams in the realm.	2023-09-13 13:25:59 -07:00
Alex Vandiver	bca9821c89	digest: Rename get_recent_streams for clarity.	2023-09-13 13:25:59 -07:00
Alex Vandiver	524d4913b3	digest: Filter out users who have joined recently in SQL.	2023-09-13 13:25:59 -07:00
Alex Vandiver	584c202d36	digest: Remove unnecessary should_process_digest function.	2023-09-13 13:25:59 -07:00
Steve Howell	751b8b5bb5	tests: Flush per-request caches automatically for query counts.	2023-08-11 11:09:34 -07:00
Steve Howell	549891266d	tests: Add assert_memcached_count. We use a specific name to distinguish from other caches like per-request caches.	2023-08-11 11:09:34 -07:00
Anders Kaseorg	df001db1a9	black: Reformat with Black 23. Black 23 enforces some slightly more specific rules about empty line counts and redundant parenthesis removal, but the result is still compatible with Black 22. (This does not actually upgrade our Python environment to Black 23 yet.) Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-02-02 10:40:13 -08:00
Zixuan James Li	46329a2710	test_classes: Create a dedicate helper for query count check. This adds a helper based on testing patterns of using the "queries_captured" context manager with "assert_length" to check the number of queries executed for preventing performance regression. It explains the rationale of checking the query count through an "AssertionError" and prints the queries captured as assert_length does, but with a format optimized for displaying the queries in a more readable manner. Signed-off-by: Zixuan James Li <p359101898@gmail.com>	2022-10-17 11:32:52 -07:00
Mateusz Mandera	00b3546c9f	models: Add denormalized .realm column to Message. This commit adds the OPTIONAL .realm attribute to Message (and ArchivedMessage), with the server changes for making new Messages have this set. Old Messages still have to be migrated to backfill this, before it can be non-nullable. Appropriate test changes to correctly set .realm for Messages the tests manually create are included here as well.	2022-10-07 10:09:38 -07:00
Mateusz Mandera	5850c38f4e	test_digest: Use proper stream.id in test_get_hot_topics. Just using values 1 and 2 as stream ids is not good, because there's no idea in which realm these streams are (or hypothetically if they exist). This can create weird Messages with sender being a user of "zulip" realm and the stream being in another realm - which would be a corrupted state.	2022-09-28 16:45:25 +02:00
Adam Sah	ba5cf331a2	testing: 100% coverage for zerver/tests/test_digest.py.	2022-06-01 16:09:13 -07:00
Anders Kaseorg	b572b18e70	test_digest: Modernize set literal syntax. Generated by pyupgrade. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-04-27 12:57:49 -07:00
Mateusz Mandera	fcf82bf047	digest: Don't send emails to deactivated users, even if queued.	2022-04-15 14:32:55 -07:00
Mateusz Mandera	7a13836d26	test_digest: Fix typo in a comment.	2022-04-15 14:32:55 -07:00
Anders Kaseorg	cbad5739ab	actions: Split out zerver.actions.create_user. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-04-14 17:14:35 -07:00
Anders Kaseorg	b0ce4f1bce	docs: Fix many spelling mistakes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-07 18:51:06 -08:00
Anders Kaseorg	90e202cd38	docs: Consistently hyphenate “web-public”. In English, compound adjectives should essentially always be hyphenated. This makes them easier to parse, especially for users who might not recognize that the words “web public” go together as a phrase. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-01-28 17:45:45 -08:00
Steve Howell	2902f8b931	tests: Ensure stream senders get a UserMessage row. We now complain if a test author sends a stream message that does not result in the sender getting a UserMessage row for the message. This is basically 100% equivalent to complaining that the author failed to subscribe the sender to the stream as part of the test setup, as far as I can tell, so the AssertionError instructs the author to subscribe the sender to the stream. We exempt bots from this check, although it is plausible we should only exempt the system bots like the notification bot. I considered auto-subscribing the sender to the stream, but that can be a little more expensive than the current check, and we generally want test setup to be explicit. If there is some legitimate way than a subscribed human sender can't get a UserMessage, then we probably want an explicit test for that, or we may want to change the backend to just write a UserMessage row in that hypothetical situation. For most tests, including almost all the ones fixed here, the author just wants their test setup to realistically reflect normal operation, and often devs may not realize that Cordelia is not subscribed to Denmark or not realize that Hamlet is not subscribed to Scotland. Some of us don't remember our Shakespeare from high school, and our stream subscriptions don't even necessarily reflect which countries the Bard placed his characters in. There may also be some legitimate use case where an author wants to simulate sending a message to an unsubscribed stream, but for those edge cases, they can always set allow_unsubscribed_sender to True.	2021-12-10 09:40:04 -08:00
Anders Kaseorg	3665deb93a	python: Remove unnecessary intermediate lists. Generated automatically by pyupgrade. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-08-02 15:53:52 -07:00
akshatdalton	e203112fd4	refactor: Use `assert_length` helper instead of `assertTrue/assertEqual`.	2021-07-13 13:03:38 -07:00
shanukun	4b67946605	refactor: Make acting_user a mandatory kwarg for do_create_user.	2021-02-25 17:58:00 -08:00
Anders Kaseorg	6e4c3e41dc	python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	11741543da	python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Vishnu KS	5c026d67e3	digest: Sort topics in descending order in get_hot_topics. We want topics with high diversity and large lengths. So they should be sorted with reverse=True. This bug seems to be introduced in `936171d258`	2021-02-09 10:35:47 -08:00
Alex Vandiver	d0f0c2f2ed	digest: Fix the structure that we enqueue across when digesting. This rename was missed in `bfa0bdf3d6`. Without this fix, digest messages fail to send.	2021-02-08 17:28:59 -08:00
Steve Howell	1040fb7219	email digests: Remove handle_digest_email shim. The previous commit made it so we only call the shim in tests, so now we completely remove it.	2021-01-17 11:28:30 -08:00
Steve Howell	bfa0bdf3d6	email digests: Process users in chunks of 30. This should make the queue empty more quickly, because we do bulk queries to prevent database hops.	2021-01-17 11:28:30 -08:00
Steve Howell	e0b451730a	email digests: Extract get_new_streams. This makes us more efficient when handling multiple users. We don't have to keep sending the same two queries to the database. Note that as part of this we eliminated a failure mode for the obscure population of users from whom both `user.is_guest` and `user.can_access_public_streams()` returns False. We know this would have only affected Zephyr users (by looking at the code), and we know we don't actually process Zephyr users for email digests (or else we would have raised exceptions in the old code).	2021-01-17 11:28:30 -08:00
Steve Howell	23de94504f	email digests: Query streams for messages up front. This should save us many hops to the database when we process users in bulk.	2021-01-17 11:28:30 -08:00
Steve Howell	f8bbb7fea9	email digests: Use select_related("realm"). We mostly need realm_id, but when we go to build message lists, we need realm.uri. We could probably be more aggresive about using `only` here, but for now I am just trying to reduce hops to the database.	2021-01-17 11:28:29 -08:00
Steve Howell	52e2d5a733	email digests: Avoid long_term_idle check. We want to exclude users with recent subscription activity from emails, regardless of whether the long_term_idle flag is set.	2021-01-17 11:28:29 -08:00
Steve Howell	162b372b93	email digests: Do one query for recent streams. This is another way to limit hops to the database when we process users in bulk.	2021-01-17 11:28:29 -08:00
Steve Howell	e2e0f06b2a	email digests: Call get_recent_topics once per batch. Once we start processing digests in batch, this will let us amortize the expense of the message query over multiple users.	2020-11-16 08:59:29 -08:00
Steve Howell	1d1e45e9ec	digests: Use UserActivityInterval for user activity. Note that we are much more efficient about finding active users here: - we do one query per realm (instead of per-user) - we pass the cutoff date to the database - we get back just a list of distinct ids	2020-11-16 08:59:29 -08:00
Steve Howell	b52f56080e	performance: Just get user_ids to queue digest emails.	2020-11-16 08:59:29 -08:00
Steve Howell	d0260392f7	digests: Get user objects from the database. The query counts increase here for somewhat contrived reasons. The tests before this commit reflected a successful trip to the UserProfile cache, but that's not actually realistic in practice.	2020-11-16 08:59:29 -08:00
Steve Howell	7737413cec	digest tests: Improve gather_new_streams test. We don't need to mock the dates here. We also explicitly clear out all streams first, and then we explicitly test with both the stream being current and the stream being old.	2020-11-16 08:59:28 -08:00
Steve Howell	9538edde06	digest tests: Simplify bots test. We can use the _enqueue_emails_for_realm helper to avoid all the Tuesday-related logic here. We also don't bother to create UserActivity records, since the bot gets excluded by virtue of its being a bot. (Also, the date ranges here were sketchy due to the time mocking.)	2020-11-16 08:59:28 -08:00
Steve Howell	0624833af6	digest tests: Improve Tuesday tests. If we're mocking time, we should do it consistently.	2020-11-16 08:59:28 -08:00
Steve Howell	2f4d7a6171	tests: Fix test_inactive_users_queued_for_digest. We can avoid all the date mocking now for all but a couple tests that exercise the is-it-Tuesday logic. And this test now correctly tests that we exclude recently active users. And this allows us to remove the other test.	2020-11-16 08:59:28 -08:00
Steve Howell	cf6bcfb84a	digest emails: Exclude users who had recent digests. This code protects us in case we ever need to re-run email digests twice in the same day.	2020-11-16 08:59:28 -08:00
Steve Howell	fb3d4c1618	digest tests: Avoid warnings about naive time.	2020-11-16 08:59:28 -08:00
Steve Howell	4271442fba	email digests: Write RealmAuditLog rows.	2020-11-16 08:59:28 -08:00
Steve Howell	c5dc9d386f	refactor: Use sets of stream_ids for email digests. I now use sets for stream_ids in more of the digest code. As part of this I replaced exclude_subscription_modified_streams with streams_recently_modified_for_user. It's easier for the caller to just ask for ids to delete from its callee than it is to pass in a set/list to mutate. The simpler boundary between the functions makes the tests easier to write--you can see the `filtered_streams` logic goes away in this diff. I also make the tests a bit more thorough by using combinations of Cordelia/Othello and Verona/Denmark to try to find multiple possible flaws. And I make the time intervals longer than 1s to avoid false negatives from slow CI boxes.	2020-11-05 17:42:43 -08:00
Steve Howell	88a57ed4ac	bulk digest: Get stream subscriptions in bulk. If we have multiple users, this reduces the amount of queries we need to do, because we get all subscriptions for all users in a single query to Subscription. For the single-user case, we are introducing an extra query hop, but the database is doing roughly the same work, because we are just breaking up this complex query into two hops: messages = select ... from message where recipient__type_id in ( select stream_id from subscription where ... ) Now it's more like: stream_ids = select stream_id from subscription where ... messages = select ... from message where recipient__type_id in stream_ids	2020-11-05 09:36:59 -08:00
Steve Howell	c83db37161	email digests: Introduce bulk methods for digest. Note that we are not changing anything semantically or algorithmically yet. The only overhead here for the single-user case is boxing and unboxing data into single-item dicts and lists. The interfaces for callers in the view and the queue processor remain the same for now.	2020-11-05 09:36:59 -08:00
Steve Howell	0e2d02b0a2	digest tests: Count cache tries.	2020-11-05 09:36:59 -08:00
Steve Howell	127f4e1291	digest tests: Add more users to bulk digest test.	2020-11-05 09:36:59 -08:00

1 2

88 Commits