Commit Graph

61 Commits

Author SHA1 Message Date
Steve Howell 1040fb7219 email digests: Remove handle_digest_email shim.
The previous commit made it so we only call the
shim in tests, so now we completely remove it.
2021-01-17 11:28:30 -08:00
Steve Howell bfa0bdf3d6 email digests: Process users in chunks of 30.
This should make the queue empty more quickly,
because we do bulk queries to prevent database
hops.
2021-01-17 11:28:30 -08:00
Steve Howell e0b451730a email digests: Extract get_new_streams.
This makes us more efficient when handling
multiple users.  We don't have to keep
sending the same two queries to the database.

Note that as part of this we eliminated
a failure mode for the obscure population
of users from whom both `user.is_guest` and
`user.can_access_public_streams()` returns
False.  We know this would have only affected
Zephyr users (by looking at the code), and
we know we don't actually process Zephyr
users for email digests (or else we would
have raised exceptions in the old code).
2021-01-17 11:28:30 -08:00
Steve Howell 23de94504f email digests: Query streams for messages up front.
This should save us many hops to the database when
we process users in bulk.
2021-01-17 11:28:30 -08:00
Steve Howell f8bbb7fea9 email digests: Use select_related("realm").
We mostly need realm_id, but when we go to build
message lists, we need realm.uri.

We could probably be more aggresive about using
`only` here, but for now I am just trying to
reduce hops to the database.
2021-01-17 11:28:29 -08:00
Steve Howell 52e2d5a733 email digests: Avoid long_term_idle check.
We want to exclude users with recent subscription
activity from emails, regardless of whether
the long_term_idle flag is set.
2021-01-17 11:28:29 -08:00
Steve Howell 162b372b93 email digests: Do one query for recent streams.
This is another way to limit hops to the database
when we process users in bulk.
2021-01-17 11:28:29 -08:00
Steve Howell e2e0f06b2a email digests: Call get_recent_topics once per batch.
Once we start processing digests in batch, this will
let us amortize the expense of the message query
over multiple users.
2020-11-16 08:59:29 -08:00
Steve Howell 1d1e45e9ec digests: Use UserActivityInterval for user activity.
Note that we are much more efficient about finding
active users here:

    - we do one query per realm (instead of per-user)
    - we pass the cutoff date to the database
    - we get back just a list of distinct ids
2020-11-16 08:59:29 -08:00
Steve Howell b52f56080e performance: Just get user_ids to queue digest emails. 2020-11-16 08:59:29 -08:00
Steve Howell d0260392f7 digests: Get user objects from the database.
The query counts increase here for somewhat
contrived reasons.  The tests before this
commit reflected a successful trip to the
UserProfile cache, but that's not actually
realistic in practice.
2020-11-16 08:59:29 -08:00
Steve Howell 7737413cec digest tests: Improve gather_new_streams test.
We don't need to mock the dates here.  We also
explicitly clear out all streams first, and then
we explicitly test with both the stream being
current and the stream being old.
2020-11-16 08:59:28 -08:00
Steve Howell 9538edde06 digest tests: Simplify bots test.
We can use the _enqueue_emails_for_realm helper
to avoid all the Tuesday-related logic here.

We also don't bother to create UserActivity
records, since the bot gets excluded by virtue
of its being a bot.  (Also, the date ranges
here were sketchy due to the time mocking.)
2020-11-16 08:59:28 -08:00
Steve Howell 0624833af6 digest tests: Improve Tuesday tests.
If we're mocking time, we should do it consistently.
2020-11-16 08:59:28 -08:00
Steve Howell 2f4d7a6171 tests: Fix test_inactive_users_queued_for_digest.
We can avoid all the date mocking now for all
but a couple tests that exercise the is-it-Tuesday
logic.

And this test now correctly tests that we exclude
recently active users.

And this allows us to remove the other test.
2020-11-16 08:59:28 -08:00
Steve Howell cf6bcfb84a digest emails: Exclude users who had recent digests.
This code protects us in case we ever need to re-run
email digests twice in the same day.
2020-11-16 08:59:28 -08:00
Steve Howell fb3d4c1618 digest tests: Avoid warnings about naive time. 2020-11-16 08:59:28 -08:00
Steve Howell 4271442fba email digests: Write RealmAuditLog rows. 2020-11-16 08:59:28 -08:00
Steve Howell c5dc9d386f refactor: Use sets of stream_ids for email digests.
I now use sets for stream_ids in more of the digest
code.

As part of this I replaced exclude_subscription_modified_streams
with streams_recently_modified_for_user.

It's easier for the caller to just ask for ids
to delete from its callee than it is to pass
in a set/list to mutate.

The simpler boundary between the functions makes
the tests easier to write--you can see the
`filtered_streams` logic goes away in this diff.

I also make the tests a bit more thorough by using
combinations of Cordelia/Othello and Verona/Denmark
to try to find multiple possible flaws.

And I make the time intervals longer than 1s to
avoid false negatives from slow CI boxes.
2020-11-05 17:42:43 -08:00
Steve Howell 88a57ed4ac bulk digest: Get stream subscriptions in bulk.
If we have multiple users, this reduces the amount
of queries we need to do, because we get all
subscriptions for all users in a single query
to Subscription.

For the single-user case, we are introducing an
extra query hop, but the database is doing
roughly the same work, because we are just breaking
up this complex query into two hops:

    messages =
        select ...  from message
        where recipient__type_id in (
            select stream_id from subscription
            where ...
        )

Now it's more like:

    stream_ids =
        select stream_id from subscription
        where ...

    messages =
        select ... from message
        where recipient__type_id in stream_ids
2020-11-05 09:36:59 -08:00
Steve Howell c83db37161 email digests: Introduce bulk methods for digest.
Note that we are not changing anything semantically
or algorithmically yet.  The only overhead here
for the single-user case is boxing and unboxing
data into single-item dicts and lists.

The interfaces for callers in the view and the
queue processor remain the same for now.
2020-11-05 09:36:59 -08:00
Steve Howell 0e2d02b0a2 digest tests: Count cache tries. 2020-11-05 09:36:59 -08:00
Steve Howell 127f4e1291 digest tests: Add more users to bulk digest test. 2020-11-05 09:36:59 -08:00
Steve Howell 89cb3fa841 digest tests: Localize mocks.
We didn't need the enough-traffic mock.

We also continue to prep for testing multiple users.

I also finally remove a comment that is about to
be addressed (and which inaccurately refers to huddles).
2020-11-05 09:36:59 -08:00
Steve Howell 1ec16dd1da digest tests: Prep to test bulk digests.
All this does, essentially, is put the logic
we used to test for othello inside of a loop.

We'll add more users in the next commit.
2020-11-05 09:36:59 -08:00
Steve Howell 637f596751 tests: Fix queries_captured to clear cache up front.
Before this change we were clearing the cache on
every SQL usage.

The code to do this was added in February 2017
in 6db4879f9c.

Now we clear the cache just one time, but before
the action/request under test.

Tests that want to count queries with a warm
cache now specify keep_cache_warm=True.  Those
tests were particularly flawed before this change.

In general, the old code both over-counted and
under-counted queries.

It under-counted SQL usage for requests that were
able to pull some data out of a warm cache before
they did any SQL.  Typically this would have bypassed
the initial query to get UserProfile, so you
will see several off-by-one fixes.

The old code over-counted SQL usage to the extent
that it's a rather extreme assumption that during
an action itself, the entries that you put into
the cache will get thrown away.  And that's essentially
what the prior code simulated.

Now, it's still bad if an action keeps hitting the
cache for no reason, but it's not as bad as hitting
the database.  There doesn't appear to be any evidence
of us doing something silly like fetching the same
data from the cache in a loop, but there are
opportunities to prevent second or third round
trips to the cache for the same object, if we
can re-structure the code so that the same caller
doesn't have two callees get the same data.

Note that for invites, we have some cache hits
that are due to the nature of how we serialize
data to our queue processor--we generally just
serialize ids, and then re-fetch objects when
we pop them off the queue.
2020-11-05 09:35:15 -08:00
Clara Dantas 8674287192 digest: Support digest of web public streams for guest users.
This change requires some basic plumbing for test code creating
web-public streams.
2020-09-25 16:11:04 -07:00
Anders Kaseorg 022c4fbfc7 Revert "digest: Support digest of web public streams for guest users."
This reverts commit c3779338c6 (part
of #14638), which incorrectly depended on commits from the future,
with the effect of either halting the flow of entropic time in an
irresolvable temporal paradox, summoning extradimensional beings to
rain destruction on the galaxy, or failing CI.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-29 21:05:59 -07:00
Clara Dantas c3779338c6 digest: Support digest of web public streams for guest users. 2020-07-29 17:52:36 -07:00
Steve Howell c44500175d database: Remove short_name from UserProfile.
A few major themes here:

    - We remove short_name from UserProfile
      and add the appropriate migration.

    - We remove short_name from various
      cache-related lists of fields.

    - We allow import tools to continue to
      write short_name to their export files,
      and then we simply ignore the field
      at import time.

    - We change functions like do_create_user,
      create_user_profile, etc.

    - We keep short_name in the /json/bots
      API.  (It actually gets turned into
      an email.)

    - We don't modify our LDAP code much
      here.
2020-07-17 11:15:15 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 8dd83228e7 python: Convert "".format to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus --keep-percent-format, but with the
NamedTuple changes reverted (see commit
ba7906a3c6, #15132).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-08 15:31:20 -07:00
Anders Kaseorg 8d20d1e632 models: Set a timezone on the MutedTopic.date_muted default.
Fixes warnings like this:

/srv/zulip-py3-venv/lib/python3.8/site-packages/django/db/models/fields/__init__.py:1424: RuntimeWarning: DateTimeField MutedTopic.date_muted received a naive datetime (2020-01-01 00:00:00) while time zone support is active.
 warnings.warn("DateTimeField %s received a naive datetime (%s)"

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-05 09:34:17 -07:00
Anders Kaseorg 840cf4b885 requirements: Drop direct dependency on mock.
mock is just a backport of the standard library’s unittest.mock now.

The SAMLAuthBackendTest change is needed because
MagicMock.call_args.args wasn’t introduced until Python
3.8 (https://bugs.python.org/issue21269).

The PROVISION_VERSION bump is skipped because mock is still an
indirect dev requirement via moto.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-26 11:40:42 -07:00
Udit107710 16218d6de3 streams: Remove dependency of streams on actions.
Refactored code in actions.py and streams.py to move stream related
functions into streams.py and remove the dependency on actions.py.

validate_sender_can_write_to_stream function in actions.py was renamed
to access_stream_for_send_message in streams.py.
2020-04-18 16:56:59 -07:00
Anders Kaseorg c734bbd95d python: Modernize legacy Python 2 syntax with pyupgrade.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-09 16:43:22 -07:00
Steve Howell 1b16693526 tests: Limit email-based logins.
We now have this API...

If you really just need to log in
and not do anything with the actual
user:

    self.login('hamlet')

If you're gonna use the user in the
rest of the test:

    hamlet = self.example_user('hamlet')
    self.login_user(hamlet)

If you are specifically testing
email/password logins (used only in 4 places):

    self.login_by_email(email, password)

And for failures uses this (used twice):

    self.assert_login_failure(email)
2020-03-11 17:10:22 -07:00
Steve Howell 00dc976379 tests: Use users for common_subscribe_to_streams.
We also use users for get_streams().
2020-03-11 14:18:29 -07:00
Steve Howell 5e2a32c936 tests: Use users in send_*_message.
This commit mostly makes our tests less
noisy, since emails are no longer an important
detail of sending messages (they're not even
really used in the API).

It also sets us up to have more scrutiny
on delivery_email/email in the future
for things that actually matter.  (This is
a prep commit for something along those
lines, kind of hard to explain the full
plan.)
2020-03-07 18:30:13 -08:00
Mateusz Mandera dbe508bb91 models: Migration of Message.pub_date to date_sent, part 2.
Fixes #1727.

With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.
2019-10-05 19:01:34 -07:00
Vishnu Ks 66ee5b870a emails: Extract get_narrow_url into a function.
Reorganized a bit by tabbott to avoid doing a bunch of database
queries twice.
2019-06-28 11:38:17 -07:00
Puneeth Chaganti 62d9ad534c digest: Trigger additional query to make tests more deterministic.
A couple of tests asserted that the number of queries were within a range,
because they ran one additional query when they were run individually, as
compared to running all the tests in `TestDigestEmailMessages`. We now trigger
these additional queries within the tests, to make the tests deterministic and
assert that the number of queries is a number, instead of a range.
2019-05-09 15:10:05 -07:00
Puneeth Chaganti ab2850c225 digest: Re-enable digest emails for soft deactivated users.
Digest emails were disabled for soft deactivated users, since UserMessage
objects are created for such users lazily when they return.

We now compute the message list for gathering hot conversations by looking at
all the messages sent to the streams where the user is subscribed, while they
were subscribed.

Fixes #6297
2019-05-09 15:10:05 -07:00
Puneeth Chaganti 6abed82fb9 digest: Use one hour cutoff to generate digest emails in test.
Otherwise, the test may flake on a slow/hosed machine, where simulating a
conversation takes longer than 1 sec.
2019-05-09 15:10:05 -07:00
Puneeth Chaganti d474a41c03 digest: Turn off digest_emails_enabled flag for realms by default. 2019-05-08 14:39:12 -07:00
Puneeth Chaganti 735b6cb761 digest: Remove code to gather new users and unread pms. 2019-05-06 17:43:53 -07:00
Puneeth Chaganti be762f9485 digest: Strip down the digest email removing a lot of fluff. 2019-05-06 17:43:52 -07:00
Puneeth Chaganti cb5e9107f4 digest: Directly fetch recipient ids from the DB.
Instead of iterating over Subscriptions and creating the list of home view
recipients, the query now only fetches recipient IDs from the DB.
2019-03-09 23:25:26 -08:00
Roman Godov 9c8eeaed85 digest_email: Add endpoint for rendering digest to the web.
Adds "/digest/" endpoint for rendering content of digest email
to the web.

Fixes #9974
2018-12-11 13:38:30 -08:00
Raymond Akornor 92dc3637df send_email: Add support for multiple recipients.
This adds a function that sends provided email to all administrators
of a realm, but in a single email. As a result, send_email now takes
arguments to_user_ids and to_emails instead of to_user_id and
to_email.

We adjust other APIs to match, but note that send_future_email does
not yet support the multiple recipients model for good reasons.

Tweaked by tabbott to modify `manage.py deliver_email` to handle
backwards-compatibily for any ScheduledEmail objects already in the
database.

Fixes #10896.
2018-12-03 15:12:11 -08:00