Commit Graph

801 Commits

Author SHA1 Message Date
Tim Abbott 28eaf5620e emails: Use common_context for email change notifications.
This also lets us remove `realm_uri`.
2017-09-27 16:48:18 -07:00
Steve Howell 1b518f1983 Return mentioned users in get_user_info_for_message_updates().
The dictionary result for get_user_info_for_message_updates()
now has a `mention_user_ids` field that is a set of user ids
who were mentioned in a message.
2017-09-27 16:01:50 -07:00
Steve Howell 646abb57b7 refactor: Extract get_user_info_for_message_updates.
We'll want to expand this to get users that were mentioned in
the prior message, but this commit is just a refactoring.
2017-09-27 16:01:50 -07:00
Steve Howell 5abf52de71 Extract get_idle_userids().
This function will help us send missed-message mails for
updates, in a future commit.
2017-09-27 16:01:50 -07:00
rht f43e54d352 zerver/lib: Remove absolute_import. 2017-09-27 10:00:39 -07:00
Steve Howell b340b28055 Extract get_service_bot_events().
There are several reasons to extract this function:

    * It's easy to unit test without extensive mocking.
    * It will show up when we profile code.
    * It is something that you can mostly ignore for
      most messages.

The main reason to extract this, though, is that we are about
to do some fairly complex splicing of data for the use case
of mentioning service bots on streams they are not subscribed to,
and we want to localize the complexity.
2017-09-26 18:49:03 -07:00
Tim Abbott c11b1623dd addressee: Accept a realm object in legacy_build.
This fixes a bug where the internal_prep_message code path would
incorrectly ignore the `realm` that was passed into it.  As a result,
attempts to send messages using the system bots with this code path
would crash.

As a sidenote, we really need to make our test system consistent with
production in terms of whether the user's realm is the same as the
system realm.
2017-09-25 14:06:29 -07:00
Tim Abbott 14c1660a55 addressee: Pass realm into get_user_profiles.
We don't access any attributes of the sender other than the realm, and
as it turns out, we in some cases want to use a different realm than
the sender's.
2017-09-25 14:00:46 -07:00
Tim Abbott 8e2c91b09c actions: Use internal_send_private_message. 2017-09-25 13:59:04 -07:00
Tim Abbott 1edd137263 RealmAuditLog: Pass acting_user to do_reactivate_user. 2017-09-22 07:33:02 -07:00
Rishi Gupta 6ec3595b77 emails: Change enqueue_welcome_emails to take a user rather than user_id. 2017-09-22 06:20:33 -07:00
Rishi Gupta a7c8770f97 emails: Move enqueue_welcome_emails outside of signups queue.
The only thing this queue should do is sign you up for the newsletter, since
it is only populated if newsletter_data is not None.
2017-09-22 06:20:33 -07:00
Tim Abbott f706f657c0 signup: Fix invitation emails not being cleared properly.
Previously, invitation reminder emails were only being cleared after a
successful signup if newsletter_data was available, since that was the
circumstance in which we were calling the relevant queue processor
code.  Now, we (1) clear them when a human user finishes signing up
and (2) correctly clear them using the 'address' field of
ScheduleEmail, not user_id.
2017-09-21 06:15:11 -07:00
Steve Howell 428d3027c2 Only require ids for finding DefaultStream objects.
We don't need full Realm objects to find DefaultStream
objects for a realm.  So now a few functions related to
adding/removing default streams use realm_id for lookups.

Similarly, we don't need a full Stream object to find
out if a stream exists in DefaultStream, so we do id
lookups there as well.

This sets us up to use thinner objects in callers.
2017-09-20 10:31:33 -07:00
Steve Howell fd58d472a5 Use update_fields in do_deactivate_stream.
We are generally explicit about which fields we save.
2017-09-20 10:31:33 -07:00
Steve Howell 7b2340decd Extract stream_name_in_use().
Checking to see if a stream exists is more idiomatic
if we just use exists() from Django.  We encapsulate it
for case insensitivity purposes.
2017-09-20 10:31:33 -07:00
Steve Howell 9046efb71a Use `prereg.users.set()` in do_invite_users.
This is a bit more idiomatic for many-to-many relationships.
2017-09-20 10:31:33 -07:00
Steve Howell 8ad7133351 Cache active_user_ids() more directly.
We now have a dedicated cache for active_user_ids() that only
stores a list of user_ids.

Before this commit, active_user_ids() used a cache of UserProfile
dictionaries, so it incurred unnecessary deserialization costs for
all the user fields that it sliced away in a list comprehension.

Because the cache is skinnier here, we also need to invalidate it
less frequently.  Basically, all we care about is new users, realm
deactivations, and user deactivations.

It's hard to measure how much this will improve performance, because
the speedup for any operation here is pretty minor, but we use this
function a lot, so hopefully it will make the overall system more
healthy.
2017-09-20 10:31:33 -07:00
Steve Howell cad3a35b6a Only require realm_id for active_user_ids().
This is mostly a preparatory commit for an upcoming optimization
related to stream data, but it probably does save us an
occasional DB hop to the realm table.
2017-09-20 10:31:33 -07:00
Steve Howell 26735eeeac Only require realm_id for get_active_user_dicts_in_realm().
This is a preparatory commit that will eventually allow us
to avoid fetching realm info that we don't need, in other
parts of the codebase.
2017-09-20 10:31:33 -07:00
Steve Howell 0966bf1a48 Simplify get_stream_cache_key().
Before this commit, we could pass in either a Realm object
or a realm_id to get_stream_cache_key().  Now we consistently
pass it a realm_id.
2017-09-20 10:31:33 -07:00
Greg Price 0c7dbd2e8a message send: Cut is_active from the values query in get_recipient_info.
This is unused since the query started filtering on is_active=True, in
51d4f16fe "Ignore inactive users in get_recipient_info()."
2017-09-19 20:08:39 -07:00
Steve Howell c4b17f2f80 Optimize get_recipient_info() with get_ids_for().
The get_ids_for() function produces a 2x speedup for
1000 users.
2017-09-16 03:07:13 -07:00
Steve Howell 84041d3195 Use itertools.groupby in bulk_get_subscriber_user_ids().
This results in about a 20% speedup by making more O(N)
things happen in C vs. Python.
2017-09-15 10:44:32 -07:00
Steve Howell 24b9f72b22 Use raw SQL in bulk_get_subscriber_user_ids().
This leads to more than a 2x speedup when tested with
20k+ total subscribers.  (For large realms with lots of default
streams, this function deals with LOTS of data, so it is important
to optimize.)
2017-09-15 10:44:32 -07:00
Steve Howell 1553dc00e0 Introduce StreamRecipient class.
This class encapsulates the mapping of stream ids to
recipient ids, and it is optimized for bulk use and
repeated use (i.e. it remembers values it already fetched).

This particular commit barely improves the performance
of gather_subscriptions_helper, but it sets us up for
further optimizations.

Long term, we may try to denormalize stream_id on to the
Subscriber table or otherwise modify the database so we
don't have to jump through hoops to do this kind of mapping.
This commit will help enable those changes, because we
isolate the mapping to this one new class.
2017-09-15 10:44:32 -07:00
Steve Howell fc2e485ca7 Sort emails in gather_subscriptions().
This helps makes the tests more deterministic.
2017-09-15 10:44:32 -07:00
Steve Howell 51d4f16fe0 Ignore inactive users in get_recipient_info().
We were mostly excluding inactive users before this fix, but
now we completely ignore them.

This potentially changes some of the data we return from
get_recipient_info(), but the extra user ids before this fix
were effectively ignored by the caller.
2017-09-15 03:08:52 -07:00
Steve Howell 1759137e4f Don't queue feedback unless the bot is active.
The prior code would queue up feedback messages even if the
feedback bot was deactivated, which was just due to oversight
most likely.  (People probably rarely disable the feedback bot,
but they should have that option.)
2017-09-15 03:08:52 -07:00
Tim Abbott 5722237f59 push: Rename received_pm to private_message.
This is a clearer name for this now more broadly used interface.
2017-09-14 05:41:37 -07:00
Sarah c3a8138f74 user_settings: Add push notifications for all stream messages.
Add setting to enable push notifications for all stream messages.
2017-09-14 05:41:37 -07:00
Steve Howell 41e3a819da Inline get_recipient_user_ids() into two callers.
This sets us up a subsequent commit where we need more data
from the Subscription table to build recipient info, so the
function boundary doesn't work any more for get_recipient_info,
which is part of the heavily optimized send-message
path.

We used to share code here with typing notifications, but
typing notifications need a lot less data than the
send-message path, so it's useful to decouple these two
things.  The idioms that are duplicated here are pretty simple
one-liners.
2017-09-14 05:13:58 -07:00
Steve Howell 6c90940f84 performance: Add UserMessageLite class.
This speeds up sending messages significantly.

For 1000 users, this speeds up create_user_messages from
0.652s to 0.0558s, so basically a 10x speedup.
2017-09-12 04:22:55 -07:00
Steve Howell 811fcf51ee Extract create_user_messages.
The logic to create UserMessage rows when you create a message
is very self-contained, and it's helpful to be able to profile it.
2017-09-12 04:22:55 -07:00
Steve Howell 7fbffb8e30 Optimize bulk inserts for UserMessage rows.
Avoiding ORM overhead makes inserting UserMessage rows
about 15 times faster.
2017-09-12 04:22:55 -07:00
Steve Howell d723be125a Optimize get_recipient_info() for sending messages.
This commit makes get_recipient_info() faster by never creating
Django ORM objects.  We use the ORM to create a values query
instead, and then we iterate over the rows to create various
collections of ids.

In order to avoid lots of code duplication, this commit unifies
how we query UserProfile for PMs and streams.  Prior to this
commit we were getting "wide" UserProfile objects out of
our memcached cache.  Now we just go to the database with our
list of userids.  The new approach at worst adds one hop to the
database for PMs, which aren't really a performance bottleneck
(compared to streams).  And the new approach actually saves a
hop when both partners aren't in cache (plus we don't pay the
penalty of hitting the cache itself).

The performance improvement here is easy to measure for messages
to streams with many users, even with all the other activity
that goes on inside do_send_messages().  I took test_performance()
in test_messages.py, set num_extra_users to 3000, and consistently
measured a ~20% speedup in do_send_messages().

This commit also eliminates fetching of emails.  We probably
could have done that in a prior commit, but in this commit it
is very explicit that we don't need it.  While removing email
from the query is a no-brainer, it actually had a negigible
impact on performance.  Almost all the savings here comes from
not create UserProfile objects.
2017-09-12 04:22:55 -07:00
Steve Howell d00c001b5f Create get_recipient_info().
This function returns a summary of recipient data for a message
that's being sent.  It's mostly just moving code into the
old function called get_recipient_user_profiles().
2017-09-12 04:22:55 -07:00
Steve Howell b562dedb53 Avoid using email to detect that the feedback bot is addressed.
This commit is necessary to prevent bringing back emails from the
DB for all N recipients of a message just to see if the feedback
bot is being invoked.
2017-09-12 04:22:55 -07:00
Steve Howell 6f0289ae79 do_send_messages(): Extract internal push_notify_user_ids set.
This is one more step toward not needing UserProfile objects.
2017-09-12 04:22:55 -07:00
Steve Howell 82b2bd8b65 Take user_ids in get_userids_for_missed_messages().
This helps us phase out the need for getting lots of UserProfile
objects.
2017-09-12 04:22:55 -07:00
Steve Howell 06c388774f do_send_messages(): Clean up service bot code.
We calculate `service_bot_tuples` earlier in the function, so that
we don't need "full" UserProfile objects later in the function.

This is part of consolidating code that basically just needs to
triage user_ids.
2017-09-12 04:22:55 -07:00
Steve Howell a22a22966f do_send_messages(): Create UserMessage objects with user_id.
This starts to phase out the need for UserProfile objects in
do_send_messages().  UserProfile objects are expensive to create
for large streams with lots of users.  The objects in the code
before this commit aren't even full UserProfile objects.

This change mostly sets up future performance improvements, but
we also get a minor speedup here when we run a test with 3000
stream subscribers.
2017-09-12 04:22:55 -07:00
Steve Howell ba397b5109 Use user_ids, not full objects, in render path.
There is no reason for either render_incoming_message() or
render_markdown() to require full UserProfile objects just to
triage alert words.

By only asking for user_ids, we save extra queries in two
callpaths and we make it easier to start using user_ids in
do_send_messages().
2017-09-12 04:22:55 -07:00
Steve Howell 9e8c24168d Extract get_typing_user_profiles().
This function is essentially a copy of get_recipient_user_profiles,
which is about to go away. The new function enforces the contract of
typing indicators, which is that they don't apply to streams, which
allows us to use a relatively simple approach for getting user
profile objects.

We are diverging this code, because the send-message path needs
more optimizations.
2017-09-12 04:22:55 -07:00
Steve Howell c87cc1447f Extract get_recipient_user_ids. 2017-09-12 04:22:55 -07:00
Steve Howell 56a552eec3 Get UserProfile objects directly for stream messages.
This change introduces an extra hop to the database, but it is
generally faster due to nuances of the DB and the ORM.  It
also sets us up to optimize get_recipient_user_profiles() by
avoiding creating ORM objects.

I measured the impact of this using a stream with 3000
subscribers, half of whom were idle, and it speeds things up
by 10%.
2017-09-12 04:22:55 -07:00
Steve Howell f5edeb01ae Calculate idle users more efficiently when sending messages.
Usually a small minority of users are eligible to receive missed
message emails or mobile notifications.

We now filter users first before hitting UserPresence to find idle
users.  We also simply check for the existence of recent activity
rather than borrowing the more complicated data structures that we
use for the buddy list.
2017-09-07 06:59:44 -07:00
Steve Howell 97c5f085e7 minor: Extract locals in do_send_messages().
This is a prepartory commit for another refactoring.
2017-09-07 06:59:44 -07:00
Steve Howell 4ac6bc46c7 Add MutedTopic model.
This commit completely switches us over to using a
dedicated model called MutedTopic to track which topics
a user has muted.

This includes the necessary migrations to create the
table and populate it from legacy data in UserProfile.

A subsequent commit will actually remove the old field
in UserProfile.
2017-09-02 09:19:51 -07:00
Steve Howell 0501570cd1 Remove POST-based API for setting topic mutes. 2017-08-29 16:53:38 -04:00