This decouples from Chrome notifications, which gives us cross-platform
support in at least modern browsers.
We log this action so its replayable in our message logs.
This implements the model change indicated by the previous schema commit.
(imported from commit b21213cdde54f43670bbb0bf1f607147fc732b38)
In repeated trials, the initial data fetch used to take about 1100ms.
In practice, it was often taking >2000ms, probably due to caching
effects. This commit cuts the time down to about 300ms in repeated
trials.
Note that the semantics are changed slightly in that we may no longer
get exactly 25000 messages. However, holes in the message_id
sequence are currently very rare or non-existent so this shouldn't be
a problem and we don't care about the exact number of messages
anyway.
I believe the problem was that the query planner was unable to
effectively use the LIMIT clause to figure out that only a small
subset of zephyr_message was going to be needed. Thus, it planned
for operating on the entire table and decided it could not use a more
efficient plan because work_mem, although large, would not be large
enough to execute the query over all of zephyr_message.
The original query was:
SELECT "zephyr_message"."id", "zephyr_message"."sender_id", "zephyr_message"."recipient_id", "zephyr_message"."subject", "zephyr_message"."content", "zephyr_message"."rendered_content", "zephyr_message"."rendered_content_version", "zephyr_message"."pub_date", "zephyr_message"."sending_client_id", "zephyr_userprofile"."id", "zephyr_userprofile"."password", "zephyr_userprofile"."last_login", "zephyr_userprofile"."email", "zephyr_userprofile"."is_staff", "zephyr_userprofile"."is_active", "zephyr_userprofile"."date_joined", "zephyr_userprofile"."full_name", "zephyr_userprofile"."short_name", "zephyr_userprofile"."pointer", "zephyr_userprofile"."last_pointer_updater", "zephyr_userprofile"."realm_id", "zephyr_userprofile"."api_key", "zephyr_userprofile"."enable_desktop_notifications", "zephyr_userprofile"."enter_sends", "zephyr_userprofile"."tutorial_status", "zephyr_realm"."id", "zephyr_realm"."domain", "zephyr_realm"."restricted_to_domain", "zephyr_recipient"."id", "zephyr_recipient"."type_id", "zephyr_recipient"."type", "zephyr_client"."id", "zephyr_client"."name" FROM "zephyr_message" INNER JOIN "zephyr_userprofile" ON ( "zephyr_message"."sender_id" = "zephyr_userprofile"."id" ) INNER JOIN "zephyr_realm" ON ( "zephyr_userprofile"."realm_id" = "zephyr_realm"."id" ) INNER JOIN "zephyr_recipient" ON ( "zephyr_message"."recipient_id" = "zephyr_recipient"."id" ) INNER JOIN "zephyr_client" ON ( "zephyr_message"."sending_client_id" = "zephyr_client"."id" ) ORDER BY "zephyr_message"."id" DESC LIMIT 25000;
with query plan:
Limit (cost=0.00..27120.95 rows=25000 width=362) (actual time=0.051..1121.282 rows=25000 loops=1)
-> Nested Loop (cost=0.00..5330872.99 rows=4913981 width=362) (actual time=0.048..1081.014 rows=25000 loops=1)
-> Nested Loop (cost=0.00..3932643.31 rows=4913981 width=344) (actual time=0.042..926.398 rows=25000 loops=1)
-> Nested Loop (cost=0.00..2550275.29 rows=4913981 width=334) (actual time=0.035..752.524 rows=25000 loops=1)
Join Filter: (zephyr_message.sending_client_id = zephyr_client.id)
-> Nested Loop (cost=0.00..1739467.29 rows=4913981 width=320) (actual time=0.024..217.348 rows=25000 loops=1)
-> Index Scan Backward using zephyr_message_pkey on zephyr_message (cost=0.00..362510.09 rows=4913981 width=156) (actual time=0.014..42.097 rows=25000 loops=1)
-> Index Scan using zephyr_userprofile_pkey on zephyr_userprofile (cost=0.00..0.27 rows=1 width=164) (actual time=0.003..0.004 rows=1 loops=25000)
Index Cond: (id = zephyr_message.sender_id)
-> Materialize (cost=0.00..1.17 rows=11 width=14) (actual time=0.001..0.010 rows=11 loops=25000)
-> Seq Scan on zephyr_client (cost=0.00..1.11 rows=11 width=14) (actual time=0.002..0.010 rows=11 loops=1)
-> Index Scan using zephyr_recipient_pkey on zephyr_recipient (cost=0.00..0.27 rows=1 width=10) (actual time=0.002..0.003 rows=1 loops=25000)
Index Cond: (id = zephyr_message.recipient_id)
-> Index Scan using zephyr_realm_pkey on zephyr_realm (cost=0.00..0.27 rows=1 width=18) (actual time=0.002..0.003 rows=1 loops=25000)
Index Cond: (id = zephyr_userprofile.realm_id)
Total runtime: 1141.408 ms
In the new code, we do two queries:
SELECT "zephyr_message"."id" FROM "zephyr_message" ORDER BY "zephyr_message"."id" DESC LIMIT 1
followed by:
SELECT "zephyr_message"."id", "zephyr_message"."sender_id", "zephyr_message"."recipient_id", "zephyr_message"."subject", "zephyr_message"."content", "zephyr_message"."rendered_content", "zephyr_message"."rendered_content_version", "zephyr_message"."pub_date", "zephyr_message"."sending_client_id", "zephyr_userprofile"."id", "zephyr_userprofile"."password", "zephyr_userprofile"."last_login", "zephyr_userprofile"."email", "zephyr_userprofile"."is_staff", "zephyr_userprofile"."is_active", "zephyr_userprofile"."date_joined", "zephyr_userprofile"."full_name", "zephyr_userprofile"."short_name", "zephyr_userprofile"."pointer", "zephyr_userprofile"."last_pointer_updater", "zephyr_userprofile"."realm_id", "zephyr_userprofile"."api_key", "zephyr_userprofile"."enable_desktop_notifications", "zephyr_userprofile"."enter_sends", "zephyr_userprofile"."tutorial_status", "zephyr_realm"."id", "zephyr_realm"."domain", "zephyr_realm"."restricted_to_domain", "zephyr_recipient"."id", "zephyr_recipient"."type_id", "zephyr_recipient"."type", "zephyr_client"."id", "zephyr_client"."name" FROM "zephyr_message" INNER JOIN "zephyr_userprofile" ON ( "zephyr_message"."sender_id" = "zephyr_userprofile"."id" ) INNER JOIN "zephyr_realm" ON ( "zephyr_userprofile"."realm_id" = "zephyr_realm"."id" ) INNER JOIN "zephyr_recipient" ON ( "zephyr_message"."recipient_id" = "zephyr_recipient"."id" ) INNER JOIN "zephyr_client" ON ( "zephyr_message"."sending_client_id" = "zephyr_client"."id" ) WHERE "zephyr_message"."id" > 4941883
with the message id filled in as the result of the first query. The
new query differs from the original only in that its ORDER BY and
LIMIT clauses are replaced by a WHERE clause. The second query has
query plan:
Hash Join (cost=709.30..28048.18 rows=20544 width=365) (actual time=41.678..279.261 rows=25041 loops=1)
Hash Cond: (zephyr_message.recipient_id = zephyr_recipient.id)
-> Hash Join (cost=102.98..27056.66 rows=20544 width=355) (actual time=3.686..190.730 rows=25041 loops=1)
Hash Cond: (zephyr_message.sending_client_id = zephyr_client.id)
-> Hash Join (cost=101.73..26772.94 rows=20544 width=341) (actual time=3.649..143.695 rows=25041 loops=1)
Hash Cond: (zephyr_userprofile.realm_id = zephyr_realm.id)
-> Hash Join (cost=99.99..26488.71 rows=20544 width=323) (actual time=3.578..96.746 rows=25041 loops=1)
Hash Cond: (zephyr_message.sender_id = zephyr_userprofile.id)
-> Index Scan using zephyr_message_pkey on zephyr_message (cost=0.00..26106.24 rows=20544 width=159) (actual time=0.017..41.980 rows=25041 loops=1)
Index Cond: (id > 4941883)
-> Hash (cost=83.33..83.33 rows=1333 width=164) (actual time=3.548..3.548 rows=1333 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 275kB
-> Seq Scan on zephyr_userprofile (cost=0.00..83.33 rows=1333 width=164) (actual time=0.006..1.646 rows=1333 loops=1)
-> Hash (cost=1.33..1.33 rows=33 width=18) (actual time=0.064..0.064 rows=33 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 2kB
-> Seq Scan on zephyr_realm (cost=0.00..1.33 rows=33 width=18) (actual time=0.003..0.033 rows=33 loops=1)
-> Hash (cost=1.11..1.11 rows=11 width=14) (actual time=0.027..0.027 rows=11 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on zephyr_client (cost=0.00..1.11 rows=11 width=14) (actual time=0.003..0.013 rows=11 loops=1)
-> Hash (cost=335.03..335.03 rows=21703 width=10) (actual time=37.974..37.974 rows=21761 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 893kB
-> Seq Scan on zephyr_recipient (cost=0.00..335.03 rows=21703 width=10) (actual time=0.004..18.443 rows=21761 loops=1)
Total runtime: 299.300 ms
(imported from commit b2a70cccc47be7970df407c6be00eccd2e8be82a)
The fact that we were dumping this cache and not refilling it seems to
be one of the causes of Tornado restarts being a lot slower on prod
than on local systems.
(imported from commit a32a759f4dfb591706ede1cce2d38f5c3704193c)
On my laptop, this saves about 80 milliseconds per 1000 messages
requested via get_old_messages queries. Since we only have one
memcached process and it does not run with special priority, this
might have significant impact on load during server restarts.
(imported from commit 06ad13f32f4a6d87a0664c96297ef9843f410ac5)
Timing out within the Twitter portion of the render causes the message
to still go through (without a preview). If we don't timeout here, it
causes the entire Markdown render to timeout, which rejects the
message in its entirety -- a far worse outcome.
(imported from commit f510a56f48afa46da8ec6277496fa03374cdb042)
See PEP 328[1] for details. This feature was introduced in Python 2.5 and
will become mandatory in Python 3.
[1]: http://www.python.org/dev/peps/pep-0328
(imported from commit 7444eeba8a08d5f91b94c7921848f2274979bd76)
Otherwise these logs will end up all getting split up when we switch
to the new deployment model.
(imported from commit 0514c296470be7113cab6c2f48e8dd33f1b9353d)
This commit will incorrectly list past-online users as active, a shortcoming that is
addressed in the next commit
(imported from commit b018767df686f88c0ca939c067c573e4d7cea357)
This avoids 10s of seconds of delay when you invite several people at
once through the web UI.
(imported from commit 75acdbdb04caf62bbb08affc7796330246d8a00e)
The previous code for adding users to default streams wouldn't do so
if the user didn't have a PreregistrationUser row.
(imported from commit 25f1383f6771319542d07660b29d891368889212)
This is preparatory for removing the StreamColor model, so we also set
things up so anything changing the StreamColor model changes the
Subscription model too.
The manual task is to run the copy_colors.py management command after
deployment to each of staging and prod.
(imported from commit 1be7523ca59f5266eb2c4dc2009e31209ed49635)
I think all that one needs to do to deploy this commit is on developer
laptops, run `generate-fixtures --force`.
(imported from commit 34916341435fef0875b5a2c7f53c2f5606cd16cd)
When this is deployed to staging, we need to run
./manage.py logout_all_users --realm=humbughq.com
When this is deployed to prod, we need to run
./manage.py logout_all_users
(imported from commit d6c6ea4b1c347f3d9122742db23c7b67767a7349)
This is intended to be used logging out users during our deployment of
the UserProfile merge, but it could be useful for other things too.
(imported from commit bfe896d854f997f7a4d06e5bc0f19ec5b1aa5e69)
Previously, we weren't clearing the users out of memcached (we just
killed them in the database), so in fact users were not logged out
when we deactivated them for an hour (when the memcached caches would
expire).
(imported from commit 0f0a2f70e003c184106c73b22b876f57c1ef3371)
The associated function was moved into zephyr.lib, but the file
location was never updated.
(imported from commit 24c3348533324b0af7c52d6a121eef8b00615275)
And keep the fields updated, by copying on UserProfile creation and
updating the UserProfile object whenever we're updating the User
object, and add management commands to (1) initially ensure that they
match and (2) check that they still match (aka that the updating code
is working).
The copy_user_to_userprofile migration needs to be run after this is
deployed to prod.
(imported from commit 0a598d2e10b1a7a2f5c67dd5140ea4bb8e1ec0b8)
This way we're not directly manipulating user.password() in random
management commands.
(imported from commit e6e32ae422015ab55184d5d8111148793a8aca36)
The previous situation was bad for two reasons:
(1) It had a lot of copies of the code, some of them missing pieces:
UserProfile.objects.get(user__email__iexact=foo)
This was in particular going to be inconvenient since we are dropping
the __user part of that.
(2) It didn't take advantage of our memcached caching.
(imported from commit 2325795f288a7cf306cdae191f5d3080aac0651a)
Only a few of them took a User as an argument anyway.
This is preparatory work for merging the User and UserProfile models.
(imported from commit 65b2bd2453597531bcf135ccf24d2a4615cd0d2a)
The previous version of our code only worked with python-requests <
1.0 (as is the case on our servers), the new version will work with
any python-requests new enough to have a .json at all.
(imported from commit 77ffe3e0d890fe88776c313e0e3289aee1bb30ea)
Clients can now request to receive only certain kinds of events,
although they always receive restart events.
(imported from commit 1e72981f8fe763829ab2abde1e35f94cad5c34e4)
This version has several limitations that are addressed in later
commits in this series.
(imported from commit 5d452b312d4204935059c4d602af0b9a8be1a009)
When we added rabbitmq usage within Tornado, we inadvertently caused
the Tornado ioloop to be initialized in runtornado.py's imports,
before we overwrote the _poll method. The end result was that we
weren't running the our instrumented Tornado poll function.
Fix this by moving that code to its own file which we import at the
top of runtornado.py, and adding comments documenting the situation so
we don't break this in some future import reorganization.
(imported from commit 016717476f10566fef4ed2b656f29f865d2084db)
This can result in a significant performance benefit because we only
need to update the columns that changed..
(imported from commit 42bef1fcc58ad79bd864f89263fe82e90743ee5b)
The policy this implements is:
* 1 week for most persistent data (Clients, etc.)
* 1 day for messages
(imported from commit d57bb2c6b9626ffa2155c6d0ef9b60827d1f2381)
This saves 2 database queries per user in the huddle when sending the
first message to a particular huddle.
(imported from commit f71aa32df846fb4b82651a93ff9608087ffcaa5a)
Previous we had around 4 copies of the logic for deciding whether we
should publish data via a SimpleQueueClient queue, a
TornadoQueueClient queue, or to directly handle the operation, which
resulted in their getting out of sync and buggy (see e.g. the previous
commit).
We need to add a lock around adding things to the queue to work around
a bug with pika's BlockingConnection.
I should note that the previous logic in some places had a bunch of
tests of the form "elif settings.TEST_SUITE" for doing the work that
would have been done by the queue processor directly; these should
have just been "else" clauses -- since we generally want that code to
run on development environments whether or not the test suite is
currently running.
(imported from commit 16bdbed4fff04b1bda6fde3b16bee7359917720b)
Previously we had several files which initialized SimpleQueueClient()
for sending items to the UserActivity queue, even though those code
paths aren't used outside Tornado. This resulted in slower Tornado
startup times.
(imported from commit ad97021ec18d3927233744037c548c22db33c321)
This fixes an experienced bug where you couldn't subscribe to a stream
with non-ASCII characters (failing with a UnicodeEncodeError), as well
as many other potential bugs.
(imported from commit f084a4b4b597b85935655097a7b5a163811c4d71)
This is required by Pika 0.9.8. We need at least 0.9.6 for the next
commit; I had been testing with 0.9.5 previously. Anyway this way
seems more correct as well.
(imported from commit bfb9e9e78938073001f70c4d28a5e07cc4ebac32)
This will automatically fix bugs such as one in which
internal_send_message didn't properly strip() the subject argument
before sending a message.
We change the recipient_type argument to internal_send_message to take
the recipient type name (e.g. 'stream') both to better fit the API and
also because the previous code incorrectly handled huddles.
(imported from commit 78c2596d328f6bb1ce2eaa3eed9a9e48146e3b6a)
This cache should save 2 database queries whenever we send a private
message. However, previously it was per-process (which meant it was
mostly useless) and also buggy (it never stored anything in the cache,
so that it was completely useless). Switching this to our standard
memcached setup will address both problems.
(imported from commit 1d807f30704bccf28de33a80523488aedc58a9be)
Previously we only used these caches for Tornado requests, because we
were not updating memcached when e.g. the user's pointer changed, and
so functions like update_pointer would not work correctly.
Now that we are updated memcached when the User and UserProfile
objects change, we can use these for all requests.
This saves 2 database queries on every Django request to the server.
(imported from commit aa5bffd885d14bde38b95e80a226bd5ab66f253d)
This should substantially decrease the amount of server load generated
by the userpresence system.
I tested that this indeed was indeed saving one query on
/json/update_active_status requests on my laptop with 2 users from the
humbughq.com realm logged in.
(imported from commit 03e9d4eb95b9f664d489862684ae162db2076e08)
This cache filling code takes about 5 seconds to run, which means it
will finish before Tornado starts reading from this cache, but if that
were to change, it would be much better to have made at least some
progress.
(imported from commit 60a3420cdb9ddf331d83573a3fdb6be1a5ee5a4f)
Previously we were calling select_related() on Message.client, which
doesn't exist. It seems kinda poor that this doesn't raise an
exception.
I believe this issue was causing us to do very large numbers of
database queries during get_updates calls during server restarts.
(imported from commit b79bd698820fbd9dd82bd61fc175c32cd5ce6d05)
Since we flush memcached when we do a server restart, the flurry of
get_updates requests that fly in afterwards are all cache misses for
getting the User/UserProfile objects, so Tornado ends up spending
around 70ms per get_updates request rather than the usual 1-2ms.
So this should substantially improve our Tornado performance around
server restarts.
(imported from commit 07b8126bdfd4ff14e4c3362f9eda1fe5fd571c5b)
Sometimes Dropbox shares with /s/ and sometimes with /sh/,
and I'm not sure which controls it, but we should deal with both.
(imported from commit 2222450f25c418b5fbd60ab2c30477467e34c0d1)
This avoids our repeatedly retrying to fetch a tweet that doesn't
exist from the Twitter API.
(imported from commit b4ca1060d03da21e7e59e5b99e682d2e8457df15)
This should substantially improve the repeat-rendering time for pages
with large numbers of tweets since we don't need to go all the way to
twitter.com, which can take like a second, to render tweets properly.
To deploy this commit properly, one needs to run
./manage.py createcachetable third_party_api_results
(imported from commit 01b528e61f9dde2ee718bdec0490088907b6017e)