Commit Graph

264 Commits

Author SHA1 Message Date
Leo Franchi 5ef7c4e6db Add a management command for active user stats
(imported from commit a4227858b422c48e272700880e0c21889c7ce566)
2013-05-01 11:17:18 -04:00
Tim Abbott e3bb1bc8ec bugdown: Fix tweet ID extraction from twitter urls.
(imported from commit 88b9882527a5317bf30bcc5f0d1255e819ea149c)
2013-04-30 10:43:17 -04:00
Tim Abbott 7c001822f2 Use bulk requests for updating memcached in get_old_messages.
Otherwise we end up doing 1000 requests to memcached, which can be
quite expensive.

(imported from commit be247f63b5fb88c6f4a45326261b66ea67fe1028)
2013-04-25 14:43:37 -04:00
Zev Benjamin 3e1ec5d7c9 Increase size of initial message cache fetch and exclude tabbott/extra's messages
(imported from commit 59544aa3adfb05f50ca69e56f37f57944dfa0b81)
2013-04-25 13:33:51 -04:00
Zev Benjamin a1634b12d3 Increase efficiency of initial message cache query
In repeated trials, the initial data fetch used to take about 1100ms.
In practice, it was often taking >2000ms, probably due to caching
effects.  This commit cuts the time down to about 300ms in repeated
trials.

Note that the semantics are changed slightly in that we may no longer
get exactly 25000 messages.  However, holes in the message_id
sequence are currently very rare or non-existent so this shouldn't be
a problem and we don't care about the exact number of messages
anyway.

I believe the problem was that the query planner was unable to
effectively use the LIMIT clause to figure out that only a small
subset of zephyr_message was going to be needed.  Thus, it planned
for operating on the entire table and decided it could not use a more
efficient plan because work_mem, although large, would not be large
enough to execute the query over all of zephyr_message.

The original query was:

SELECT "zephyr_message"."id", "zephyr_message"."sender_id", "zephyr_message"."recipient_id", "zephyr_message"."subject", "zephyr_message"."content", "zephyr_message"."rendered_content", "zephyr_message"."rendered_content_version", "zephyr_message"."pub_date", "zephyr_message"."sending_client_id", "zephyr_userprofile"."id", "zephyr_userprofile"."password", "zephyr_userprofile"."last_login", "zephyr_userprofile"."email", "zephyr_userprofile"."is_staff", "zephyr_userprofile"."is_active", "zephyr_userprofile"."date_joined", "zephyr_userprofile"."full_name", "zephyr_userprofile"."short_name", "zephyr_userprofile"."pointer", "zephyr_userprofile"."last_pointer_updater", "zephyr_userprofile"."realm_id", "zephyr_userprofile"."api_key", "zephyr_userprofile"."enable_desktop_notifications", "zephyr_userprofile"."enter_sends", "zephyr_userprofile"."tutorial_status", "zephyr_realm"."id", "zephyr_realm"."domain", "zephyr_realm"."restricted_to_domain", "zephyr_recipient"."id", "zephyr_recipient"."type_id", "zephyr_recipient"."type", "zephyr_client"."id", "zephyr_client"."name" FROM "zephyr_message" INNER JOIN "zephyr_userprofile" ON ( "zephyr_message"."sender_id" = "zephyr_userprofile"."id" ) INNER JOIN "zephyr_realm" ON ( "zephyr_userprofile"."realm_id" = "zephyr_realm"."id" ) INNER JOIN "zephyr_recipient" ON ( "zephyr_message"."recipient_id" = "zephyr_recipient"."id" ) INNER JOIN "zephyr_client" ON ( "zephyr_message"."sending_client_id" = "zephyr_client"."id" ) ORDER BY "zephyr_message"."id" DESC LIMIT 25000;

with query plan:
 Limit  (cost=0.00..27120.95 rows=25000 width=362) (actual time=0.051..1121.282 rows=25000 loops=1)
   ->  Nested Loop  (cost=0.00..5330872.99 rows=4913981 width=362) (actual time=0.048..1081.014 rows=25000 loops=1)
         ->  Nested Loop  (cost=0.00..3932643.31 rows=4913981 width=344) (actual time=0.042..926.398 rows=25000 loops=1)
               ->  Nested Loop  (cost=0.00..2550275.29 rows=4913981 width=334) (actual time=0.035..752.524 rows=25000 loops=1)
                     Join Filter: (zephyr_message.sending_client_id = zephyr_client.id)
                     ->  Nested Loop  (cost=0.00..1739467.29 rows=4913981 width=320) (actual time=0.024..217.348 rows=25000 loops=1)
                           ->  Index Scan Backward using zephyr_message_pkey on zephyr_message  (cost=0.00..362510.09 rows=4913981 width=156) (actual time=0.014..42.097 rows=25000 loops=1)
                           ->  Index Scan using zephyr_userprofile_pkey on zephyr_userprofile  (cost=0.00..0.27 rows=1 width=164) (actual time=0.003..0.004 rows=1 loops=25000)
                                 Index Cond: (id = zephyr_message.sender_id)
                     ->  Materialize  (cost=0.00..1.17 rows=11 width=14) (actual time=0.001..0.010 rows=11 loops=25000)
                           ->  Seq Scan on zephyr_client  (cost=0.00..1.11 rows=11 width=14) (actual time=0.002..0.010 rows=11 loops=1)
               ->  Index Scan using zephyr_recipient_pkey on zephyr_recipient  (cost=0.00..0.27 rows=1 width=10) (actual time=0.002..0.003 rows=1 loops=25000)
                     Index Cond: (id = zephyr_message.recipient_id)
         ->  Index Scan using zephyr_realm_pkey on zephyr_realm  (cost=0.00..0.27 rows=1 width=18) (actual time=0.002..0.003 rows=1 loops=25000)
               Index Cond: (id = zephyr_userprofile.realm_id)
 Total runtime: 1141.408 ms

In the new code, we do two queries:

SELECT "zephyr_message"."id" FROM "zephyr_message" ORDER BY "zephyr_message"."id" DESC LIMIT 1

followed by:

SELECT "zephyr_message"."id", "zephyr_message"."sender_id", "zephyr_message"."recipient_id", "zephyr_message"."subject", "zephyr_message"."content", "zephyr_message"."rendered_content", "zephyr_message"."rendered_content_version", "zephyr_message"."pub_date", "zephyr_message"."sending_client_id", "zephyr_userprofile"."id", "zephyr_userprofile"."password", "zephyr_userprofile"."last_login", "zephyr_userprofile"."email", "zephyr_userprofile"."is_staff", "zephyr_userprofile"."is_active", "zephyr_userprofile"."date_joined", "zephyr_userprofile"."full_name", "zephyr_userprofile"."short_name", "zephyr_userprofile"."pointer", "zephyr_userprofile"."last_pointer_updater", "zephyr_userprofile"."realm_id", "zephyr_userprofile"."api_key", "zephyr_userprofile"."enable_desktop_notifications", "zephyr_userprofile"."enter_sends", "zephyr_userprofile"."tutorial_status", "zephyr_realm"."id", "zephyr_realm"."domain", "zephyr_realm"."restricted_to_domain", "zephyr_recipient"."id", "zephyr_recipient"."type_id", "zephyr_recipient"."type", "zephyr_client"."id", "zephyr_client"."name" FROM "zephyr_message" INNER JOIN "zephyr_userprofile" ON ( "zephyr_message"."sender_id" = "zephyr_userprofile"."id" ) INNER JOIN "zephyr_realm" ON ( "zephyr_userprofile"."realm_id" = "zephyr_realm"."id" ) INNER JOIN "zephyr_recipient" ON ( "zephyr_message"."recipient_id" = "zephyr_recipient"."id" ) INNER JOIN "zephyr_client" ON ( "zephyr_message"."sending_client_id" = "zephyr_client"."id" ) WHERE "zephyr_message"."id" > 4941883

with the message id filled in as the result of the first query.  The
new query differs from the original only in that its ORDER BY and
LIMIT clauses are replaced by a WHERE clause.  The second query has
query plan:

 Hash Join  (cost=709.30..28048.18 rows=20544 width=365) (actual time=41.678..279.261 rows=25041 loops=1)
   Hash Cond: (zephyr_message.recipient_id = zephyr_recipient.id)
   ->  Hash Join  (cost=102.98..27056.66 rows=20544 width=355) (actual time=3.686..190.730 rows=25041 loops=1)
         Hash Cond: (zephyr_message.sending_client_id = zephyr_client.id)
         ->  Hash Join  (cost=101.73..26772.94 rows=20544 width=341) (actual time=3.649..143.695 rows=25041 loops=1)
               Hash Cond: (zephyr_userprofile.realm_id = zephyr_realm.id)
               ->  Hash Join  (cost=99.99..26488.71 rows=20544 width=323) (actual time=3.578..96.746 rows=25041 loops=1)
                     Hash Cond: (zephyr_message.sender_id = zephyr_userprofile.id)
                     ->  Index Scan using zephyr_message_pkey on zephyr_message  (cost=0.00..26106.24 rows=20544 width=159) (actual time=0.017..41.980 rows=25041 loops=1)
                           Index Cond: (id > 4941883)
                     ->  Hash  (cost=83.33..83.33 rows=1333 width=164) (actual time=3.548..3.548 rows=1333 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 275kB
                           ->  Seq Scan on zephyr_userprofile  (cost=0.00..83.33 rows=1333 width=164) (actual time=0.006..1.646 rows=1333 loops=1)
               ->  Hash  (cost=1.33..1.33 rows=33 width=18) (actual time=0.064..0.064 rows=33 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 2kB
                     ->  Seq Scan on zephyr_realm  (cost=0.00..1.33 rows=33 width=18) (actual time=0.003..0.033 rows=33 loops=1)
         ->  Hash  (cost=1.11..1.11 rows=11 width=14) (actual time=0.027..0.027 rows=11 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 1kB
               ->  Seq Scan on zephyr_client  (cost=0.00..1.11 rows=11 width=14) (actual time=0.003..0.013 rows=11 loops=1)
   ->  Hash  (cost=335.03..335.03 rows=21703 width=10) (actual time=37.974..37.974 rows=21761 loops=1)
         Buckets: 4096  Batches: 1  Memory Usage: 893kB
         ->  Seq Scan on zephyr_recipient  (cost=0.00..335.03 rows=21703 width=10) (actual time=0.004..18.443 rows=21761 loops=1)
 Total runtime: 299.300 ms

(imported from commit b2a70cccc47be7970df407c6be00eccd2e8be82a)
2013-04-25 13:25:15 -04:00
Tim Abbott 102988e430 Refill the Session cache after restarting the server.
The fact that we were dumping this cache and not refilling it seems to
be one of the causes of Tornado restarts being a lot slower on prod
than on local systems.

(imported from commit a32a759f4dfb591706ede1cce2d38f5c3704193c)
2013-04-24 10:44:56 -04:00
Tim Abbott 9b8f0fab0f Retrieve message objects from memcached in a bulk request.
On my laptop, this saves about 80 milliseconds per 1000 messages
requested via get_old_messages queries.  Since we only have one
memcached process and it does not run with special priority, this
might have significant impact on load during server restarts.

(imported from commit 06ad13f32f4a6d87a0664c96297ef9843f410ac5)
2013-04-24 10:44:56 -04:00
Tim Abbott 66b3c1fbff Log time spent querying memcached in logs when larger than 5ms.
(imported from commit a4de15026d24526a446b724500d1194dce824d1a)
2013-04-24 10:44:56 -04:00
Tim Abbott 5e22778843 Use a function for stopping/restarting time logging for longpolling.
(imported from commit 11b772deaa126fcc7e7605d467022b22d9e98cb0)
2013-04-24 10:44:56 -04:00
Waseem Daher 20233cfc96 Time out on Twitter rendering if it takes too long.
Timing out within the Twitter portion of the render causes the message
to still go through (without a preview). If we don't timeout here, it
causes the entire Markdown render to timeout, which rejects the
message in its entirety -- a far worse outcome.

(imported from commit f510a56f48afa46da8ec6277496fa03374cdb042)
2013-04-23 12:56:34 -04:00
Luke Faraone 71a91197fa Enable absolute imports.
See PEP 328[1] for details. This feature was introduced in Python 2.5 and
will become mandatory in Python 3.

[1]: http://www.python.org/dev/peps/pep-0328

(imported from commit 7444eeba8a08d5f91b94c7921848f2274979bd76)
2013-04-23 09:51:17 -07:00
Leo Franchi 7b0423efc1 Use incr instead of gauge when sending events to drawAsInfinite to statsd
(imported from commit 08a4b6920c7a4a8f472f147ddce7c04710fe5c0a)
2013-04-19 09:56:41 -04:00
Leo Franchi 652b821d64 Add a bunch of statsd logging
(imported from commit 8c17a76fed08c34abab3cc45d76a0fb30774d73a)
2013-04-18 18:05:52 -04:00
Leo Franchi 46415d4984 Add statsd helpers and wrappers
(imported from commit 9d5b805ae416a65ac49dda8e8e11d9831308116c)
2013-04-18 18:05:52 -04:00
Tim Abbott 5afe06e8cb Decrease idle event queue timeout back to 10 minutes.
(imported from commit 1ca1c99c013f3e7f7e70e1fd9c5386b0d5a27b98)
2013-04-18 16:58:31 -04:00
Leo Franchi aa75f51d5e Catch IncompatibleProtoclErrors as well, since our failure to connect might happen during the initial handshake phase
(imported from commit 55115f19a5a101676e3ce1ca2a7b9cd2a2d5b028)
2013-04-18 15:46:43 -04:00
Leo Franchi fb2b3ae21a Handle multiple preregistration user objects when choosing streams
(imported from commit 52faa0256a719bed8a8ccc120f8177cce20450e2)
2013-04-17 15:48:30 -04:00
Leo Franchi 0aa20cb594 Rework saved consumer logic in TornadoQueue to always reconnect consumers
(imported from commit 0627d769349077c1e795db9215b17f538e9ec75c)
2013-04-17 12:11:28 -04:00
Leo Franchi 3681b77f22 Patch TornadoConnection to catch exceptions and continue reconnection
(imported from commit 6bf9086b6bdc35321b23bb92b35679e2a21f6333)
2013-04-17 11:29:08 -04:00
Leo Franchi 4adf2d5c26 Add on-close callback immediately after creating
(imported from commit 221f8c6306ef9b6c658d10b72e15dcfba83017e0)
2013-04-17 11:20:01 -04:00
Leo Franchi d7a33485ad Register a tornado atexit handler to disconnect from rabbitmq
(imported from commit b70650070f1df548794a9e3ff2948d134fd0c5de)
2013-04-16 11:49:03 -04:00
Leo Franchi 79a94a8e79 Delay queue creation if we're not connected in the TornadoQueueClient
(imported from commit c583693783322136927ae1a1018a61b2ffa6597f)
2013-04-16 10:04:48 -04:00
Leo Franchi befe7c26d3 Don't send on-demand presence information for mit users
(imported from commit 711a197b9a8c1e6c66d768b240c7bce7595e5b3b)
2013-04-16 09:37:25 -04:00
Leo Franchi c024653331 Build presence update for missed events properly
(imported from commit 15d75a2e0f5c5e1035b526df3aca443a2cffdf25)
2013-04-16 09:32:46 -04:00
Zev Benjamin 858d32b3c4 Increase event queue lifetime and decrease event queue GC frequency
(imported from commit 6328c0659e2144a8d7898cbb54eac25f1c21c983)
2013-04-12 15:32:51 -04:00
Tim Abbott 04c4321d90 Move PERSISTENT_QUEUE_FILENAME into settings.py.
(imported from commit e7d1378fd0cb3f3d894ff4a5b6ee44212bf3ce34)
2013-04-12 12:06:53 -04:00
Tim Abbott f7406b9c7d Don't write logs to the server's working directory when DEPLOYED.
Otherwise these logs will end up all getting split up when we switch
to the new deployment model.

(imported from commit 0514c296470be7113cab6c2f48e8dd33f1b9353d)
2013-04-12 11:54:50 -04:00
Leo Franchi 916c235d8c Only show active users in presence list
(imported from commit 73c0347aa10b52f13f41bbd93ff5372750ffcd3e)
2013-04-12 09:53:50 -04:00
Leo Franchi 302cfcd48c Send client information for initial presence and process time differential
(imported from commit 99a51b7cc8b6c51c4e82757a984d07603b2980e3)
2013-04-12 09:11:40 -04:00
Leo Franchi 5d4b2305fe Send presence updates when a new user logs in for the first time, and when returning from inactive
This commit will incorrectly list past-online users as active, a shortcoming that is
addressed in the next commit

(imported from commit b018767df686f88c0ca939c067c573e4d7cea357)
2013-04-12 09:11:40 -04:00
Jessica McKellar 7175dc534a Send invitation e-mails asynchronously through RabbitMQ.
This avoids 10s of seconds of delay when you invite several people at
once through the web UI.

(imported from commit 75acdbdb04caf62bbb08affc7796330246d8a00e)
2013-04-10 16:57:49 -04:00
Zev Benjamin f6a6a6b220 Add per-stream desktop notifications
(imported from commit b4a0576847b3aec1495f017ca9805febe80c9275)
2013-04-10 16:11:27 -04:00
Zev Benjamin a2010871e3 Make subscription properties less free-form
(imported from commit eda607c2abfa51d2dadddc7b9ecba3e2d0b5be4d)
2013-04-10 16:11:27 -04:00
Zev Benjamin 5e307f9cce Fix calculation of number of active users
(imported from commit 0a74f1d8db51988ec806deb6af7cd8a6ef18d08c)
2013-04-09 13:35:09 -04:00
Tim Abbott ea95a8b167 Future-proof adding new users to default streams.
The previous code for adding users to default streams wouldn't do so
if the user didn't have a PreregistrationUser row.

(imported from commit 25f1383f6771319542d07660b29d891368889212)
2013-04-09 11:58:07 -04:00
Tim Abbott 1b11eeb2bc Simplify the default_subscriptions code path.
(imported from commit 62894a5949621465fcfd8d25372316d7ab495252)
2013-04-09 11:58:07 -04:00
Keegan McAllister 3c40dd3bf3 bugdown: Fix fenced_code for Python-Markdown 2.3
(imported from commit 3954444708e222217407df228f07d2cad402a02b)
2013-04-05 13:14:00 -04:00
Tim Abbott fdefa06190 Eliminate use of old StreamColor model.
(imported from commit c72a06bdc44f30fb6bca299463e259262367e8c2)
2013-04-04 17:48:51 -04:00
Tim Abbott f6affa8802 gather_subscriptions: Use the colors from the Subscription table.
(imported from commit c23829ad4141a97c61e21b970e5031eae20e24b4)
2013-04-04 17:48:51 -04:00
Leo Franchi 8fe82085c4 [schema][manual] Automatically subscribe users to default streams only after tutorial
(imported from commit 6511851c0aee2628bef597bf1310d6f96b0fd1d4)
2013-04-04 17:11:39 -04:00
Tim Abbott 0ee684a4b5 [schema] [manual] Add colors to the subscription model.
This is preparatory for removing the StreamColor model, so we also set
things up so anything changing the StreamColor model changes the
Subscription model too.

The manual task is to run the copy_colors.py management command after
deployment to each of staging and prod.

(imported from commit 1be7523ca59f5266eb2c4dc2009e31209ed49635)
2013-04-04 14:17:01 -04:00
Leo Franchi 0055107cfd Use IANA's TLD list for auto-linkification detecting
(imported from commit 9103fdc92405b92300a793bd1d4f493df64b5b9c)
2013-04-03 09:58:17 -04:00
Leo Franchi a5643efa14 Allow @ in urls
(imported from commit cb2ffe4a8f050e732bb06ab4609997be35577417)
2013-04-02 18:38:38 -04:00
Leo Franchi d127d6f19f Support up to one level deep of nested parens in urls
(imported from commit 3f314b16a47b5267ddb0d18aa6c5456656895f77)
2013-04-02 18:38:38 -04:00
Keegan McAllister 191231ab3d bugdown: Whitelist URL schemes
(imported from commit 76e22cec3918c00faaa903baae74915cc5e64264)
2013-04-02 18:38:38 -04:00
Keegan McAllister 5d538d7a2a bugdown: Allow colons in URLs
(imported from commit b57fc21f4508f2bff3cbc32a6359de686aa3a96e)
2013-04-02 18:38:38 -04:00
Tim Abbott fa20696230 do_add_subscription: Don't unnecessarily fetch subscription from the database.
(imported from commit ffe2c8d2026b60a91dd54f10cfd9df0adbfd7acd)
2013-04-02 14:01:54 -04:00
Tim Abbott 2a46c46fa8 set_stream_color: Pass color to get_or_create.
(imported from commit 0d5f1fd227fd6dc337291d2d07ba24f96080e9e2)
2013-04-02 14:01:54 -04:00
Tim Abbott 1cec86eb2d [manual] Remove now-unused User model.
I think all that one needs to do to deploy this commit is on developer
laptops, run `generate-fixtures --force`.

(imported from commit 34916341435fef0875b5a2c7f53c2f5606cd16cd)
2013-04-02 12:57:10 -04:00
Tim Abbott a8e89962d8 Remove remaining direct usage of the User model.
(imported from commit c494b4e32761e9ce57115da918a86a1d6a0b6971)
2013-04-02 12:07:08 -04:00