We HTML-escape the subject in Postgres to avoid a server round-trip.
Unlike the rendered_content, which is already escaped and cached on
zephyr_message, we normally escape subjects client-side. Escaping in
Django would require fetching the messages that match the query,
escaping the subjects, and then making a second query to Postgres to
insert the markup. We could instead fetch the messages with subjects
marked up using non-HTML (some unique string) that is later converted
into the correct markup either in Django or client-side, but then the
escaping problem would just be with some random string instead of
HTML. Since the function is pretty simple, doing the escaping in
Postgres itself is the least painful option.
(imported from commit 004931d8e496697c18650aee97b1a74c55a04cb2)
In addition to changing the trigger that updates
zephyr_message.search_tsvector to use our new text search
configuration, it also now builds the tsvector on rendered_content
instead of content and fires on update of only the subject or
rendered_content columns.
This migration is expected to take a long time. The
checkpoint_segments parameter in postgresql.conf should be
temporarily raised (probably to 32) while it is running.
(imported from commit 4535438bb33ce1db2a74ecbe91efc52afdb568f1)
Text search was not that great partially because Postgres wasn't
using a ispell dictionary (Postgres term) before. We now pull in
Hunspell and use its dictionary and affix rules.
It is Ok to run with this new configuration before updating our full
text column and index that will be coming in the next few commits.
Manual steps for deploy:
1) On both postgres0 and postgres1 (both before moving on to step 2),
install the hunspell-en-us package
2) On staging, run migration 0022
3) On both postgres0 and postgres1, copy the appropriate postgresql.conf
file over
4) On both postgres0 and postgres1, run `pg_ctlcluster 9.1 main reload`
(imported from commit 706bf0f6ecc46c712cea10b73c34fd9d1dfd4767)
There's still a lot to do here. For example, the external code
should probably go through the new Filter object directly instead of
indirectly through the narrow module.
(imported from commit 22dcd31cdebd51453f1658af52a4432b2fe7a4cb)
In the case where we're getting old messages for a narrowed view, the
anchor message id might not actually be in the result set so there's
no reason to fetch an extra message.
(imported from commit e610d1f2cb95be3ff9fce6dc95e40c560bc5bf84)
In particular, I added absolute positioning and hidden overflow,
which ensures that if an element has a persistent min-width
(like a file input field apparently does), it doesn't affect its
parent.
(imported from commit 72e7a5bee2775fb6f229899ba849292eee76aa4a)
In repeated trials, the initial data fetch used to take about 1100ms.
In practice, it was often taking >2000ms, probably due to caching
effects. This commit cuts the time down to about 300ms in repeated
trials.
Note that the semantics are changed slightly in that we may no longer
get exactly 25000 messages. However, holes in the message_id
sequence are currently very rare or non-existent so this shouldn't be
a problem and we don't care about the exact number of messages
anyway.
I believe the problem was that the query planner was unable to
effectively use the LIMIT clause to figure out that only a small
subset of zephyr_message was going to be needed. Thus, it planned
for operating on the entire table and decided it could not use a more
efficient plan because work_mem, although large, would not be large
enough to execute the query over all of zephyr_message.
The original query was:
SELECT "zephyr_message"."id", "zephyr_message"."sender_id", "zephyr_message"."recipient_id", "zephyr_message"."subject", "zephyr_message"."content", "zephyr_message"."rendered_content", "zephyr_message"."rendered_content_version", "zephyr_message"."pub_date", "zephyr_message"."sending_client_id", "zephyr_userprofile"."id", "zephyr_userprofile"."password", "zephyr_userprofile"."last_login", "zephyr_userprofile"."email", "zephyr_userprofile"."is_staff", "zephyr_userprofile"."is_active", "zephyr_userprofile"."date_joined", "zephyr_userprofile"."full_name", "zephyr_userprofile"."short_name", "zephyr_userprofile"."pointer", "zephyr_userprofile"."last_pointer_updater", "zephyr_userprofile"."realm_id", "zephyr_userprofile"."api_key", "zephyr_userprofile"."enable_desktop_notifications", "zephyr_userprofile"."enter_sends", "zephyr_userprofile"."tutorial_status", "zephyr_realm"."id", "zephyr_realm"."domain", "zephyr_realm"."restricted_to_domain", "zephyr_recipient"."id", "zephyr_recipient"."type_id", "zephyr_recipient"."type", "zephyr_client"."id", "zephyr_client"."name" FROM "zephyr_message" INNER JOIN "zephyr_userprofile" ON ( "zephyr_message"."sender_id" = "zephyr_userprofile"."id" ) INNER JOIN "zephyr_realm" ON ( "zephyr_userprofile"."realm_id" = "zephyr_realm"."id" ) INNER JOIN "zephyr_recipient" ON ( "zephyr_message"."recipient_id" = "zephyr_recipient"."id" ) INNER JOIN "zephyr_client" ON ( "zephyr_message"."sending_client_id" = "zephyr_client"."id" ) ORDER BY "zephyr_message"."id" DESC LIMIT 25000;
with query plan:
Limit (cost=0.00..27120.95 rows=25000 width=362) (actual time=0.051..1121.282 rows=25000 loops=1)
-> Nested Loop (cost=0.00..5330872.99 rows=4913981 width=362) (actual time=0.048..1081.014 rows=25000 loops=1)
-> Nested Loop (cost=0.00..3932643.31 rows=4913981 width=344) (actual time=0.042..926.398 rows=25000 loops=1)
-> Nested Loop (cost=0.00..2550275.29 rows=4913981 width=334) (actual time=0.035..752.524 rows=25000 loops=1)
Join Filter: (zephyr_message.sending_client_id = zephyr_client.id)
-> Nested Loop (cost=0.00..1739467.29 rows=4913981 width=320) (actual time=0.024..217.348 rows=25000 loops=1)
-> Index Scan Backward using zephyr_message_pkey on zephyr_message (cost=0.00..362510.09 rows=4913981 width=156) (actual time=0.014..42.097 rows=25000 loops=1)
-> Index Scan using zephyr_userprofile_pkey on zephyr_userprofile (cost=0.00..0.27 rows=1 width=164) (actual time=0.003..0.004 rows=1 loops=25000)
Index Cond: (id = zephyr_message.sender_id)
-> Materialize (cost=0.00..1.17 rows=11 width=14) (actual time=0.001..0.010 rows=11 loops=25000)
-> Seq Scan on zephyr_client (cost=0.00..1.11 rows=11 width=14) (actual time=0.002..0.010 rows=11 loops=1)
-> Index Scan using zephyr_recipient_pkey on zephyr_recipient (cost=0.00..0.27 rows=1 width=10) (actual time=0.002..0.003 rows=1 loops=25000)
Index Cond: (id = zephyr_message.recipient_id)
-> Index Scan using zephyr_realm_pkey on zephyr_realm (cost=0.00..0.27 rows=1 width=18) (actual time=0.002..0.003 rows=1 loops=25000)
Index Cond: (id = zephyr_userprofile.realm_id)
Total runtime: 1141.408 ms
In the new code, we do two queries:
SELECT "zephyr_message"."id" FROM "zephyr_message" ORDER BY "zephyr_message"."id" DESC LIMIT 1
followed by:
SELECT "zephyr_message"."id", "zephyr_message"."sender_id", "zephyr_message"."recipient_id", "zephyr_message"."subject", "zephyr_message"."content", "zephyr_message"."rendered_content", "zephyr_message"."rendered_content_version", "zephyr_message"."pub_date", "zephyr_message"."sending_client_id", "zephyr_userprofile"."id", "zephyr_userprofile"."password", "zephyr_userprofile"."last_login", "zephyr_userprofile"."email", "zephyr_userprofile"."is_staff", "zephyr_userprofile"."is_active", "zephyr_userprofile"."date_joined", "zephyr_userprofile"."full_name", "zephyr_userprofile"."short_name", "zephyr_userprofile"."pointer", "zephyr_userprofile"."last_pointer_updater", "zephyr_userprofile"."realm_id", "zephyr_userprofile"."api_key", "zephyr_userprofile"."enable_desktop_notifications", "zephyr_userprofile"."enter_sends", "zephyr_userprofile"."tutorial_status", "zephyr_realm"."id", "zephyr_realm"."domain", "zephyr_realm"."restricted_to_domain", "zephyr_recipient"."id", "zephyr_recipient"."type_id", "zephyr_recipient"."type", "zephyr_client"."id", "zephyr_client"."name" FROM "zephyr_message" INNER JOIN "zephyr_userprofile" ON ( "zephyr_message"."sender_id" = "zephyr_userprofile"."id" ) INNER JOIN "zephyr_realm" ON ( "zephyr_userprofile"."realm_id" = "zephyr_realm"."id" ) INNER JOIN "zephyr_recipient" ON ( "zephyr_message"."recipient_id" = "zephyr_recipient"."id" ) INNER JOIN "zephyr_client" ON ( "zephyr_message"."sending_client_id" = "zephyr_client"."id" ) WHERE "zephyr_message"."id" > 4941883
with the message id filled in as the result of the first query. The
new query differs from the original only in that its ORDER BY and
LIMIT clauses are replaced by a WHERE clause. The second query has
query plan:
Hash Join (cost=709.30..28048.18 rows=20544 width=365) (actual time=41.678..279.261 rows=25041 loops=1)
Hash Cond: (zephyr_message.recipient_id = zephyr_recipient.id)
-> Hash Join (cost=102.98..27056.66 rows=20544 width=355) (actual time=3.686..190.730 rows=25041 loops=1)
Hash Cond: (zephyr_message.sending_client_id = zephyr_client.id)
-> Hash Join (cost=101.73..26772.94 rows=20544 width=341) (actual time=3.649..143.695 rows=25041 loops=1)
Hash Cond: (zephyr_userprofile.realm_id = zephyr_realm.id)
-> Hash Join (cost=99.99..26488.71 rows=20544 width=323) (actual time=3.578..96.746 rows=25041 loops=1)
Hash Cond: (zephyr_message.sender_id = zephyr_userprofile.id)
-> Index Scan using zephyr_message_pkey on zephyr_message (cost=0.00..26106.24 rows=20544 width=159) (actual time=0.017..41.980 rows=25041 loops=1)
Index Cond: (id > 4941883)
-> Hash (cost=83.33..83.33 rows=1333 width=164) (actual time=3.548..3.548 rows=1333 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 275kB
-> Seq Scan on zephyr_userprofile (cost=0.00..83.33 rows=1333 width=164) (actual time=0.006..1.646 rows=1333 loops=1)
-> Hash (cost=1.33..1.33 rows=33 width=18) (actual time=0.064..0.064 rows=33 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 2kB
-> Seq Scan on zephyr_realm (cost=0.00..1.33 rows=33 width=18) (actual time=0.003..0.033 rows=33 loops=1)
-> Hash (cost=1.11..1.11 rows=11 width=14) (actual time=0.027..0.027 rows=11 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on zephyr_client (cost=0.00..1.11 rows=11 width=14) (actual time=0.003..0.013 rows=11 loops=1)
-> Hash (cost=335.03..335.03 rows=21703 width=10) (actual time=37.974..37.974 rows=21761 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 893kB
-> Seq Scan on zephyr_recipient (cost=0.00..335.03 rows=21703 width=10) (actual time=0.004..18.443 rows=21761 loops=1)
Total runtime: 299.300 ms
(imported from commit b2a70cccc47be7970df407c6be00eccd2e8be82a)
When you create a stream that you'd previously created (then unsubscribed from),
it was possible to end up in the subscribers list twice. Once came from loading
the subscribers list from the backend, and once came from a bit of mark_subscribed
logic that only gets called if you've subscribed to that stream at least once before
in the current session.
resolves trac #1196
(imported from commit e47ff139a9c25b1b8689ea6795dfad96ae8d2591)
If the pasted content has strings, we don't upload included files and instead
allow the default behavior to take place. This deals with a quirky behavior of
pastes from MS Word, which in addition to the formatted string content also
includes a thumbnail of it. Images still paste as usual.
(imported from commit 60c4f8dd90ac2e8e38940fb302cc9d1ebeecfdf3)
This allows users on signup-eligible domains to sign up for Humbug using
Google Apps.
As part of this, we wrap the openid done view in our own code in order to
handle the "Unknown user" error. Therein, we create a PreregistrationUser
and then shunt the user through the rest of the confirmation process, pre-
filling in their name.
(imported from commit 066d9a1021384a6da2662352e62a701451bd6f44)
This allows us to keep a record of the user's name as returned by Google
Apps authentication.
(imported from commit cbfe383a51b480400b8f0e5f40c725562ffc6a66)
Changes include:
* New markup for the button in compose.html
* A hidden file input field in compose.html
* Added reference to the file input field in filedrop
initialization in compose.js
* A feature test and a click event binding for
the "Attach files" button in ui.js
* New paperclip icon reference in fonts.css
* New general hidden display classes in zephyr.css
* New composition pane button classes in zephyr.css
Fixes to the "Attach files" button commit e673bda...
Changes include:
* Fixed the feature test for (new XMLHttpRequest).upload so
it works in Firefox.
* Renamed .button to .message-control-button
* Removed stray newlines
(imported from commit c1f0834b74fd7120ec27db64ec380ffb3fa34633)
Having a message ID range significantly improves the query
performance because the number of messages Postgres has to consider
is much smaller.
(imported from commit 9b007457712f1c1502d526abea1b6fd742bd911d)
The fact that we were dumping this cache and not refilling it seems to
be one of the causes of Tornado restarts being a lot slower on prod
than on local systems.
(imported from commit a32a759f4dfb591706ede1cce2d38f5c3704193c)
Previously, our check for whether we needed to call load_old_messages
a second time on page load to get up to the present caused us to
basically always do such a call.
(imported from commit b599041e8c0853b4c8c9ab2def6679142302523e)
On my laptop, this saves about 80 milliseconds per 1000 messages
requested via get_old_messages queries. Since we only have one
memcached process and it does not run with special priority, this
might have significant impact on load during server restarts.
(imported from commit 06ad13f32f4a6d87a0664c96297ef9843f410ac5)
The internal format of 'message' had changed, so prior to this commit,
the tutorial was receiving (a) internally inconsistent, and (b)
not-what-it-expected versions of the message.
(imported from commit 233b934e6b600bd59125d133fdf7443fd8f6bbf8)
It's subtle, but the slice was in the wrong place and wasn't
actually truncating the stream name at all, so the client and
server disagreed about where the tutorial messages should go.
(It might be the case that we should accept the tutorial stream
name from the client directly, rather than computing it in two
places.)
(imported from commit 8273223f182e8ad36eaea1cbf75e1426fcfdfbab)
If the system was waiting for you to reply and you replied 'exit', the
tutorial would stop -- but our thing that was waiting for you to reply
would continue waiting. It would eventually timeout and send you the
heartbroken "I didn't hear from you so I stopped waiting" message.
Chances are, you were unsubscribed so you didn't see it, but we
should still just not send it.
(imported from commit 694e442bc29b32efd59f08b4b8b5f573768aea21)