Here we introduce a new Django app, zilencer. The intent is to not have
this app enabled on LOCALSERVER instances, and for it to grow to include
all the functionality we want to have in our central server that isn't
relevant for local deployments.
Currently we have to modify functions in zerver/* to match; in the
future, it would be cool to have the relevant shared code broken out
into a separate library.
This commit inclues both the migration to create the models as well as a
data migration that (for non-LOCALSERVER) creates a single default
Deployment for zulip.com.
To apply this migration to your system, run:
./manage.py migrate zilencer
(imported from commit 86d5497ac120e03fa7f298a9cc08b192d5939b43)
Trac #1734
This is implemented by bouncing uploaded file links through a view
that checks authentication and redirects to an expiring S3 URL.
This makes file uploads return a domain-relative URI. The client converts
this to an absolute URI when it's in the composebox, then back to relative
when it's submitted to the server.
We need the relative URI because the same message may be viewed across
{staging,www,zephyr}.zulip.com, which have different cookies.
(imported from commit 33acb2abaa3002325f389d5198fb20ee1b30f5fa)
This has a small bug where we don't actually filter the message out of
the home view; fixing that requires adding an index on the "flags"
field of UserMessage.
(imported from commit 492c99d0a8e87b253e577be6564bec12099bd8e9)
Because our authentication system reads cookies from the initial
connection attempt, several SockJS transports can't be used.
(imported from commit 34b9571225d39072985b8223fb12c43c7235841f)
New dependency: sockjs-tornado
One known limitation is that we don't clean up sessions for
non-websockets transports. This is a bug in Tornado so I'm going to
look at upgrading us to the latest version:
https://github.com/mrjoes/sockjs-tornado/issues/47
(imported from commit 31cdb7596dd5ee094ab006c31757db17dca8899b)
The gather_subscriptions_helper() does a separate query to
get emails from user_ids, and it returns an email_dict to its
caller.
This may seem like a step backward, since gather_subscriptions()
now needs to do an additional query, but there is some benefit
in passing fewer redundant emails over the wire from the DB.
The real payoff, though, will come in subsequent commits, where
we will reduce the amount of data going over the wire to the browser,
which will benefit users with slow connections.
(imported from commit bf1cc5828a4c5f68cafd052ea29a177837970206)
Arguably the nl2br extension should be doing this for us. Given that
we're using nl2br, the "two spaces at the end of a line makes a line
break" rule doesn't make any sense (since every newline leads to a
linebreak), so we disable it.
(imported from commit 5ffa2ac8a825642ad31e085c532091e076665710)
clear_followup_emails_queue now filters by from_email too
send_local_email_template_with_delay passes the template_payload into the subject template
(imported from commit 8044fe2ebad90a9d6d5c67cdfdd08801760fd7f7)
The current version should only be used for testing; for example,
if you want to create a bunch of streams for stress testing, you
can run this in a loop.
(imported from commit ec51a431fb9679fc18379e4c6ecdba66bc75a395)
It makes the event queue return all messages on public streams, rather
than only the user's subscriptions. It's meant for use with chat bots.
(imported from commit 12d7e9e9586369efa7e7ff9eb060f25360327f71)
Trac #1162
The process_fence method replaces code blocks with placeholders, so
indexes stored before the replacement are incorrect. However, because
the closed code blocks have been replaced, we can simply search the
whole string for any remaining opening code block markers.
(imported from commit 6a9e6924840f8f3ca5175da7c52a905e27c1fabd)
I added filter() statements to do_update_message_flags().
Here is some context:
Steve Howell: Case 1, have AND clause to reduce work for DB.
humbug=> update zerver_usermessage set flags = (flags & ~1) where id > 9000;
UPDATE 382
humbug=> select count(*) from zerver_usermessage where (flags & 1) = 0;
count
-------
382
(1 row)
humbug=> explain analyze update zerver_usermessage set flags = (flags | 1) where (flags & 1) = 0;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Update on zerver_usermessage (cost=0.00..266.85 rows=47 width=27) (actual time=5.727..5.727 rows=0 loops=1)
-> Seq Scan on zerver_usermessage (cost=0.00..266.85 rows=47 width=27) (actual time=0.045..2.751 rows=382 loops=1)
Filter: ((flags & 1::bigint) = 0)
Rows Removed by Filter: 9000
Total runtime: 5.759 ms
(5 rows)
humbug=> select count(*) from zerver_usermessage where (flags & 1) = 0;
count
-------
0
(1 row)
Leo Franchi: Sounds reasonable, but I know way less than zev about DBs so I'll defer to his judgement :)
Steve Howell: Case 2, how the code works now:
humbug=> update zerver_usermessage set flags = (flags & ~1) where id > 9000;
UPDATE 382
humbug=> select count(*) from zerver_usermessage where (flags & 1) = 0;
count
-------
382
(1 row)
humbug=> explain analyze update zerver_usermessage set flags = (flags | 1);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Update on zerver_usermessage (cost=0.00..243.28 rows=9382 width=27) (actual time=362.075..362.075 rows=0 loops=1)
-> Seq Scan on zerver_usermessage (cost=0.00..243.28 rows=9382 width=27) (actual time=0.008..6.138 rows=9382 loops=1)
Total runtime: 362.105 ms
(3 rows)
humbug=> select count(*) from zerver_usermessage where (flags & 1) = 0;
count
-------
0
(1 row)
Steve Howell: In both trials, we set it up so that only 382 of 9382 rows need to be updated. The first trial runs about 63x as fast. The second trial, if my theory is correct, is doing 24x as many writes as it needs. Both trials are reading all 9382 rows.
Steve Howell: The expense of the update statement seems to be proportional to the number of rows you "update", not the number of rows that you actually change.
Steve Howell: For now I created #1869.
Zev Benjamin: That sounds like a reasonable explanation. The disk IO can be expensive
(imported from commit d9090daee1f81cad76c430de0956f9bd504da075)
Handled by the queue processor for signups. Added a management command
that accomplishes the same task, in case it's needed for manually added users,
or in case we goof and need to remove queued emails for a given user.
This addresses Trac #1807
(imported from commit 6727b82a07fa6a3ea3d827860c9e60fd0602297a)
We want to avoid opening a DB connection in the markdown thread
as its DB connection might live for a long time
(imported from commit 7700b2ca793ee5e9add7f071b92f22a4bf576b3d)
This will hopefully incentivize people to click one and get back into
the app.
We'll also need this for digest emails.
(imported from commit 57191c3fcca3b12df93a81e4692bb7eb8ccc83b2)
The text of manual links are already AtomicStrings, so linkified strings
should be too.
Moves emoji detection to happen after linkification, so the emoji rule
won't look at links.
(imported from commit 9c56bce6a0e873b398255e0762dfb312a4a9a64e)
InlinePatterns should return None on failure, not text that may
have placeholders in it.
(imported from commit f9d8d22b2b8cfa7a92ecf3e52a6c76b48e6f0175)
This function doesn't require the whole UserProfile object to
create the avatar url, and we call it from Message.to_dict_uncached().
(imported from commit e814caab101c4fedd1ba66df041a3408014e4085)
The realm should always be the realm of the stream, and we should
always pass in a stream rather than sometimes passing in a stream name
and other times passing in a stream.
(imported from commit a098d6ed3db218a37c1b6b7c956e847c316c2d13)
We have been persisting muting preferences on the back end for
a while, but we haven't been adding them to page_params for the
client to have at reload/startup time.
(imported from commit d9ca68aa0e4d22bfb0e6ce67fc0bc63981175c8b)
We now bulk-fetch subscription information once from the database
and use it throughout bulk_add_subscriptions in order to avoid
hitting the db O(streams) times.
On my machine this shaved the accounts_register API call from making
66 queries to making 37 queries.
(imported from commit 5dd5ad3f50b2a6edf85b5f1d55ebd697a1c60647)
When we send a message, we send some presence information to Tornado
to help it figure out how to generate emails for idle recipients of
a message. This change limits the presence info to being the
intersection of present users and recipients of the message. It is
just an internal optimization to avoid queueing up unneeded data.
The history behind this feature is that I implemented it a while
back, but I think I made a rebase mistake that sent all the presence
data over the wire, despite having code to filter on recipients.
It was mostly harmless, just leading to some inefficiency which is
now fixed.
(imported from commit 7c8e97705afb299c67b99053909e952fbc823551)
For a 4-person stream, we were hitting the DB 8 times, and 4 of
those queries were to lazily get user.email for the 4 recipients
due to upstream code using only(). I added user_profile__email
to the only() call.
I believe this regression started 9/18, and after pushing this
to prod, we would should look at this graph:
https://stats1.zulip.net/graphs/8274cd84588
(imported from commit 70629cb69fe5955c674ba76482609dfe78e5faaf)
Use stream.num_subscribers() in check_if_a_bot_is_sending_a_message_to_an_empty_stream().
The num_subscribers() function using Django's count() method, which returns
a single row, vs. len() on an iterator of query rows.
(imported from commit 6157fe248945e9288ee71d8cc39fb6dda4e9a247)
Some bots created by us do not have owners. Don't try to send a
message to the nonexistent owner.
(imported from commit ab952eccd7d6c4728e9477a106142214b5c81ca9)
Instead just rely on the 2-minute delay in the management command to
batch conversations.
We've had people report being confused or thinking the feature was
broken when they didn't get e-mails because of our rate-limiting, so
let's see if this is not too overwhelming.
(imported from commit 706ddb07b906b5c2edea1159c04acc2ee6f06e29)
Don't send peer_add notifications to users who are already
getting add notifications, because they will already know
about subscribers.
(imported from commit 726b54ae0e30b71440b17d9c51b026872ea96218)
It only grabs the user_profile_id column now. This leads to a
speedup of about 16x between grabbing large ORM objects vs.
small 1-column dictionaries.
(imported from commit 95150bff3fdcbe250b04f014062224af42a6644f)
Splitting out notify_peers() will give us flexibility for cleaning
up how we notify peers for bulk adds.
(imported from commit e108fa2c432cc1fe54d788c58c82c983e0f2394e)
If you expand subscribers on your settings page, you will now see
a query like this in your postgres logs:
SELECT "zerver_userprofile"."email"
FROM "zerver_subscription" INNER JOIN "zerver_recipient" ON ("zerver_subscription"."recipient_id" = "zerver_recipient"."id") INNER JOIN "zerver_userprofile" ON ("zerver_subscription"."user_profile_id" = "zerver_userprofile"."id") WHERE ("zerver_recipient"."type" = 2 AND "zerver_subscription"."active" = true AND "zerver_recipient"."type_id" = 40 AND "zerver_userprofile"."is_active" = true )
The join's still complicated, but the list of fields is one instead of 40+.
(imported from commit 48de1f888193a4d23fcea52d0b633d134e4a3ff7)
get_subscribers_backend() now calls the new get_subscriber_emails()
function, which just queries the email field:
"zerver_userprofile"."email"
...instead of querying about 40 fields that it never uses.
I was able to verify the query slimming by watching my postgres server log.
Also, you can verify that the ORM does roughly 16x less work using values():
>>> def f(): return [sub.user_profile.email for sub in list(Subscription.objects.all().select_related())]
...
>>> def g(): return [row['user_profile__email'] for row in list(Subscription.objects.all().values('user_profile__email'))]
...
>>> def timeit(func): t = time.time(); func(); return time.time() - t
...
>>> timeit(f)
0.045198917388916016
>>> timeit(g)
0.002752065658569336
(imported from commit a69f690a96d076b323fdfc2f4821b0548bdfac7f)
LinkPattern returned a string which contained a placeholder if the URL was
considered invalid. AtomicLinkPattern wrapped this in an AtomicString,
where the placeholder doesn't get removed properly.
m.group(0) is always incorrect because python-markdown modifies your regex
to include more than you specified (this is why part of the message got
duplicated).
(imported from commit 576bdf09c2b677cf4bc56484c363eb05f2110158)
We have to read the data anyway, and we don't have a convenient file
handle for uploads from attachments sent through the e-mail gateway.
(imported from commit 86260a4eaceef85c82707929a80558e11dc54ef6)