A few "slow" tests aren't as slow any more, for whatever reason,
so we're setting a higher bar going forward.
(imported from commit 642137cebb7826f4512b5635da9d7b75bd5c35f4)
The text of manual links are already AtomicStrings, so linkified strings
should be too.
Moves emoji detection to happen after linkification, so the emoji rule
won't look at links.
(imported from commit 9c56bce6a0e873b398255e0762dfb312a4a9a64e)
InlinePatterns should return None on failure, not text that may
have placeholders in it.
(imported from commit f9d8d22b2b8cfa7a92ecf3e52a6c76b48e6f0175)
We want the UserActivity.query field to reflect the name of the
function for REST calls, not the URL, and we accomplish this by
setting request._query to target_function.__name__.
(imported from commit 9df05fef0dffb34483b182b95f8cbc4409083eed)
If request._query is set in the call to update_user_activity(),
we will use that instead of request.META['PATH_INFO'] for
the query field of the UserActivity row we write.
(imported from commit fcee30098e1c7c5cb4195a1e5905fc7b88af804f)
This results in some small behavior changes. First, if a user
has both malformed JSON and an invalid API key, they will now
be informed of the invalid API key, not the malformed JSON,
because the decorator wrapping code executing first. Second,
we call process_client(), which basically builds us a
UserActivity record with the client "API".
(imported from commit fadb523db9bdc82984bdae61833c5c99f1ebd1c0)
This is for webhook API endpoints that only get passed in an api_key,
not an email. An example would be api_jira_webhook, and some of
the code is borrowed from there. The rest of the code is from
authenticated_api_view().
(imported from commit b5b2a4ea52f9b317f00357ef3142c76534fabf20)
Add the number of person-minutes for the last 24 hours to the
realm report on the main tab of /activity.
(imported from commit 2ff46eacc4c8276ab0407fc6ff9f28f5137f1ed2)
When decoding an operand, a + can be converted to a space
only if the operand is not an email address.
(imported from commit 08fc36a579bbe6409137c60c0fa9579fe3ab2c43)
This tab shows how long each user has been on during the last 24
hours, using data from UserActivityInterval. Much of the code
is borrowed from analyze_user_activity.py, but in this version
we set the time interval to be the last 24 hours and sort by
realm and email. I also ensure that it only executes one
query to get all the data (and there's test coverage for that).
(imported from commit 7a2b80f52679054b03c5f5f42b2cda07d5599432)
Waseem is ok with removing the client-specific tabs on the
main /activity page. This reduces the number of queries from
25 to 1. We might eventually restore some of that logic, but
we will do it more efficiently. A lot of the data for
non-website clients is kind of unreliable, anyway.
The page looks kind of funny with only one tab, but that
will be fixed in the next commit.
(imported from commit 54f08f89d5242ad3e045d8ca0d97b86617c15380)
When we don't already have old messages in cache, we need to
fetch data from the database and create dictionaries for the
cache. This commit makes that process work in 50ms, instead
of 130ms, for the data set in test_bulk_message_fetching(),
which is 602 records. Before this commit we had about 132
microseconds of unnecessary churn per message, because we
were fetching DB fields we didn't need and incurring the cost
of the Django ORM. Now we use values() to get only the columns
we need, and we take advantage of previous commits that make
our code less OO and more function-driven, so we can pass the
values directly to build_message_dict() without having to create
objects.
A couple caveats on this commit:
1) I haven't been able to get good measurements on the overall
effect on get_old_messages_backend(). If you kill the cache to
force DB queries, you introduce noise related to sessions and
user profiles.
2) Look at the long comment in this commit related to
re-rendering messages in this codepath. The problem precedes
this commit.
(imported from commit dcb64aa9416f0e9583355ddd6dc3adfa746b9fc7)
Only call a function on the message object in the unfortunate
situation that we are rendering new content in to_dict_uncached().
Long term, it would be nice if this function didn't have side
effects, and we had a better strategy for upgrading rendered
content when bugdown versions change.
(imported from commit 2a323f52af37a6d651c171cb8234fbfa3d25d561)
This function doesn't require the whole UserProfile object to
create the avatar url, and we call it from Message.to_dict_uncached().
(imported from commit e814caab101c4fedd1ba66df041a3408014e4085)
For a bunch of self-dot references, move them to the top. (This
is kind of funny out of context, but it sets us up for future
refactorings.)
(imported from commit 4ebc1c44a633d86772df1828c51180707769c3dc)
If this line of code were ever called, it would crash anyway,
because it would be an unknown type, and Recipient.type_name()
would raise a KeyError.
(imported from commit db38c5f71fb2f0b044a832eb88e53fceb0d8a9cf)
This is a variation of get_display_recipient that takes
values instead of an object, so that it is decoupled from
the Django object system.
(imported from commit 25bed43ecd62f1fe0176d517b7003e7f4c78bc37)
If it's ok for the tests to use memcached, it should be ok
for them to use the in-process cache too.
(imported from commit be43879c3c48f3780317fd5b4139b44d4a1f0ed3)
This is a harmless extraction designed to allow subclasses override
the behavior of how rendered content gets saved.
(imported from commit 9df4ed9f86c857897fcb5f2b6781bfc5a0813766)
The realm should always be the realm of the stream, and we should
always pass in a stream rather than sometimes passing in a stream name
and other times passing in a stream.
(imported from commit a098d6ed3db218a37c1b6b7c956e847c316c2d13)
There is a scenario where we call process_read_message()
for a message that we haven't recorded as unread before.
I'm not sure how it happens, but I put back code to
guard against crashing. The regression happened in
5752458c821.
(imported from commit 5ce15d2e236b738b445ed88f1733aa0612be0ff3)
We have been persisting muting preferences on the back end for
a while, but we haven't been adding them to page_params for the
client to have at reload/startup time.
(imported from commit d9ca68aa0e4d22bfb0e6ce67fc0bc63981175c8b)
Update get_counts() so that it ignores counts for muted topics
when calculating stream/home unread counts.
(imported from commit 9b4e4da4346c225c535e97d709d3dee032603cc5)
The indirection was more confusing than helpful, especially
since the function had side effects, despite its getter-like
name.
(imported from commit 85d9cf642b4177f62488136f0e0f7f6c9304942e)
Empirically, we only get these for malformed emails where the charset
specified in a message part header does not match the true encoding of
the part. I checked what the resulting Zulip looked like for the
original offender, and it looked find with ignoring errors.
(imported from commit ac6ba65b611cb22d4ec547b75a585abce6fc50b0)
We now bulk-fetch subscription information once from the database
and use it throughout bulk_add_subscriptions in order to avoid
hitting the db O(streams) times.
On my machine this shaved the accounts_register API call from making
66 queries to making 37 queries.
(imported from commit 5dd5ad3f50b2a6edf85b5f1d55ebd697a1c60647)
We have a handy bulk_add_subscriptions function to make cases
like this fast, so lets use it.
On my machine this reduces the number of db queries during account_register
from 112 to 66.
(imported from commit 21a6b31d0f229998d095735b8c581a50ca6aab66)
It shows domains and how many active users they have. A
user is consider active if they have done something at least
as active as updating their pointer in the last day. Domains
with no meaningful activity in the last two weeks are excluded
from the report.
(imported from commit 700cecfc7f1732e9ac3ea590177da18f75b01303)
A small functional change here was to eliminate an enormous "Usage"
headline that was already implicit from the tabs. It would have
complicated the refactoring to try to preserve it, and I don't think
anyone will miss it.
Extracting this template will give us a little more flexibility
to customize future tabs in the /activity page.
(imported from commit bdb0b7030c8ec1e20d4451dc059830c3f5ea7632)
We are still showing the same data points, but the logic to drill
down on details for a particular realm is now all server side,
not client side, and we are smarter about omitting fields. In
summary mode, we don't show empty Name or Email columns. In
detailed mode, we show the realm as a headline instead of a column.
In this version you do lose the ability to see all system users in
the same view, but Waseem is ok with this.
(imported from commit edd2e646ab4cf5783ea64232d0cd621debece8d4)
Before, it was trying to use connection.queries, but Django
could pull the rug out from under us. Now we monkeypatch
the CursorDebugWrapper methods instead.
(imported from commit 25d5bab47673f2b66a6325f48d33e66c31055ab3)
When we send a message, we send some presence information to Tornado
to help it figure out how to generate emails for idle recipients of
a message. This change limits the presence info to being the
intersection of present users and recipients of the message. It is
just an internal optimization to avoid queueing up unneeded data.
The history behind this feature is that I implemented it a while
back, but I think I made a rebase mistake that sent all the presence
data over the wire, despite having code to filter on recipients.
It was mostly harmless, just leading to some inefficiency which is
now fixed.
(imported from commit 7c8e97705afb299c67b99053909e952fbc823551)
When you load the activity report, it will just show summary
counts for realms, but if you click on a realm, you will see
details about users in the realms. You can also click "Show all"
to see an interleaved view of realms and users.
(imported from commit b106557b1fae64d525071afc124b5a8aed319086)
Add rows to the activity report that roll up counts for all
users on each realm, to go along with individual users.
(imported from commit 8104f3ef7fbe406fe0fd2ba1bb00ce76a1ccbee5)
The overall message charset may be null or not match the part's
charset. Even though it's unclear from the documentation,
experimentally using the charset for a message part seems to give you
the charset even for non-multipart emails.
(imported from commit 0e1d23073f4c53041f9760e66a6635f8a94893d1)
For a 4-person stream, we were hitting the DB 8 times, and 4 of
those queries were to lazily get user.email for the 4 recipients
due to upstream code using only(). I added user_profile__email
to the only() call.
I believe this regression started 9/18, and after pushing this
to prod, we would should look at this graph:
https://stats1.zulip.net/graphs/8274cd84588
(imported from commit 70629cb69fe5955c674ba76482609dfe78e5faaf)
This is useful in debugging when you just want to discard all the
messages in a queue because they have the wrong structure.
(imported from commit 8559ac74f11841430b4d0c801d5506ebcb74c3eb)
Instead of collapsing muted messages, just hide them altogether
in view where it makes sense to hide them.
(imported from commit 1c2c987ff302ceb135a025753cf421b4de1aea71)
Use stream.num_subscribers() in check_if_a_bot_is_sending_a_message_to_an_empty_stream().
The num_subscribers() function using Django's count() method, which returns
a single row, vs. len() on an iterator of query rows.
(imported from commit 6157fe248945e9288ee71d8cc39fb6dda4e9a247)
This tests that a bot's owner gets sent a message if the bot
sends a message to a stream with no subscribers. (Presumably
the message will be a PM; we could make the test more precise
in the future.)
(imported from commit 0aaf931a90cb9c7bc3fde8ac545c6b6ad0a55668)
This includes:
* Merging the pull request and issue subject creation functions
* Factoring out pull request and issue content creation
(imported from commit 9cf90e999482a1998431e6483788522101607167)
Some bots created by us do not have owners. Don't try to send a
message to the nonexistent owner.
(imported from commit ab952eccd7d6c4728e9477a106142214b5c81ca9)
Instead just rely on the 2-minute delay in the management command to
batch conversations.
We've had people report being confused or thinking the feature was
broken when they didn't get e-mails because of our rate-limiting, so
let's see if this is not too overwhelming.
(imported from commit 706ddb07b906b5c2edea1159c04acc2ee6f06e29)
Warn inside these functions when you get data on streams that you
are not subscribed to:
add_subscriber
remove_subscriber
user_is_subscribed
The back end should be smart enough not to spam us with subscriber
info that we don't care about.
(imported from commit b27644be2abc37c11ddff884ef392ea208bd1bd3)
Don't send peer_add notifications to users who are already
getting add notifications, because they will already know
about subscribers.
(imported from commit 726b54ae0e30b71440b17d9c51b026872ea96218)
It only grabs the user_profile_id column now. This leads to a
speedup of about 16x between grabbing large ORM objects vs.
small 1-column dictionaries.
(imported from commit 95150bff3fdcbe250b04f014062224af42a6644f)
Splitting out notify_peers() will give us flexibility for cleaning
up how we notify peers for bulk adds.
(imported from commit e108fa2c432cc1fe54d788c58c82c983e0f2394e)
If you expand subscribers on your settings page, you will now see
a query like this in your postgres logs:
SELECT "zerver_userprofile"."email"
FROM "zerver_subscription" INNER JOIN "zerver_recipient" ON ("zerver_subscription"."recipient_id" = "zerver_recipient"."id") INNER JOIN "zerver_userprofile" ON ("zerver_subscription"."user_profile_id" = "zerver_userprofile"."id") WHERE ("zerver_recipient"."type" = 2 AND "zerver_subscription"."active" = true AND "zerver_recipient"."type_id" = 40 AND "zerver_userprofile"."is_active" = true )
The join's still complicated, but the list of fields is one instead of 40+.
(imported from commit 48de1f888193a4d23fcea52d0b633d134e4a3ff7)
get_subscribers_backend() now calls the new get_subscriber_emails()
function, which just queries the email field:
"zerver_userprofile"."email"
...instead of querying about 40 fields that it never uses.
I was able to verify the query slimming by watching my postgres server log.
Also, you can verify that the ORM does roughly 16x less work using values():
>>> def f(): return [sub.user_profile.email for sub in list(Subscription.objects.all().select_related())]
...
>>> def g(): return [row['user_profile__email'] for row in list(Subscription.objects.all().values('user_profile__email'))]
...
>>> def timeit(func): t = time.time(); func(); return time.time() - t
...
>>> timeit(f)
0.045198917388916016
>>> timeit(g)
0.002752065658569336
(imported from commit a69f690a96d076b323fdfc2f4821b0548bdfac7f)
LinkPattern returned a string which contained a placeholder if the URL was
considered invalid. AtomicLinkPattern wrapped this in an AtomicString,
where the placeholder doesn't get removed properly.
m.group(0) is always incorrect because python-markdown modifies your regex
to include more than you specified (this is why part of the message got
duplicated).
(imported from commit 576bdf09c2b677cf4bc56484c363eb05f2110158)