We have historically cached two types of values
on a per-request basis inside of memory:
* linkifiers
* display recipients
Both of these caches were hand-written, and they
both actually cache values that are also in memcached,
so the per-request cache essentially only saves us
from a few memcached hits.
I think the linkifier per-request cache is a necessary
evil. It's an important part of message rendering, and
it's not super easy to structure the code to just get
a single value up front and pass it down the stack.
I'm not so sure we even need the display recipient
per-request cache any more, as we are generally pretty
smart now about hydrating recipient data in terms of
how the code is organized. But I haven't done thorough
research on that hypotheseis.
Fortunately, it's not rocket science to just write
a glorified memoize decorator and tie it into key
places in the code:
* middleware
* tests (e.g. asserting db counts)
* queue processors
That's what I did in this commit.
This commit definitely reduces the amount of code
to maintain. I think it also gets us closer to
possibly phasing out this whole technique, but that
effort is beyond the scope of this PR. We could
add some instrumentation to the decorator to see
how often we get a non-trivial number of saved
round trips to memcached.
Note that when we flush linkifiers, we just use
a big hammer and flush the entire per-request
cache for linkifiers, since there is only ever
one realm in the cache.
This swaps out url_format_string from all of our APIs and replaces it
with url_template. Note that the documentation changes in the following
commits will be squashed with this commit.
We change the "url_format" key to "url_template" for the
realm_linkifiers events in event_schema, along with updating
LinkifierDict. "url_template" is the name chosen to normalize
mixed usages of "url_format_string" and "url_format" throughout
the backend.
The markdown processor is updated to stop handling the format string
interpolation and delegate the task template expansion to the uri_template
library instead.
This change affects many test cases. We mostly just replace "%(name)s"
with "{name}", "url_format_string" with "url_template" to make sure that
they still pass. There are some test cases dedicated for testing "%"
escaping, which aren't relevant anymore and are subject to removal.
But for now we keep most of them as-is, and make sure that "%" is always
escaped since we do not use it for variable substitution any more.
Since url_format_string is not populated anymore, a migration is created
to remove this field entirely, and make url_template non-nullable since
we will always populate it. Note that it is possible to have
url_template being null after migration 0422 and before 0424, but
in practice, url_template will not be None after backfilling and the
backend now is always setting url_template.
With the removal of url_format_string, RealmFilter model will now be cleaned
with URL template checks, and the old checks for escapes are removed.
We also modified RealmFilter.clean to skip the validation when the
url_template is invalid. This avoids raising mulitple ValidationError's
when calling full_clean on a linkifier. But we might eventually want to
have a more centric approach to data validation instead of having
the same validation in both the clean method and the validator.
Fixes#23124.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
The Django convention is for __repr__ to include the type and __str__
to omit it. In fact its default __repr__ implementation for models
automatically adds a type prefix to __str__, which has resulted in the
type being duplicated:
>>> UserProfile.objects.first()
<UserProfile: <UserProfile: emailgateway@zulip.com <Realm: zulipinternal 1>>>
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This adds a helper based on testing patterns of using the "queries_captured"
context manager with "assert_length" to check the number of queries
executed for preventing performance regression.
It explains the rationale of checking the query count through an
"AssertionError" and prints the queries captured as assert_length does,
but with a format optimized for displaying the queries in a more
readable manner.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This commit adds the OPTIONAL .realm attribute to Message
(and ArchivedMessage), with the server changes for making new Messages
have this set. Old Messages still have to be migrated to backfill this,
before it can be non-nullable.
Appropriate test changes to correctly set .realm for Messages the tests
manually create are included here as well.
We now complain if a test author sends a stream message
that does not result in the sender getting a
UserMessage row for the message.
This is basically 100% equivalent to complaining that
the author failed to subscribe the sender to the stream
as part of the test setup, as far as I can tell, so the
AssertionError instructs the author to subscribe the
sender to the stream.
We exempt bots from this check, although it is
plausible we should only exempt the system bots like
the notification bot.
I considered auto-subscribing the sender to the stream,
but that can be a little more expensive than the
current check, and we generally want test setup to be
explicit.
If there is some legitimate way than a subscribed human
sender can't get a UserMessage, then we probably want
an explicit test for that, or we may want to change the
backend to just write a UserMessage row in that
hypothetical situation.
For most tests, including almost all the ones fixed
here, the author just wants their test setup to
realistically reflect normal operation, and often devs
may not realize that Cordelia is not subscribed to
Denmark or not realize that Hamlet is not subscribed to
Scotland.
Some of us don't remember our Shakespeare from high
school, and our stream subscriptions don't even
necessarily reflect which countries the Bard placed his
characters in.
There may also be some legitimate use case where an
author wants to simulate sending a message to an
unsubscribed stream, but for those edge cases, they can
always set allow_unsubscribed_sender to True.
A bug in the implementation of the can_forge_sender permission
(previously is_api_super_user) resulted in users with this permission
being able to send messages appearing as if sent by a system bots,
including to other organizations hosted by the same Zulip installation.
- The send message API had a bug allowing an api super user to
use forging to send messages to other realms' streams, as a
cross-realm bot. We fix this most directly by eliminating the
realm_str parameter - it is not necessary for any valid current use
case. The email gateway doesn't use this API despite the comment in
that block suggesting otherwise.
- The conditionals inside access_stream_for_send_message are changed up
to improve security. They were generally not ordered very well,
allowing the function to successfully return due to very weak
acceptance conditions - skipping the higher importance checks that
should lead to raising an error.
- The query count in test_subs is decreased because
access_stream_for_send_message returns earlier when doing its check
for a cross-realm bot sender - some subscription checking queries are
skipped.
- A linkifier test in test_message_dict needs to be changed. It didn't
make much sense in the first place, because it was creating a message
by a normal user, to a stream outside of the user's realm. That
shouldn't even be allowed.
Clock time checks lead to tests that nondeterministically fail when
the CI container is super slow, and there's no good reason this test
in particular needs to do that sort of test in addition to our
standard database query count check (which is already does).
This commit moves TestMessageForIdsDisplayRecipientFetching class which
have tests regarding display_recipient filled in by MessageDict to
test_message_dict.py.