The `get_link_embed_data` / `link_embed_data_from_cache` pair as
introduced in c93f1d4eda uses the cache
as a temporary store inside of the `embed_links` worker; this means
that it must be durable storage, or the worker will stall and re-fetch
the same links to preview them.
Switch to plumbing through the fetched URL embed data as an parameter
to the Markdown evaluation which uses them, rather than using the
cache as an intermediary. This frees up the cache to be merely a
non-durable cache.
As a side-effect, this removes get_cache_with_key, and
link_embed_data_from_cache which was its only callsite.
This fixes a batch of mypy errors of the following format:
'Item "None" of "Optional[Something]" has no attribute "abc"
Since we have already been recklessly using these attritbutes
in the tests, adding assertions beforehand is justified presuming
that they oughtn't to be None.
This commit migrates some of the backend tests to use assertLogs(),
instead of mock.patch() as planned in #15331.
Tweaked by tabbott to avoid tautological assertions.
There were some tests that had mock patches for logging, although no
logging was actually happening there. This commit removes such patches
in `corporate/tests/test_stripe.py`, `zerver/tests/test_cache.py`,
`zerver/tests/test_queue_worker.py`,
and `zerver/tests/test_signup.py`.
Before this change we were clearing the cache on
every SQL usage.
The code to do this was added in February 2017
in 6db4879f9c.
Now we clear the cache just one time, but before
the action/request under test.
Tests that want to count queries with a warm
cache now specify keep_cache_warm=True. Those
tests were particularly flawed before this change.
In general, the old code both over-counted and
under-counted queries.
It under-counted SQL usage for requests that were
able to pull some data out of a warm cache before
they did any SQL. Typically this would have bypassed
the initial query to get UserProfile, so you
will see several off-by-one fixes.
The old code over-counted SQL usage to the extent
that it's a rather extreme assumption that during
an action itself, the entries that you put into
the cache will get thrown away. And that's essentially
what the prior code simulated.
Now, it's still bad if an action keeps hitting the
cache for no reason, but it's not as bad as hitting
the database. There doesn't appear to be any evidence
of us doing something silly like fetching the same
data from the cache in a loop, but there are
opportunities to prevent second or third round
trips to the cache for the same object, if we
can re-structure the code so that the same caller
doesn't have two callees get the same data.
Note that for invites, we have some cache hits
that are due to the nature of how we serialize
data to our queue processor--we generally just
serialize ids, and then re-fetch objects when
we pop them off the queue.
This was hiding an actual type error in test_cache: a mismatch between
the object ID type, which is str, and the default id_fetcher, which
returns int.
Mypy’s insufficient support for default generic arguments basically
means we can’t use them without a lot of overloading, and there are
not enough callers here to justify that.
https://github.com/python/mypy/issues/3737
We avoid this being super messy where the code calls this by adding
some less generic wrappers for generic_bulk_cached_fetch.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Generated by pyupgrade --py36-plus --keep-percent-format, but with the
NamedTuple changes reverted (see commit
ba7906a3c6, #15132).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
mock is just a backport of the standard library’s unittest.mock now.
The SAMLAuthBackendTest change is needed because
MagicMock.call_args.args wasn’t introduced until Python
3.8 (https://bugs.python.org/issue21269).
The PROVISION_VERSION bump is skipped because mock is still an
indirect dev requirement via moto.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
We try to use the correct variation of `email`
or `delivery_email`, even though in some
databases they are the same.
(To find the differences, I temporarily hacked
populate_db to use different values for email
and delivery_email, and reduced email visibility
in the zulip realm to admins only.)
In places where we want the "normal" realm
behavior of showing emails (and having `email`
be the same as `delivery_email`), we use
the new `reset_emails_in_zulip_realm` helper.
A couple random things:
- I fixed any error messages that were leaking
the wrong email
- a test that claimed to rely on the order
of emails no longer does (we sort user_ids
instead)
- we now use user_ids in some place where we used
to use emails
- for IRC mirrors I just punted and used
`reset_emails_in_zulip_realm` in most places
- for MIT-related tests, I didn't fix email
vs. delivery_email unless it was obvious
I also explicitly reset the realm to a "normal"
realm for a couple tests that I frankly just didn't
have the energy to debug. (Also, we do want some
coverage on the normal case, even though it is
"easier" for tests to pass if you mix up `email`
and `delivery_email`.)
In particular, I just reset data for the analytics
and corporate tests.
Fixes#13504.
This commit is purely an improvement in error handling.
We used to not do any validation on keys before passing them to
memcached, which meant for invalid keys, memcached's own key
validation would throw an exception. Unfortunately, the resulting
error messages are super hard to read; the traceback structure doesn't
even show where the call into memcached happened.
In this commit we add validation to all the basic cache_* functions, and
appropriate handling in their callers.
We also add a lot of tests for the new behavior, which has the nice
effect of giving us decent coverage of all these core caching
functions which previously had been primarily tested manually.
In the unlikely event that someone edited the properties of a system
bot and then saved the result, we were still caching the old version
indefinitely in the get_system_bot cache.
This led to a confusing case where a newly installed Zulip server
didn't have is_api_super_user properly set on its EMAIL_GATEWAY_BOT in
memcached.
Co-authored-by: Mateusz Mandera <mateusz.mandera@protonmail.com>