The main purpose of this new function is to allow
us to validate emails in bulk, which we don't do
yet (still setting the stage for that).
This is still a speedup, though, since in our
caller we grab only three fields now.
And other than that, we're essentially doing
the same query for the single-email case, just
outside the loop.
We are trying to kill off `validate_email`, so
we no longer call it from these tests.
These tests are already kind of low-level in
nature, so testing the more specific helpers
here should be fine.
Note that we also make the third parameter
to `validate_email` non-optional in this commit,
to preserve 100% coverage. This is really just
refactoring noise--we will soon eliminate the
entire function, but I didn't want to do everything
in a huge commit.
This is a prep commit that will allow us
to more efficiently validate a bunch of
emails in the invite UI.
This commit does not yet change any
behavior or performance.
A secondary goal of this commit is to
prepare us to eliminate some hackiness
related to how we construct
`ValidationError` exceptions.
It preserves some quirks of the prior
implementation:
- the strings we decided to translate
here appear haphazard (and often
get ignored anyway)
- we use `msg` in most codepaths,
but use `code` for invites
Right now we never actually call this with
more than one email, but that will change
soon.
Note that part of the rationale for the inner
method here is to avoid a test coverage bug
with `continue` in loops.
This has two goals:
- sets up a future commit to bulk-validate
emails
- the extracted function is more simple,
since it just has errors, and no codes
or deactivated flags
This commit leaves us in a somewhat funny
intermediate state where we have
`action.validate_email` being a glorified
two-line function with strange parameters,
but subsequent commits will clean this up:
- we will eliminate validate_email
- we will move most of the guts of its
other callee to lib/email_validation.py
To be clear, the code is correct here, just
kinda in an ugly, temporarily-disorganized
intermediate state.
We now use the `get_realm_email_validator()`
helper to build an email validator outside
the loop of emails in our invite list.
This allows us to perform RealmDomain queries
only once per request, instead of once per
email.
Now called:
validate_email_not_already_in_realm
We have a separate validation function that
makes sure that the email fits into a realm's
domain scheme, and we want to avoid naming
confusion here.
Without the fix here, you will get an exception
similar to below if you try to invite one of the
cross realm bots. (The actual exception is
a bit different due to some rebasing on my branch.)
File "/home/zulipdev/zulip/zerver/lib/request.py", line 368, in _wrapped_view_func
return view_func(request, *args, **kwargs)
File "/home/zulipdev/zulip/zerver/views/invite.py", line 49, in invite_users_backend
do_invite_users(user_profile, invitee_emails, streams, invite_as)
File "/home/zulipdev/zulip/zerver/lib/actions.py", line 5153, in do_invite_users
email_error, email_skipped, deactivated = validate_email(user_profile, email)
File "/home/zulipdev/zulip/zerver/lib/actions.py", line 5069, in validate_email
return None, (error.code), (error.params['deactivated'])
TypeError: 'NoneType' object is not subscriptable
Obviously, you shouldn't try to invite a cross
realm bot to your realm, but we want a reasonable
error message.
RESOLUTION:
Populate the `code` parameter for `ValidationError`.
BACKGROUND:
Most callers to `validate_email_for_realm` simply catch
the `ValidationError` and then report a more generic error.
That's also what `do_invite_users` does, but it has the
somewhat convoluted codepath through `validate_email`
that triggers this code:
try:
validate_email_for_realm(user_profile.realm, email)
except ValidationError as error:
return None, (error.code), (error.params['deactivated'])
The way that we're using the `code` parameter for
`ValidationError` feels hacky to me. The intention
behind `code` is to provide a descriptive error to
calling code, and it's not intended for humans, and
it feels strange that we actually translate this in
other places. Here are the Django docs:
https://docs.djangoproject.com/en/3.0/ref/forms/validation/
And then here's an example of us actually translating
a code (not part of this commit, just providing context):
raise ValidationError(_('%s already has an account') %
(email,), code = _("Already has an account."),
params={'deactivated': False})
Those codes eventually get put into InvitationError, which
inherits from JsonableError, and we do actually display
these errors in the webapp:
if skipped and len(skipped) == len(invitee_emails):
# All e-mails were skipped, so we didn't actually invite anyone.
raise InvitationError(_("We weren't able to invite anyone."),
skipped, sent_invitations=False)
I will try to untangle this somewhat in upcoming commits.
Previously, the input:
====================
- One
- Two
Two continued
====================
Would produce the same output as:
====================
- One
- Two
```
Two continued
```
====================
This was because our CodeBlockProcessor had a higher priority than
the ListIndentProcessor. This issue was discussed here:
https://chat.zulip.org/#narrow/stream/9-issues/topic/continuation.20paragraphs.20in.20list.20items.
/delete_topic endpoint could be used to request the deletion of a topic,
that would cause do_delete_messages to be called with an empty set in
these cases:
1. Requesting deletion of an empty stream.
2. Requesting deletion of a topic in a private stream with history not
public to subscribers, if the requesting admin doesn't have access to
any of the messages in that topic.
This function slims down the data that we get
from the database in order to create the
streams part of our client payload.
We also fix a typo.
We also clearly distinguish between queries
and lists here.
This new method prevents us from getting fat
objects from the database.
Instead, now we just get ids from the database
to build our subqueries.
Note that we could also technically eliminate
the `set(...)` wrappers in this code to have
Django make a subquery and save a round trip.
I am postponing that for another commit (since
it's still somewhat coupled to some other
complexity in `do_get_streams` that I am trying
to cut through, plus it's not the main point
of this commit.)
BEFORE:
# old, still in use for other codepaths
def get_stream_subscriptions_for_user(user_profile: UserProfile) -> QuerySet:
# TODO: Change return type to QuerySet[Subscription]
return Subscription.objects.filter(
user_profile=user_profile,
recipient__type=Recipient.STREAM,
)
user_subs = get_stream_subscriptions_for_user(user_profile).filter(
active=True,
).select_related('recipient')
recipient_check = Q(id__in=[sub.recipient.type_id for sub in user_subs])
AFTER:
# newly added
def get_subscribed_stream_ids_for_user(user_profile: UserProfile) -> QuerySet:
return Subscription.objects.filter(
user_profile_id=user_profile,
recipient__type=Recipient.STREAM,
active=True,
).values_list('recipient__type_id', flat=True)
subscribed_stream_ids = get_subscribed_stream_ids_for_user(user_profile)
recipient_check = Q(id__in=set(subscribed_stream_ids))
We calculate `max_message_id` for the mobile client.
Our query now no longer joins to the Message table
and just grabs one value instead of fat objects.
For historical reasons we were creating Recipient
objects at some point in the typing-notifications
codepath. Now we just work with UserProfiles.
This removes some queries, as indicated by
the change to `len(queries)` in a couple of the
tests.
The one subtle thing that changes here is huddles.
If user 10 sends a typing notification that they
are talking to users 20 and 30, there might not
actually be a huddle for users 10/20/30, but
we were actually creating huddles on the fly!
There is no need to create huddles just for
typing notifications, since we don't even
share huddle ids with our clients. The clients
just infer the huddles.
Some of the code that gets killed off here as
somewhat "collateral damage" is some
defensive code related to formerly supporting streams
in typing indicators. The support for streams
was killed off almost as soon as we released
the feature, and the codepath is pretty clearly
user-centric at this point.
I actually like this pattern:
def check_send_typing_notification(...):
typing_notification = check_typing_notification(...)
do_send_typing_notification(...)
It can help divide responsibilities nicely and make it easy
to write detailed unit tests against each of the two helpers.
Unfortunately, the good things didn't really happen here, and
instead we got the worst aspects of the pattern:
- The responsibilities for validation leaked into
the second function.
- Both functions were doing sane things individually
that became not-so-sane in the big picture (namely,
we ended up making Recipient objects for no reason,
but if you read each of the helpers, it was just one
step that seemed reasonable).
- Passing around dictionaries for results can be annoying.
Also, the pattern made a lot more sense when the validation
for typing was a lot more complicated. My prior commit makes
it so that we only ever deal with a list of user_ids.
Anyway, now I'm inlining it. :)
Subsequent commits will clean up the more substantive issue
here, which is that we are building Recipients for no reason.
Before the Django 2.x upgrade, the DatabaseCreation
argument took an integer value. To deal with running
mulitple test instances, we created a random start
range that could count up 100 workers until the next
random id. Arbitrarily limiting the number of workers
to 100.
Post upgrade, we can now use string values. Enabling
the database + worker numbers to be more readable, as
well as removing the cap on the worker count.
This field wasn't accessed by any clients and was a less robust
version of the user_id field. Any client hoping to be interested in
who did message edits should be able to handle working with user IDs
rather than email addresses.
This is preparation for supporting moving messages between streams in
some cases.
It doesn't actually have any functional effect, since flush_message
clears the message unconditionally anyway.
This is mostly refactoring, but we also prevent a new
type of value error (list of non-int-or-string). The
new test code helps enforce that.
Cleanup includes:
- Use early-exit for email case.
- Rename helpers to get_validate_*.
- Avoid clumsy rebuilding of lists in helpers.
- Avoid the confusing `recipient` name (which
can be confused with the model by the same
name).
- Just delegate duplicate-id/email-removal to
the helpers.
The cleaner structure allows us to elminate a couple
mypy workarounds.
Credits to @xpac1985 for reporting, debugging and proposing fix to the
issue. The proposed fix was modified slightly by @hackerkid to set the
correct value for max_invites and upload_quota_gb. Tests added by
@hackerkid.
Fixes#13974
This fixes a confusing aspect of how our automated tests worked
previously, where we'd almost all HTTP requests in the unlikely
configuration with no User-Agent string specified.
We need to adjust query counts in a few tests that now are a bit
cheaper because they now can take advantage of a Client object created
in server_initialization.py in `process_client`.
To avoid some hidden bugs in tests caused by every ldap user having the
same password, we give each user a different password, generated based
on their uids (to avoid some ugly hard-coding in a bunch of places).
‘req_var in request.GET’ was previously believed to be slow from
profiling results. However, the real explanation for those profiling
results is that WSGIRequest.GET is a lazy cached property, so there’s
no reason to avoid it if we’re accessing request.GET anyway.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
In https://github.com/zulip/zulip/pull/12823 some changes to the realms
structure have been made, so now both in production and development
cross-realm bots live in the realm with string_id "zulipinternal".
There was a TODO in retention code to eliminate a conditional in a query
that became redundant with this change, and also the zulipinternal realm
should be omitted from the archiving process in archive_messages().
We use this single regular expression for processing essentially every
request, so it's definitely worth hinting to Python that we're going
to do so by compiling it. Saves about 40us per request.
A sloppy implementation of the main has_request_variables wrapper
function meant that it did two very inefficient things:
* To combine together the GET and POST parameters, it would make a
copy of the request.GET QueryDict object, which combined with the
fact that these objects are slow to access, consumed about 90us per
argument.
* Doing this in a loop (one time per argument), rather than once,
which resulted in us doing this 11 times for a `GET /events` query.
Fixing this to just make a dictionary and combine things with some
small loops saved about 1 millisecond from the total runtime of GET
/events (for comparison, the total actual work of that view function
is about 700ms).
We need to fix at least one test that used a bad mock HttpRequest
object that didn't have a .GET property.
The comment explains this issue, but effectively, the upgrade to
Django 2.x means that Django's built-in django.request logger was
writing to our errors logs WARNING-level data for every 404 and 400
error. We don't consider user errors to be a problem worth
highlighting in that log file.