In templates/zerver/api/main.html, since the current context isn't
passed to render_markdown_path when rendering an article,
render_markdown_path doesn't have the context to render values such
as api_url. This commit makes sure that it does by passing a dict
called api_uri_context to render_markdown_path when rendering an
article.
This commit puts the guts of parse_usermessage_flags into
UserMessage.flags_list_for_flags, since it was slightly faster
than the old implementation and produced the same results.
(Both algorithms were super fast, actually.)
And then all callers use the model method now.
The logic to set search_fields was essentially the same for both
sides of the include_history conditional.
Now we have just one code block that sets search_fields, and we
can quickly short-circuit the loop when is_search is False.
This change affects realm_users and realm_non_active_users.
Note that we still send full avatar urls in realm_user/add
events, so apply_events has to do something mildly hacky to
turn the avatar_url to None in that case.
Fixing the event is probably not worth the trouble, as single
urls are not bandwidth hogs; we only need this optimization
for bulk data.
This change affects these values:
* page_params.avatar_url
* page_params.avatar_url_medium
It requires passing the client_gravatar flag through this
codepath:
* home_real
* do_events_register
* fetch_initial_state_data
* avatar_url
Seems like the more logical check. Also, the previous code makes it feel
like there is a potential vulnerability where one could get an email change
object in a realm where email changes are disabled, and then open that link
while logged in to a different realm.
While we're at it, remove the unnecessary check that the user is
logged in when clicking the confirmation link; that creates
unnecessary trouble for users who use multiple browsers.
Removes an assert, which at this point is there just for readability, since
the second argument to
get_object_from_key(confirmation_key, Confirmation.EMAIL_CHANGE)
ensures that the returned object is of the correct type.
This commit allows clients to register client_gravatar=True, and
then we recognize that flag for message events. If the flag is
True, we will not calculate gravatar URLs and let the clients do
it themselves. (Clients can calculate gravatar URLs based on
emails with just a little bit of code.)
This refactoring doesn't change behavior, but it sets us up
to more easily handle a register setting for `client_gravatar`,
which will allow clients to tell us they're going to compute
their own gravatar URLs.
The `client_gravatar` flag already exists in our code, but it
is only used for Django views (users/messages) but not for
Zulip events.
The main change is to move the call to `set_sender_avatar` into
`finalize_payload`, which adds the boolean `client_gravatar`
parameter to that function. And then we update various callers
to supply that flag.
One small performance benefit of this change is that we now
lazily compute the client message payloads in
`event_queue.process_message_event` now, so this will improve
performance if all interested clients have the same value of
`apply_markdown`. But the change here is really preparing us
for the additional boolean parameter, which will cause us to
have four variations of the payload.
This gets used when we call `process_client`, which we generally do at
some kind of login; and in particular, we do in the shared auth
codepath `login_or_register_remote_user`. Add a decorator to make it
easy, and use it on the various views that wind up there.
In particular, this ensures that the `query` is some reasonable
constant corresponding to the view, as intended. When not set, we
fall back in `update_user_activity` on the URL path, but in particular
for `log_into_subdomain` that can now contain a bunch of
request-specific data, which makes it (a) not aggregate properly, and
(b) not even fit in the `CHARACTER VARYING(50)` database field we've
allotted it.
The only place this attribute is used is in `update_user_activity`,
called only in `process_client`, which won't happen if we end up
returning a redirect just below. If we don't, we go and call
`add_logging_data` just after, which takes care of this already.
This won't work for all call paths without deeper refactoring,
but for at least some paths we can make this more direct -- function
arguments, rather than mutating a request attribute -- so it's easier
to see how the data is flowing.
I remember being really confused by this function in the past, and I finally
figured it out. It should be removed, and the dev_url added by
00-realm-creation should call a function that just gets the confirmation_key
from outbox like all of the backend tests, but until then this comment
should help.
This change:
* Prevents weird potential attacks like taking a valid confirmation link
(say an unsubscribe link), and putting it into the URL of a multiuse
invite link. I don't know of any such attacks one could do right now, but
reasoning about it is complicated.
* Makes the code easier to read, and in the case of confirmation/views.py,
exposes something that needed refactoring anyway (USER_REGISTRATION and
INVITATION should have different endpoints, and both of those endpoints
should be in zerver/views/registration, not this file).
This test helper method duplicated a bunch of logic in
`zerver/worker/queue_processors.py` in a specialized fashion for the
tests. Now that we're using `call_consume_in_tests` in this code
path, we don't need it.
Before this commit, ResponseMock() was initialized
with a data attribute, which isn't used in the tests
and does not occur in the outgoing webhook code.
The main limitation of this version is that it's controlled entirely
from settings, with nothing in the database and no web UI or even
management command to control it. That makes it a bit more of a
burden for the server admins than it'd ideally be, but that's fine
for now.
Relatedly, the web flow for realm creation still requires choosing a
subdomain even if the realm is destined to live at an alias domain.
Specific to the dev environment, there is an annoying quirk: the
special dev login flow doesn't work on a REALM_HOSTS realm. Also,
in this version the `add_new_realm` and `add_new_user` management
commands, which are intended for use in development environments only,
don't support this feature.
In manual testing, I've confirmed that a REALM_HOSTS realm works for
signup and login, with email/password, Google SSO, or GitHub SSO.
Most of that was in dev; I used zulipstaging.com to also test
* logging in with email and password;
* logging in with Google SSO... far enough to correctly determine
that my email address is associated with some other realm.
The original PR to allow generic bots to be mentioned had
some merge issues that we detected about a week after the
fact. This commit restores the logic from the original PR.
The reason we didn't detect this bug earlier is that the
merge issues didn't break any existing behavior. Instead,
they made it so that only UserMessage rows got written for
bots, but no events were being set. The part of the commit
that got lost is restored here, so now events get sent as
well.
Thanks to @derAnfaenger for reporting this and being patient
as we tracked it down.
Fixes#7140
This adds the data model and bugdown support for the new UserGroup
mention feature.
Before it'll be fully operational, we'll still need:
* A backend API for making these.
* A UI for interacting with that API.
* Typeahead on the frontend.
* CSS to make them look pretty and see who's in them.
tsearch_extras returns search offsets in bytes but our highlight
function treated them as character offsets. Added a check to subtract
extra bytes if the tsearch search backend is being used.
Fixes#4084.
Fixes#7021.
This commit allows for the /api-new/ page to rendered similarly to our
/help pages. It's based on the old content for /api, but we're not
replacing the old content yet, to give a bit of time to restructure
things reasonably.
Tweaked by eeshangarg and tabbott.
Because this is for tests, a heuristic like this that's right in most
situations is actually fine; we can override it in the few cases where
a test might set up a situation where it fails.
So just make it clear for the next reader that that's what's going on,
and also adjust the helper's interface slightly so that its callers
do have that flexibility.
The "subdomain" label is redundant, to the extent it's even
accurate -- this is really just the URL we want to display,
which may or may not involve a subdomain. Similarly "external".
The former `external_api_path_subdomain` was never a path -- it's a
host, followed by a path, which together form a scheme-relative URL.
I'm not quite convinced that value is actually the right thing in
2 of the 3 places we use it, but fixing that can start by giving an
accurate name to the thing we have.
This was part of the logic to handle EXTERNAL_API_PATH varying.
But also it was already no longer used -- it was only ever passed
into template contexts, as `external_api_uri`, and it'd been
overtaken there by `external_api_uri_subdomain`.
So, update our dev docs to reflect that, and eliminate the variable.
This setting isn't documented at all, and I believe nobody has used it
since the end of api.zulip.com in 2016. So we get to complete the
cleanup of this logic.
We extract get_bulk_stream_subscriber_info() from this
function to remove some of the complexity. Also, in that
new function we avoid a hop to the database by querying
on stream ids instead of recipient ids. The query that
gets changed here does require a join to the recipient
table (to get the stream id), so it's a little bit of a
tradeoff.
There's an implicit assumption in bulk_remove_subscriptions
that all users belong to the same realm. We use the realm
for things like comparing occupied streams before and
after our main operation of deactivating streams.
Before this change, we just used the user_profile variable
that leaked from some prior loop to look up the realm, which
was super brittle.
Now we're a bit more explicit.
We were using Google's diff-match-patch library to diff HTML. The
problem with that approach is that it is a text differ, not an HTML
differ and so it ends up messing up the HTML tags. `lxml` is a safer
option.
Fixes: #7219.
Note that this code leads to a slightly different query, because
we join to one row in the small Recipient table to match
stream_id to recipient.type_id.
The first method we extract to this library is
get_active_subscriptions_for_stream_id().
We also move num_subscribers_for_stream_id() to here, which
is slightly annoying (having the method on Stream was nice)
but avoids some circular dependency issues.
The HelpView class will render a directory as markdown with an index HTML
page. This however can also be used for other generics and applied to
the API pages as well, so change the class to a generic class and
specify the path templates and names.
Tweaked by tabbott and Eeshan Garg.
FuncT was unused in decorator.py, and only imported into profile.py.
The @profiled decorator is now more strongly typed on return-type.
Annotations were converted to python3 format.
This extraction moves all the huddle logic into models.py, which
hopefully can reduce friction for things like re-organizing our
caches (there are two cache entries for every huddle) and/or
just putting huddle_id on Message directly.
Do you call get_recipient(Recipient.STREAM, stream_id) or
get_recipient(stream_id, Recipient.STREAM)? I could never
remember, and it was not very type safe, since both parameters
are integers.
Almost all callers to do_create_user were trying to
create active users, except for one test. The
active=False codepath was kind of broken (things
like sending welcome messages had sort of undefined
behavior there), so instead of trying to maintain it,
we just update the one test (`test_people`) to flip the
`is_active` flag manually.
Fixes#7197
We mostly introduce these functions (as part of a big
code sweep):
send_stream_message
send_personal_message
send_huddle_message
In two cases, where we want to specifically manipulate
queue ids, we now call check_send_message directly. (The
above three functions deliberately don't support kwargs
to ensure simple code and better type safety.)
Along with fixing some minor bugs, this requires extracting out the
default functions so that we can do type: ignores on them properly.
While we're at it, we switch to the Python 3 syntax.
If a Zulip install at example.org got a request at an HTTP `Host`
like foo.example.org.evil.com (or even foo.example.orgevil.com),
we would accept it as subdomain foo. This isn't likely to happen
in practice because it shouldn't pass ALLOWED_HOSTS, and it's not
obvious to me that anything untoward could be done with it even
if ALLOWED_HOSTS were set wide open, but if nothing else it
multiplies the cases in analyzing this logic.
The reason we had a loose match like this, I assume, is to allow
the user to come from arbitrary ports -- especially in development.
So tighten the pattern to allow just that, and add some tests for
that behavior and a comment explaining why this complication is
needed.
The cookie mechanism only works when passing the login token to a
subdomain. URLs work across domains, which is why they're the
standard transport for SSO on the web. Switch to URLs.
Tweaked by tabbott to add a test for an expired token.
This makes the tests a little cleaner in itself, and also prepares
them to adjust with less churn when we change how
redirect_and_log_into_subdomain passes the signed token.
Most of these have more to do with authentication in general than with
registering a new account. `create_preregistration_user` could go
either way; we move it to `auth` so we can make the imports go only in
one direction.
Lets administrators view a list of open(unconfirmed) invitations and
resend or revoke a chosen invitation.
There are a few changes that we can expect for the future:
* It is currently possible to invite an email that you have already
invited, it might make sense to change this behavior.
* Resend currently sends an invite reminder instead of resending the
original invite, this is because 'custom_body' was not stored when
the first invite was sent.
Tweaked in various minor ways, primarily in the backend, by tabbott,
mostly for style consistency with the rest of the codebase.
Fixes: #1180.
Tweaked by tabbott to have the field before the invitation is
completed be called invite_as_admins, not invited_as_admins, for
readability.
Fixes#6834.
The tighter interface prevents the need to specify
Recipient.PERSONAL (which can often be inaccurate in the
huddle case, anyway), and it prevents tests from confusingly
specifying a "subject" field for PMs.
Having send_stream_message() avoids the need to supply
Recipient.STREAM as a parameter, and it also uses the more
modern name of `topic_name` for topics. Under the hood, it
avoids some annoying steps for re-formatting the recipients,
since we just have a single stream name.
When possible, we want to use direct APIs for sending
stream messages.
This changes the codepath slightly, by not using
forwarded_user_profile, but it doesn't impact the number
of queries, and it's a simple check.
We also remove a couple "subject" references here.
This change allows normal bots to get UserMessage rows when
they are mentioned on a stream, even if they are not actually
subscribed to the stream.
Fixes#7140.
We now find all (possibly) relevant service bots for a message
in the call to get_recipient_info. This allows us to eliminate
some code that would patch them after we rendered.
The get_service_bot_events() function will ignore any service
bots that weren't actually mentioned in the message (due to
backticks) or part of the active user ids.
We now have a MentionData class that encapsulates
the users who are possibly mentioned in a message.
Not that the rendering code may not keep all the mentions,
since things like backticks will suppress the mention.
We populate this now in do_send_messages, so that we can use
the info earlier in the message-sending process. This info
now gets passed down the call stack as an optional parameter.
Note that bugdown.convert() still populates the data when its
callers decline to pass in a MentionData object.
This is mostly a preparatory commit, as we don't take advantage
of the data yet in do_send_messages.
In do_send_messages, we only produce one dictionary for
the event queues, instead of different flavors for text
vs. html. This prevents two unnecessary queries to the
database.
It also means we only put one dictionary on the "message"
event queue instead of two, albeit a wider one that has
some values that won't be sent to the actual clients.
This wider dictionary from MessageDict.wide_dict is also
used for the `feedback_messages` queue and service bot
queues. Since the extra fields are possibly useful down
the road, and they'll just be ignored for now, we don't
bother to remove them. Also, those queue processors won't
have access to `content_type`, which they shouldn't need.
Fixes#6947
Before this change, we populated two cache entries for each
message that we sent. The entries were largely redundant,
with the only difference being whether we sent the content
as raw markdown or as the rendered HTML.
This commit makes it so we only have one cache entry per
message, and it includes both content and rendered_content.
One legacy source on confusion here is that `content`
changes meaning when you're on the front end. Here is the
situation going forward:
database:
content = raw
rendered_contented = rendered
cache entry:
content = raw
rendered_contented = rendered
payload for the frontend:
content = raw (for apply_markdown=False)
content = rendered (for apply_markdown=True)
Wherever possible, we always want to move checking for error
conditions to the views code, so that we don't need to worry about
handling failures with (in this case) a user that's half-created
because a DefaultStreamGroup doesn't exist.
This effectively implements the feature of default stream groups,
except for a UI, nice styling, etc.
Note that we're careful to not have this do anything in an
organization that doesn't have any default stream groups.
This has exactly the same behavior so long as self.subdomain contains
no colon character, ':'; and of course we don't allow those in
subdomains, because they aren't allowed by DNS.
These are just instances that jumped out at me while working on the
subdomains code, mostly while grepping for get_subdomain call sites.
I haven't attempted a comprehensive search, and there are likely
still others left.
Now that the old `check_subdomain` has no callers except in
implementing the new, improved interface `user_matches_subdomain`,
inline it into that. Also simplify the Boolean logic a bit.
Now that every call site of check_subdomain produces its second
argument in exactly the same way, push that shared bit of logic
into a new wrapper for check_subdomain.
Also give that new function a name that says more specifically what
it's checking -- which I think is easier to articulate for this
interface than for that of check_subdomain.
This should be a pure refactor: the only asymmetry in the behavior
of `check_subdomain` between its two arguments is if one of them
is None, and in this case we have a non-nullable model field on
one side and the return value from `get_subdomain` on the other.
With these swapped, this call site now matches all other
`check_subdomain` call sites in having the second argument come as
the subdomain of some user's realm.
The type of get_subdomain's parameter is non-Optional, and
in fact if passed an argument of None it would promptly
blow up. So this `getattr` can't be serving any purpose.
Adds support to add "Embedded bot" Service objects. This service
handles every embedded bot.
Extracted from "Embedded bots: Add support to add embedded bots from
UI" by Robert Honig.
Tweaked by tabbott to be disabled by default.
This fixes an exception occurring when engaging an embedded
bot in a PM, makes it respond as itself instead of the sender,
and makes it respond to the PM conversation it is engaded in.
Every time we updated a UserProfile object, we were calling
delete_display_recipient_cache(), which churns the cache and
does an extra database hop to find subscriptions. This was
due to saying `updated_fields` instead of `update_fields`.
This made us prone to cache churn for fields like UserProfile.pointer
that are fairly volatile.
Now we use the helper function changed(). To prevent the
opposite problem, we use all the fields that could invalidate
the cache.
This test had a little bug, where we weren't actually
verifying `realm_bots` before, because we weren't using
`field` to look it up.
This commit fixes that bug and adds additional checks,
particularly for the recently added `realm_non_active_users'.
We now add `realm_non_active_users` to the result of
`do_events_register` (and thus `page_params`). It has
the same structure as `realm_users`, but it's for
non-active users. Clients need data on non-active users
when they process old messages that were sent by those
users when they were active. Clients can currently get
most of the data they need in the message events, but it
makes for ugly client code.
Fixes#4322
This is a prepatory commit that adds non-active users to
the realm user cache. It mostly involves name changes and
removing an `is_active` filter from the relevant DB query.
The only consumer of this cache is `get_raw_user_data`, which
now filters on `is_active` in a dictionary comprehension (but
this will get moved around a bit in a subsequent commit).
We make a few things cleaner for populating `realm_users`
in `do_event_register` and `apply_events`:
* We have a `raw_users` intermediate dictionary that
makes event updates O(1) and cleaner to read.
* We extract an `is_me` section for all updates that
apply to the current user.
* For `update` events, we do a more surgical copying
of fields from the event into our dict. This
prevents us from mutating fields in the event,
which was sketchy (at least in test mode). In
particular, this allowed us to remove some ugly
`del` code related to avatars.
* We introduce local vars `was_admin` and `now_admin`.
The cleanup had two test implications:
* We no longer need to normalize `realm_users`, since
`apply_events` now sees `raw_users` instead. Since
`raw_users` is a dict, there is no need to normalize
it, unlike lists with possibly random order.
* We updated the schema for avatar updates to include
the two fields that we used to hackily delete from
an event.
While it's totally fine to put a leading '.' before the cookie domain
for normal hostnames and browsers will just strip them, if you're
using an IP address, it doesn't work, because .127.0.0.1 (for example)
is just invalid, and the cookie won't be set.
This fixes an issue where after installing with an IP address, realm
creation would end with being stuck at a blank page for
/accounts/login/subdomain/.
If an organization doesn't have the EmailAuthBackend (which allows
password auth) enabled, then our password reset form doesn't do
anything, so we should hide it in the UI.
We're going to end up deleting most of this in the next few commits;
the main goal here is to make it easy to code-review whether we're
breaking anything in replacing the built-in Django form's logic.
While our recent changing to hide /register means we don't need a nice
pretty error message here, eventually we'll want to clean up the error
message.
Fixes#7047.
This new function extractions the bit of logic we use after creating a
new user account to log them in and send them to the home page,
without emailing the user about their new login.
This fixes a problem we've seen where LDAP users were not getting this
part of the onboarding process, and a similar problem for human users
created via the API.
Ideally, we would have put these fixes in process_new_human_user, but
that would cause import loop problems.
Now we use 'git ls-files' to get the list of locales that we actually
track. Previously we were using os.listdir to get the contents of the
static/locale directory. This could also return locales which were
present in the directory but are not supported by us, e.g. zh_CN.
Historically, we'd just use the default Django version of this
function. However, since we did the big subdomains migration, it's
now the case that we have to pass in the subdomain to authenticate
(i.e. there's no longer a fallback to just looking up the user by
email).
This fixes a problem with user creation in an LDAP realm, because
previously, the user creation flow would just pass in the username and
password (after validating the subdomain).
This new test solves the problem that when we
made changes to the page-load codepath in the past,
it's been hard to identify what new code caused
more database queries. Now you can see query
counts broken out by event type.
This requires a small, harmless change to extract
an `always_want` function in `lib/events.py`.
Clients fetching messages can now specify that they are able
to compute their avatar, and if they set client_gratavar to
True in the request (w/our normal encoding scheme), then the
backend will not compute it, and the payload will be smaller.
The fix starts with get_messages_backend. The flag gets
passed down through these functions:
* MessageDict.post_process_dicts.
* MessageDict.set_sender_avatar.
We also fix up the callers for post_process_dicts to explicitly
pass in the client_gravatar path, but for now they all just hard
code the value to False.
We have been assigning locale to language code. Mostly code and locale
are same but for languages like zh-Hans, locale is zh_Hans and code is
zh-hans.
After this commit, compilemessages command should be run.
Previously we were using regexes to extract the language from our
locale files. Now we use LANG_INFO data structure provided by Django
to do the same and fallback to PO files only when language code is not
present in the Django data structure.
In the UI we use locale as the code for the language. Django expects
language code. For Simplified Chinese, 'zh_Hans' is the locale which
maps to a directaory under static/locale, and 'zh-hans' is the language
code, which is used in settings.LANGUAGES setting found in Django.
This replaces the former non-functional StateHandler
stub with a dictionary-like state object. Accessing it will
will read and store strings in the BotUserStateData model.
Each bot has a limited state size. To enforce this limit while
keeping data updates efficient, StateHandler caches the expensive
query for getting a bot's total state size. Assignments to a key
then only need to fetch that entry's previous size, if any, and
compare it to the new entry's size.
Some bots have class names that differ from their module name,
e.g. `helloworld.py` vs. `HelloWorld`. Our tests should accept
all of these, as long as a handler class is present.
I think an hour after signup is not the right time to try to get someone to
re-engage with a product.
This also makes the day1 email clearly a transactional email both in
experiencing the product and in the eyes of various anti-spam laws, and
allows us to remove the unsubscribe link.
The rules here are fuzzy, and it's quite possible none of Zulip's emails
need an address at all. Every country has its own rules though, which makes
it hard to tell. In general, transactional emails do not need an address,
and marketing emails do.
This modifies the realm creation form to (1) support a
realm_in_root_domain flag and (2) clearly check whether the root
domain is available inside check_subdomain_available before trying to
create a realm with it; this should avoid IntegrityErrors.
We were doing an unnecessary database query on every user registration
checking the availability of the user's subdomain, when in fact this
is only required for realm creation.
This removes the utterly unnecessary `triggers` dict (which always was
a dict with exactly one value True) in favor of a single field,
'trigger'.
Inspired by Kunal Gupta's work in #6659.
Since a user could use the same installation of the Zulip mobile app
with multiple Zulip servers, correct behavior is to allow reusing the
same token with multiple Zulip servers in the RemotePushDeviceToken
model.
While the missedmessage_hook logic originally did a reasonably good
job of avoiding double-sending notifications, there was a corner case
it didn't handle, namely a user who had been presence-idle when a
message was sent and became also event-queue-idle as well within the
next 10 minutes. For those users, they got a notification at message
send time, and the missedmessage_hook would deliver it a second time.
We fix this by just checking the conveniently available push_notified
and email_notified variables that indicate whether the message already
had a notification triggered.
Fixes#7031.
This should mean that maintaining two Zulip development environments
using the same Git checkout no longer has caching problems keeping
track of the migration status.
Previously, to check whether a logo file existed, we simply took
the static/ URL for the logo and treated it as a file path. This
led to problems when static/* was not the correct parent directory
for our static files (for example, when settings.PRODUCTION = True).
Now, we treat URLs and file paths differently and the logo file
path is constructed by joining settings.STATIC_ROOT and the
relative path to the logo file.
Fixes#7018.
This change makes the cache entries smaller for message
dictionaries. It also ensures we get valid data put into
message dictionaries if, for example, the sender's avatar
changes.
After this change, all of the attributes for a message
sender are only fetched during post-processing with two
exceptions:
* We get sender_id for "free" from the message,
and it's the primary key that we need to figure
out which data to fetch in post-processing.
* We need sender_realm_id to be able to cache topic
links, and a sender's realm id will never change,
so it's not a concern for invalidating cache rows.
All the other attributes are either likely to change (e.g.
sender avatar_version) and/or impact the size of cache
entries more severely than the two small id fields above.
This change should improve our overall system performance
by reducing the amount of memory used by every N message
rows we cache, and typically N will be in the thousands or
so on a large realm.
The other major implication of this change is that when
a user changes their avatar, and then later messages that
the user sent are fetched, all of the fields that go into
computing the avatar url will be pulled from the database,
not from cache.
Message.get_raw_db_rows is moved to MessageDict, since its
implementation details are highly coupled to other methods
in MessageDict.
And then sew_messages_and_reactions comes along for the
ride.
We eventually want to move Reaction.get_raw_db_rows to there
as well.
We now populate the avatar url as part of the post
processing step of building message dictionaries,
so that the avatar url is no longer in cache.
This change makes the cache slimmer, because instead
of caching the avatar url (which often includes a long
hash), we just cache the smaller fields that are used
to compute the url.
Note that this commit still has the problem that we're
essentially computing the avatar url from cached fields
that can be invalid. We will address that a few commits
later.
An immediate benefit of this change is that how we compute
avatar urls (or whether we compute them all) is now decoupled
from caching concerns. We will address this later as
well. (Some clients will be capable of computing their
own gravatar urls, for example.)
We're about to have multiple post-processing stages for building
message dictionaries. Rather than having individual "hydration"
methods remove intermediate values, we just wait until the end.
This decouples the hyrdration steps. The potentional problem
here is that we may have a field like sender_is_mirror_dummy
that isn't part of the final payload, but we need it for
calculating display recipients and avatars. We don't want to
delete it too early from the objects.
This makes tests of queue processors more realistic,
by adding a parameter to `queue_json_publish` that
calls a queue's consumer function if accessed in a test.
Fixes part of #6542.
By default, Django sets up two handlers on this logger, one of them
its AdminEmailHandler. We have our own handler for sending email on
error, and we want to stick to that -- we like the format somewhat
better, and crucially we've given it some rate-limiting through
ZulipLimiter.
Since we cleaned out our logging config in e0a5e6fad, though, we've
been sending error emails through both paths. The config we'd had
before that for `django` was redundant with the config on the root --
but having *a* config there was essential for causing
`logging.config.dictConfig`, when Django passes it our LOGGING dict,
to clear out that logger's previous config. So, give it an empty
config.
Django by default configures two loggers: `django` and
`django.server`. We have our own settings for `django.server`
anyway, so this is the only one we need to add.
The stdlib `logging` and `logging.config` docs aren't 100% clear, and
while the source of `logging` is admirably straightforward the source
of `logging.config` is a little twisty, so it's not easy to become
totally confident that this has the right effect just by reading.
Fortunately we can put some of that source-diving to work in writing
a test for it.
Nobody has used this feature in years, and it causes certain types of
markdown issues in development to completely DoS the development
environment by making it possible for the "Bugdown timeout" exception
handler to timeout in bugdown.
Since we already send an email to the server administrators, there's
no need to replace this feature with anything.
This function is designed to replace avatar_url() and
avatar_url_from_dict() over time.
There are a few things new about it:
* We make the parameters more explicit, rather than
passing in an opaque dictionary or requiring a
UserProfile object. (A lot of our callers want
to use `values()` for efficiency sake, since we
are often doing bulk user operations.)
* We start to support the client_gravatar option.
This works around a bug in Django in handling the error case of a
client sending an inappropriate HTTP `Host:` header. Various
internal Django machinery expects to be able to casually call
`request.get_host()`, which will attempt to parse that header, so an
exception will be raised. The exception-handling machinery attempts
to catch that exception and just turn it into a 400 response... but
in a certain case, that machinery itself ends up trying to call
`request.get_host()`, and we end up with an uncaught exception that
causes a 500 response, a chain of tracebacks in the logs, and an email
to the server admins. See example below.
That `request.get_host` call comes in the midst of some CSRF-related
middleware, which doesn't even serve any function unless you have a
form in your 400 response page that you want CSRF protection for.
We use the default 400 response page, which is a 26-byte static
HTML error message. So, just send that with no further ado.
Example exception from server logs (lightly edited):
2017-10-08 09:51:50.835 ERR [django.security.DisallowedHost] Invalid HTTP_HOST header: 'example.com'. You may need to add 'example.com' to ALLOWED_HOSTS.
2017-10-08 09:51:50.835 ERR [django.request] Internal Server Error: /loginWithSetCookie
Traceback (most recent call last):
File ".../django/core/handlers/exception.py", line 41, in inner
response = get_response(request)
File ".../django/utils/deprecation.py", line 138, in __call__
response = self.process_request(request)
File ".../django/middleware/common.py", line 57, in process_request
host = request.get_host()
File ".../django/http/request.py", line 113, in get_host
raise DisallowedHost(msg)
django.core.exceptions.DisallowedHost: Invalid HTTP_HOST header: 'example.com'. You may need to add 'example.com' to ALLOWED_HOSTS.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".../django/core/handlers/exception.py", line 109, in get_exception_response
response = callback(request, **dict(param_dict, exception=exception))
File ".../django/utils/decorators.py", line 145, in _wrapped_view
result = middleware.process_view(request, view_func, args, kwargs)
File ".../django/middleware/csrf.py", line 276, in process_view
good_referer = request.get_host()
File ".../django/http/request.py", line 113, in get_host
raise DisallowedHost(msg)
django.core.exceptions.DisallowedHost: Invalid HTTP_HOST header: 'example.com'. You may need to add 'example.com' to ALLOWED_HOSTS.
This field would get overwritten with an improper value when
we looped over multiple clients, due to not making full copies
of the message dictionary. This failure would be somewhat
random depending on how clients were ordered in the loop.
The only consumers of this field were the mobile app and the
apply-events-to-unread-counts logic. Both of these will now
use `flags` instead.
The `is_mentioned` flag in message events was buggy. We now
look directly at flags.
We will kill off `is_mentioned` in a subsequent commit.
We also remove some debugging code in the test that was failing
before this fix. The test would only fail when `is_mentioned`
was wrong, which never happened when you ran a single test, and
which would happen randomly when you ran multiple tests.
Add this field to the Stream model will prevent us from having
to look at realm data for several types of stream operations, which
can be prone to either doing extra database lookups or making
our cached data bloated.
Going forward, we'll set stream.is_zephyr to True whenever the
realm's string id is "zephyr".
This removes sender names from the message cache, since
they aren't guaranteed to be valid, and they're inexpensive
to add.
This commit will make the message cache entries smaller
by removing sender___full_name and sender__short_name
fields.
Then we add in the sender fields to the message payloads
by doing a query against the unique sender ids of the
messages we are processing.
This change leads to 2 extra database hops for most of
our message-related codepaths. The reason there are 2 hops
instead of 1 is that we basically re-calculate way too
much data to get a no-markdown dictionary.
Introduce MessageDict.post_process_dicts() will allow us
the ability to do the following:
* use less memory in the cache for repeated data
* prevent cache invalidation
* format data according to different client needs
The first use of this function is pretty inconsequential, but
it sets us up for more consequential changes.
In this commit we defer the MessageDict.hydrate_recipient_info
step until after we pull data out of the cache. This impacts
cache size as follows:
* streams - negligibly bigger
* PMs/huddles - slimmer due to not needing to repeat
sender data like email/full_name
Again, the main point of this change is to start setting up
the infrastructure to do post-processing.
This is a first step to eventually slimming the message cache,
but there are still some moving parts there to be worked through.
The more immediate benefit of extracting this function is that
we can put tests on it. Also, it isolates some functionality
that may go away as our clients gets smarter.
This endpoint is about to become an API-style route and have the legacy
decorator removed from its view. The json/fetch_api_key endpoint will be
used in tests instead of it.
We now use a `.values` query to get just the fields we need
in order to fulfill '/json/users' requests.
The main benefit is that we don't do O(N) queries for bot
owners, but we also have less data on UserProfile to process.
On receiving a request for deleting a reaction, just check if such
a reaction exists or not. If it exists then just delete the reaction
otherwise send an error message that such a reaction doesn't exist.
It doesn't make sense to check whether an emoji name is valid or not.
This commit prepares us to introduce a StreamLite class. For
these tests, we don't care about the actual contents of the
Stream, just the right stream is there.
Since subscribed_to_stream is only doing an id lookup
on the Stream model to find out if a user is subscribed to
a stream, there's no reason to require a full Stream object.
It's currently the case that all callers do have full Stream
objects handy to pass in to this function, but it's still a
good practice to have functions only ask for objects that they
need.
The original "quality score" was invented purely for populating
our password-strength progress bar, and isn't expressed in terms
that are particularly meaningful. For configuration and the core
accept/reject logic, it's better to use units that are readily
understood. Switch to those.
I considered using "bits of entropy", defined loosely as the log
of this number, but both the zxcvbn paper and the linked CACM
article (which I recommend!) are written in terms of the number
of guesses. And reading (most of) those two papers made me
less happy about referring to "entropy" in our terminology.
I already knew that notion was a little fuzzy if looked at
too closely, and I gained a better appreciation of how it's
contributed to confusion in discussing password policies and
to adoption of perverse policies that favor "Password1!" over
"derived unusual ravioli raft". So, "guesses" it is.
And although the log is handy for some analysis purposes
(certainly for a graph like those in the zxcvbn paper), it adds
a layer of abstraction, and I think makes it harder to think
clearly about attacks, especially in the online setting. So
just use the actual number, and if someone wants to set a
gigantic value, they will have the pleasure of seeing just
how many digits are involved.
(Thanks to @YJDave for a prototype that the code changes in this
commit are based on.)
We now return user_ids for subscribers to streams in add-stream
events. This allows us to eliminate the UserLite class for
both bulk adds and bulk removes. It also simplifies some JS
code that already wanted to use user_ids, not emails.
Fixes#6898