This modifies the realm creation form to (1) support a
realm_in_root_domain flag and (2) clearly check whether the root
domain is available inside check_subdomain_available before trying to
create a realm with it; this should avoid IntegrityErrors.
We were doing an unnecessary database query on every user registration
checking the availability of the user's subdomain, when in fact this
is only required for realm creation.
This removes the utterly unnecessary `triggers` dict (which always was
a dict with exactly one value True) in favor of a single field,
'trigger'.
Inspired by Kunal Gupta's work in #6659.
While the missedmessage_hook logic originally did a reasonably good
job of avoiding double-sending notifications, there was a corner case
it didn't handle, namely a user who had been presence-idle when a
message was sent and became also event-queue-idle as well within the
next 10 minutes. For those users, they got a notification at message
send time, and the missedmessage_hook would deliver it a second time.
We fix this by just checking the conveniently available push_notified
and email_notified variables that indicate whether the message already
had a notification triggered.
Fixes#7031.
Message.get_raw_db_rows is moved to MessageDict, since its
implementation details are highly coupled to other methods
in MessageDict.
And then sew_messages_and_reactions comes along for the
ride.
We eventually want to move Reaction.get_raw_db_rows to there
as well.
We're about to have multiple post-processing stages for building
message dictionaries. Rather than having individual "hydration"
methods remove intermediate values, we just wait until the end.
This decouples the hyrdration steps. The potentional problem
here is that we may have a field like sender_is_mirror_dummy
that isn't part of the final payload, but we need it for
calculating display recipients and avatars. We don't want to
delete it too early from the objects.
By default, Django sets up two handlers on this logger, one of them
its AdminEmailHandler. We have our own handler for sending email on
error, and we want to stick to that -- we like the format somewhat
better, and crucially we've given it some rate-limiting through
ZulipLimiter.
Since we cleaned out our logging config in e0a5e6fad, though, we've
been sending error emails through both paths. The config we'd had
before that for `django` was redundant with the config on the root --
but having *a* config there was essential for causing
`logging.config.dictConfig`, when Django passes it our LOGGING dict,
to clear out that logger's previous config. So, give it an empty
config.
Django by default configures two loggers: `django` and
`django.server`. We have our own settings for `django.server`
anyway, so this is the only one we need to add.
The stdlib `logging` and `logging.config` docs aren't 100% clear, and
while the source of `logging` is admirably straightforward the source
of `logging.config` is a little twisty, so it's not easy to become
totally confident that this has the right effect just by reading.
Fortunately we can put some of that source-diving to work in writing
a test for it.
This function is designed to replace avatar_url() and
avatar_url_from_dict() over time.
There are a few things new about it:
* We make the parameters more explicit, rather than
passing in an opaque dictionary or requiring a
UserProfile object. (A lot of our callers want
to use `values()` for efficiency sake, since we
are often doing bulk user operations.)
* We start to support the client_gravatar option.
This works around a bug in Django in handling the error case of a
client sending an inappropriate HTTP `Host:` header. Various
internal Django machinery expects to be able to casually call
`request.get_host()`, which will attempt to parse that header, so an
exception will be raised. The exception-handling machinery attempts
to catch that exception and just turn it into a 400 response... but
in a certain case, that machinery itself ends up trying to call
`request.get_host()`, and we end up with an uncaught exception that
causes a 500 response, a chain of tracebacks in the logs, and an email
to the server admins. See example below.
That `request.get_host` call comes in the midst of some CSRF-related
middleware, which doesn't even serve any function unless you have a
form in your 400 response page that you want CSRF protection for.
We use the default 400 response page, which is a 26-byte static
HTML error message. So, just send that with no further ado.
Example exception from server logs (lightly edited):
2017-10-08 09:51:50.835 ERR [django.security.DisallowedHost] Invalid HTTP_HOST header: 'example.com'. You may need to add 'example.com' to ALLOWED_HOSTS.
2017-10-08 09:51:50.835 ERR [django.request] Internal Server Error: /loginWithSetCookie
Traceback (most recent call last):
File ".../django/core/handlers/exception.py", line 41, in inner
response = get_response(request)
File ".../django/utils/deprecation.py", line 138, in __call__
response = self.process_request(request)
File ".../django/middleware/common.py", line 57, in process_request
host = request.get_host()
File ".../django/http/request.py", line 113, in get_host
raise DisallowedHost(msg)
django.core.exceptions.DisallowedHost: Invalid HTTP_HOST header: 'example.com'. You may need to add 'example.com' to ALLOWED_HOSTS.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".../django/core/handlers/exception.py", line 109, in get_exception_response
response = callback(request, **dict(param_dict, exception=exception))
File ".../django/utils/decorators.py", line 145, in _wrapped_view
result = middleware.process_view(request, view_func, args, kwargs)
File ".../django/middleware/csrf.py", line 276, in process_view
good_referer = request.get_host()
File ".../django/http/request.py", line 113, in get_host
raise DisallowedHost(msg)
django.core.exceptions.DisallowedHost: Invalid HTTP_HOST header: 'example.com'. You may need to add 'example.com' to ALLOWED_HOSTS.
This field would get overwritten with an improper value when
we looped over multiple clients, due to not making full copies
of the message dictionary. This failure would be somewhat
random depending on how clients were ordered in the loop.
The only consumers of this field were the mobile app and the
apply-events-to-unread-counts logic. Both of these will now
use `flags` instead.
The `is_mentioned` flag in message events was buggy. We now
look directly at flags.
We will kill off `is_mentioned` in a subsequent commit.
We also remove some debugging code in the test that was failing
before this fix. The test would only fail when `is_mentioned`
was wrong, which never happened when you ran a single test, and
which would happen randomly when you ran multiple tests.
Add this field to the Stream model will prevent us from having
to look at realm data for several types of stream operations, which
can be prone to either doing extra database lookups or making
our cached data bloated.
Going forward, we'll set stream.is_zephyr to True whenever the
realm's string id is "zephyr".
This removes sender names from the message cache, since
they aren't guaranteed to be valid, and they're inexpensive
to add.
This commit will make the message cache entries smaller
by removing sender___full_name and sender__short_name
fields.
Then we add in the sender fields to the message payloads
by doing a query against the unique sender ids of the
messages we are processing.
This change leads to 2 extra database hops for most of
our message-related codepaths. The reason there are 2 hops
instead of 1 is that we basically re-calculate way too
much data to get a no-markdown dictionary.
Introduce MessageDict.post_process_dicts() will allow us
the ability to do the following:
* use less memory in the cache for repeated data
* prevent cache invalidation
* format data according to different client needs
The first use of this function is pretty inconsequential, but
it sets us up for more consequential changes.
In this commit we defer the MessageDict.hydrate_recipient_info
step until after we pull data out of the cache. This impacts
cache size as follows:
* streams - negligibly bigger
* PMs/huddles - slimmer due to not needing to repeat
sender data like email/full_name
Again, the main point of this change is to start setting up
the infrastructure to do post-processing.
This is a first step to eventually slimming the message cache,
but there are still some moving parts there to be worked through.
The more immediate benefit of extracting this function is that
we can put tests on it. Also, it isolates some functionality
that may go away as our clients gets smarter.
This endpoint is about to become an API-style route and have the legacy
decorator removed from its view. The json/fetch_api_key endpoint will be
used in tests instead of it.
We now use a `.values` query to get just the fields we need
in order to fulfill '/json/users' requests.
The main benefit is that we don't do O(N) queries for bot
owners, but we also have less data on UserProfile to process.
On receiving a request for deleting a reaction, just check if such
a reaction exists or not. If it exists then just delete the reaction
otherwise send an error message that such a reaction doesn't exist.
It doesn't make sense to check whether an emoji name is valid or not.
This commit prepares us to introduce a StreamLite class. For
these tests, we don't care about the actual contents of the
Stream, just the right stream is there.
The original "quality score" was invented purely for populating
our password-strength progress bar, and isn't expressed in terms
that are particularly meaningful. For configuration and the core
accept/reject logic, it's better to use units that are readily
understood. Switch to those.
I considered using "bits of entropy", defined loosely as the log
of this number, but both the zxcvbn paper and the linked CACM
article (which I recommend!) are written in terms of the number
of guesses. And reading (most of) those two papers made me
less happy about referring to "entropy" in our terminology.
I already knew that notion was a little fuzzy if looked at
too closely, and I gained a better appreciation of how it's
contributed to confusion in discussing password policies and
to adoption of perverse policies that favor "Password1!" over
"derived unusual ravioli raft". So, "guesses" it is.
And although the log is handy for some analysis purposes
(certainly for a graph like those in the zxcvbn paper), it adds
a layer of abstraction, and I think makes it harder to think
clearly about attacks, especially in the online setting. So
just use the actual number, and if someone wants to set a
gigantic value, they will have the pleasure of seeing just
how many digits are involved.
(Thanks to @YJDave for a prototype that the code changes in this
commit are based on.)
We now return user_ids for subscribers to streams in add-stream
events. This allows us to eliminate the UserLite class for
both bulk adds and bulk removes. It also simplifies some JS
code that already wanted to use user_ids, not emails.
Fixes#6898