Wherever possible, we always want to move checking for error
conditions to the views code, so that we don't need to worry about
handling failures with (in this case) a user that's half-created
because a DefaultStreamGroup doesn't exist.
This effectively implements the feature of default stream groups,
except for a UI, nice styling, etc.
Note that we're careful to not have this do anything in an
organization that doesn't have any default stream groups.
This has exactly the same behavior so long as self.subdomain contains
no colon character, ':'; and of course we don't allow those in
subdomains, because they aren't allowed by DNS.
These are just instances that jumped out at me while working on the
subdomains code, mostly while grepping for get_subdomain call sites.
I haven't attempted a comprehensive search, and there are likely
still others left.
Now that the old `check_subdomain` has no callers except in
implementing the new, improved interface `user_matches_subdomain`,
inline it into that. Also simplify the Boolean logic a bit.
Now that every call site of check_subdomain produces its second
argument in exactly the same way, push that shared bit of logic
into a new wrapper for check_subdomain.
Also give that new function a name that says more specifically what
it's checking -- which I think is easier to articulate for this
interface than for that of check_subdomain.
This should be a pure refactor: the only asymmetry in the behavior
of `check_subdomain` between its two arguments is if one of them
is None, and in this case we have a non-nullable model field on
one side and the return value from `get_subdomain` on the other.
With these swapped, this call site now matches all other
`check_subdomain` call sites in having the second argument come as
the subdomain of some user's realm.
The type of get_subdomain's parameter is non-Optional, and
in fact if passed an argument of None it would promptly
blow up. So this `getattr` can't be serving any purpose.
Adds support to add "Embedded bot" Service objects. This service
handles every embedded bot.
Extracted from "Embedded bots: Add support to add embedded bots from
UI" by Robert Honig.
Tweaked by tabbott to be disabled by default.
This fixes an exception occurring when engaging an embedded
bot in a PM, makes it respond as itself instead of the sender,
and makes it respond to the PM conversation it is engaded in.
Every time we updated a UserProfile object, we were calling
delete_display_recipient_cache(), which churns the cache and
does an extra database hop to find subscriptions. This was
due to saying `updated_fields` instead of `update_fields`.
This made us prone to cache churn for fields like UserProfile.pointer
that are fairly volatile.
Now we use the helper function changed(). To prevent the
opposite problem, we use all the fields that could invalidate
the cache.
This test had a little bug, where we weren't actually
verifying `realm_bots` before, because we weren't using
`field` to look it up.
This commit fixes that bug and adds additional checks,
particularly for the recently added `realm_non_active_users'.
We now add `realm_non_active_users` to the result of
`do_events_register` (and thus `page_params`). It has
the same structure as `realm_users`, but it's for
non-active users. Clients need data on non-active users
when they process old messages that were sent by those
users when they were active. Clients can currently get
most of the data they need in the message events, but it
makes for ugly client code.
Fixes#4322
This is a prepatory commit that adds non-active users to
the realm user cache. It mostly involves name changes and
removing an `is_active` filter from the relevant DB query.
The only consumer of this cache is `get_raw_user_data`, which
now filters on `is_active` in a dictionary comprehension (but
this will get moved around a bit in a subsequent commit).
We make a few things cleaner for populating `realm_users`
in `do_event_register` and `apply_events`:
* We have a `raw_users` intermediate dictionary that
makes event updates O(1) and cleaner to read.
* We extract an `is_me` section for all updates that
apply to the current user.
* For `update` events, we do a more surgical copying
of fields from the event into our dict. This
prevents us from mutating fields in the event,
which was sketchy (at least in test mode). In
particular, this allowed us to remove some ugly
`del` code related to avatars.
* We introduce local vars `was_admin` and `now_admin`.
The cleanup had two test implications:
* We no longer need to normalize `realm_users`, since
`apply_events` now sees `raw_users` instead. Since
`raw_users` is a dict, there is no need to normalize
it, unlike lists with possibly random order.
* We updated the schema for avatar updates to include
the two fields that we used to hackily delete from
an event.
While it's totally fine to put a leading '.' before the cookie domain
for normal hostnames and browsers will just strip them, if you're
using an IP address, it doesn't work, because .127.0.0.1 (for example)
is just invalid, and the cookie won't be set.
This fixes an issue where after installing with an IP address, realm
creation would end with being stuck at a blank page for
/accounts/login/subdomain/.
If an organization doesn't have the EmailAuthBackend (which allows
password auth) enabled, then our password reset form doesn't do
anything, so we should hide it in the UI.
We're going to end up deleting most of this in the next few commits;
the main goal here is to make it easy to code-review whether we're
breaking anything in replacing the built-in Django form's logic.
While our recent changing to hide /register means we don't need a nice
pretty error message here, eventually we'll want to clean up the error
message.
Fixes#7047.
This new function extractions the bit of logic we use after creating a
new user account to log them in and send them to the home page,
without emailing the user about their new login.
This fixes a problem we've seen where LDAP users were not getting this
part of the onboarding process, and a similar problem for human users
created via the API.
Ideally, we would have put these fixes in process_new_human_user, but
that would cause import loop problems.
Now we use 'git ls-files' to get the list of locales that we actually
track. Previously we were using os.listdir to get the contents of the
static/locale directory. This could also return locales which were
present in the directory but are not supported by us, e.g. zh_CN.
Historically, we'd just use the default Django version of this
function. However, since we did the big subdomains migration, it's
now the case that we have to pass in the subdomain to authenticate
(i.e. there's no longer a fallback to just looking up the user by
email).
This fixes a problem with user creation in an LDAP realm, because
previously, the user creation flow would just pass in the username and
password (after validating the subdomain).
This new test solves the problem that when we
made changes to the page-load codepath in the past,
it's been hard to identify what new code caused
more database queries. Now you can see query
counts broken out by event type.
This requires a small, harmless change to extract
an `always_want` function in `lib/events.py`.
Clients fetching messages can now specify that they are able
to compute their avatar, and if they set client_gratavar to
True in the request (w/our normal encoding scheme), then the
backend will not compute it, and the payload will be smaller.
The fix starts with get_messages_backend. The flag gets
passed down through these functions:
* MessageDict.post_process_dicts.
* MessageDict.set_sender_avatar.
We also fix up the callers for post_process_dicts to explicitly
pass in the client_gravatar path, but for now they all just hard
code the value to False.
We have been assigning locale to language code. Mostly code and locale
are same but for languages like zh-Hans, locale is zh_Hans and code is
zh-hans.
After this commit, compilemessages command should be run.
Previously we were using regexes to extract the language from our
locale files. Now we use LANG_INFO data structure provided by Django
to do the same and fallback to PO files only when language code is not
present in the Django data structure.
In the UI we use locale as the code for the language. Django expects
language code. For Simplified Chinese, 'zh_Hans' is the locale which
maps to a directaory under static/locale, and 'zh-hans' is the language
code, which is used in settings.LANGUAGES setting found in Django.
This replaces the former non-functional StateHandler
stub with a dictionary-like state object. Accessing it will
will read and store strings in the BotUserStateData model.
Each bot has a limited state size. To enforce this limit while
keeping data updates efficient, StateHandler caches the expensive
query for getting a bot's total state size. Assignments to a key
then only need to fetch that entry's previous size, if any, and
compare it to the new entry's size.
Some bots have class names that differ from their module name,
e.g. `helloworld.py` vs. `HelloWorld`. Our tests should accept
all of these, as long as a handler class is present.
I think an hour after signup is not the right time to try to get someone to
re-engage with a product.
This also makes the day1 email clearly a transactional email both in
experiencing the product and in the eyes of various anti-spam laws, and
allows us to remove the unsubscribe link.
The rules here are fuzzy, and it's quite possible none of Zulip's emails
need an address at all. Every country has its own rules though, which makes
it hard to tell. In general, transactional emails do not need an address,
and marketing emails do.
This modifies the realm creation form to (1) support a
realm_in_root_domain flag and (2) clearly check whether the root
domain is available inside check_subdomain_available before trying to
create a realm with it; this should avoid IntegrityErrors.
We were doing an unnecessary database query on every user registration
checking the availability of the user's subdomain, when in fact this
is only required for realm creation.
This removes the utterly unnecessary `triggers` dict (which always was
a dict with exactly one value True) in favor of a single field,
'trigger'.
Inspired by Kunal Gupta's work in #6659.
Since a user could use the same installation of the Zulip mobile app
with multiple Zulip servers, correct behavior is to allow reusing the
same token with multiple Zulip servers in the RemotePushDeviceToken
model.
While the missedmessage_hook logic originally did a reasonably good
job of avoiding double-sending notifications, there was a corner case
it didn't handle, namely a user who had been presence-idle when a
message was sent and became also event-queue-idle as well within the
next 10 minutes. For those users, they got a notification at message
send time, and the missedmessage_hook would deliver it a second time.
We fix this by just checking the conveniently available push_notified
and email_notified variables that indicate whether the message already
had a notification triggered.
Fixes#7031.
This should mean that maintaining two Zulip development environments
using the same Git checkout no longer has caching problems keeping
track of the migration status.
Previously, to check whether a logo file existed, we simply took
the static/ URL for the logo and treated it as a file path. This
led to problems when static/* was not the correct parent directory
for our static files (for example, when settings.PRODUCTION = True).
Now, we treat URLs and file paths differently and the logo file
path is constructed by joining settings.STATIC_ROOT and the
relative path to the logo file.
Fixes#7018.
This change makes the cache entries smaller for message
dictionaries. It also ensures we get valid data put into
message dictionaries if, for example, the sender's avatar
changes.
After this change, all of the attributes for a message
sender are only fetched during post-processing with two
exceptions:
* We get sender_id for "free" from the message,
and it's the primary key that we need to figure
out which data to fetch in post-processing.
* We need sender_realm_id to be able to cache topic
links, and a sender's realm id will never change,
so it's not a concern for invalidating cache rows.
All the other attributes are either likely to change (e.g.
sender avatar_version) and/or impact the size of cache
entries more severely than the two small id fields above.
This change should improve our overall system performance
by reducing the amount of memory used by every N message
rows we cache, and typically N will be in the thousands or
so on a large realm.
The other major implication of this change is that when
a user changes their avatar, and then later messages that
the user sent are fetched, all of the fields that go into
computing the avatar url will be pulled from the database,
not from cache.
Message.get_raw_db_rows is moved to MessageDict, since its
implementation details are highly coupled to other methods
in MessageDict.
And then sew_messages_and_reactions comes along for the
ride.
We eventually want to move Reaction.get_raw_db_rows to there
as well.
We now populate the avatar url as part of the post
processing step of building message dictionaries,
so that the avatar url is no longer in cache.
This change makes the cache slimmer, because instead
of caching the avatar url (which often includes a long
hash), we just cache the smaller fields that are used
to compute the url.
Note that this commit still has the problem that we're
essentially computing the avatar url from cached fields
that can be invalid. We will address that a few commits
later.
An immediate benefit of this change is that how we compute
avatar urls (or whether we compute them all) is now decoupled
from caching concerns. We will address this later as
well. (Some clients will be capable of computing their
own gravatar urls, for example.)
We're about to have multiple post-processing stages for building
message dictionaries. Rather than having individual "hydration"
methods remove intermediate values, we just wait until the end.
This decouples the hyrdration steps. The potentional problem
here is that we may have a field like sender_is_mirror_dummy
that isn't part of the final payload, but we need it for
calculating display recipients and avatars. We don't want to
delete it too early from the objects.
This makes tests of queue processors more realistic,
by adding a parameter to `queue_json_publish` that
calls a queue's consumer function if accessed in a test.
Fixes part of #6542.
By default, Django sets up two handlers on this logger, one of them
its AdminEmailHandler. We have our own handler for sending email on
error, and we want to stick to that -- we like the format somewhat
better, and crucially we've given it some rate-limiting through
ZulipLimiter.
Since we cleaned out our logging config in e0a5e6fad, though, we've
been sending error emails through both paths. The config we'd had
before that for `django` was redundant with the config on the root --
but having *a* config there was essential for causing
`logging.config.dictConfig`, when Django passes it our LOGGING dict,
to clear out that logger's previous config. So, give it an empty
config.
Django by default configures two loggers: `django` and
`django.server`. We have our own settings for `django.server`
anyway, so this is the only one we need to add.
The stdlib `logging` and `logging.config` docs aren't 100% clear, and
while the source of `logging` is admirably straightforward the source
of `logging.config` is a little twisty, so it's not easy to become
totally confident that this has the right effect just by reading.
Fortunately we can put some of that source-diving to work in writing
a test for it.
Nobody has used this feature in years, and it causes certain types of
markdown issues in development to completely DoS the development
environment by making it possible for the "Bugdown timeout" exception
handler to timeout in bugdown.
Since we already send an email to the server administrators, there's
no need to replace this feature with anything.
This function is designed to replace avatar_url() and
avatar_url_from_dict() over time.
There are a few things new about it:
* We make the parameters more explicit, rather than
passing in an opaque dictionary or requiring a
UserProfile object. (A lot of our callers want
to use `values()` for efficiency sake, since we
are often doing bulk user operations.)
* We start to support the client_gravatar option.
This works around a bug in Django in handling the error case of a
client sending an inappropriate HTTP `Host:` header. Various
internal Django machinery expects to be able to casually call
`request.get_host()`, which will attempt to parse that header, so an
exception will be raised. The exception-handling machinery attempts
to catch that exception and just turn it into a 400 response... but
in a certain case, that machinery itself ends up trying to call
`request.get_host()`, and we end up with an uncaught exception that
causes a 500 response, a chain of tracebacks in the logs, and an email
to the server admins. See example below.
That `request.get_host` call comes in the midst of some CSRF-related
middleware, which doesn't even serve any function unless you have a
form in your 400 response page that you want CSRF protection for.
We use the default 400 response page, which is a 26-byte static
HTML error message. So, just send that with no further ado.
Example exception from server logs (lightly edited):
2017-10-08 09:51:50.835 ERR [django.security.DisallowedHost] Invalid HTTP_HOST header: 'example.com'. You may need to add 'example.com' to ALLOWED_HOSTS.
2017-10-08 09:51:50.835 ERR [django.request] Internal Server Error: /loginWithSetCookie
Traceback (most recent call last):
File ".../django/core/handlers/exception.py", line 41, in inner
response = get_response(request)
File ".../django/utils/deprecation.py", line 138, in __call__
response = self.process_request(request)
File ".../django/middleware/common.py", line 57, in process_request
host = request.get_host()
File ".../django/http/request.py", line 113, in get_host
raise DisallowedHost(msg)
django.core.exceptions.DisallowedHost: Invalid HTTP_HOST header: 'example.com'. You may need to add 'example.com' to ALLOWED_HOSTS.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".../django/core/handlers/exception.py", line 109, in get_exception_response
response = callback(request, **dict(param_dict, exception=exception))
File ".../django/utils/decorators.py", line 145, in _wrapped_view
result = middleware.process_view(request, view_func, args, kwargs)
File ".../django/middleware/csrf.py", line 276, in process_view
good_referer = request.get_host()
File ".../django/http/request.py", line 113, in get_host
raise DisallowedHost(msg)
django.core.exceptions.DisallowedHost: Invalid HTTP_HOST header: 'example.com'. You may need to add 'example.com' to ALLOWED_HOSTS.
This field would get overwritten with an improper value when
we looped over multiple clients, due to not making full copies
of the message dictionary. This failure would be somewhat
random depending on how clients were ordered in the loop.
The only consumers of this field were the mobile app and the
apply-events-to-unread-counts logic. Both of these will now
use `flags` instead.
The `is_mentioned` flag in message events was buggy. We now
look directly at flags.
We will kill off `is_mentioned` in a subsequent commit.
We also remove some debugging code in the test that was failing
before this fix. The test would only fail when `is_mentioned`
was wrong, which never happened when you ran a single test, and
which would happen randomly when you ran multiple tests.
Add this field to the Stream model will prevent us from having
to look at realm data for several types of stream operations, which
can be prone to either doing extra database lookups or making
our cached data bloated.
Going forward, we'll set stream.is_zephyr to True whenever the
realm's string id is "zephyr".
This removes sender names from the message cache, since
they aren't guaranteed to be valid, and they're inexpensive
to add.
This commit will make the message cache entries smaller
by removing sender___full_name and sender__short_name
fields.
Then we add in the sender fields to the message payloads
by doing a query against the unique sender ids of the
messages we are processing.
This change leads to 2 extra database hops for most of
our message-related codepaths. The reason there are 2 hops
instead of 1 is that we basically re-calculate way too
much data to get a no-markdown dictionary.
Introduce MessageDict.post_process_dicts() will allow us
the ability to do the following:
* use less memory in the cache for repeated data
* prevent cache invalidation
* format data according to different client needs
The first use of this function is pretty inconsequential, but
it sets us up for more consequential changes.
In this commit we defer the MessageDict.hydrate_recipient_info
step until after we pull data out of the cache. This impacts
cache size as follows:
* streams - negligibly bigger
* PMs/huddles - slimmer due to not needing to repeat
sender data like email/full_name
Again, the main point of this change is to start setting up
the infrastructure to do post-processing.
This is a first step to eventually slimming the message cache,
but there are still some moving parts there to be worked through.
The more immediate benefit of extracting this function is that
we can put tests on it. Also, it isolates some functionality
that may go away as our clients gets smarter.
This endpoint is about to become an API-style route and have the legacy
decorator removed from its view. The json/fetch_api_key endpoint will be
used in tests instead of it.
We now use a `.values` query to get just the fields we need
in order to fulfill '/json/users' requests.
The main benefit is that we don't do O(N) queries for bot
owners, but we also have less data on UserProfile to process.
On receiving a request for deleting a reaction, just check if such
a reaction exists or not. If it exists then just delete the reaction
otherwise send an error message that such a reaction doesn't exist.
It doesn't make sense to check whether an emoji name is valid or not.
This commit prepares us to introduce a StreamLite class. For
these tests, we don't care about the actual contents of the
Stream, just the right stream is there.
Since subscribed_to_stream is only doing an id lookup
on the Stream model to find out if a user is subscribed to
a stream, there's no reason to require a full Stream object.
It's currently the case that all callers do have full Stream
objects handy to pass in to this function, but it's still a
good practice to have functions only ask for objects that they
need.
The original "quality score" was invented purely for populating
our password-strength progress bar, and isn't expressed in terms
that are particularly meaningful. For configuration and the core
accept/reject logic, it's better to use units that are readily
understood. Switch to those.
I considered using "bits of entropy", defined loosely as the log
of this number, but both the zxcvbn paper and the linked CACM
article (which I recommend!) are written in terms of the number
of guesses. And reading (most of) those two papers made me
less happy about referring to "entropy" in our terminology.
I already knew that notion was a little fuzzy if looked at
too closely, and I gained a better appreciation of how it's
contributed to confusion in discussing password policies and
to adoption of perverse policies that favor "Password1!" over
"derived unusual ravioli raft". So, "guesses" it is.
And although the log is handy for some analysis purposes
(certainly for a graph like those in the zxcvbn paper), it adds
a layer of abstraction, and I think makes it harder to think
clearly about attacks, especially in the online setting. So
just use the actual number, and if someone wants to set a
gigantic value, they will have the pleasure of seeing just
how many digits are involved.
(Thanks to @YJDave for a prototype that the code changes in this
commit are based on.)
We now return user_ids for subscribers to streams in add-stream
events. This allows us to eliminate the UserLite class for
both bulk adds and bulk removes. It also simplifies some JS
code that already wanted to use user_ids, not emails.
Fixes#6898
This test suite works by using the expected_output and new text_output
fields in the bugdown test cases to verify that each syntax is
correctly translated by this new function.
Some of these translations, like strikethrough, are kinda poor; but
this framework should make it easy to iterate on the formatting.
Fixes: #6720.
This function truncates the textual content at correct length.
(It will be updated later to handle corner cases of unicode
combining characters and tags when we start supporting them.)