We now allow users to change email address visibility setting
on the "Terms of service" page during first login. This page is
not shown for users creating account using normal registration
process, but is useful for imported users and users created
through API, LDAP, SCIM and management commands.
We now set tos_version to "-1" for imported users and the ones
created using API or using other methods like LDAP, SCIM and
management commands. This value will help us to allow users to
change email address visibility setting during first login.
Earlier when a user who is not allowed to add subscribers to a
stream because of realm level setting "Who can add users to streams"
is subscribing other users while creating a new stream than new stream
was created but no one is subscribed to stream.
To fix this issue this commit makes changes in the API used
for adding subscriptions. Now stream will be created only when user
has permissions to add other users.
With a rewrite of the test by Tim Abbott.
The immediate application of this will be for SAML SP-initiated logout,
where information about which IdP was used for authenticating the
session needs to be accessed. Aside of that, this seems like generally
valuable session information to keep that other features may benefit
from in the future.
This is nicer that .pop()ing specified keys - e.g. we no longer will
have to update this chunk of code whenever adding a new key to
ExternalAuthDataDict.
We now allow users to invite without specifying any stream to join.
In such cases, the user would join the default streams, if any, during
the process of account creation after accepting the invite.
It is also fine if there are no default streams and user isn't
subscribed to any stream initially.
For scheduled stream messages, we already limited the `to`
parameter to be the stream ID, but here we return a JsonableError
in the case of a ValueError when the passed value is not an integer.
For scheduled direct messages, we limit the list for the `to`
parameter to be user IDs. Previously, we accepted emails like
we do when sending messages.
In commit fc58c35c0, we added a check in various emails for the
settings.CORPORATE_ENABLED value, but that context is only always
included for views/templates with a request.
Here we add that to common_context, which is often used when there
is not a request (like with emails). And we manually add it to the
email context in various cases when there is not a user account to
call with common_context: new user invitations, registration emails,
and realm reactivation emails.
This will help us remove scheduled message and reminder logic
from `/messages` code path.
Removes `deliver_at`/`defer_until` and `tz_guess` parameters. And
adds the `scheduled_delivery_timestamp` instead. Also updates the
scheduled message dicts to return `scheduled_delivery_timestamp`.
Also, revises some text in `/delete-scheduled-message` endpoint
and in the `ScheduledMessage` schema in the API documentation.
Previously, entering an organization via 'accounts/go' with the
web-public stream enabled took the user to the web-public view
even if the user was not logged in.
Now, a user is always redirected to the 'login_page' with
the next parameter, if present.
The 'login_page' view is updated to redirect an authenticated
user based on the 'next' parameter instead of always redirecting
to 'realm.uri'.
Fixes#23344.
The "Resend" link for realm creation was not working correctly
because it is implemented by basically submiting the registration
form again which results in resending the email but all the
required parameters were not passed to the form after recent
changes in the realm creation flow.
This commit fixes it by passing all the required parameters -
email, realm name, realm type and realm subdomain, when submitting
form again by clicking on the "resend" link.
Fixes#25249.
This commit adds ORG_TYPE_IDS constant field to Realm class
such that it can be used when we want to validate the org_type
passed in request. This was previously defined in realm.py, but
we move it inside Realm class such that we can use it at other
places as well.
In #23380 we want to change all occurrences of `uri` with `url`.
This commit changes the occurrences in a context key `api_uri_context`
and a function name `add_api_uri_context`.
In #23380 we want to change all occurrences of `uri` with `url`.
This commit changes the names of two variables `external_uri_scheme`
and `main_site_uri`, who are constructed using `settings` constants.
This swaps out url_format_string from all of our APIs and replaces it
with url_template. Note that the documentation changes in the following
commits will be squashed with this commit.
We change the "url_format" key to "url_template" for the
realm_linkifiers events in event_schema, along with updating
LinkifierDict. "url_template" is the name chosen to normalize
mixed usages of "url_format_string" and "url_format" throughout
the backend.
The markdown processor is updated to stop handling the format string
interpolation and delegate the task template expansion to the uri_template
library instead.
This change affects many test cases. We mostly just replace "%(name)s"
with "{name}", "url_format_string" with "url_template" to make sure that
they still pass. There are some test cases dedicated for testing "%"
escaping, which aren't relevant anymore and are subject to removal.
But for now we keep most of them as-is, and make sure that "%" is always
escaped since we do not use it for variable substitution any more.
Since url_format_string is not populated anymore, a migration is created
to remove this field entirely, and make url_template non-nullable since
we will always populate it. Note that it is possible to have
url_template being null after migration 0422 and before 0424, but
in practice, url_template will not be None after backfilling and the
backend now is always setting url_template.
With the removal of url_format_string, RealmFilter model will now be cleaned
with URL template checks, and the old checks for escapes are removed.
We also modified RealmFilter.clean to skip the validation when the
url_template is invalid. This avoids raising mulitple ValidationError's
when calling full_clean on a linkifier. But we might eventually want to
have a more centric approach to data validation instead of having
the same validation in both the clean method and the validator.
Fixes#23124.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
For endpoints with a `type` parameter to indicate whether the message
is a stream or direct message, `POST /typing` and `POST /messages`,
adds support for passing "direct" as the preferred value for direct
messages, group and 1-on-1.
Maintains support for "private" as a deprecated value to indicate
direct messages.
Fixes#24960.
Refactors instances of `message_type_name` and `message_type`
that are referring to API message type value ("stream" or
"private") to use `recipient_type_name` instead.
Prep commit for adding "direct" as a value for endpoints with a
`type` parameter to indicate whether the message is a stream or
direct message.
So far, we've used the BitField .authentication_methods on Realm
for tracking which backends are enabled for an organization. This
however made it a pain to add new backends (requiring altering the
column and a migration - particularly troublesome if someone wanted to
create their own custom auth backend for their server).
Instead this will be tracked through the existence of the appropriate
rows in the RealmAuthenticationMethods table.
If the ID of the scheduled message is passed by the client, we
edit the existing scheduled message instead of creating a new one.
However, this will soon be moved into its own API endpoint.
After this commit a notification message is sent to users if they are
added to user_groups by someone else or they are removed from user_groups
by someone else.
Fixes#23642.
Previously, we had an architecture where CSS inlining for emails was
done at provision time in inline_email_css.py. This was necessary
because the library we were using for this, Premailer, was extremely
slow, and doing the inlining for every outgoing email would have been
prohibitively expensive.
Now that we've migrated to a more modern library that inlines the
small amount of CSS we have into emails nearly instantly, we are able
to remove the complex architecture built to work around Premailer
being slow and just do the CSS inlining as the final step in sending
each individual email.
This has several significant benefits:
* Removes a fiddly provisioning step that made the edit/refresh cycle
for modifying email templates confusing; there's no longer a CSS
inlining step that, if you forget to do it, results in your testing a
stale variant of the email templates.
* Fixes internationalization problems related to translators working
with pre-CSS-inlined emails, and then Django trying to apply the
translators to the post-CSS-inlined version.
* Makes the send_custom_email pipeline simpler and easier to improve.
Signed-off-by: Daniil Fadeev <fadeevd@zulip.com>
Adds the user ID to the return values for the `/fetch_api_key` and
`/dev_fetch_api_key` endpoints. This saves clients like mobile a
round trip to the server to get the user's unique ID as it is now
returned as part of the log in flow.
Fixes#24980.
This commit adds a new endpoint, 'POST /user_topics' which
is used to update the personal preferences for a topic.
Currently, it is used to update the visibility policy of
a user-topic row.
This is a prep commit that renames lib functions
so that they can be used while implementing view
for the new endpoint 'POST /user_topics'.
We use a more generic name when removing the visibility_policy of
a topic, i.e., 'access_stream_to_remove_visibility_policy_by_id/name'
instead of 'access_stream_for_unmute_topic_by_id/name' which focused
on removing MUTE from a topic.
This commit adds the fields related to realm creation form using
get_realm_create_form_context in the context passed to register.html
template to avoid duplication.
Since we have updated the registration code to use
PreregistrationRealm objects for realm creation in
previous commits, some of the code has become
redundant and this commit removes it.
We remove the following code -
- The modification to PreregistrationUser objects in
process_new_human_user can now be done unconditionally
because prereg_user is passed only during user creation
and not realm creation. And we anyway do not expect
any PreregistrationUser objects inside the realm
during the creation.
- There is no need of "realm_creation" parameter in
create_preregistration_user function, since we now
use create_preregistration_realm during realm creation.
Fixes part of #24307.
In previous commits, we updated the realm creation flow to show
the realm name, type and subdomain fields in the first form
when asking for the email of the user. This commit updates the
user registration form to show the already filled realm details
as non-editable text and there is also a button to edit the
realm details before registration.
We also update the sub-heading for user registration form as
mentioned in the issue.
Fixes part of #24307.
We now use PreregistrationRealm objects in registration_helper
function when creating new realms instead of PreregistrationUser
objects.
Fixes part of #24307.
We now show inputs for realm details like name, type and URL
in the create_realm.html template opened for "/new" url and
these information will be stored in PreregistrationRealm
objects in further commits.
We add a new class RealmDetailsForm in forms.py for this
such that it is used as a base class for RealmCreationForm
and we define RealmDetailsForm such that we can use it as
a subclass for RegistrationForm as well to avoid duplication.
This commit renames prereg_user variable in
check_prereg_key and get_prereg_key_and_redirect
functions in zerver/views/registration.py to
prereg_object as in further commits the
preregistration object could also be
PreregistrationRealm object as part of changes
for #24307.
Some well-intentioned adblockers also block Sentry client-side error
reporting. Provide an endpoint on the Zulip server which forwards to
the Sentry server, so that these requests are not blocked.
Use the built-in HTML escaping of Markup("…{var}…").format(), in order
to allow Semgrep to detect mistakes like Markup("…{var}…".format())
and Markup(f"…{var}…").
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Currently, there is a checkbox setting for whether to
"Include realm name in subject of message notification emails".
This commit replaces the checkbox setting with a dropdown
having values: Automatic [default], Always, Never.
The Automatic option includes the realm name if, and only if,
there are multiple Zulip realms associated with the user's email.
Tests are added and(or) modified.
Fixes: #19905.
While the function which processes the realm registration and
signup remains the same, we use different urls and functions to
call the process so that we can separately track them. This will
help us know the conversion rate of realm registration after
receiving the confirmation link.
b4dd118aa1 changed how the `user_info_str` parsed information out of
the events it received -- but only changed the server errors, not the
browser errors, though both use the same codepath. As a result, all
browser errors since then have been incorrectly marked as being for
anonymous users.
Build and pass in the expected `user` dict into the event.
In order to support different types of topic visibility policies,
this renames 'add_topic_mute' to
'set_user_topic_visibility_policy_in_database'
and refactors it to accept a parameter 'visibility_policy'.
Create a corresponding UserTopic row for any visibility policy,
not just muting topics.
When a UserTopic row for (user_profile, stream, topic, recipient_id)
exists already, it updates the row with the new visibility_policy.
In the event of a duplicate request, raises a JsonableError.
i.e., new_visibility_policy == existing_visibility_policy.
There is an increase in the database query count in the message-edit
code path.
Reason:
Earlier, 'add_topic_mute' used 'bulk_create' which either
creates or raises IntegrityError -- 1 query.
Now, 'set_user_topic_visibility_policy' uses get_or_create
-- 2 queries in the case of creating new row.
We can't use the previous approach, because now we have to
handle the case of updating the visibility_policy too.
Also, using bulk_* for a single row is not the correct way.
Co-authored-by: Kartik Srivastava <kaushiksri0908@gmail.com>
Co-authored-by: Prakhar Pratyush <prakhar841301@gmail.com>
Replaces 'do_unmute_topic' with 'do_set_user_topic_visibility_policy'
and associated minor changes.
This change is made to align with the plan to use a single function
'do_set_user_topic_visibility_policy' to manage
user_topic - visibility_policy changes and corresponding event
generation.
This commit is a step in the direction of having a common
function to handle visibility_policy changes and event
generation instead of separate functions for each
visibility policy.
In order to support different types of topic visibility policies,
this renames 'do_topic_mute' to 'do_set_user_topic_visibility_policy'
and refactors it to accept a parameter 'visibility_policy'.
Creates `MutableJsonResponse` as a subclass of Django's `HttpResponse`
that we can modify for ignored parameters in the response content.
Updates responses to include `ignored_parameters_unsupported` in
the response data through `has_request_variables`. Creates unit
test for this implementation in `test_decorators.py`.
The `method` parameter processed in `rest_dispatch` is not in the
`REQ` framework, so for any tests that pass that parameter, assert
for the ignored parameter with a comment.
Updates OpenAPI documentation for `ignored_parameters_unsupported`
being returned in the JSON success response for all endpoints.
Adds detailed documentation in the error handling article, and
links to that page in relevant locations throughout the API docs.
For the majority of endpoints, the documentation does not include
the array in any examples of return values, and instead links to
the error handling page. The exceptions are the three endpoints
that had previously supported this return value. The changes note
and example for these endpoints is also used in the error
handling page.
This already became useless in 6e11754642,
as detailed in the API changelog entry here. At this point, we should
eliminate this param and the weird code around it.
This commit also deletes the associated tests added in
6e11754642, since with realm_str removed,
they make no sense anymore (and actually fail with an OpenAPI error due
to using params not used in the API). Hypothetically they could be
translated to use the subdomain= kwarg, but that also doesn't make
sense, since at that point they'd be just testing the case of a user
making an API request on a different subdomain than their current one
and that's just redundant and already tested generally in
test_decorators.
This leftover variable, as a result of older changes, was just always
set to None. That was fine, because when realm=None reaches
check_message further down the codepath, it just infers from
sender.realm. We want to stop passing None like that though, so let's
just set this to user_profile.realm.
View that handled `PATCH user_groups/<int:user_group_id>` required
both name and description parameters to be passed. Due to this
clients had to pass values for both these parameters even if
one of them was changed.
To resolve this name description parameters to
`PATCH user_groups/<int:user_group_id>` are made optional.
We now allow user to change email_address_visibility during user
signup and it overrides the realm-level default and also overrides
the setting if user import settings from existing account.
We do not show UI to set email_address_visibility during realm
creation.
Fixes#24310.
This commit adds backend code to set email_address_visibility when
registering a new user. The realm-level default and the value of
source profile gets overridden by the value user selected during
signup.
We add stream_permission_group_settings object which is
similar to property_types framework used for realm settings.
This commit also adds GroupPermissionSetting dataclass for
defining settings inside stream_permission_group_settings.
We add "do_change_stream_group_based_setting" function which
is called in loop to update all the group-based stream settings
and it is now used to update 'can_remove_subscribers_group'
setting instead of "do_change_can_remove_subscribers_group".
We also change the variable name for event_type field of
RealmAuditLog objects to STREAM_GROUP_BASED_SETTING_CHANGED
since this will be used for all group-based stream settings.
'property' field is also added to extra_data field to identify
the setting for which RealmAuditLog object was created.
We will add a migration in further commits which will add the
property field to existing RealmAuditLog objects created for
changing can_remove_subscribers_group setting.
This makes use of the new case insensitive UNIQUE index added in the
earlier commit. With that index present, we can now rely solely on the
database to correctly identify duplicates and throw integrity errors as
required.
In 141b0c4, we added code to handle races caused by duplicate muting
requests. That code can also handle the non-race condition, so we don't
require the first check.
This commits update the code to use user-level email_address_visibility
setting instead of realm-level to set or update the value of UserProfile.email
field and to send the emails to clients.
Major changes are -
- UserProfile.email field is set while creating the user according to
RealmUserDefault.email_address_visbility.
- UserProfile.email field is updated according to change in the setting.
- 'email_address_visibility' is added to person objects in user add event
and in avatar change event.
- client_gravatar can be different for different users when computing
avatar_url for messages and user objects since email available to clients
is dependent on user-level setting.
- For bots, email_address_visibility is set to EVERYONE while creating
them irrespective of realm-default value.
- Test changes are basically setting user-level setting instead of realm
setting and modifying the checks accordingly.
This commit extracts a function to parse message time limit type settings
and to set it if the new setting value is None.
This function is currently used for message_content_edit_limit_seconds and
message_content_delete_limit_seconds settings and will be used for
message_move_limit_seconds setting to be added in further commits.
Similar to the previous commit, Django was responsible for setting the
Content-Disposition based on the filename, whereas the Content-Type
was set by nginx based on the filename. This difference is not
exploitable, as even if they somehow disagreed with Django's expected
Content-Type, nginx will only ever respond with Content-Types found in
`uploads.types` -- none of which are unsafe for user-supplied content.
However, for consistency, have Django provide both Content-Type and
Content-Disposition headers.
The Content-Type of user-provided uploads was provided by the browser
at initial upload time, and stored in S3; however, 04cf68b45e
switched to determining the Content-Disposition merely from the
filename. This makes uploads vulnerable to a stored XSS, wherein a
file uploaded with a content-type of `text/html` and an extension of
`.png` would be served to browsers as `Content-Disposition: inline`,
which is unsafe.
The `Content-Security-Policy` headers in the previous commit mitigate
this, but only for browsers which support them.
Revert parts of 04cf68b45e, specifically by allowing S3 to provide
the Content-Disposition header, and using the
`ResponseContentDisposition` argument when necessary to override it to
`attachment`. Because we expect S3 responses to vary based on this
argument, we include it in the cache key; since the query parameter
has dashes in it, we can't use use the helper `$arg_` variables, and
must parse it from the query parameters manually.
Adding the disposition may decrease the cache hit rate somewhat, but
downloads are infrequent enough that it is unlikely to have a
noticeable effect. We take care to not adjust the cache key for
requests which do not specify the disposition.
In nginx, `location` blocks operate on the _decoded_ URI[^1]:
> The matching is performed against a normalized URI, after decoding
> the text encoded in the “%XX” form
This means that if a user-uploaded file contains characters that are
not URI-safe, the browser encodes them in UTF-8 and then URI-encodes
them -- and nginx decodes them and reassembles the original character
before running the `location ~ ^/...` match. This means that the `$2`
_is not URI-encoded_ and _may contain non-ASCII characters.
When `proxy_pass` is passed a value containing one or more variables,
it does no encoding on that expanded value, assuming that the bytes
are exactly as they should be passed to the upstream. This means that
directly calling `proxy_pass https://$1/$2` would result in sending
high-bit characters to the S3 upstream, which would rightly balk.
However, a longstanding bug in nginx's `set` directive[^2] means that
the following line:
```nginx
set $download_url https://$1/$2;
```
...results in nginx accidentally URI-encoding $1 and $2 when they are
inserted, resulting in a `$download_url` which is suitable to pass to
`proxy_pass`. This bug is only present with numeric capture
variables, not named captures; this is particularly relevant because
numeric captures are easily overridden by additional regexes
elsewhere, as subsequent commits will add.
Fixing this is complicated; nginx does not supply any way to escape
values[^3], besides a third-party module[^4] which is an undue
complication to begin using. The only variable which nginx exposes
which is _not_ un-escaped already is `$request_uri`, which contains
the very original URL sent by the browser -- and thus can't respect
any work done in Django to generate the `X-Accel-Redirect` (e.g., for
`/user_uploads/temporary/` URLs). We also cannot pass these URLs to
nginx via query-parameters, since `$arg_foo` values are not
URI-decoded by nginx, there is no function to do so[^3], and the
values must be URI-encoded because they themselves are URLs with query
parameters.
Extra-URI-encode the path that we pass to the `X-Accel-Redirect`
location, for S3 redirects. We rely on the `location` block
un-escaping that layer, leaving `$s3_hostname` and `$s3_path` as they
were intended in Django.
This works around the nginx bug, with no behaviour change.
[^1]: http://nginx.org/en/docs/http/ngx_http_core_module.html#location
[^2]: https://trac.nginx.org/nginx/ticket/348
[^3]: https://trac.nginx.org/nginx/ticket/52
[^4]: https://github.com/openresty/set-misc-nginx-module#set_escape_uri
Rename 'muting.py' to 'user_mutes.py' because it, now
, contains only user-mute related functions.
Includes minor refactoring needed after renaming the file.
This commit moves topic related stuff i.e. topic muting functions
to a separate file 'views/user_topics.py'.
'views/muting.py' contains functions related to user-mutes only.
This will help us track if users actually clicked on the
email confirmation link while creating a new organization.
Replaced all the `reder` calls in `accounts_register` with
`TemplateResponse` to comply with `add_google_analytics`
decorator.
This adds a new endpoint /jwt/fetch_api_key that accepts a JWT and can
be used to fetch API keys for a certain user. The target realm is
inferred from the request and the user email is part of the JWT.
A JSON containing an user API key, delivery email and (optionally)
raw user profile data is returned in response.
The profile data in the response is optional and can be retrieved by
setting the POST param "include_profile" to "true" (default=false).
Co-authored-by: Mateusz Mandera <mateusz.mandera@zulip.com>
This will be useful for re-use for implementation of an endpoint for
obtaining the API by submitting a JWT in the next commits.
It's not a pure refactor, as it requires some tweaks to remote_user_jwt
behavior:
1. The expected format of the request is changed a bit. It used to
expect "user" and "realm" keys, from which the intended email was
just generated by joining with @. Now it just expects "email"
straight-up. The prior design was a bt strange to begin with, so this
might be an improvement actually.
2. In the case of the codepath of new user signup, this will no longer
pre-populate the Full Name in the registration form with the value
from the "user" key. This should be a very minor lost of
functionality, because the "user" value was not going to be a proper
Full Name anyway. This functionality can be restored in a future
commit if desired.
This is an API change, but this endpoint is nearly unused as far as
we're aware.
- Updates `.prettierignore` for the new directory.
- Updates any reference to the API documentation directory for
markdown files to be `api_docs/` instead of `zerver/api/`.
- Removes a reference link from `docs/documentation/api.md` that
hasn't referenced anything in the text since commit 0542c60.
- Update rendering of API documentation for new directory.
Moves the check for calling the `api-doc-template.md` directly,
so that we don't return a 500 error from the server, to happen
earlier with other checks for returning a 404 / missing page.
Also adds a specific test to `zerver/tests/test_urls` for this
template.
Prep commit for moving API documentation directory to be a top
level directory.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
These files are not Jinja2 templates, so there's no reason that they needed
to be inside `templates/zerver`. Moving them to the top level reflects their
importance and also makes it feel nicer to work on editing the help center content,
without it being unnecessary buried deep in the codebase.
Changes the check for whether the documentation page is a policy
center page to be the `self.policies_view` boolean instead of the
`path_template` value as it reads much more clearly.
Moves a comment in the code to be contextually relevant.
Because of the overlap with the `DocumentationArticle` dataclass
field `article_path`, we rename the `article_path` variable used
in `MarkdownDirectoryView.get_context_data` for the absolute path
to be `article_absolute_path`.
In commit bbecd41, we added "not_index_page" to the context for
some documentation articles, but use of that context key/value was
removed when the help documentation was removed in commit 1cf7ee9.
Changes `not_index_page` to be a boolean value that's used to set
the page title, but is not then passed on as a context key/value.
Also removes an irrelevant comment about disabling "Back to home"
on the homepage.
Since we want to use `accounts/new/send_confirm` to know how many
users actually register after visiting the register page, we
added it to Google Tag Manager, but GTM tracks every user
registration separately due <email> in the URL
making it harder to track.
To solve this, we want to pass <email> as a GET parameter which
can be easily filtered inside GTM using a RegEx and all the
registrations can be tracked as one.
Previously, we got the directory path for all documentation pages
before checking for API method and path information in the OpenAPI
documentation. Instead, we now check the `path_template` is the
API documentation view template before getting the directory path.
Also, changes the confusingly named `article_path` variable, which
overlapped with the DocumentationArticle dataclass `article_path`
field, to now be `api_documentation_path`.
Prep commit for moving the help center documentation to a top level
directory.
When file uploads are stored in S3, this means that Zulip serves as a
302 to S3. Because browsers do not cache redirects, this means that
no image contents can be cached -- and upon every page load or reload,
every recently-posted image must be re-fetched. This incurs extra
load on the Zulip server, as well as potentially excessive bandwidth
usage from S3, and on the client's connection.
Switch to fetching the content from S3 in nginx, and serving the
content from nginx. These have `Cache-control: private, immutable`
headers set on the response, allowing browsers to cache them locally.
Because nginx fetching from S3 can be slow, and requests for uploads
will generally be bunched around when a message containing them are
first posted, we instruct nginx to cache the contents locally. This
is safe because uploaded file contents are immutable; access control
is still mediated by Django. The nginx cache key is the URL without
query parameters, as those parameters include a time-limited signed
authentication parameter which lets nginx fetch the non-public file.
This adds a number of nginx-level configuration parameters to control
the caching which nginx performs, including the amount of in-memory
index for he cache, the maximum storage of the cache on disk, and how
long data is retained in the cache. The currently-chosen figures are
reasonable for small to medium deployments.
The most notable effect of this change is in allowing browsers to
cache uploaded image content; however, while there will be many fewer
requests, it also has an improvement on request latency. The
following tests were done with a non-AWS client in SFO, a server and
S3 storage in us-east-1, and with 100 requests after 10 requests of
warm-up (to fill the nginx cache). The mean and standard deviation
are shown.
| | Redirect to S3 | Caching proxy, hot | Caching proxy, cold |
| ----------------- | ------------------- | ------------------- | ------------------- |
| Time in Django | 263.0 ms ± 28.3 ms | 258.0 ms ± 12.3 ms | 258.0 ms ± 12.3 ms |
| Small file (842b) | 586.1 ms ± 21.1 ms | 266.1 ms ± 67.4 ms | 288.6 ms ± 17.7 ms |
| Large file (660k) | 959.6 ms ± 137.9 ms | 609.5 ms ± 13.0 ms | 648.1 ms ± 43.2 ms |
The hot-cache performance is faster for both large and small files,
since it saves the client the time having to make a second request to
a separate host. This performance improvement remains at least 100ms
even if the client is on the same coast as the server.
Cold nginx caches are only slightly slower than hot caches, because
VPC access to S3 endpoints is extremely fast (assuming it is in the
same region as the host), and nginx can pool connections to S3 and
reuse them.
However, all of the 648ms taken to serve a cold-cache large file is
occupied in nginx, as opposed to the only 263ms which was spent in
nginx when using redirects to S3. This means that to overall spend
less time responding to uploaded-file requests in nginx, clients will
need to find files in their local cache, and skip making an
uploaded-file request, at least 60% of the time. Modeling shows a
reduction in the number of client requests by about 70% - 80%.
The `Content-Disposition` header logic can now also be entirely shared
with the local-file codepath, as can the `url_only` path used by
mobile clients. While we could provide the direct-to-S3 temporary
signed URL to mobile clients, we choose to provide the
served-from-Zulip signed URL, to better control caching headers on it,
and greater consistency. In doing so, we adjust the salt used for the
URL; since these URLs are only valid for 60s, the effect of this salt
change is minimal.
Moving `/user_avatars/` to being served partially through Django
removes the need for the `no_serve_uploads` nginx reconfiguring when
switching between S3 and local backends. This is important because a
subsequent commit will move S3 attachments to being served through
nginx, which would make `no_serve_uploads` entirely nonsensical of a
name.
Serve the files through Django, with an offload for the actual image
response to an internal nginx route. In development, serve the files
directly in Django.
We do _not_ mark the contents as immutable for caching purposes, since
the path for avatar images is hashed only by their user-id and a salt,
and as such are reused when a user's avatar is updated.
The `django-sendfile2` module unfortunately only supports a single
`SENDFILE` root path -- an invariant which subsequent commits need to
break. Especially as Zulip only runs with a single webserver, and
thus sendfile backend, the functionality is simple to inline.
It is worth noting that the following headers from the initial Django
response are _preserved_, if present, and sent unmodified to the
client; all other headers are overridden by those supplied by the
internal redirect[^1]:
- Content-Type
- Content-Disposition
- Accept-Ranges
- Set-Cookie
- Cache-Control
- Expires
As such, we explicitly unset the Content-type header to allow nginx to
set it from the static file, but set Content-Disposition and
Cache-Control as we want them to be.
[^1]: https://www.nginx.com/resources/wiki/start/topics/examples/xsendfile/
sendfile already applied a Content-Disposition header, but the
algorithm may provide both `filename=` and `filename*=` values (which
is potentially confusing to clients) and incorrectly slash-escapes
quotes in Unicode strings.
Django provides a correct implementation, but it is only accessible to
FileResponse objects. Since the entire point is to offload the
filehandle handling, we cannot use a FileResponse.
Django 4.2 will make the function available outside of FileResponse.
Until then, extract our own Content-Disposition handling, based on
Django's.
We remove the very verbose comment added in d4360e2287, describing
Content-Disposition headers, as it does not add much.
If `invite_as` is passed as a number outside the range of a PostgreSQL
`SMALLINT` field, the database throws an exception. Move this exception
to the glass as a validation error to allow better client-side error
handling and reduce database round-trips.
Moves files in `templates/zerver/help/include` that are used
specifically for integrations documentation to be in a new
directory: `templates/zerver/integrations/include`.
Adds a boolean parameter to `render_markdown_path` to be used
for integrations documentation pages.
Track `create_realm` and `new_realm_send_confirm` using
google analytics.
This will help us track number of users who want to
create a new Zulip organization.