When the credentials are provided by dint of being run on an EC2
instance with an assigned Role, we must be able to fetch the instance
metadata from IMDS -- which is precisely the type of internal-IP
request that Smokescreen denies.
While botocore supports a `proxies` argument to the `Config` object,
this is not actually respected when making the IMDS queries; only the
environment variables are read from. See
https://github.com/boto/botocore/issues/2644
As such, implement S3_SKIP_PROXY by monkey-patching the
`botocore.utils.should_bypass_proxies` function, to allow requests to
IMDS to be made without Smokescreen impeding them.
Fixes#20715.
This will make it convenient to add a handful of organizations to the
beta of this feature during its first few weeks to try to catch bugs,
before we open it to everyone in Zulip Cloud.
This replaces the TERMS_OF_SERVICE and PRIVACY_POLICY settings with
just a POLICIES_DIRECTORY setting, in order to support settings (like
Zulip Cloud) where there's more policies than just those two.
With minor changes by Eeshan Garg.
We do s/TOS/TERMS_OF_SERVICE/ on the name, and while we're at it,
remove the assumed zerver/ namespace for the template, which isn't
correct -- Zulip Cloud related content should be in the corporate/
directory.
We had an incident where someone didn't notice for a week that they'd
accidentally enabled a 30-day message retention policy, and thus we
were unable to restore the deleted the content.
After some review of what other products do (E.g. Dropbox preserves
things in a restoreable state for 30 days) we're adjusting this
setting's default value to be substantially longer, to give more time
for users to notice their mistake and correct it before data is
irrevocably deleted.
TOR users are legitimate users of the system; however, that system can
also be used for abuse -- specifically, by evading IP-based
rate-limiting.
For the purposes of IP-based rate-limiting, add a
RATE_LIMIT_TOR_TOGETHER flag, defaulting to false, which lumps all
requests from TOR exit nodes into the same bucket. This may allow a
TOR user to deny other TOR users access to the find-my-account and
new-realm endpoints, but this is a low cost for cutting off a
significant potential abuse vector.
If enabled, the list of TOR exit nodes is fetched from their public
endpoint once per hour, via a cron job, and cached on disk. Django
processes load this data from disk, and cache it in memcached.
Requests are spared from the burden of checking disk on failure via a
circuitbreaker, which trips of there are two failures in a row, and
only begins trying again after 10 minutes.
Previously, our codebase contained links to various versions of the
Django docs, eg https://docs.djangoproject.com/en/1.8/ref/
request-response/#django.http.HttpRequest and https://
docs.djangoproject.com/en/2.2/ref/settings/#std:setting-SERVER_EMAIL
opening a link to a doc with an outdated Django version would show a
warning "This document is for an insecure version of Django that is no
longer supported. Please upgrade to a newer release!".
Most of these links are inside comments.
Following the replacement of these links in our docs, this commit uses
a search with the regex "docs.djangoproject.com/en/([0-9].[0-9]*)/"
and replaces all matches with "docs.djangoproject.com/en/3.2/".
All the new links in this commit have been generated by the above
replace and each link has then been manually checked to ensure that
(1) the page still exists and has not been moved to a new location
(and it has been found that no page has been moved like this), (2)
that the anchor that we're linking to has not been changed (and it has
been found that no anchor has been changed like this).
One comment where we mentioned a Django version in text before linking
to a page for that version has also been changed, the comment
mentioned the specific version when a change happened, and the history
is no longer relevant to us.
This new setting both serves as a guard to allow us to merge API
support for web public streams to main before we're ready for this
feature to be available on Zulip Cloud, and also long term will
protect self-hosted servers from accidentally enabling web-public
streams (which could be a scary possibility for the administrators of
a corporate Zulip server).
SOCIAL_AUTH_SUBDOMAIN was potentially very confusing when opened by a
user, as it had various Login/Signup buttons as if there was a realm on
it. Instead, we want to display a more informative page to the user
telling them they shouldn't even be there. If possible, we just redirect
them to the realm they most likely came from.
To make this possible, we have to exclude the subdomain from
ROOT_SUBDOMAIN_ALIASES - so that we can give it special behavior.
This fixes error found with django-stubs and it is a part of #18777.
Note that there are various remaining errors that need to be fixed in
upstream or elsewhere in our codebase.
Running notify_server_error directly from the logging handler can lead
to database queries running in a random context. Among the many
potential problems that could cause, one actual problem is a
SynchronousOnlyOperation exception when running in an asyncio event
loop.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Fixes#17277.
The main limitation of this implementation is that the sync happens if
the user authing already exists. This means that a new user going
through the sign up flow will not have their custom fields synced upon
finishing it. The fields will get synced on their consecutive log in via
SAML in the future. This can be addressed in the future by moving the
syncing code further down the codepaths to login_or_register_remote_user
and plumbing the data through to the user creation process.
We detail that limitation in the documentation.
The old type in default_settings wasn't right - limit_to_subdomains is a
List[str]. We define a TypeDict for capturing the typing of the settings
dict more correctly and to allow future addition of configurable
attributes of other non-str types.
This will offer users who are self-hosting to adjust
this value. Moreover, this will help to reduce the
overall time taken to test `test_markdown.py` (since
this can be now overridden with `override_settings`
Django decorator).
This is done as a prep commit for #18641.
This will be useful for deployments that want to just use the full name
provided by the IdP and thus skip the registration form. Also in
combination with disabling name changes in the organization, can force
users to just use that name without being able to change it.
This makes it parallel with deliver_scheduled_messages, and clarifies
that it is not used for simply sending outgoing emails (e.g. the
`email_senders` queue).
This also renames the supervisor job to match.
Support for the timeouts, and tests for them, was added in
53a8b2ac87 -- though no code could have set them after 31597cf33e.
Add a 10-second default timeout. Observationally, p99 is just about
5s, with everything else being previously being destined to meet the
30s worker timeout; 10s provides a sizable buffer between them.
Fixes#17742.
Thumbor and tc-aws have been dragging their feet on Python 3 support
for years, and even the alphas and unofficial forks we’ve been running
don’t seem to be maintained anymore. Depending on these projects is
no longer viable for us.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
As discussed in the comment, this is a critical scalability
optimization for organizations with thousands of users.
With substantial comment updates by tabbott.
This allows access to be more configurable than just setting one
attribute. This can be configured by setting the setting
AUTH_LDAP_ADVANCED_REALM_ACCESS_CONTROL.
This was used by the old native Zulip Android app
(zulip/zulip-android). That app has been undeveloped for enough years
that we believe it no longer functions; as a result, there's no reason
to keep a prototype API endpoint for it (that we believe never worked).
We use GIPHY web SDK to create popover containing GIFs in a
grid format. Simply clicking on the GIFs will insert the GIF in the compose
box.
We add GIPHY logo to compose box action icons which opens the GIPHY
picker popover containing GIFs with "Powered by GIPHY"
attribution.
We add a TUTORIAL_ENABLED setting for self-hosters who want to
disable the tutorial entirely on their system. For this, the
default value (True) is placed in default_settings.py, which
can be overwritten by adding an entry in /etc/zulip/settings.py.
Match the order of the variables between `default_settings.py` and
`settings.py`, and move the defaults into `default_settings.py` so
the section does not require any uncommented lines in `settings.py` if
LDAP is not in use.
Boto3 does not allow setting the endpoint url from
the config file. Thus we create a django setting
variable (`S3_ENDPOINT_URL`) which is passed to
service clients and resources of `boto3.Session`.
We also update the uploads-backend documentation
and remove the config environment variable as now
AWS supports the SIGv4 signature format by default.
And the region name is passed as a parameter instead
of creating a config file for just this value.
Fixes#16246.
Having both of these is confusing; TORNADO_SERVER is used only when
there is one TORNADO_PORT. Its primary use is actually to be _unset_,
and signal that in-process handling is to be done.
Rename to USING_TORNADO, to parallel the existing USING_RABBITMQ, and
switch the places that used it for its contents to using
TORNADO_PORTS.
In development and test, we keep the Tornado port at 9993 and 9983,
respectively; this allows tests to run while a dev instance is
running.
In production, moving to port 9800 consistently removes an odd edge
case, when just one worker is on an entirely different port than if
two workers are used.
The apple developer webapp consistently refers this App ID. So,
this clears any confusion that can occur.
Since python social auth only requires us to include App ID in
_AUDIENCE(a list), we do that in computed settings making it easier for
server admin and we make it much clear by having it set to
APP_ID instead of BUNDLE_ID.
Use the default configuration, which catches Error logging and
exceptions. This is placed in `computed_settings.py` to match the
suggested configuration from Sentry[1], which places it in `settings.py`
to ensure it is consistently loaded early enough.
It is placed behind a check for SENTRY_DSN soas to not incur the
additional overhead of importing the `sentry_sdk` modules if Sentry is
not configured.
[1] https://docs.sentry.io/platforms/python/django/
We remove support for the old clients which required an event for
each message to clear notification.
This is justified since it has been around 1.5 years since we started
supporting the bulk operation (and so essentially nobody is using a
mobile app version so old that it doesn't support the batched
approach) and the unbatched approach has a maintenance and reliability
cost.
Due to authentication restrictions, a deployment may need to direct
traffic for mobile applications to an alternate uri to take advantage
alternate authentication mechansism. By default the standard realm URI
will be usedm but if overridden in the settings file, an alternate uri
can be substituted.
Fixes#14960.
The default of 6 thread may not be appropriate in certain
configurations. Taking half of the numer of CPUs available to the
process will be more flexible.
This prevents memcached from automatically appending the hostname to
the username, which was a source of problems on servers where the
hostname was changed.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This implementation overrides some of PSA's internal backend
functions to handle `state` value with redis as the standard
way doesn't work because of apple sending required details
in the form of POST request.
Includes a mixin test class that'll be useful for testing
Native auth flow.
Thanks to Mateusz Mandera for the idea of using redis and
other important work on this.
Documentation rewritten by tabbott.
Co-authored-by: Mateusz Mandera <mateusz.mandera@zulip.com>
Calling jwt.decode without an algorithms list raises a
DeprecationWarning. This is for protecting against
symmetric/asymmetric key confusion attacks.
This is a backwards-incompatible configuration change.
Fixes#15207.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This reimplements our Zoom video call integration to use an OAuth
application. In addition to providing a cleaner setup experience,
especially on zulipchat.com where the server administrators can have
done the app registration already, it also fixes the limitation of the
previous integration that it could only have one call active at a time
when set up with typical Zoom API keys.
Fixes#11672.
Co-authored-by: Marco Burstein <marco@marco.how>
Co-authored-by: Tim Abbott <tabbott@zulipchat.com>
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
While this functionality to post slow queries to a Zulip stream was
very useful in the early days of Zulip, when there were only a few
hundred accounts, it's long since been useless since (1) the total
request volume on larger Zulip servers run by Zulip developers, and
(2) other server operators don't want real-time notifications of slow
backend queries. The right structure for this is just a log file.
We get rid of the queue and replace it with a "zulip.slow_queries"
logger, which will still log to /var/log/zulip/slow_queries.log for
ease of access to this information and propagate to the other logging
handlers. Reducing the amount of queues is good for lowering zulip's
memory footprint and restart performance, since we run at least one
dedicated queue worker process for each one in most configurations.
If SAML_REQUIRE_LIMIT_TO_SUBDOMAINS is enabled, the configured IdPs will
be validated and cleaned up when the saml backend is initialized.
settings.py would be a tempting and more natural place to do this
perhaps, but in settings.py we don't do logging and we wouldn't be able
to write a test for it.
In development env, we use `get_secret` to get
`SOCIAL_AUTH_GITLAB_KEY` from `dev-secrets.conf`. But in
production env, we don't need this as we ask the user
to set that value in `prod_settings_template.py`.
This restricts the code from looking `zulip-secrets.conf`
for `social_auth_gitlab_key` in production env.
We plan to use these records to check and record the schema of Zulip's
events for the purposes of API documentation.
Based on an original messier commit by tabbott.
In theory, a nicer version of this would be able to work directly off
the mypy type system, but this will be good enough for our use case.
SOCIAL_AUTH_SAML_SECURITY_CONFIG["authnRequestsSigned"] override in
settings.py in a previous commit wouldn't work on servers old enough to
not have the SAML settings in their settings.py - due to
SOCIAL_AUTH_SAML_SECURITY_CONFIG being undefined.
This commit fixes that.
This applies rate limiting (through a decorator) of authenticate()
functions in the Email and LDAP backends - because those are the ones
where we check user's password.
The limiting is based on the username that the authentication is
attempted for - more than X attempts in Y minutes to a username is not
permitted.
If the limit is exceeded, RateLimited exception will be raised - this
can be either handled in a custom way by the code that calls
authenticate(), or it will be handled by RateLimitMiddleware and return
a json_error as the response.
This legacy cross-realm bot hasn't been used in several years, as far
as I know. If we wanted to re-introduce it, I'd want to implement it
as an embedded bot using those common APIs, rather than the totally
custom hacky code used for it that involves unnecessary queue workers
and similar details.
Fixes#13533.
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013. We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not. Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.
While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.
This WebSockets code path had a lot of downsides/complexity,
including:
* The messy hack involving constructing an emulated request object to
hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
needs and must be provisioned independently from the rest of the
server).
* A duplicate check_send_receive_time Nagios test specific to
WebSockets.
* The requirement for users to have their firewalls/NATs allow
WebSocket connections, and a setting to disable them for networks
where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
been poorly maintained, and periodically throws random JavaScript
exceptions in our production environments without a deep enough
traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
server restart, and especially for large installations like
zulipchat.com, resulting in extra delay before messages can be sent
again.
As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients. We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.
If we later want WebSockets, we’ll likely want to just use Django
Channels.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
These are leftovers from where we had default settings in the
settings.py file. Now that the files are separate those references to
"below" are not correct.