Previously, the GitLab webhook code, namely the `get_objects_assignee`
method first tried to get a single assignee and if that failed then it
looks for multiple assignees and then it would return the first
assignee that it found (there's actually a code smell here - a loop
which would always return on the first iteration).
Instead, this commit will change that behavior to first check for
multiple assignees first then for a single assignee if we can't find
multiple assignees. Ultimately it will return a list of all of the
assignees (however many that might be [0, n]). This method has then
aptly been renamed to `get_assignees`.
Finally, we tweked the code using this method to always use it's
output as an "assignees" parameter to templates (there's also an
assignee parameter which we want to avoid here for consistency).
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
For some reasons, some of the fixtures had the +x bit set, while
some didn't. What this commit does is make sure that no fixture
is marked as "executable" (for anyone).
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
The previous code only worked by accident and hyperlink 20.0.0 breaks
it.
>>> hyperlink.parse("example.com").replace(scheme="https")
DecodedURL(url=URL.from_text('https:example.com'))
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Django treats path("<name>") like re_path(r"(?P<name>[^/]+)") and
path("<path:name>") like re_path(r"(?P<name>.+)").
This is more readable and consistent than the mix of slightly
different regexes we had before, and fixes various bugs:
• The r'apps/(.*)$' regex was missing a start anchor ^, so it
incorrectly matched all URLs that included apps/ as a substring
anywhere.
• The r'accounts/login/(google)/$' regex was missing a start anchor ^,
so it incorrectly matched all URLs that ended with
accounts/login/google/.
• The type annotation of zerver.views.realm_export.delete_realm_export
takes export_id as an int, but it was previously passed as a string.
• The type annotation of zerver.views.users.avatar takes medium as a
bool, but it was previously passed as a string.
• The [0-9A-Za-z]+ pattern for uidb64 was missing the - and _
characters that can validly be part of a base64url encoded
string (although I think the id is actually a decimal integer here,
in which case only 012345ADEIMNOQTUYcgjkwxyz are present in its
base64url encoding).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
$ref siblings are ignored according to the OpenAPI specification, and
the referenced definitions already have examples.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Replace default root logger with zulip.auth.apple for apple auth
in file zproject/backends.py and update the test cases
accordingly in file zerver/tests/test_auth_backends.py
Replaced mock.patch with assertLogs for testing log outputs
in file test_auth_backends.py.
This change requires adjusting
test_log_into_subdomain_when_email_is_none to use an explicit token
since that appears in the log output.
This clears it out of the data sent to Sentry, where it is duplicative
with the indexed metadata -- and potentially exposes PHI if Sentry's
"make this issue public" feature is used.
The previous link was to "extended callable" types, which are
deprecated in favor of callback protocols. Unfortunately, defining a
protocol class can't express the typing -- we need some sort of
variadic generics[1]. Specifically, we wish to support hitting the
endpoint with additional parameters; thus, this protocol is
insufficient:
```
class WebhookHandler(Protocol):
def __call__(request: HttpRequest, api_key: str) -> HttpResponse: ...
```
...since it prohibits additional parameters. And allowing extra
arguments:
```
class WebhookHandler(Protocol):
def __call__(request: HttpRequest, api_key: str,
*args: object, **kwargs: object) -> HttpResponse: ...
```
...is similarly problematic, since the view handlers do not support
_arbitrary_ keyword arguments.
[1] https://github.com/python/typing/issues/193
`zulip.zerver.lib.webhooks.common` was very opaque previously,
especially since none of the logging was actually done from that
module.
Adjust to a more explicit logger name.
Any exception is an "unexpected event", which means talking about
having an "unexpected event logger" or "unexpected event exception" is
confusing. As the error message in `exceptions.py` already explains,
this is about an _unsupported_ event type.
This also switches the path that these exceptions are written to,
accordingly.
8e10ab282a moved UnexpectedWebhookEventType into
`zerver.lib.exceptions`, but left the import into
`zserver.lib.webhooks.common` so that webhooks could continue to
import the exception from there.
This clutters things and adds complexity; there is no compelling
reason that the exception's source of truth should not move alongside
all other exceptions.
The main race conditions, which actually happened in production was with
concurrent execution of deliver_email and clear_scheduled_emails.
clear_scheduled_emails could delete all email.users in the middle of
deliver_email execution, causing it to pass empty to_user_ids list to
send_email. We mitigate this by getting the list of user ids in a single
query and moving forward with that snapshot, not having to worry about
database data being mutated anymore.
clear_scheduled_emails had potential race conditions with concurrent
execution of itself due to not locking the appropriate rows upon
selecting them for the purpose of potentially deleting them. FOR UPDATE
locks need to be acquired to prevent simultaneous mutation.
Tested manually with some print+sleep debugging to make some races
happen.
fixes #zulip-2k (sentry)
There are three functional side effects:
• Correct an insignificant but mathematically offensive bias toward
repeated characters in generate_api_key introduced in commit
47b4283c4b4c70ecde4d3c8de871c90ee2506d87; its entropy is increased
from 190.52864 bits to 190.53428 bits.
• Use the base32 alphabet in confirmation.models.generate_key; its
entropy is reduced from 124.07820 bits to the documented 120 bits, but
now it uses 1 syscall instead of 24.
• Use the base32 alphabet in get_bigbluebutton_url; its entropy is
reduced from 51.69925 bits to 50 bits, but now it uses 1 syscall
instead of 10.
(The base32 alphabet is A-Z 2-7. We could probably replace all of
these with plain secrets.token_urlsafe, since I expect most callers
can handle the full urlsafe_b64 alphabet A-Z a-z 0-9 - _ without
problems.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
For web-public streams, clients can access full topic history
without being authenticated. They only need to additionally
send "streams:web-public" narrow with their request like all
the other web-public queries.
This verifies that we actually do enqueue a record when there is an
error on non-staging. With the previous commit, it verifies that that
data serializes correctly.
The return type of `ugettext_lazy('...')` (aliased as `_`) is a
promise, which is only forced into a string when it is dealt with in
string context. This `django.utils.functional.lazy.__proxy__` object
is not entirely transparent, however -- it cannot be serialized by
`orjson`, and `isinstance(x, str) == False`, which can lead to
surprising action-at-a-distance.
In the two places which will serialize the role value (either into
Zulip's own error reporting queue, or Sentry's), force the return
value. Failure to do this results in errors being dropped
mostly-silently, as they cannot be serialized and enqueued by the
error reporter logger, which has no recourse but to just log a
warning; see previous commit.
When we do this forcing, explicitly override the language to be the
realm default. Failure to provide this override would translate the
role into the role in the language of the _request_, yielding varying
results.
AdminNotifyHandler is used to notify admins of errors; it is a
critical piece of logic. Failures in reporting errors will compound,
since its `except Exception` clauses cannot generate logging at the
`error` or `exception` level, as that would be recursive. It must
settle for logging at the `warning` level, and hope that admins are
vigilant to the logging there.
Increase the chances of being notified of failures in this logger, by
bubbling up those exceptions to Sentry, which is an orthogonal
reporting stack.
When user requests for a realm that doesn't exists, we raise
a InvalidSubdomainError.
This reduces our effort at repeatedly ensuring realm is valid
in request in web-public queries.
If there are unsupported keys, we still log an error,
but we now also send a message to the stream. (This
is a good tradeoff for the github webhook, since users
can just turn off notifications if they find it spammy.
Also, we intend to support "repository" soon.)
This is a bit of an experiment to see how this plays
in the field:
* will customers notice the change?
* will Sentry reports look any different?
The main thing fixed here is that we weren't turning
on our keys into a list. And then I refined the message
a bit more, including sorting the keys.
I also avoid the unnecessary "else".
The EVENT_FUNCTION_MAPPER maps a string event name
to a function handler. Before this we circumvented
mypy checks with a call to get_body_function_based_on_type,
which specified Any as the type of our event function.
Now the types are rigorous.
This change was impossible without the recent commit
to introduce the Helper class.
The Helper class will soon grow, but the immediate
problem it solves is the need to jankily inspect
the parameters of our get_*_body function.
Most of the changes were handled by an ad hoc
munge.py script.
The substantive changes were adding the Helper
class and passing it in.
And then the linter discovered a place where
the optional include_title parameter wasn't used
(which is one of the reasons to avoid the janky
inspect-signature technique).
As a side note, none of the include_title parameters
needed a default value of False, as we always passed
in an explicit value.
We test cover both sides of include_title, which
you can verify by hard coding it to either True or
False (and seeing the relevant failures), although I
suspect most individual codepaths
only test one value, based on whether "topic" is in
the fixture or not.
Finally, I know Helper is not a great name, but I
intend to evolve the class a bit before deciding
whether a more descriptive name is helpful here.
(For example, an upcoming commit will add a
log_unexpected helper method.)
We get the header_event one level up the call
stack now, too.
It's somewhat annoying that we have our own
concept of "event" here, instead of just returning
our event handlers directly, or just calling them
directly, but it's a bit non-trivial to fix that
right away.
In passing, I remove the strange OR for "ping",
which is already a key in EVENT_FUNCTION_MAPPER.
See https://github.com/zulip/zulip/issues/16258 for
possible follow up here.
We now ignore the following two new pull_request
actions (as well as the three existing ones
from before):
approved
converted_to_draft
As the issue above indicates, we may want to actually
support "approved" if we can find somebody to work
on the webhook. (And then the issue goes a little
broader than what changed here.)
We consolidate the tests and remove the fixtures, which
just have a lot of noisy fields that we ignore. Also,
pull_request__request_review_removed was named improperly.
Before this the only way we took advantage
of the summary from UnexpectedWebhookEventType
was by looking at exc_info().
Now we just explicitly add it to the log
message, which also sets us up to call
log_exception_to_webhook_logger directly
with some sort of "summary" info
when we don't actually want a real
exception (for example, we might want to
report anomalous webhook data but still
continue the transaction).
A minor change in passing is that I move
the payload parameter lexically.
Our isort configuration was almost Black-compatible, but we were
missing ensure_newline_before_comments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Some users setup zulip with trailing / at end, like 'https://meet.jit.si/
leading to extra / on clients while generating video chat link.
This commit removes trailing '/' if it exists to make it consistent. Manual
testing was done by generating jitsi url.
Fixes#16225
We eliminate optional parameters and replace `request_body`
with `payload`.
There is much less confusion if we just pass in `payload`,
and then we optionally re-format it if it's json.
For unclear reasons the original code was trying to
do `request_body = str(payload)` when `request_body`
was no longer being used.
The query to finds and marks all unread UserMessages in the stream as read
can be quite expensive, so we'll move that work to the deferred_work
queue and split it into batches.
Fixes#15770.
In 468c5b9a58 we changed the method of
getting the list of management commands. Using app_config.path has a
caveat in that the value depends on the path from which we're executing.
An example of things breaking can be reproduced by calling
/home/vagrant/zulip/tools/test-backend TestCommandsCanStart
This makes the app_config.path values to start with /home/vagrant/zulip,
but DEPLOY_ROOT in the dev environment is set to /srv/zulip.
/home/vagrant/zulip is a soft link to /srv/zulip, so it's a valid path
to call test-backend through, but it causes self.commands to end up
being an empty list. We fix this by converting app_config.path to the
real path.
Rather than catching, checking action type, and possibly re-raising,
instead return None explicitly from `get_subject_and_body`, which
already signals for a blank success result. This collocates the logic
of the action types in one place, and removes the complexity of the
re-raise.
Sentry may get reported multiple exceptions stacks, in the case where
a `raise ...` was caught, and a new exception was `raise`d. In this
case, the `filename` is the most recent exception -- but the
exceptions are stored in the `exception` key in the order in which
they occurred. As such, taking the first value with a `stacktrace`
will result in showing the wrong line, or in no stack trace being
resolved at all.
Look from the last `exception` backwards, for matching stacks.
Since ALL_HOTSPOTS is a global object, it is initialized
at the time the backend server is started. Hence, the
title and description is translated only once. Using
ugettext_lazy makes sure that the strings are translated
in each and every request according to the language
of the user.
Fixes#16224
This commit fixes examples in "400" response for deactivating user
endpoints to have msg as "Cannot deactivate the last organization
owner" instead of "Cannot deactivate the last organization
administrator".
We had already removed the restriction on deactivating last admin
and added it for last owner, while adding owner role.
The `typing: stop` event did not have any tests in test_events
hence its documentation wasn't added. So add tests and relevant
documentation for the typing stop event. Also edit the documentation
of `typing: start` to include the fact that servers should use
their own timeout incase `stop` event event isn't received.
Fixes#16122.
We display the text of the consent message, and then continue with the
export, which will scroll the content off the screen. Allow the
administrator time to examine the contents of the message, and decide
whether to proceed based on that and the fraction of users that have
responded so far.
We raise two types of json_unauthorized when
MissingAuthenticationError is raised. Raising the one
with www_authenticate let's the client know that user needs
to be logged in to access the requested content.
Sending `www_authenticate='session'` header with the response
also stops modern web-browsers from showing a login form to the
user and let's the client handle it completely.
Structurally, this moves the handling of common authentication errors
to a single shared middleware exception handler.
Improve OpenAPI documentation of /zulip-outgoing-webhook by moving
data and making appropriate additions from its couterpart in the
/outgoing-webhook docs. Then remove the redundant documentation
from the doc and add command to render OpenAPI documetation. Also
add a test to outgoing_webhooks_interface.py to ensure that OpenAPI
documentation is correct.
Fixes#16203.
This lets the backend tests pass if zilencer has been (manually)
removed from EXTRA_INSTALLED_APPS, by skipping the tests that require
it. test-backend complains that some URLs are untested in this case:
ERROR: Some URLs are untested! Here's the list of untested URLs:
api/v1/users/me/android_gcm_reg_id
api/v1/users/me/apns_device_token
team/
Signed-off-by: Anders Kaseorg <anders@zulip.com>
It's never safe to access the mock RemoteZulipServer object; this
caused exceptions on every request in production for any server with
ZILENCER_ENABLED=False.
This commit adds automatic detection of extra output (other than
printed by testing library or tools) in stderr and stdout by code under
test test-backend when it is run with flag --ban-console-output.
It also prints the test that produced the extra console output.
Fixes: #1587.
Extracting a section for presence endpoints and using path() rather
than re_path() results in a much cleaner implementation of this
concept.
This eliminates the last case where test_openapi couldn't correctly
match an endpoint documentation with the OpenAPI definitions for it.
This renames 'group_id' to 'user_group_id' in the api docs to remove
the naming mismatch between the url config and the docs and eventually
remove the 'user_groups' endpoints from 'pending_endpoints' in
test_openapi.py.
'user_groups' endpoints are currently under 'pending_endpoints' in
test_openapi.py (even after being documented except one), due to the
'user_group_id' and 'group_id' parameter name mismatch in the
url config and the view functions.
This commit includes 'path_only=True' for 'user_group_id' parameter in
views to avoid the failure of 'test_openapi_arguments', in
test_openapi.py, which excludes the path parameters. This is a prep
commit for renaming 'group_id' to 'user_group_id' in the documentation
and removing the 'user_groups' endpoints from 'pending_endpoints'.
This queue had a race condition with creation of another Timer while
maybe_send_batched_emails is still doing its work, which may cause
two or more threads to be running maybe_send_batched_emails
at the same time, mutating the shared data simultaneously.
Another less likely potential race condition was that
maybe_send_batched_emails after sending out its email, can call
ensure_timer(). If the consume function is run simultaneously
in the main thread, it will call ensure_timer() too, which,
given unfortunate timings, might lead to both calls setting a new Timer.
We add locking to the queue to avoid such race conditions.
Tested manually, by print debugging with the following setup:
1. Making handle_missedmessage_emails sleep 2 seconds for each email,
and changed BATCH_DURATION to 1s to make the queue start working
right after launching.
2. Putting a bunch of events in the queue.
3. ./manage.py process_queue --queue_name missedmessage_emails
4. Once maybe_send_batched_emails is called and while it's processing
the events, I pushed more events to the queue. That triggers the
consume() function and ensure_timer().
Before implementing the locking mechanism, this causes two threads
to run maybe_send_batched_emails at the same time, mutating each other's
shared data, causing a traceback such as
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 1182, in run
self.function(*self.args, **self.kwargs)
File "/srv/zulip/zerver/worker/queue_processors.py", line 507, in maybe_send_batched_emails
del self.events_by_recipient[user_profile_id]
KeyError: '5'
With the locking mechanism, things get handled as expected, and
ensure_timer() exits if it can't obtain the lock due to
maybe_send_batched_emails still working.
Co-authored-by: Tim Abbott <tabbott@zulip.com>
We raise two types of json_unauthorized when
MissingAuthenticationError is raised. Raising the one
with www_authenticate let's the client know that user needs
to be logged in to access the requested content.
Sending `www_authenticate='session'` header with the response
also stops modern web-browsers from showing a login form to the
user and let's the client handle it completely.
Structurally, this moves the handling of common authentication errors
to a single shared middleware exception handler.
If you look at line number 1121 (new) of commit 14c0a387cf,
I seem to have accidently set the description for a status
200 response to "Bad Request" instead of "Success" which
is what it really is. It's basically an ugly typo (maybe
due to hastily copy-pasting the template).
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
I noticed RateLimitTests.test_hit_ratelimits fails when run as an
individual test, but never when run after other tests. That's due to the
first API request in a run of tests taking a long time, as detailed in
the comment on the change to the setUp method.
Django always sets request.user to a UserProfile or AnonymousUser
instance, so it's better to mimic that in the tests where we pass a
dummy request objects for rate limiter testing purposes.
The data is now stored in memory if things are happening inside tornado.
That aside, there is no reason for a comment on a rate_limit_user call
to talk about low level implementation details of that function.
I can find no evidence of it being possible to get an Exception when
accessing request.user or for it to be falsy. Django should always set
request.user to either a UserProfile (if logged in) or AnonymousUser
instance. Thus, this seems to be dead code that's handling cases that
can't happen.
`zproject/settings.py` itself is mostly-empty now. Adjust the
references which should now point to `zproject/computed_settings.py`
or `zproject/default_settings.py`.
`update_message_flags` events used `operation` instead of `op`, the
latter being the standard field used in other events. So add `op`
field to `update_message_flags` and mark `operation` as deprecated,
so that it can be removed later.
It's possible that this is a new name for the "due"
field, but it's not totally clear.
In the exception we saw in the field:
payload['action']['data']['old']['dueComplete'] = False
payload['action']['data']['card']['dueComplete'] = True
We remove the fixture for create_check_item, which
has been bit-rotting for as long as we have ignored
this type of card data.
Our new test is more powerful, in the sense that it
shows we successfully ignore all fixtures of this
type.
If we want to handle this, we'll just need to get
new, representative fixture data from trello.
Commit c4254497b2
curiously had get_body() round tripping its data
through json load and dump.
I have seen this done for pretty-printing reasons,
but it doesn't apply here.
And if you're doing it for validation reasons,
you only need to do half the work, as my commit
here demonstrates.
We arguably don't even need the fail-fast code
here, since our fixtures are linted to be proper
json, I believe, plus downstream code probably
gives reasonably easy-to-diagnose symptoms.
We introduce get_payload for the relatively
exceptional cases where webhooks return payloads
as dicts.
Having a simple "str" type for get_body will
allow us to extract test helpers that use
payloads from get_body() without the ugly
`Union[str, Dict[str, str]]` annotations.
I also tightened up annotations in a few places
where we now call get_payload (using Dict[str, str]
instead of Dict[str, Any]).
In the zendesk test I explicitly stringify
one of the parameters to satisfy mypy.
We tighten up the mypy types here. And then
once we know that expected_message and expected_topic
are never None, we don't have call the do_test_message
and do_test_topic helpers any more, so we eliminate
them, too.
Finally, we don't return a message, since no tests
use the message currently.
If we're not passing in expected_topic or expected_message
to check_webhook, it's better to just call send_webhook_payload,
since we'll want to explicitly check our messages
anyway.
This preps us to always require those fields for
check_webhook, which can prevent insidious testing no-ops.
This forces us to be a bit more explicit about testing
the three key values in any stream message, and it
also de-clutters the code a bit. I eventually want
to phase out do_test_topic and friends, since they
have the pitfall that you can call them and have them
do nothing, because they don't actually require
values to be be passed in.
I also clean up the code a bit for the tests that
have two new messages arriving.
Having an optional stream_name parameter makes
it confusing to read the code if you know your
webhook is sending private messages.
And then the other two callers are already
checking topics, so they might as well check
stream names, too.
We also have the two stream-oriented callers
make their own call to "subscribe". And we
future-proof this by making sure the exception
for no-message-being-sent calls out that gotcha.
Somewhat in passing, we now assert that
self.STREAM_NAME is not None in the main
helper. This is partly to satisfy mypy, but
it's also a good sanity check.
This also sets the stage for the next commit,
where I'll add an assert_stream_message helper.
Not all webhook payloads are json, so send_json_payload was a
bit misleading.
In passing I also remove "bytes" from the Union type for
"payload" parameter.
Almost all webhook tests use this helper, except a few
webhooks that write to private streams.
Being concise is important here, and the name
`self.send_and_test_stream_message` always confused
me, since it sounds you're sending a stream message,
and it leaves out the webhook piece.
We should consider renaming `send_and_test_private_message`
to something like `check_webhook_private`, but I couldn't
decide on a great name, and it's very rarely used. So
for now I just made sure the docstrings of the two
sibling functions reference each other.
The "EXPECTED_" prefix and "_EVENTS" suffix
usually provided more noise than signal.
We also use module constants to avoid the "self."
noise. It also makes it a bit more clear which
constants actually have to be in the class (e.g.
"FIXTURE_DIR_NAME") to do their job.
This function is a bad idea, as it leads to a possible situation
where you aren't actually testing anything:
def do_test_message(self, msg: Message, expected_message: Optional[str]) -> None:
if expected_message is not None:
self.assertEqual(msg.content, expected_message)
Unfortunately, it's called deep in the stack in some places, but
we can safely replace it with assertEqual here.
We had optional parameters for expected_topic and
expected_message, which are trivial to eliminate,
since the integration is really simple.
And we were doing strange things trying to reset
class variables at the end of tests. Now we just
set them explicitly in the tests.
The test helper here was taking an "expected_topic"
parameter that it just ignored, and then the
dialogflow tests were passing in expected messages
in that slot, so the actual "expected_message" var
was "None" and was ignored. So the tests weren't
testing anything.
Now we eliminate the crufty expected_topic parameter
and require an actual value for "expected_message".
I also clean up the mypy type for content_type,
and I remove the `content_type is None` check,
since all callers either pass in a str content
type or default to "application/json".
Some `<img>` tags do not have an SRC, if they are rewritten using JS
to have one later. Attempting to access `first_image['src']` on these
will raise an exception, as they have no such attribute.
Only look for images which have a defined `src` attribute on them. We
could instead check if `first_image.has_attr('src')`, but this seems
only likely to produce fewer valid images.
03ca3afbc2 added more codes that are equivalent to 404's; this adds to
the list of cache-as-None codes a couple which are equivalent to
403's. It does not comprise _all_ possible 403-like codes -- many of
them are "the client is not OK," which is relevant to log as an error
still.
Per [1], the sentry API returns frames sorted from oldest to newest.
As such, matching against the first filename that matches is most
likely not the right frame.
Match against the last frame with the guilty filename.
[1] https://develop.sentry.dev/sdk/event-payloads/stacktrace/
The original commit was broken here:
b553507412
The intention was to run the same loop for all
settings, but instead, we did a funny loop of
just resetting schema_checker, and then we only
actually tested the last value of the loop.
This commit adds "role" field to the Subscription objects passed to
clients. This is important preparation for being able to work on the
frontend for this feature.
While exporting analytics data we were using wrong table name
'zerver_analytics' in analytics config. Renamed it with
correct table name 'zerver_realm'.
Since bug https://bugs.python.org/issue3445 was resolved in Python
3.3, we can avoid the use of assigned=available_attrs(view_func) in
wraps decorator (which we were only using because we'd copied code
that handled that from Django).
Also available_attrs is now depreciated from Django 3.0 onwards.
Django 3.0 removed private Python 2 compatibility APIs
so used lru_cache() directly from functools.
We cast lru_cache to Any to avoid attr-defined error in mypy since we
are adding extra field, 'key_prefix', to this object later.
This comment stopped being true in 5686821150, and very much stopped
being relevant in dd40649e04 when the middleware entirely stopped
publishing to a queue.
This function now matches the copy in zerver/lib/actions.py.
This is the same migration as
b250e42f61c525029bd2b3bbb8f4ea93ece62072; orjson enforces that we
don't use integers as keys in JSON dictionaries.
Apparently, we were incorrectly using constants for title/description
rather than the nice non-constant values from og:title and
og:description in our meta tags.
This commit adds the is_web_public field in the AbstractAttachment
class. This is useful when validating user access to the attachment,
as otherwise we would have to make a query in the db to check if
that attachment was sent in a message in a web-public stream or not.
The new Stream administrator role is allowed to manage a stream they
administer, including:
* Setting properties like name, description, privacy and post-policy.
* Removing subscribers
* Deactivating the stream
The access_stream_for_delete_or_update is modified and is used only
to get objects from database and further checks for administrative
rights is done by check_stream_access_for_delete_or_update.
We have also added a new exception class StreamAdministratorRequired.
This commit adds role field to the Subscription class. Currently,
there are two option of roles - STREAM_ADMINISTRATOR and MEMBER.
We also add a property 'is_stream_admin' for checking whether the
user is stream admin or not.
Via API, users can now access messages which are in web-public
streams without any authentication.
If the user is not authenticated, we assume it is a web-public
query and add `streams:web-public` narrow if not already present
to the narrow. web-public streams are also directly accessible.
Any malformed narrow which is not allowed in a web-public query
results in a 400 or 401. See test_message_fetch for the allowed
queries.
django.security.DisallowedHost is only one of a set of exceptions that
are "SuspiciousOperation" exceptions; all return a 400 to the user
when they bubble up[1]; all of them are uninteresting to Sentry.
While they may, in bulk, show a mis-configuration of some sort of the
application, such a failure should be detected via the increase in
400's, not via these, which are uninteresting individually.
While all of these are subclasses of SuspiciousOperation, we enumerate
them explicitly for a number of reasons:
- There is no one logger we can ignore that captures all of them.
Each of the errors uses its own logger, and django does not supply
a `django.security` logger that all of them feed into.
- Nor can we catch this by examining the exception object. The
SuspiciousOperation exception is raised too early in the stack for
us to catch the exception by way of middleware and check
`isinstance`. But at the Sentry level, in `add_context`, it is no
longer an exception but a log entry, and as such we have no
`isinstance` that can be applied; we only know the logger name.
- Finally, there is the semantic argument that while we have decided
to ignore this set of security warnings, we _may_ wish to log new
ones that may be added at some point in the future. It is better
to opt into those ignores than to blanket ignore all messages from
the security logger.
This moves the DisallowedHost `ignore_logger` to be adjacent to its
kin, and not on the middleware that may trigger it. Consistency is
more important than locality in this case.
Of these, the DisallowedHost logger if left as the only one that is
explicitly ignored in the LOGGING configuration in
`computed_settings.py`; it is by far the most frequent, and the least
likely to be malicious or impactful (unlike, say, RequestDataTooBig).
[1] https://docs.djangoproject.com/en/3.0/ref/exceptions/#suspiciousoperation
These weren’t wrong since orjson.JSONDecodeError subclasses
json.JSONDecodeError which subclasses ValueError, but the more
specific ones express the intention more clearly.
(ujson raised ValueError directly, as did json in Python 2.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This adds 'user_id' to the simple success response for 'POST /users'
api endpoint, to make it convenient for API clients to get details
about users they just created. Appropriate changes have been made in
the docs and test_users.py.
Fixes#16072.
These escapes are valid YAML 1.2 (for JSON compatibility) but not
valid YAML 1.1, which means they don’t work with the faster
yaml.CSafeLoader that we’d like to transition to.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
datetime objects are not ordinarily JSON serializable. While both
ujson and orjson have special cases to serialize datetime objects,
they do it in different ways. So we want to fix the post-processing
code to do its job.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
It is more suited for `process_request`, since it should stop
execution of the request if the domain is invalid. This code was
likely added as a process_response (in ea39fb2556) because there was
already a process_response at the time (added 7e786d5426, and no
longer necessary since dce6b4a40f).
It quiets an unnecessary warning when logging in at a non-existent
realm.
This stops performing unnecessary work when we are going to throw it
away and return a 404. The edge case to this is if the request
_creates_ a realm, and is made using the URL of the new realm; this
change would prevent the request before it occurs. While this does
arise in tests, the tests do not reflect reality -- real requests to
/accounts/register/ are made via POST to the same (default) realm,
redirected there from `confirm-preregistrationuser`. The tests are
adjusted to reflect real behavior.
Tweaked by tabbott to add a block comment in HostDomainMiddleware.
This redirect was never effective -- because of the
HostDomainMiddleware, all requests to invalid domains have their
actual results thrown away, and replaced by an "Invalid realm" 404.
These lines are nonetheless _covered_ by coverage, because they do
run; the redirect is simply ineffective. This can be seen by the test
that was added with them, in c8edbae21c, actually testing the contents
for the invalid realm wording, not the "find your accounts" wording.
The exception trace only goes from where the exception was thrown up
to where the `logging.exception` call is; any context as to where
_that_ was called from is lost, unless `stack_info` is passed as well.
Having the stack is particularly useful for Sentry exceptions, which
gain the full stack trace.
Add `stack_info=True` on all `logging.exception` calls with a
non-trivial stack; we omit `wsgi.py`. Adjusts tests to match.
consume_time_seconds wasn't properly defined at the beginning, so when
a BaseException that isn't a subclass of Exception is thrown, the
finally: block could be entered with it still undefined.
By defaults, `requests` has no timeout on requests, which can lead to
waiting indefinitely. Add a half-second timeout on these; this is
applied _inside_ each retry, not overall -- that is, with retries any
of these functions may take a total of 1.5s.
Use the `no_proxy` proxy, which explicitly disables proxy usage for
particular hosts. This is a slightly cleaner solution than ignoring
all of the environment, as removing proxies is specifically what we
are attempting to accomplish.
The change in #2764 provided a better error message on one of the
three calls into Tornado, but left the other two with the old error
message. `raise_for_status` was used on two out of three.
Use a custom HTTPAdapter to apply this pattern to all requests from
Django to Tornado.
In f8bcf39014, we fixed buggy
marshalling of Streams and similar data structures where we were
including the Stream object rather than its ID in dictionaries passed
to ujson, and ujson happily wrote that large object dump into the
RealmAuditLog.extra_data field.
This commit includes a migration to fix those corrupted RealmAuditLog
entries, and because the migration loop is the same, also fixes the
format of similar RealmAuditLog entries to be in a more natural format
that doesn't weirdly nest and duplicate the "property" field.
Fixes#16066.
This requires a few redundant runtime isinstance
checks, but the extra assertions arguably make
the code more readable, and isinstance checks
are extremely negligible.
JSON keys must be strings, and orjson enforces this. Mypy didn’t
catch the mismatched type of profiles_by_user_id because it doesn’t
understand CustomProfileFieldValue.field_id.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
It doesn’t end well. Or sometimes it doesn’t end (OverflowError:
Maximum recursion level reached).
Introduced by commits ccdf52fef6 and
94d2de8b4a (#15601).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Our intent throughout the codebase is to treat email
case-insensitively.
The only codepath affected by this bug is remote_user_sso, as that's the
only one that currently passes potentially both a user_profile and
ExternalAuthDataDict when creating the ExternalAuthResult. That's why we
add a test specifically for that codepath.
To increase code reusability and reduce code redundancy, we move data
structures which occur multiple times in the OpenAPI documentation to
the `schemas` section. Note that this a pure data movement commit
without any changes to the data beyond removing over-specific
descriptions (E.g. that suggest the user group was just created).
(Future commits will use these)
Previously there was a documented_events set which provided for partial
OpenAPI documentation while documentation was still going on. But since
the documentation is complete now, remove it.
This giant commit completes basic OpenAPI documentation for all events
in Zulip's real-time events API.
Further work will be required in the near future to make
/api/get_events usable.
With many edits by tabbott for wording and correctness (especially
around which clients receive events, and their purpose).
The variant `update_message` events have this extra sender field not
present in normal update_message events; this field has no purpose, so
we remove it.
Apparently, `update_message` events unexpectedly contained what were
intended to be internal data structures about which users were
mentioned in a given message.
The bug has been present and accumulating new data structures for
years.
Fixing this should improve the performance of handling update_message
events as well as cleaning up this API's interface.
This was discovered by our automated API documentation schema checking
tooling detecting these unexpected elements in these event
definitions; that same logic should prevent future bugs like this from
being introduced in the future.
Use `ujson.loads(ujson.dumps())` wrapper on events sent for OpenAPI
testing so that all tuples are converted into arrays as tuples aren't
valid in JSON.
To make it easier to check if there is user information to be used
in the error report emails, we create a user object inside report.
Now, to check if we have the user's full name, email, etc, we just
need to do report['user']['user_full_name'] rather than check
each information one by one, because if the value of one key in
the report is different than None, all the others will be as well.
ES and TypeScript modules are strict by default and don’t need this
directive. ESLint will remind us to add it to new CommonJS files and
remove it from ES and TypeScript modules.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Per the API documentation[1], the following codes all correspond to
HTTP 404:
- `34`: **Sorry, that page does not exist.** The specified resource
was not found.
- `144`: **No status found with that ID.** The requested Tweet ID is
not found (if it existed, it was probably deleted)
- `421`: **This Tweet is no longer available.** The Tweet cannot be
retrieved. This may be for a number of reasons.
- `422`: **This Tweet is no longer available because it violated the
Twitter Rules.** The Tweet is not available in the API.
Treat all of these identically.
[1] https://developer.twitter.com/en/docs/basics/response-codes
Since the title of a merge request can often change, it shouldn't be a
part of the topic that we send the message to. Otherwise things would
get messy and confusing.
But at the same time we don't want to make this mandatory. So we add
a new boolean GET parameter that can toggle whether or not the topic
should include the MR title (`use_merge_request_title`).
Fixes#15951.
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
The S3 data export tool's upload code path uses this nice boto
callback feature for showing a progress bar, which is nice for the
management command. It's spammy/broken in production and the backend
tests, so we change percent_callback to be a parameter passed in so
that it can only be used in the contexts where it makes sense.
This reverts commit c3779338c6 (part
of #14638), which incorrectly depended on commits from the future,
with the effect of either halting the flow of entropic time in an
irresolvable temporal paradox, summoning extradimensional beings to
rain destruction on the galaxy, or failing CI.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This commit modifies the /streams endpoint so that the web-public
streams are included in the default list of streams that users
have access to.
This is part of PR #14638 that aims to allow guest users to
browse and subscribe themselves to web public streams.
Modifies filter_stream_authorization so that web-public streams are
added in the list of authorized streams that a guest user can
subscribe.
This commit is part of PR #14638 that aims to allow guest users
to browse and subscribe to web-public streams.
In this commit, we grant guest users access to stream history,
send message and common stream data of web-public streams.
This is part of PR #14638 that aims to allow guest users to
browse and subscribe to web-public streams.
This modification allows guest users to have access to web-public
streams subscribers, even if they aren't subscribed or never
subscribed to that stream.
This commit is part of PR #14638 that aims to allow guest users to
browser and subscribe to web-public streams.
Now, gather_subscriptions include web-public streams in the 3 sets
of streams that it returns, subscribed, unsubscribed and never
subscribed.
This is part of PR #14638 that aims to allow guest users to browse and
subscribe to web-public streams.
This change makes the flow more coherent by instead of checking,
in the last condition, if the user isn't authorized to access that
stream, check if they are, as it is done in the other checks. Only
if all the conditions are false, which means that the user doesn't
have access to that stream, the stream is added to the
unauthorized_streams list.
Card descriptions aren't dates, and calling prettify_date on them results in removing upper case T characters, replacing uppercase Z characters with " UTC", etc. in descriptions when they appear in Zulip.
This was pretty clearly just a copy/paste mistake (these functions are very closely parallel to the *_due_date_* functions above, which do work on dates and call prettify_date).
Also add a Draft object-to-dictionary conversion method.
The following commits will provide an API around this
model using which our clients can sync drafts across each
other (if they so wish too). As of making this commit, we
haven't finalized exactly how our clients will use this.
See https://chat.zulip.org/#narrow/stream/2-general/topic/drafts
For some of the discussion around this model and in general,
around this feature.
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
Unlike the other Python datetime to Unix timestamp conversion
function (`datetime_to_timestamp`), `datetime_to_precise_timestamp`
won't drop the microseconds.
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
The apple developer webapp consistently refers this App ID. So,
this clears any confusion that can occur.
Since python social auth only requires us to include App ID in
_AUDIENCE(a list), we do that in computed settings making it easier for
server admin and we make it much clear by having it set to
APP_ID instead of BUNDLE_ID.
Uses git release as this version 3.4.0 is not released to pypi.
This is required for removing some overriden functions of
apple auth backend class AppleAuthBackend.
With the update we also make following changes:
* Fix full name being populated as "None None".
c5c74f27dd that's included in update assigns first_name and last_name
to None when no name is provided by apple. Due to this our
code is filling return_data['full_name'] to 'None None'.
This commit fixes it by making first and last name strings empty.
* Remove decode_id_token override.
Python social auth merged the PR we sent including the changes
we made to decode_id_token function. So, now there is no
necessity for the override.
* Add _AUDIENCE setting in computed_settings.py.
`decode_id_token` is dependent on this setting.
Three reasons:
1. The sliding was disorienting.
2. The collapsing disallowed searching for other pages with Ctrl+F.
3. The collapsing mechanism wasn't accessible (not usable with the
keyboard / no ARIA tags).
Tweaked by tabbott to center the left sidebar on the selected page.
Part of #15948.
Document all events of `type`=stream i.e all `op`s. Also document some other
events.
Tweaked by tabbott to clarify some documentation details (especially
around who receives events).
The idea behind doing this is that we would rather let the code error
out rather than add to the logs. It's webhook code usually never uses
the logging module so this section of legacy code needed to be changed
or removed.
Assists PR #15942.
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
Edit the function `validate_against_openapi_schema` and add some
helper functions to allow for validation of documented events.
Also add OpenAPI response validation in `verify_action` as it is
called in a large number of `/events` tests.
This lets us test the recursion bug behavior of this logging handler
without resulting in `logging.error` output being printed to the
console in the event that the test passes.
The parameter Stream.date_created is now sent down to the clients
for both:
- client.get_streams()
- client.list_subscriptions()
API docs updated for stream and subscriptions.
Fixes#15410
Some parameters such as `to` and `topic` have been intentionally
undocumentecd hence fail request validation. So mark tests which
fail due to this accordingly.
Change the condition for allowing failed validation to the condition
that `if the test fails, response status code begins with 4`. Also
add `intentionally_undocumented` argument in `validate_request` for
allowing passing of tests which return `200` responses but fail
validation due to some intentionally undocumented feature in
OpenAPI specification.
Added order_by("id") clause in query for RealmAuditLog
for consistent output.
It was causing zerver.tests.test_audit_log.TestRealmAuditLog
to fail due to order mismatch.
Clock time checks lead to tests that nondeterministically fail when
the CI container is super slow, and there's no good reason this test
in particular needs to do that sort of test in addition to our
standard database query count check (which is already does).
Now when you are reading a single test, you can
explicitly see that the event and service handler
are tied to your bot, which is our test bot
for outgoing webhooks.
Decorating an entire test with a mock makes it
hard to ascertain where the actual mock behavior
is expected to happen, plus it clutters up
the parameter list.
In fact, we remove a dubious re-assertion here that
a mock was called. The assertion that a mock was
called was true, but it was misleading to think
the code right before it had invoked the mock.
The stream schema is used in two locations so move it to the
components section. Also the `is_default` key returned by `/streams`
is not returned by `/events`. So handle it separately.
The `Messages` schema present in `#/components/schemas` was a
combination of all keys possible in any message object used in Zulip.
Edit it so that the original `Messages` contains just the keys present
in the `message` event. Also make another schema `GetMessages` which
adds a few other keys which are received when using `GET /messages`.
The message object in the `/zulip-outgoing-webhook` has also been
modified and corrected.
The `subscriptions` has use in multiple endpoints and hence instead
of redefining it at every point move it to the the components section
for easier reuse.
We also have the caller pass in the property name for an
additional sanity check.
Note that we don't yet handle the possibility of extra_data;
that will be a subsequent commit.
Also, the stream_id fields aren't in Realm.property_types,
so we specify their types in the checker.
This a pretty big commit, but I really wanted it
to be atomic.
All realm_user/update events look the same from
the top:
_check_realm_user_update = check_events_dict(
required_keys=[
("type", equals("realm_user")),
("op", equals("update")),
("person", _check_realm_user_person),
]
)
And then we have a bunch of fields for person that
are optional, and we usually only send user_id plus
one other field, with the exception of avatar-related
events:
_check_realm_user_person = check_dict_only(
required_keys=[
# vertical formatting
("user_id", check_int),
],
optional_keys=[
("avatar_source", check_string),
("avatar_url", check_none_or(check_string)),
("avatar_url_medium", check_none_or(check_string)),
("avatar_version", check_int),
("bot_owner_id", check_int),
("custom_profile_field", _check_custom_profile_field),
("delivery_email", check_string),
("full_name", check_string),
("role", check_int_in(UserProfile.ROLE_TYPES)),
("email", check_string),
("user_id", check_int),
("timezone", check_string),
],
)
I would start the code review by just skimming the changes
to event_schema.py, to get the big picture of the complexity
here. Basically the schema is just the combined superset of
all the individual schemas that we remove from test_events.
Then I would read test_events.py.
The simplest diffs are basically of this form:
- schema_checker = check_events_dict([
- ('type', equals('realm_user')),
- ('op', equals('update')),
- ('person', check_dict_only([
- ('role', check_int_in(UserProfile.ROLE_TYPES)),
- ('user_id', check_int),
- ])),
- ])
# ...
- schema_checker('events[0]', events[0])
+ check_realm_user_update('events[0]', events[0], {'role'})
Instead of a custom schema checker, we use the "superset"
schema checker, but then we pass in the set of fields that we
expect to be there. Note that 'user_id' is always there.
So most of the heavy lifting happens in this new function
in event_schema.py:
def check_realm_user_update(
var_name: str, event: Dict[str, Any], optional_fields: Set[str],
) -> None:
_check_realm_user_update(var_name, event)
keys = set(event["person"].keys()) - {"user_id"}
assert optional_fields == keys
But we still do some more custom checks in test_events.py.
custom profile fields: check keys of custom_profile_field
def test_custom_profile_field_data_events(self) -> None:
+ self.assertEqual(
+ events[0]['person']['custom_profile_field'].keys(),
+ {"id", "value", "rendered_value"}
+ )
+ check_realm_user_update('events[0]', events[0], {"custom_profile_field"})
+ self.assertEqual(
+ events[0]['person']['custom_profile_field'].keys(),
+ {"id", "value"}
+ )
avatar fields: check more specific types, since the superset
schema has check_none_or(check_string)
def test_change_avatar_fields(self) -> None:
+ check_realm_user_update('events[0]', events[0], avatar_fields)
+ assert isinstance(events[0]['person']['avatar_url'], str)
+ assert isinstance(events[0]['person']['avatar_url_medium'], str)
+ check_realm_user_update('events[0]', events[0], avatar_fields)
+ self.assertEqual(events[0]['person']['avatar_url'], None)
+ self.assertEqual(events[0]['person']['avatar_url_medium'], None)
Also note that avatar_fields is a set of four fields that
are set in event_schema.
full name: no extra work!
def test_change_full_name(self) -> None:
- schema_checker('events[0]', events[0])
+ check_realm_user_update('events[0]', events[0], {'full_name'})
test_change_user_delivery_email_email_address_visibilty_admins:
no extra work for delivery_email
check avatar fields more directly
roles (several examples) -- actually check the specific role
def test_change_realm_authentication_methods(self) -> None:
- schema_checker('events[0]', events[0])
+ check_realm_user_update('events[0]', events[0], {'role'})
+ self.assertEqual(events[0]['person']['role'], role)
bot_owner_id: no extra work!
- change_bot_owner_checker_user('events[1]', events[1])
+ check_realm_user_update('events[1]', events[1], {"bot_owner_id"})
- change_bot_owner_checker_user('events[1]', events[1])
+ check_realm_user_update('events[1]', events[1], {"bot_owner_id"})
- change_bot_owner_checker_user('events[1]', events[1])
+ check_realm_user_update('events[1]', events[1], {"bot_owner_id"})
timezone: no extra work!
- timezone_schema_checker('events[1]', events[1])
+ check_realm_user_update('events[1]', events[1], {"email", "timezone"})