We now no longer define any schemas in test_events--all
of them are in event_schema, which helps our tooling
cross-check schemas for openapi and node tests.
It happens that whether you add a reaction or remove
a reaction, we send the exact same fields, just using
a different op code.
This sort of symmetry is actually kind of rare, as
usually "add" events have more fields, and "remove" events
might just send an id of something to remove.
Our openapi schema treats these as two seperate events,
so we are more consistent with it, and it helps our
schema-checking tooling for node fixtures, too.
Note that we now have to exempt the two events from
our openapi checks, due to the is_mirror_dummy field
in the deprecated user block. We can decide how to
handle this later--one possibility is to just add it
as an optional field on the event_schema side.
Note that we use value_type for value instead of
bool, since properties can be non-bool things
like color, which we just don't test now. We
should test them.
We more than compensate for this by checking
the actual value of the value in
check_subscription_update.
There is a legacy format where we send
singular "message_id" instead of plural
"message_ids".
Then there are different fields for "private"
and "stream" message types.
Note that we make the schema for profile_data
slightly more realistic, but it doesn't actually get
exercised by our current tests (apart from
making sure it's a dict), since we don't have
profile data for our test realm.
We also don't have the optional fields for bots,
since our tests don't exercise that, nor
delivery_email.
So we exempt realm_user_add_event from openapi
checks for now.
When we try to match the openapi specs better, we
will probably want to add a few tests to test_events.
Obviously getting good coverage for adding users
would be nice for all these scenarios:
* delivery_email matters
* bots
* realm has profile fields
This is a prep commit for supporting "presence"
events, where the key of the dictionary is some
arbitrary string like "website" but the value
of the dictionary is another dictionary itself
with keys that are more like variable names.
This also forces us to create TupleType.
We exempt this from the openapi check,
since we haven't figured out how to model
tuples in openapi with the same precision
as event_schema (and it may be impossible).
Long term we just want to stop dealing in
tuples, of course.
StringDict is a data type for representing dictionaries where
all keys and values are strings. Add this data type to data_types.py
and edit other files so that this data type is put to use and tested.
(slightly tweaked by @showell to remove a comment and shorten
a var name now that we have a proper data type)
We also make our schema in event_schema reflect this,
which in turn makes us match the already accurate
openapi spec, so we no longer need to exempt four
types of events from our sanity checks.
We might want to rename the tool to something more
general now, since we are really reconciling three
things:
- node fixtures
- event_schema checkers for test_events
- openapi specs
The way we compare python and openapi schemas is
as follows:
- first convert openapi schemas to be build
from DictType, ListType, etc. with from_opeapi
- do a diff on the schemas
Most of the new code is just having the FooType
family of classes serialize themselves with schema().
Defining types with an object hierarchy
of type classes will allow us to build
functionality that was impossible (or
really janky) with the validators.py
approach of composing functions.
Most of the changes to event_schema.py
were automated search/replaces.
This patch doesn't really yet take
advantage of the new FooType classes,
but we will use it soon to audit our
openapi specs.
Even before GDPR changes, it was strange that we displayed
users differently for fork events vs. all other events.
After GDPR, we don't even get the `username` field any
more.
So now we simply use `display_name` if available, and then
we try `nickname`.
See https://developer.atlassian.com/cloud/bitbucket/bitbucket-api-changes-gdpr/
for more context.
We were trying to share the same format string between
the two different versions of bitbucket, but this only
creates confusion, as the two versions are only close
enough to be confusing.
The format string might be the same, but the semantics
are different, as well as the eventual outputs.
For example, the {username} piece here is simple in version
2, but in version 3 we append a url to the user's name.
This commit renames 'test_message_to_self' and
'test_api_message_to_self' tests to
'test_message_to_stream_by_name' and
'test_api_message_to_stream_by_name' to depict
the actual purpose of these tests.
user_profile will be None for web_public_guests here. Hence, for
settings (of which most be inaccessible by web public guest),
which require a user_profile, we either set an empty value for
them or set them to a default value. This will help render
the frontend or extend support to our clients without breaking
a lot of code.
Tweaked by tabbott to add many comments.
These represent known errors in what the user submitted. This is
slightly complicated by UnsupportedWebhookEventType being an instance
of JsonableError.
allow_webhook_access may be true if the request allows webhook
requests, regardless of if it only used for a webhook integration.
Only actually log to the verbose webhook logger if it is explicitly a
webhook endpoint, as judged by `webhook_client_name`. This prevents
requests for `POST /api/v1/messages` from being logged to the webhook
logger if they mistakenly contain a `payload` argument.
This argument does not define if an endpoint "is a webhook"; it is set
for "/api/v1/messages", which is not really a webhook, but allows
access from webhooks.
If multiple filters match the same string, we run into an infinite
loop of converting string into urls. To fix it, we mark the matched
string as atomic after first conversion.
We raise MissingAuthenticationError now, which adds
`www_authenticate=session` header to the error response. This
stops modern web-browsers from displaying a login form everytime
a 401 response it sent to the client.
Having both of these is confusing; TORNADO_SERVER is used only when
there is one TORNADO_PORT. Its primary use is actually to be _unset_,
and signal that in-process handling is to be done.
Rename to USING_TORNADO, to parallel the existing USING_RABBITMQ, and
switch the places that used it for its contents to using
TORNADO_PORTS.
This system can't update stats while the queue is idle, without using
threads for this, but at least we ensure to update the file after
consuming an event if more than MAX_SECONDS_BEFORE_UPDATE_STATS passed
since the last update, regardless of the number of iterations done so
far.
The race condition is described in the comment block removed by this
commit. This leaves room for another, remaining race condition
that should be virtually impossible, but nevertheless it seems
worthwhile to have it documented in the code, so we put a new comment
describing it.
As a final note, this is not a new race condition,
it was hypothetically possible with the old code as well.
This mimics the backend logic for adding the data-attribute -
to know what Pygments language was used to highlight the code
block - in locally echoed messages.
New test added checks our logic for canonicalizing pygments alias
(for both frontend and backend).
Other fixtures and tests amended.
In ae58ed5a7 we decided to echo back the text, when no Pygments lexer
matching that language was found. When we do so, we must take care to
HTML escape the lang before wrapping it in a data-code-language attribute.
Tweaked by tabbott to make clear the escaping is defensive.
In development and test, we keep the Tornado port at 9993 and 9983,
respectively; this allows tests to run while a dev instance is
running.
In production, moving to port 9800 consistently removes an odd edge
case, when just one worker is on an entirely different port than if
two workers are used.
tornado.web.Application does not share any inheritance with Django at
all; it has a similar router interface, but tornado.web.Application is
not an instance of Django anything.
Refold the long lines that follow it.
While urllib3 retries all connection errors, it only retries a subset
of read errors, since not all requests are safe to retry if they are
not idempotent, and the far side may have already processed them once.
By default, the only methods that are urllib3 retries read errors on
are GET, TRACE, DELETE, OPTIONS, HEAD, and PUT. However, all of the
requests into Tornado from Django are POST requests, which limits the
effectiveness of bb754e0902.
POST requests to `/api/v1/events/internal` are safe to retry; at worst,
they will result in another event queue, which is low cost and will be
GC'd in short order.
POST requests to `/notify_tornado` are _not_ safe to retry, but this
codepath is only used if USING_RABBITMQ is False, which only occurs
during testing.
Enable retries for read errors during all POSTs to Tornado, to better
handle Tornado restarts without 500's.
Without an explicit port number, the `stdout_logfile` values for each
port are identical. Supervisor apparently decides that it will
de-conflict this by appending an arbitrary number to the end:
```
/var/log/zulip/tornado.log
/var/log/zulip/tornado.log.1
/var/log/zulip/tornado.log.10
/var/log/zulip/tornado.log.2
/var/log/zulip/tornado.log.3
/var/log/zulip/tornado.log.7
/var/log/zulip/tornado.log.8
/var/log/zulip/tornado.log.9
```
This is quite confusing, since most other files in `/var/log/zulip/`
use `.1` to mean logrotate was used. Also note that these are not all
sequential -- 4, 5, and 6 are mysteriously missing, though they were
used in previous restarts. This can make it extremely hard to debug
logs from a particular Tornado shard.
Give the logfiles a consistent name, and set them up to logrotate.
Calling `render()` in a middleware before LocaleMiddleware has run
will pick up the most-recently-set locale. This may be from the
_previous_ request, since the current language is thread-local. This
results in the "Organization does not exist" page occasionally being
in not-English, depending on the preferences of the request which that
thread just finished serving.
Move HostDomainMiddleware below LocaleMiddleware; none of the earlier
middlewares call `render()`, so are safe. This will also allow the
"Organization does not exist" page to be localized based on the user's
browser preferences.
Unfortunately, it also means that the default LocaleMiddleware catches
the 404 from the HostDomainMiddlware and helpfully tries to check if
the failure is because the URL lacks a language component (e.g.
`/en/`) by turning it into a 304 to that new URL. We must subclass
the default LocaleMiddleware to remove this unwanted functionality.
Doing so exposes a two places in tests that relied (directly or
indirectly) upon the redirection: '/confirmation_key'
was redirected to '/en/confirmation_key', since the non-i18n version
did not exist; and requests to `/stats/realm/not_existing_realm/`
incorrectly were expecting a 302, not a 404.
This regression likely came in during f00ff1ef62, since prior to
that, the HostDomainMiddleware ran _after_ the rest of the request had
completed.
This commit moves docs for users/{user_id}/subscriptions/{stream_id}
enndpoint to be after users/me/subscriptions/muted_topics docs.
We are rearranging the docs because after adding the new patch
endpoint for users/{user_id}/subscriptions/{stream_id}, openapi_core
validator tries to match 'users/me/subscriptions/muted_topics'
with 'users/{user_id}/subscriptions/{stream_id}' path in zulip.yaml
and thus gives error while running tests.
This is a bug in 'openapi_core' as it does not follows OpenAPI specs
to match concrete paths before their templated counterparts. Thus,
this commit rearranges the docs such that openapi_core validator
tries to match muted_topics endpoint with the correct path in
zulip.yaml docs.
When converting fenced code markdown, we add the language (if specified)
in a data-attribute by tweaking the HTML generated. Doing so, allows the
frontend to make use of this attr to display view-in-playground option
for codeblocks.
We use pygments to get the lexer subclass name and use that instead of
directly using the language in the data-attribute. Doing so, helps us
map different language aliases (like `js` and `javascript`) into a common
variable (like `JavaScript`) - and avoids the client from dealing with
multiple tags corresponding to the same language.
The html structure for a message like this:
``` js
..content..
```
would now be:
<div class="codehilite" data-codehilite-language="JavaScript">
<pre>..content..</pre>
</div>
Tests and fixtures amended.
This was a broken abstraction that returned to its caller within
multiple forked processes on exceptions, and encouraged ignoring the
error code (as all of its callers did).
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Fixes#16284.
Most of the work for this was done when we implemented correct
behavior for guest users, since they treat public streams like private
streams anyway.
The general method involves moving the messages to the new stream with
special care of UserMessage.
We delete UserMessages for subs who are losing access to the message.
For private streams with protected history, we also create UserMessage
elements for users who are not present in the old stream, since that's
important for those users to access the moved messages.
Previously, S3UploadBackend.delete_export_tarball failed to strip the
leading ‘/’ from the export path. This mistake is now caught by Moto
1.3.15. I expect it caused deletion failures in the real S3, although
I haven’t verified this.
We store export_path in the audit log with a leading ‘/’, but the
actual S3 keys do not have a leading ‘/’. Changing either system
would require a migration. So the new convention is that the
variables named ‘export_path’ have a leading ‘/’, while variables
named ‘path_id’ or ‘key’ do not.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Previously, the GitLab webhook code, namely the `get_objects_assignee`
method first tried to get a single assignee and if that failed then it
looks for multiple assignees and then it would return the first
assignee that it found (there's actually a code smell here - a loop
which would always return on the first iteration).
Instead, this commit will change that behavior to first check for
multiple assignees first then for a single assignee if we can't find
multiple assignees. Ultimately it will return a list of all of the
assignees (however many that might be [0, n]). This method has then
aptly been renamed to `get_assignees`.
Finally, we tweked the code using this method to always use it's
output as an "assignees" parameter to templates (there's also an
assignee parameter which we want to avoid here for consistency).
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
For some reasons, some of the fixtures had the +x bit set, while
some didn't. What this commit does is make sure that no fixture
is marked as "executable" (for anyone).
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
The previous code only worked by accident and hyperlink 20.0.0 breaks
it.
>>> hyperlink.parse("example.com").replace(scheme="https")
DecodedURL(url=URL.from_text('https:example.com'))
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Django treats path("<name>") like re_path(r"(?P<name>[^/]+)") and
path("<path:name>") like re_path(r"(?P<name>.+)").
This is more readable and consistent than the mix of slightly
different regexes we had before, and fixes various bugs:
• The r'apps/(.*)$' regex was missing a start anchor ^, so it
incorrectly matched all URLs that included apps/ as a substring
anywhere.
• The r'accounts/login/(google)/$' regex was missing a start anchor ^,
so it incorrectly matched all URLs that ended with
accounts/login/google/.
• The type annotation of zerver.views.realm_export.delete_realm_export
takes export_id as an int, but it was previously passed as a string.
• The type annotation of zerver.views.users.avatar takes medium as a
bool, but it was previously passed as a string.
• The [0-9A-Za-z]+ pattern for uidb64 was missing the - and _
characters that can validly be part of a base64url encoded
string (although I think the id is actually a decimal integer here,
in which case only 012345ADEIMNOQTUYcgjkwxyz are present in its
base64url encoding).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
$ref siblings are ignored according to the OpenAPI specification, and
the referenced definitions already have examples.
Signed-off-by: Anders Kaseorg <anders@zulip.com>