This low-level interface allows consuming from a queue with timeouts.
This can be used to either consume in batches (with an upper timeout),
or one-at-a-time. This is notably more performant than calling
`.get()` repeatedly (what json_drain_queue does under the hood), which
is "*highly discouraged* as it is *very inefficient*"[1].
Before this change:
```
$ ./manage.py queue_rate --count 10000 --batch
Purging queue...
Enqueue rate: 11158 / sec
Dequeue rate: 3075 / sec
```
After:
```
$ ./manage.py queue_rate --count 10000 --batch
Purging queue...
Enqueue rate: 11511 / sec
Dequeue rate: 19938 / sec
```
[1] https://www.rabbitmq.com/consumers.html#fetching
Despite its name, the `queue_size` method does not return the number
of items in the queue; it returns the number of items that the local
consumer has delivered but unprocessed. These are often, but not
always, the same.
RabbitMQ's queues maintain the queue of unacknowledged messages; when
a consumer connects, it sends to the consumer some number of messages
to handle, known as the "prefetch." This is a performance
optimization, to ensure the consumer code does not need to wait for a
network round-trip before having new data to consume.
The default prefetch is 0, which means that RabbitMQ immediately dumps
all outstanding messages to the consumer, which slowly processes and
acknowledges them. If a second consumer were to connect to the same
queue, they would receive no messages to process, as the first
consumer has already been allocated them. If the first consumer
disconnects or crashes, all prior events sent to it are then made
available for other consumers on the queue.
The consumer does not know the total size of the queue -- merely how
many messages it has been handed.
No change is made to the prefetch here; however, future changes may
wish to limit the prefetch, either for memory-saving, or to allow
multiple consumers to work the same queue.
Rename the method to make clear that it only contains information
about the local queue in the consumer, not the full RabbitMQ queue.
Also include the waiting message count, which is used by the
`consume()` iterator for similar purpose to the pending events list.
We modify access_stream_for_delete_or_update function to return
Subscription object also along with stream. This change will be
helpful in avoiding an extra query to get subscription object in
code for updating subscription role.
For streams in which only full members are allowed to post,
we block guest users from posting there.
Guests users were blocked from posting to admin only streams
already. So now, guest users can only post to
STREAM_POST_POLICY_EVERYONE streams.
This is not a new feature but a bugfix which should have
happened when implementing full member stream policy / guest users.
Replaced ImageOps.fit by ImageOps.pad, in zerver/lib/upload.py, which
returns a sized and padded version of the image, expanded to fill the
requested aspect ratio and size.
Fixes part of #16370.
Currently, drain_queue and json_drain_queue ack every message as it is
pulled off of the queue, until the queue is empty. This means that if
the consumer crashes between pulling a batch of messages off the
queue, and actually processing them, those messages will be
permanently lost. Sending an ACK on every message also results in a
significant amount lot of traffic to rabbitmq, with notable
performance implications.
Send a singular ACK after the processing has completed, by making
`drain_queue` into a contextmanager. Additionally, use the `multiple`
flag to ACK all of the messages at once -- or explicitly NACK the
messages if processing failed. Sending a NACK will re-queue them at
the front of the queue.
Performance of a no-op dequeue before this change:
```
$ ./manage.py queue_rate --count 50000 --batch
Purging queue...
Enqueue rate: 10847 / sec
Dequeue rate: 2479 / sec
```
Performance of a no-op dequeue after this change (a 25% increase):
```
$ ./manage.py queue_rate --count 50000 --batch
Purging queue...
Enqueue rate: 10752 / sec
Dequeue rate: 3079 / sec
```
Part of #16094.
Moved the language selection preference logic from home.py to a new
function in i18n.py to avoid repetition in analytics views and home
views.
This prevents the memcached connection from being shared across
multiple processes, and hopefully addresses unexpected behavior from
cached functions like get_user_profile_by_id invoked inside the worker
processes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
We call build_message_send_dict from check_message instead of
do_send_messages.
This is a prep commit for adding a new setting for handling
wildcard mentions in large streams.
We extract the loop for building message dict in
do_send_messages in a separate function named
build_message_send_dict.
This is a prep commit for moving the code for building
of message dict in check_message.
There is a bug where we send event for even
those messages which do not have embedded links
as we are using single set 'links_for_embed' to
check whether we have to send event for
embedded links or not.
This commit fixes the bug by adding 'links_for_embed'
in message dict itself and send the event only
if that message has embedded links.
This commit removes the unnecessary comment which was added in
9454683108, when we were using message.get() for keys which
were also passed as args in do_send_messages, but there are no
such keys in the current code.
This commit removes the unnecessary line of code to get
rendered_content from message dict sent by check_message
when it actually does not inlcude 'rendered_content' key.
This line was added in 9454683108, but now we do not send
rendered_content in the message dict as we render the message
in do_send_messages itself.
A later commit alters `authenticate` of EmailAuthBackend to
add a store `needs_to_change_password` variable to session
which is useful to insist users on changing their weak password.
The tests start failing with that change because client.login()
runs `authenticate` without a `request` object. So, this commit
sends a request object with `request.session=self.client.session`
to self.client.login() in tests wherever needed.
We now no longer define any schemas in test_events--all
of them are in event_schema, which helps our tooling
cross-check schemas for openapi and node tests.
It happens that whether you add a reaction or remove
a reaction, we send the exact same fields, just using
a different op code.
This sort of symmetry is actually kind of rare, as
usually "add" events have more fields, and "remove" events
might just send an id of something to remove.
Our openapi schema treats these as two seperate events,
so we are more consistent with it, and it helps our
schema-checking tooling for node fixtures, too.
Note that we now have to exempt the two events from
our openapi checks, due to the is_mirror_dummy field
in the deprecated user block. We can decide how to
handle this later--one possibility is to just add it
as an optional field on the event_schema side.
Note that we use value_type for value instead of
bool, since properties can be non-bool things
like color, which we just don't test now. We
should test them.
We more than compensate for this by checking
the actual value of the value in
check_subscription_update.
There is a legacy format where we send
singular "message_id" instead of plural
"message_ids".
Then there are different fields for "private"
and "stream" message types.
Note that we make the schema for profile_data
slightly more realistic, but it doesn't actually get
exercised by our current tests (apart from
making sure it's a dict), since we don't have
profile data for our test realm.
We also don't have the optional fields for bots,
since our tests don't exercise that, nor
delivery_email.
So we exempt realm_user_add_event from openapi
checks for now.
When we try to match the openapi specs better, we
will probably want to add a few tests to test_events.
Obviously getting good coverage for adding users
would be nice for all these scenarios:
* delivery_email matters
* bots
* realm has profile fields
This is a prep commit for supporting "presence"
events, where the key of the dictionary is some
arbitrary string like "website" but the value
of the dictionary is another dictionary itself
with keys that are more like variable names.
This also forces us to create TupleType.
We exempt this from the openapi check,
since we haven't figured out how to model
tuples in openapi with the same precision
as event_schema (and it may be impossible).
Long term we just want to stop dealing in
tuples, of course.
StringDict is a data type for representing dictionaries where
all keys and values are strings. Add this data type to data_types.py
and edit other files so that this data type is put to use and tested.
(slightly tweaked by @showell to remove a comment and shorten
a var name now that we have a proper data type)
We also make our schema in event_schema reflect this,
which in turn makes us match the already accurate
openapi spec, so we no longer need to exempt four
types of events from our sanity checks.
We might want to rename the tool to something more
general now, since we are really reconciling three
things:
- node fixtures
- event_schema checkers for test_events
- openapi specs
The way we compare python and openapi schemas is
as follows:
- first convert openapi schemas to be build
from DictType, ListType, etc. with from_opeapi
- do a diff on the schemas
Most of the new code is just having the FooType
family of classes serialize themselves with schema().
Defining types with an object hierarchy
of type classes will allow us to build
functionality that was impossible (or
really janky) with the validators.py
approach of composing functions.
Most of the changes to event_schema.py
were automated search/replaces.
This patch doesn't really yet take
advantage of the new FooType classes,
but we will use it soon to audit our
openapi specs.
user_profile will be None for web_public_guests here. Hence, for
settings (of which most be inaccessible by web public guest),
which require a user_profile, we either set an empty value for
them or set them to a default value. This will help render
the frontend or extend support to our clients without breaking
a lot of code.
Tweaked by tabbott to add many comments.
This argument does not define if an endpoint "is a webhook"; it is set
for "/api/v1/messages", which is not really a webhook, but allows
access from webhooks.
If multiple filters match the same string, we run into an infinite
loop of converting string into urls. To fix it, we mark the matched
string as atomic after first conversion.
In ae58ed5a7 we decided to echo back the text, when no Pygments lexer
matching that language was found. When we do so, we must take care to
HTML escape the lang before wrapping it in a data-code-language attribute.
Tweaked by tabbott to make clear the escaping is defensive.
Calling `render()` in a middleware before LocaleMiddleware has run
will pick up the most-recently-set locale. This may be from the
_previous_ request, since the current language is thread-local. This
results in the "Organization does not exist" page occasionally being
in not-English, depending on the preferences of the request which that
thread just finished serving.
Move HostDomainMiddleware below LocaleMiddleware; none of the earlier
middlewares call `render()`, so are safe. This will also allow the
"Organization does not exist" page to be localized based on the user's
browser preferences.
Unfortunately, it also means that the default LocaleMiddleware catches
the 404 from the HostDomainMiddlware and helpfully tries to check if
the failure is because the URL lacks a language component (e.g.
`/en/`) by turning it into a 304 to that new URL. We must subclass
the default LocaleMiddleware to remove this unwanted functionality.
Doing so exposes a two places in tests that relied (directly or
indirectly) upon the redirection: '/confirmation_key'
was redirected to '/en/confirmation_key', since the non-i18n version
did not exist; and requests to `/stats/realm/not_existing_realm/`
incorrectly were expecting a 302, not a 404.
This regression likely came in during f00ff1ef62, since prior to
that, the HostDomainMiddleware ran _after_ the rest of the request had
completed.
When converting fenced code markdown, we add the language (if specified)
in a data-attribute by tweaking the HTML generated. Doing so, allows the
frontend to make use of this attr to display view-in-playground option
for codeblocks.
We use pygments to get the lexer subclass name and use that instead of
directly using the language in the data-attribute. Doing so, helps us
map different language aliases (like `js` and `javascript`) into a common
variable (like `JavaScript`) - and avoids the client from dealing with
multiple tags corresponding to the same language.
The html structure for a message like this:
``` js
..content..
```
would now be:
<div class="codehilite" data-codehilite-language="JavaScript">
<pre>..content..</pre>
</div>
Tests and fixtures amended.
This was a broken abstraction that returned to its caller within
multiple forked processes on exceptions, and encouraged ignoring the
error code (as all of its callers did).
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Fixes#16284.
Most of the work for this was done when we implemented correct
behavior for guest users, since they treat public streams like private
streams anyway.
The general method involves moving the messages to the new stream with
special care of UserMessage.
We delete UserMessages for subs who are losing access to the message.
For private streams with protected history, we also create UserMessage
elements for users who are not present in the old stream, since that's
important for those users to access the moved messages.
Previously, S3UploadBackend.delete_export_tarball failed to strip the
leading ‘/’ from the export path. This mistake is now caught by Moto
1.3.15. I expect it caused deletion failures in the real S3, although
I haven’t verified this.
We store export_path in the audit log with a leading ‘/’, but the
actual S3 keys do not have a leading ‘/’. Changing either system
would require a migration. So the new convention is that the
variables named ‘export_path’ have a leading ‘/’, while variables
named ‘path_id’ or ‘key’ do not.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The previous code only worked by accident and hyperlink 20.0.0 breaks
it.
>>> hyperlink.parse("example.com").replace(scheme="https")
DecodedURL(url=URL.from_text('https:example.com'))
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Django treats path("<name>") like re_path(r"(?P<name>[^/]+)") and
path("<path:name>") like re_path(r"(?P<name>.+)").
This is more readable and consistent than the mix of slightly
different regexes we had before, and fixes various bugs:
• The r'apps/(.*)$' regex was missing a start anchor ^, so it
incorrectly matched all URLs that included apps/ as a substring
anywhere.
• The r'accounts/login/(google)/$' regex was missing a start anchor ^,
so it incorrectly matched all URLs that ended with
accounts/login/google/.
• The type annotation of zerver.views.realm_export.delete_realm_export
takes export_id as an int, but it was previously passed as a string.
• The type annotation of zerver.views.users.avatar takes medium as a
bool, but it was previously passed as a string.
• The [0-9A-Za-z]+ pattern for uidb64 was missing the - and _
characters that can validly be part of a base64url encoded
string (although I think the id is actually a decimal integer here,
in which case only 012345ADEIMNOQTUYcgjkwxyz are present in its
base64url encoding).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This clears it out of the data sent to Sentry, where it is duplicative
with the indexed metadata -- and potentially exposes PHI if Sentry's
"make this issue public" feature is used.
Any exception is an "unexpected event", which means talking about
having an "unexpected event logger" or "unexpected event exception" is
confusing. As the error message in `exceptions.py` already explains,
this is about an _unsupported_ event type.
This also switches the path that these exceptions are written to,
accordingly.
8e10ab282a moved UnexpectedWebhookEventType into
`zerver.lib.exceptions`, but left the import into
`zserver.lib.webhooks.common` so that webhooks could continue to
import the exception from there.
This clutters things and adds complexity; there is no compelling
reason that the exception's source of truth should not move alongside
all other exceptions.
The main race conditions, which actually happened in production was with
concurrent execution of deliver_email and clear_scheduled_emails.
clear_scheduled_emails could delete all email.users in the middle of
deliver_email execution, causing it to pass empty to_user_ids list to
send_email. We mitigate this by getting the list of user ids in a single
query and moving forward with that snapshot, not having to worry about
database data being mutated anymore.
clear_scheduled_emails had potential race conditions with concurrent
execution of itself due to not locking the appropriate rows upon
selecting them for the purpose of potentially deleting them. FOR UPDATE
locks need to be acquired to prevent simultaneous mutation.
Tested manually with some print+sleep debugging to make some races
happen.
fixes #zulip-2k (sentry)
There are three functional side effects:
• Correct an insignificant but mathematically offensive bias toward
repeated characters in generate_api_key introduced in commit
47b4283c4b4c70ecde4d3c8de871c90ee2506d87; its entropy is increased
from 190.52864 bits to 190.53428 bits.
• Use the base32 alphabet in confirmation.models.generate_key; its
entropy is reduced from 124.07820 bits to the documented 120 bits, but
now it uses 1 syscall instead of 24.
• Use the base32 alphabet in get_bigbluebutton_url; its entropy is
reduced from 51.69925 bits to 50 bits, but now it uses 1 syscall
instead of 10.
(The base32 alphabet is A-Z 2-7. We could probably replace all of
these with plain secrets.token_urlsafe, since I expect most callers
can handle the full urlsafe_b64 alphabet A-Z a-z 0-9 - _ without
problems.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
For web-public streams, clients can access full topic history
without being authenticated. They only need to additionally
send "streams:web-public" narrow with their request like all
the other web-public queries.
When user requests for a realm that doesn't exists, we raise
a InvalidSubdomainError.
This reduces our effort at repeatedly ensuring realm is valid
in request in web-public queries.
Our isort configuration was almost Black-compatible, but we were
missing ensure_newline_before_comments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Some users setup zulip with trailing / at end, like 'https://meet.jit.si/
leading to extra / on clients while generating video chat link.
This commit removes trailing '/' if it exists to make it consistent. Manual
testing was done by generating jitsi url.
Fixes#16225
The query to finds and marks all unread UserMessages in the stream as read
can be quite expensive, so we'll move that work to the deferred_work
queue and split it into batches.
Fixes#15770.
Since ALL_HOTSPOTS is a global object, it is initialized
at the time the backend server is started. Hence, the
title and description is translated only once. Using
ugettext_lazy makes sure that the strings are translated
in each and every request according to the language
of the user.
Fixes#16224
The `typing: stop` event did not have any tests in test_events
hence its documentation wasn't added. So add tests and relevant
documentation for the typing stop event. Also edit the documentation
of `typing: start` to include the fact that servers should use
their own timeout incase `stop` event event isn't received.
Fixes#16122.
We raise two types of json_unauthorized when
MissingAuthenticationError is raised. Raising the one
with www_authenticate let's the client know that user needs
to be logged in to access the requested content.
Sending `www_authenticate='session'` header with the response
also stops modern web-browsers from showing a login form to the
user and let's the client handle it completely.
Structurally, this moves the handling of common authentication errors
to a single shared middleware exception handler.
This lets the backend tests pass if zilencer has been (manually)
removed from EXTRA_INSTALLED_APPS, by skipping the tests that require
it. test-backend complains that some URLs are untested in this case:
ERROR: Some URLs are untested! Here's the list of untested URLs:
api/v1/users/me/android_gcm_reg_id
api/v1/users/me/apns_device_token
team/
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This commit adds automatic detection of extra output (other than
printed by testing library or tools) in stderr and stdout by code under
test test-backend when it is run with flag --ban-console-output.
It also prints the test that produced the extra console output.
Fixes: #1587.
We raise two types of json_unauthorized when
MissingAuthenticationError is raised. Raising the one
with www_authenticate let's the client know that user needs
to be logged in to access the requested content.
Sending `www_authenticate='session'` header with the response
also stops modern web-browsers from showing a login form to the
user and let's the client handle it completely.
Structurally, this moves the handling of common authentication errors
to a single shared middleware exception handler.
`zproject/settings.py` itself is mostly-empty now. Adjust the
references which should now point to `zproject/computed_settings.py`
or `zproject/default_settings.py`.
`update_message_flags` events used `operation` instead of `op`, the
latter being the standard field used in other events. So add `op`
field to `update_message_flags` and mark `operation` as deprecated,
so that it can be removed later.
Commit c4254497b2
curiously had get_body() round tripping its data
through json load and dump.
I have seen this done for pretty-printing reasons,
but it doesn't apply here.
And if you're doing it for validation reasons,
you only need to do half the work, as my commit
here demonstrates.
We arguably don't even need the fail-fast code
here, since our fixtures are linted to be proper
json, I believe, plus downstream code probably
gives reasonably easy-to-diagnose symptoms.
We introduce get_payload for the relatively
exceptional cases where webhooks return payloads
as dicts.
Having a simple "str" type for get_body will
allow us to extract test helpers that use
payloads from get_body() without the ugly
`Union[str, Dict[str, str]]` annotations.
I also tightened up annotations in a few places
where we now call get_payload (using Dict[str, str]
instead of Dict[str, Any]).
In the zendesk test I explicitly stringify
one of the parameters to satisfy mypy.
We tighten up the mypy types here. And then
once we know that expected_message and expected_topic
are never None, we don't have call the do_test_message
and do_test_topic helpers any more, so we eliminate
them, too.
Finally, we don't return a message, since no tests
use the message currently.
This forces us to be a bit more explicit about testing
the three key values in any stream message, and it
also de-clutters the code a bit. I eventually want
to phase out do_test_topic and friends, since they
have the pitfall that you can call them and have them
do nothing, because they don't actually require
values to be be passed in.
I also clean up the code a bit for the tests that
have two new messages arriving.
Having an optional stream_name parameter makes
it confusing to read the code if you know your
webhook is sending private messages.
And then the other two callers are already
checking topics, so they might as well check
stream names, too.
We also have the two stream-oriented callers
make their own call to "subscribe". And we
future-proof this by making sure the exception
for no-message-being-sent calls out that gotcha.
Somewhat in passing, we now assert that
self.STREAM_NAME is not None in the main
helper. This is partly to satisfy mypy, but
it's also a good sanity check.
This also sets the stage for the next commit,
where I'll add an assert_stream_message helper.
Not all webhook payloads are json, so send_json_payload was a
bit misleading.
In passing I also remove "bytes" from the Union type for
"payload" parameter.
Almost all webhook tests use this helper, except a few
webhooks that write to private streams.
Being concise is important here, and the name
`self.send_and_test_stream_message` always confused
me, since it sounds you're sending a stream message,
and it leaves out the webhook piece.
We should consider renaming `send_and_test_private_message`
to something like `check_webhook_private`, but I couldn't
decide on a great name, and it's very rarely used. So
for now I just made sure the docstrings of the two
sibling functions reference each other.
This function is a bad idea, as it leads to a possible situation
where you aren't actually testing anything:
def do_test_message(self, msg: Message, expected_message: Optional[str]) -> None:
if expected_message is not None:
self.assertEqual(msg.content, expected_message)
Unfortunately, it's called deep in the stack in some places, but
we can safely replace it with assertEqual here.
The test helper here was taking an "expected_topic"
parameter that it just ignored, and then the
dialogflow tests were passing in expected messages
in that slot, so the actual "expected_message" var
was "None" and was ignored. So the tests weren't
testing anything.
Now we eliminate the crufty expected_topic parameter
and require an actual value for "expected_message".
I also clean up the mypy type for content_type,
and I remove the `content_type is None` check,
since all callers either pass in a str content
type or default to "application/json".
Some `<img>` tags do not have an SRC, if they are rewritten using JS
to have one later. Attempting to access `first_image['src']` on these
will raise an exception, as they have no such attribute.
Only look for images which have a defined `src` attribute on them. We
could instead check if `first_image.has_attr('src')`, but this seems
only likely to produce fewer valid images.
03ca3afbc2 added more codes that are equivalent to 404's; this adds to
the list of cache-as-None codes a couple which are equivalent to
403's. It does not comprise _all_ possible 403-like codes -- many of
them are "the client is not OK," which is relevant to log as an error
still.
This commit adds "role" field to the Subscription objects passed to
clients. This is important preparation for being able to work on the
frontend for this feature.
While exporting analytics data we were using wrong table name
'zerver_analytics' in analytics config. Renamed it with
correct table name 'zerver_realm'.
Django 3.0 removed private Python 2 compatibility APIs
so used lru_cache() directly from functools.
We cast lru_cache to Any to avoid attr-defined error in mypy since we
are adding extra field, 'key_prefix', to this object later.
This comment stopped being true in 5686821150, and very much stopped
being relevant in dd40649e04 when the middleware entirely stopped
publishing to a queue.
This commit adds the is_web_public field in the AbstractAttachment
class. This is useful when validating user access to the attachment,
as otherwise we would have to make a query in the db to check if
that attachment was sent in a message in a web-public stream or not.
The new Stream administrator role is allowed to manage a stream they
administer, including:
* Setting properties like name, description, privacy and post-policy.
* Removing subscribers
* Deactivating the stream
The access_stream_for_delete_or_update is modified and is used only
to get objects from database and further checks for administrative
rights is done by check_stream_access_for_delete_or_update.
We have also added a new exception class StreamAdministratorRequired.
Via API, users can now access messages which are in web-public
streams without any authentication.
If the user is not authenticated, we assume it is a web-public
query and add `streams:web-public` narrow if not already present
to the narrow. web-public streams are also directly accessible.
Any malformed narrow which is not allowed in a web-public query
results in a 400 or 401. See test_message_fetch for the allowed
queries.
These weren’t wrong since orjson.JSONDecodeError subclasses
json.JSONDecodeError which subclasses ValueError, but the more
specific ones express the intention more clearly.
(ujson raised ValueError directly, as did json in Python 2.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
datetime objects are not ordinarily JSON serializable. While both
ujson and orjson have special cases to serialize datetime objects,
they do it in different ways. So we want to fix the post-processing
code to do its job.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The exception trace only goes from where the exception was thrown up
to where the `logging.exception` call is; any context as to where
_that_ was called from is lost, unless `stack_info` is passed as well.
Having the stack is particularly useful for Sentry exceptions, which
gain the full stack trace.
Add `stack_info=True` on all `logging.exception` calls with a
non-trivial stack; we omit `wsgi.py`. Adjusts tests to match.
In f8bcf39014, we fixed buggy
marshalling of Streams and similar data structures where we were
including the Stream object rather than its ID in dictionaries passed
to ujson, and ujson happily wrote that large object dump into the
RealmAuditLog.extra_data field.
This commit includes a migration to fix those corrupted RealmAuditLog
entries, and because the migration loop is the same, also fixes the
format of similar RealmAuditLog entries to be in a more natural format
that doesn't weirdly nest and duplicate the "property" field.
Fixes#16066.
This requires a few redundant runtime isinstance
checks, but the extra assertions arguably make
the code more readable, and isinstance checks
are extremely negligible.
JSON keys must be strings, and orjson enforces this. Mypy didn’t
catch the mismatched type of profiles_by_user_id because it doesn’t
understand CustomProfileFieldValue.field_id.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
It doesn’t end well. Or sometimes it doesn’t end (OverflowError:
Maximum recursion level reached).
Introduced by commits ccdf52fef6 and
94d2de8b4a (#15601).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The variant `update_message` events have this extra sender field not
present in normal update_message events; this field has no purpose, so
we remove it.
Apparently, `update_message` events unexpectedly contained what were
intended to be internal data structures about which users were
mentioned in a given message.
The bug has been present and accumulating new data structures for
years.
Fixing this should improve the performance of handling update_message
events as well as cleaning up this API's interface.
This was discovered by our automated API documentation schema checking
tooling detecting these unexpected elements in these event
definitions; that same logic should prevent future bugs like this from
being introduced in the future.
To make it easier to check if there is user information to be used
in the error report emails, we create a user object inside report.
Now, to check if we have the user's full name, email, etc, we just
need to do report['user']['user_full_name'] rather than check
each information one by one, because if the value of one key in
the report is different than None, all the others will be as well.
Per the API documentation[1], the following codes all correspond to
HTTP 404:
- `34`: **Sorry, that page does not exist.** The specified resource
was not found.
- `144`: **No status found with that ID.** The requested Tweet ID is
not found (if it existed, it was probably deleted)
- `421`: **This Tweet is no longer available.** The Tweet cannot be
retrieved. This may be for a number of reasons.
- `422`: **This Tweet is no longer available because it violated the
Twitter Rules.** The Tweet is not available in the API.
Treat all of these identically.
[1] https://developer.twitter.com/en/docs/basics/response-codes
The S3 data export tool's upload code path uses this nice boto
callback feature for showing a progress bar, which is nice for the
management command. It's spammy/broken in production and the backend
tests, so we change percent_callback to be a parameter passed in so
that it can only be used in the contexts where it makes sense.
This reverts commit c3779338c6 (part
of #14638), which incorrectly depended on commits from the future,
with the effect of either halting the flow of entropic time in an
irresolvable temporal paradox, summoning extradimensional beings to
rain destruction on the galaxy, or failing CI.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This commit modifies the /streams endpoint so that the web-public
streams are included in the default list of streams that users
have access to.
This is part of PR #14638 that aims to allow guest users to
browse and subscribe themselves to web public streams.
Modifies filter_stream_authorization so that web-public streams are
added in the list of authorized streams that a guest user can
subscribe.
This commit is part of PR #14638 that aims to allow guest users
to browse and subscribe to web-public streams.
In this commit, we grant guest users access to stream history,
send message and common stream data of web-public streams.
This is part of PR #14638 that aims to allow guest users to
browse and subscribe to web-public streams.
This modification allows guest users to have access to web-public
streams subscribers, even if they aren't subscribed or never
subscribed to that stream.
This commit is part of PR #14638 that aims to allow guest users to
browser and subscribe to web-public streams.
Now, gather_subscriptions include web-public streams in the 3 sets
of streams that it returns, subscribed, unsubscribed and never
subscribed.
This is part of PR #14638 that aims to allow guest users to browse and
subscribe to web-public streams.
This change makes the flow more coherent by instead of checking,
in the last condition, if the user isn't authorized to access that
stream, check if they are, as it is done in the other checks. Only
if all the conditions are false, which means that the user doesn't
have access to that stream, the stream is added to the
unauthorized_streams list.
Also add a Draft object-to-dictionary conversion method.
The following commits will provide an API around this
model using which our clients can sync drafts across each
other (if they so wish too). As of making this commit, we
haven't finalized exactly how our clients will use this.
See https://chat.zulip.org/#narrow/stream/2-general/topic/drafts
For some of the discussion around this model and in general,
around this feature.
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
Unlike the other Python datetime to Unix timestamp conversion
function (`datetime_to_timestamp`), `datetime_to_precise_timestamp`
won't drop the microseconds.
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
Edit the function `validate_against_openapi_schema` and add some
helper functions to allow for validation of documented events.
Also add OpenAPI response validation in `verify_action` as it is
called in a large number of `/events` tests.
The parameter Stream.date_created is now sent down to the clients
for both:
- client.get_streams()
- client.list_subscriptions()
API docs updated for stream and subscriptions.
Fixes#15410
We also have the caller pass in the property name for an
additional sanity check.
Note that we don't yet handle the possibility of extra_data;
that will be a subsequent commit.
Also, the stream_id fields aren't in Realm.property_types,
so we specify their types in the checker.
This a pretty big commit, but I really wanted it
to be atomic.
All realm_user/update events look the same from
the top:
_check_realm_user_update = check_events_dict(
required_keys=[
("type", equals("realm_user")),
("op", equals("update")),
("person", _check_realm_user_person),
]
)
And then we have a bunch of fields for person that
are optional, and we usually only send user_id plus
one other field, with the exception of avatar-related
events:
_check_realm_user_person = check_dict_only(
required_keys=[
# vertical formatting
("user_id", check_int),
],
optional_keys=[
("avatar_source", check_string),
("avatar_url", check_none_or(check_string)),
("avatar_url_medium", check_none_or(check_string)),
("avatar_version", check_int),
("bot_owner_id", check_int),
("custom_profile_field", _check_custom_profile_field),
("delivery_email", check_string),
("full_name", check_string),
("role", check_int_in(UserProfile.ROLE_TYPES)),
("email", check_string),
("user_id", check_int),
("timezone", check_string),
],
)
I would start the code review by just skimming the changes
to event_schema.py, to get the big picture of the complexity
here. Basically the schema is just the combined superset of
all the individual schemas that we remove from test_events.
Then I would read test_events.py.
The simplest diffs are basically of this form:
- schema_checker = check_events_dict([
- ('type', equals('realm_user')),
- ('op', equals('update')),
- ('person', check_dict_only([
- ('role', check_int_in(UserProfile.ROLE_TYPES)),
- ('user_id', check_int),
- ])),
- ])
# ...
- schema_checker('events[0]', events[0])
+ check_realm_user_update('events[0]', events[0], {'role'})
Instead of a custom schema checker, we use the "superset"
schema checker, but then we pass in the set of fields that we
expect to be there. Note that 'user_id' is always there.
So most of the heavy lifting happens in this new function
in event_schema.py:
def check_realm_user_update(
var_name: str, event: Dict[str, Any], optional_fields: Set[str],
) -> None:
_check_realm_user_update(var_name, event)
keys = set(event["person"].keys()) - {"user_id"}
assert optional_fields == keys
But we still do some more custom checks in test_events.py.
custom profile fields: check keys of custom_profile_field
def test_custom_profile_field_data_events(self) -> None:
+ self.assertEqual(
+ events[0]['person']['custom_profile_field'].keys(),
+ {"id", "value", "rendered_value"}
+ )
+ check_realm_user_update('events[0]', events[0], {"custom_profile_field"})
+ self.assertEqual(
+ events[0]['person']['custom_profile_field'].keys(),
+ {"id", "value"}
+ )
avatar fields: check more specific types, since the superset
schema has check_none_or(check_string)
def test_change_avatar_fields(self) -> None:
+ check_realm_user_update('events[0]', events[0], avatar_fields)
+ assert isinstance(events[0]['person']['avatar_url'], str)
+ assert isinstance(events[0]['person']['avatar_url_medium'], str)
+ check_realm_user_update('events[0]', events[0], avatar_fields)
+ self.assertEqual(events[0]['person']['avatar_url'], None)
+ self.assertEqual(events[0]['person']['avatar_url_medium'], None)
Also note that avatar_fields is a set of four fields that
are set in event_schema.
full name: no extra work!
def test_change_full_name(self) -> None:
- schema_checker('events[0]', events[0])
+ check_realm_user_update('events[0]', events[0], {'full_name'})
test_change_user_delivery_email_email_address_visibilty_admins:
no extra work for delivery_email
check avatar fields more directly
roles (several examples) -- actually check the specific role
def test_change_realm_authentication_methods(self) -> None:
- schema_checker('events[0]', events[0])
+ check_realm_user_update('events[0]', events[0], {'role'})
+ self.assertEqual(events[0]['person']['role'], role)
bot_owner_id: no extra work!
- change_bot_owner_checker_user('events[1]', events[1])
+ check_realm_user_update('events[1]', events[1], {"bot_owner_id"})
- change_bot_owner_checker_user('events[1]', events[1])
+ check_realm_user_update('events[1]', events[1], {"bot_owner_id"})
- change_bot_owner_checker_user('events[1]', events[1])
+ check_realm_user_update('events[1]', events[1], {"bot_owner_id"})
timezone: no extra work!
- timezone_schema_checker('events[1]', events[1])
+ check_realm_user_update('events[1]', events[1], {"email", "timezone"})
Obviously, this file will soon grow--this
was the easiest way to start without introducing
noise into other commits.
It will soon be structurally similar
to frontend_tests/node_tests/lib/events.js--I
have some ideas there. But this should also
help for things like API docs.
This change makes our handling of youtube-url previews consistent
with how we handle our inline images. This allows the previews to
render next to the paragraph that links to the youtube video.
Follow-up to PR #15773.
This commit rewrites the way addresses are collected. If
the header with the address is not an AddressHeader (for instance,
Delivered-To and Envelope-To), we take its string representation.
Fixes: #15864 ("Error in email_mirror - _UnstructuredHeader has no attribute addresses").
This commit re-adds the integration for canarytokens.org, now separate
from the primary Thinkst integration.
Signed-off-by: David Wood <david@davidtw.co>
This commit fixes the Thinkst Canary integration which - based on the
schema in upstream documentation - incorrectly assumed that some fields
would always be sent, which meant that the integration would fail. In
addition, this commit adjusts support for canarytokens to only support
the canarytoken schema with Thinkst Canaries (not Thinkst's
canarytokens.org).
Signed-off-by: David Wood <david@davidtw.co>
A few major themes here:
- We remove short_name from UserProfile
and add the appropriate migration.
- We remove short_name from various
cache-related lists of fields.
- We allow import tools to continue to
write short_name to their export files,
and then we simply ignore the field
at import time.
- We change functions like do_create_user,
create_user_profile, etc.
- We keep short_name in the /json/bots
API. (It actually gets turned into
an email.)
- We don't modify our LDAP code much
here.
We generally want to avoid having two sibling test
suites depend on each other, unless there's a real
compelling reason to share code. (And if there is
code to share, we can usually promote it to either
test_helpers or ZulipTestCase, as I did here.)
This commit is also prep for the next commit, where
I try to simplify all of the helpers in EmojiReactionBase.
Especially now that we have f-strings, it is usually
better to just call api_post explicitly than to
obscure the mechanism with thin wrappers around
api_post. Our url schemes are pretty stable, so it's
unlikely that the helpers are actually gonna prevent
future busywork.
This issue isn't something a system administrator needs to take action
on -- it's a likely minor logic bug around organization
administrators moving topics between streams.
As a result, it shouldn't send error emails to administrators.
This is a hacky fix to avoid spoiler content leaking in emails. The
general idea here is to tell people to open Zulip to view the actual
message in full.
We create a mini-markdown parser here that strips away the fence content
that has the 'spoiler' tag for the text emails.
Our handling of html emails is much better in comparison where we can
use lxml to parse the spoiler blocks.
We include tests for the new implementation to avoid churning the
codebase too much so this can be easily reverted when we are able to
re-enable the feature.
We now do something sensible for spoilers in notifications. A message
like:
```spoiler Luke's father is
Vader. Don't tell anyone else.
```
would be rendered as:
Luke's father is (...)
If the push_notification for the UserMessage is already active,
we don't send any push notification to the user. This may
happen due to race conditions.
Added and fixed test cases affected by this.
This is similar to our behavior with image previews, and helps
reduce clutter in the final rendered html.
We add the string 'Tweet: ' to our existing tests so those tests
remain the same.
This commit makes our handling of twitter previews consistent with
how we handle our inline images so that tweets render next to the
paragraph that links to the tweet.
We decouple the logic of insertion rules for inline links from
image preview logic. Now, we can use this same logic for other
kinds of link previews as well.
We also remove the post_process_data option.
The sanity check is just overkill at this point,
since the mechanism to find streams is very
direct due to a recent commit.
Before this change we would only export streams
that had actual subscribers, which is usually
harmless, but it was mostly a relic of a one
time migration that we did when we were cleaning
up some dirty data in some of our very early
databases (circa 2016).
Now we work down the table hierarchy in a
more natural way:
- get Streams in Realm
- get Recipients matching above Streams
- get Subscriptions matching above Recipients
Note that for per-user exports, I kept the
same logic (users -> subscriptions -> recipients ->
streams) we had before.
One subtle detail here is that we make our
final Config blocks--which build the final
version of Recipient/Subscription--now hang
off of realm_config.
Fixes#15146.
This particular commit has been a long time coming. For reference,
!avatar(email) was an undocumented syntax that simply rendered an
inline 50px avatar for a user in a message, essentially allowing
you to create a user pill like:
`!avatar(alice@example.com) Alice: hey!`
---
Reimplementation
If we decide to reimplement this or a similar feature in the future,
we could use something like `<avatar:userid>` syntax which is more
in line with creating links in markdown. Even then, it would not be
a good idea to add this instead of supporting inline images directly.
Since any usecases of such a syntax are in automation, we do not need
to make it userfriendly and something like the following is a better
implementation that doesn't need a custom syntax:
`![avatar for Alice](/avatar/1234?s=50) Alice: hey!`
---
History
We initially added this syntax back in 2012 and it was 'deprecated'
from the get go. Here's what the original commit had to say about
the new syntax:
> We'll use this internally for the commit bot. We might eventually
> disable it for external users.
We eventually did start using this for our github integrations in 2013
but since then, those integrations have been neglected in favor of
our GitHub webhooks which do not use this syntax.
When we copied `!gravatar` to add the `!avatar` syntax, we also noted
that we want to deprecate the `!gravatar` syntax entirely - in 2013!
Since then, we haven't advertised either of these syntaxes anywhere
in our docs, and the only two places where this syntax remains is
our game bots that could easily do without these, and the git commit
integration that we have deprecated anyway.
We do not have any evidence of someone asking about this syntax on
chat.zulip.org when developing an integration and rightfully so- only
the people who work on Zulip (and specifically, markdown) are likely
to stumble upon it and try it out.
This is also the only peice of code due to which we had to look up
emails -> userid mapping in our backend markdown. By removing this,
we entirely remove the backend markdown's dependency on user emails
to render messages.
---
Relevant commits:
- Oct 2012, Initial commit c31462c278
- Nov 2013, Update commit bot 968c393826
- Nov 2013, Add avatar syntax 761c0a0266
- Sep 2017, Avoid email use c3032a7fe8
- Apr 2019, Remove from webhook 674fcfcce1
Log RealmAuditLog in do_set_realm_property and do_remove_realm_domain.
Tests for the changes are written in test_events because it will save
duplicate code for test_change_realm_property.
Added new Event Type in AbstractRealmAuditLog STREAM_CREATED.
Since we finally create streams in create_stream_if_needed function
in zerver/lib/streams.py so logged realm_audit there.
Passed acting_user when create_stream_if_needed or ensure_stream
function is called.
Added tests in test_audit_log.
We had been using !time() syntax for timestamps so far. Since its
an unreleased feature, we can make changes without affecting many
people.
Fixes#15442.