If we have multiple users, this reduces the amount
of queries we need to do, because we get all
subscriptions for all users in a single query
to Subscription.
For the single-user case, we are introducing an
extra query hop, but the database is doing
roughly the same work, because we are just breaking
up this complex query into two hops:
messages =
select ... from message
where recipient__type_id in (
select stream_id from subscription
where ...
)
Now it's more like:
stream_ids =
select stream_id from subscription
where ...
messages =
select ... from message
where recipient__type_id in stream_ids
Note that we are not changing anything semantically
or algorithmically yet. The only overhead here
for the single-user case is boxing and unboxing
data into single-item dicts and lists.
The interfaces for callers in the view and the
queue processor remain the same for now.
We didn't need the enough-traffic mock.
We also continue to prep for testing multiple users.
I also finally remove a comment that is about to
be addressed (and which inaccurately refers to huddles).
In 709493cd75 (Feb 2017)
I added code to render_markdown that re-fetched the
sender of the message, to detect whether the message is
a bot.
It's better to just let the ORM fetch this. The
message object should already have sender.
The diff makes it look like we are saving round trips
to the database, which is true in some cases. For
the main message-send codepath, though, we are only
saving a trip to memcached, since the middleware
will have put our sender's user object into the
cache. The test_message_send test calls internally
to check_send_stream_message, so it was actually
hitting the database in render_markdown (prior to
my change).
Before this change we were clearing the cache on
every SQL usage.
The code to do this was added in February 2017
in 6db4879f9c.
Now we clear the cache just one time, but before
the action/request under test.
Tests that want to count queries with a warm
cache now specify keep_cache_warm=True. Those
tests were particularly flawed before this change.
In general, the old code both over-counted and
under-counted queries.
It under-counted SQL usage for requests that were
able to pull some data out of a warm cache before
they did any SQL. Typically this would have bypassed
the initial query to get UserProfile, so you
will see several off-by-one fixes.
The old code over-counted SQL usage to the extent
that it's a rather extreme assumption that during
an action itself, the entries that you put into
the cache will get thrown away. And that's essentially
what the prior code simulated.
Now, it's still bad if an action keeps hitting the
cache for no reason, but it's not as bad as hitting
the database. There doesn't appear to be any evidence
of us doing something silly like fetching the same
data from the cache in a loop, but there are
opportunities to prevent second or third round
trips to the cache for the same object, if we
can re-structure the code so that the same caller
doesn't have two callees get the same data.
Note that for invites, we have some cache hits
that are due to the nature of how we serialize
data to our queue processor--we generally just
serialize ids, and then re-fetch objects when
we pop them off the queue.
Steve asked me to remove this, since the tictactoe game was always
intended as a proof of concept. Now that we have poll and todo
widgets, the sample code for tictactoe has much less value.
We replace the content and type in test_widgets.py to maintain
coverage.
Initally, when writing two or more quotes, having
a blank line in between them, merges those quotes.
This created confusion especially in "quote and reply".
This commit fixes such issues. Now two or more quotes
having a blank line in between them, will not get merged.
This change is correct both for usability and for improving our
compatibility with CommonMark.
Fixes#14379.
This commit removes mock.patch with assertLogs().
* Adds return value to do_rest_call() in outgoing_webhook.py, to
support asserting log output in test_outgoing_webhook_system.py.
* Logs are not asserted in test_realm.py because it would require to users
to be queried using users=User.objects.filter(realm=realm) and the order
of resulting queryset varies for each run.
* In test_decorators.py, replacement of mock.patch is not done because
I'm not sure if it's worth the effort to replace it as it's a return
value of a function.
Tweaked by tabbott to set proper mypy types.
Then because the ID is now part of the draft dict, we can
(and do) change the structure of the "drafts" parameter
returned from `GET /drafts` from an object (mapping ID to
data) to an array.
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
Sometimes we don't need to specify the expected_drafts field.
So by removing it, we can reduce the clutter a bit.
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
Now the timestamp returned in a draft dict will always be an int.
The endpoints will still accept either an int or a float.
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
Our test-backend validation confirms that we don't log anything to
stdout in the tests, so the fact that CI passes with this removes
shows there was nothing being logged.
While working on shifting toward native browser time zone APIs
(#16451), it was found that all but very recent Chrome and Node
versions reject certain legacy timezone aliases like US/Pacific
(https://crbug.com/364374).
For now, we only canonicalize the timezone property returned in user
objects and not the timezone setting itself.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
During the new user creation code path, there can be no existing
active clients for the user being created, so we can skip the code to
send events to that user's clients.
The tests here reflect that we need to send fewer events, and do fewer
queries that would have been spent computing data for these..
Fixes#16503, combined with the long series of recent changes by Steve
Howell to fix super-linear behavior in this code path.
We no bulk up peer_add/peer_remove events by user if the
same user has subscribed to multiple streams (and just
that single user).
This mostly optimizes the new-user codepath, but the
algorithm is a bit more general in nature.
This test was flaky due to some date-related
non-determinism. I make all the Message objects
current to make add_new_user_history reliably
try to bulk-update UserMessage rows to read.
Because of the very large `oneOf` clause of the formats of events
possible in Zulip's `GET /events` system, we had issues with
`test-backend` failures for missing documentation for a new event
format being like 1000 lines of output, which was very much unhelpful.
Fix this by limiting the output use only the oneOf variants that are
broadly similar to the actual payload received.
Fixes#16023.
The comment still pointed to 'vacate' event flow, but
we have removed the vacate event in a9356508ca.
This commit fixes the comment to depict the correct
purpose of below lines, i.e. to test the remove
event flow.
We were including 'realm_user' in event_types along with 'subscription',
but we don't send event of type 'realm_user' when subscribing to a new
stream. This was added in 1c332f5d6a.
This commit removes 'realm_user' from event_types.
The name used to be included in the id_token, but this seems to have
been changed by Apple and now it's sent in the `user` request param.
https://github.com/python-social-auth/social-core/pull/483 is the
upstream PR for this - but upstream is currently unmaintained, so we
have to monkey patch.
We also alter the tests to reflect this situation. Tests no longer put
the name in the id_token, but rather in the `user` request param in the
browser flow, just like it happens in reality.
An adaptation has to be made in the native flow - since the name won't
be included by Apple in the id_token anymore, the app, when POSTing
to the /complete/apple/ endpoint,
can (and should for better user experience)
add the `user` param formatted as json of
{"email": "hamlet@zulip.com", "name": {"firstName": "Full", "lastName": "Name"}}
dict. This is also reflected by the change in the
native flow tests.
We now can send an implied matrix of user/stream tuples
for peer_add and peer_remove events.
The client code basically does this:
for stream_id in event['stream_ids']:
for user_id in event['user_ids']:
update_sub(stream_id, user_id)
We used to send individual events, which gets real
expensive when you are creating new streams. For
the case of copy-to-stream case, we should see
events go from U to 1, where U is the number of users
added.
Note that we don't yet fully optimize the potential
of this schema. For adding a new user with lots
of default streams, we still send S peer_add events.
And if you subscribe a bunch of users to a bunch of
private streams, we only go from U * S to S; we can't
optimize it down to one event easily.
All the fields of a stream's recipient object can
be inferred from the Stream, so we just make a local
object. Django will create a Message object without
checking that the child Recipient object has been
saved. If that behavior changes in some upgrade,
we should see some pretty obvious symptom, including
query counts changing.
Tweaked by tabbott to add a longer explanatory comment, and delete a
useless old comment.
This saves us a query for edge cases like when
you try to unsubscribe from a public stream
that you have already unsubscribed from.
But this is mostly to prep for upcoming
optimizations.
Initially markdown titles were overridden by Youtube and Vimeo preview titles.
But now it will check if any markdown title is present to replace Youtube or
Vimeo preview titles, if preview of linked websites is enabled.
Fixes#16100
Upstream has slightly changed the whitespace around stashes. Take
this opportunity to clean up the extra blank lines we were outputting.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
That class is an artifact of when Stream
didn't have recipient_id. Now it's simpler
to deal with stream subscriptions.
We also save a query during page load (and
other places where we get subscriber
info).
We already trust ids that are put on our queue
for deferred work. For example, see the code for
"mark_stream_messages_as_read_for_everyone"
We now pass stream_recipient_id when we queue
up work for do_mark_stream_messages_as_read.
This generally saves about 3 queries per
user when we unsubscribe them from a stream.
We get two speedups:
* The query to get existing subscribers only
gets the two fields we need. We no longer
need all the overhead of user_profile
and recipient data being returned in the
query.
* We avoid Django making extra hops to the
database to get user info.
We replace get_peer_user_ids_for_stream_change
with two bulk functions to get peers and/or
subscribers.
Note that we have three codepaths that care about
peers:
subscribing existing users:
we need to tell peers about new subscribers
we need to tell subscribed user about old subscribers
unsubscribing existing users:
we only need to tell peers who unsubscribed
subscribing new user:
we only need to tell peers about the new user
(right now we generate send_event
calls to tell the new user about existing
subscribers, but this is a waste
of effort that we will fix soon)
The two bulk functions are this:
bulk_get_subscriber_peer_info
bulk_get_peers
They have some overlap in the implementation,
but there are some nuanced differences that are
described in the comments.
Looking up peers/subscribers in bulk leads to some
nice optimizations.
We will save some memchached traffic if you are
subscribing to multiple public streams.
We will save a query in the remove-subscriber
case if you are only dealing with private streams.
We used to send occupy/vacate events when
either the first person entered a stream
or the last person exited.
It appears that our two main apps have never
looked at these events. Instead, it's
generally the case that clients handle
events related to stream creation/deactivation
and subscribe/unsubscribe.
Note that we removed the apply_events code
related to these events. This doesn't affect
the webapp, because the webapp doesn't care
about the "streams" field in do_events_register.
There is a theoretical situation where a
third party client could be the victim of
a race where the "streams" data includes
a stream where the last subscriber has left.
I suspect in most of those situations it
will be harmless, or possibly even helpful
to the extent that they'll learn about
streams that are in a "quasi" state where
they're activated but not occupied.
We could try to patch apply_event to
detect when subscriptions get added
or removed. Or we could just make the
"streams" piece of do_events_register
not care about occupy/vacate semantics.
I favor the latter, since it might
actually be what users what, and it will
also simplify the code and improve
performance.
The query to get "occupied" streams has been expensive
in the past. I'm not sure how much any recent attempts
to optimize that query have mitigated the issue, but
since we clearly aren't sending this data, there is no
reason to compute it.
Using web_public_guest for anonymous users is confusing since
'guest' is actually a logged-in user compared to
web_public_guest which is not logged-in and has only
read access to messages. So, we rename it to
web_public_visitor.
This is a more thorough test of adding multiple
streams for multiple users, including streams
that users have already subscribed to.
The extra queries here are due to the fact
that we call `principal_to_user_profile` in
a loop in the view. So that's an example
of O(N) overhead. We may be able to bulk-fetch
these users eventually.
This is a pure extraction, except that I remove a
redundant check that `len(principals) > 0`. Whenever
that value is false, then `new_subscriptions` will
only have one possible entry, which is the current
user, and we skip that in the loop.
We no longer do O(N) queries to get existing streams.
This is a somewhat contrived use case--generally, we
are not trying to re-subscribe a user to several
streams. Still, we want to avoid this.
This commit also makes `test_bulk_subscribe_many`
do more work, and the change to the test helped
me discover this bug.
If a user asks to be subscribed to a stream
that they are already subscribed to, then
that stream won't be in new_stream_user_ids,
and we won't need to send an event for it.
This change makes that happen more automatically.
I think it's important that the callers understand
that bulk_add_subscriptions assumes all streams
are being created within a single realm, so I make
it an explicit parameter.
This may be overkill--I would also be happy if we
just included the assertions from this commit.
ssh always runs its command through a shell (after naïvely joining
multiple arguments with spaces), so it needs an extra level of shell
quoting. This should have no effect because we already validated user
with a regex, but it’s better for escaping to be locally correct in
case the context changes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This low-level interface allows consuming from a queue with timeouts.
This can be used to either consume in batches (with an upper timeout),
or one-at-a-time. This is notably more performant than calling
`.get()` repeatedly (what json_drain_queue does under the hood), which
is "*highly discouraged* as it is *very inefficient*"[1].
Before this change:
```
$ ./manage.py queue_rate --count 10000 --batch
Purging queue...
Enqueue rate: 11158 / sec
Dequeue rate: 3075 / sec
```
After:
```
$ ./manage.py queue_rate --count 10000 --batch
Purging queue...
Enqueue rate: 11511 / sec
Dequeue rate: 19938 / sec
```
[1] https://www.rabbitmq.com/consumers.html#fetching
`loopworker_sleep_mock` is a file-level variable used to mock out the
sleep() call in LoopQueueProcessingWorker; don't reuse the variable
name for something else.
Despite its name, the `queue_size` method does not return the number
of items in the queue; it returns the number of items that the local
consumer has delivered but unprocessed. These are often, but not
always, the same.
RabbitMQ's queues maintain the queue of unacknowledged messages; when
a consumer connects, it sends to the consumer some number of messages
to handle, known as the "prefetch." This is a performance
optimization, to ensure the consumer code does not need to wait for a
network round-trip before having new data to consume.
The default prefetch is 0, which means that RabbitMQ immediately dumps
all outstanding messages to the consumer, which slowly processes and
acknowledges them. If a second consumer were to connect to the same
queue, they would receive no messages to process, as the first
consumer has already been allocated them. If the first consumer
disconnects or crashes, all prior events sent to it are then made
available for other consumers on the queue.
The consumer does not know the total size of the queue -- merely how
many messages it has been handed.
No change is made to the prefetch here; however, future changes may
wish to limit the prefetch, either for memory-saving, or to allow
multiple consumers to work the same queue.
Rename the method to make clear that it only contains information
about the local queue in the consumer, not the full RabbitMQ queue.
Also include the waiting message count, which is used by the
`consume()` iterator for similar purpose to the pending events list.
For streams in which only full members are allowed to post,
we block guest users from posting there.
Guests users were blocked from posting to admin only streams
already. So now, guest users can only post to
STREAM_POST_POLICY_EVERYONE streams.
This is not a new feature but a bugfix which should have
happened when implementing full member stream policy / guest users.
SIGALRM is the simplest way to set a specific maximum duration that
queue workers can take to handle a specific message. This only works
in non-threaded environments, however, as signal handlers are
per-process, not per-thread.
The MAX_CONSUME_SECONDS is set quite high, at 10s -- the longest
average worker consume time is embed_links, which hovers near 1s.
Since just knowing the recent mean does not give much information[1],
it is difficult to know how much variance is expected. As such, we
set the threshold to be such that only events which are significant
outliers will be timed out. This can be tuned downwards as more
statistics are gathered on the runtime of the workers.
The exception to this is DeferredWorker, which deals with quite-long
requests, and thus has no enforceable SLO.
[1] https://www.autodesk.com/research/publications/same-stats-different-graphs
Currently, drain_queue and json_drain_queue ack every message as it is
pulled off of the queue, until the queue is empty. This means that if
the consumer crashes between pulling a batch of messages off the
queue, and actually processing them, those messages will be
permanently lost. Sending an ACK on every message also results in a
significant amount lot of traffic to rabbitmq, with notable
performance implications.
Send a singular ACK after the processing has completed, by making
`drain_queue` into a contextmanager. Additionally, use the `multiple`
flag to ACK all of the messages at once -- or explicitly NACK the
messages if processing failed. Sending a NACK will re-queue them at
the front of the queue.
Performance of a no-op dequeue before this change:
```
$ ./manage.py queue_rate --count 50000 --batch
Purging queue...
Enqueue rate: 10847 / sec
Dequeue rate: 2479 / sec
```
Performance of a no-op dequeue after this change (a 25% increase):
```
$ ./manage.py queue_rate --count 50000 --batch
Purging queue...
Enqueue rate: 10752 / sec
Dequeue rate: 3079 / sec
```
We add a new wildcard_mention_policy setting to handle wildcard
mentions in large streams, with a wide range of policies available to
organizations.
We set the default to the safe option for preventing accidental spam:
only stream administrators being able to use wildcard mentions in
large streams.
We call build_message_send_dict from check_message instead of
do_send_messages.
This is a prep commit for adding a new setting for handling
wildcard mentions in large streams.
A later commit alters `authenticate` of EmailAuthBackend to
add a store `needs_to_change_password` variable to session
which is useful to insist users on changing their weak password.
The tests start failing with that change because client.login()
runs `authenticate` without a `request` object. So, this commit
sends a request object with `request.session=self.client.session`
to self.client.login() in tests wherever needed.
We previously used to to redirect to config error page with
a different URL. This commit renders config error in the same
URL where configuration error is encountered. This way when
conifguration error is fixed the user can refresh to continue
normally or go back to login page from the link provided to
choose any other backend auth.
Also moved those URLs to dev_urls.py so that they can be easily
accessed to work on styling etc.
In tests, removed some of the asserts checking status code to be 200
as the function `assert_in_success_response` does that check.
We now no longer define any schemas in test_events--all
of them are in event_schema, which helps our tooling
cross-check schemas for openapi and node tests.
It happens that whether you add a reaction or remove
a reaction, we send the exact same fields, just using
a different op code.
This sort of symmetry is actually kind of rare, as
usually "add" events have more fields, and "remove" events
might just send an id of something to remove.
Our openapi schema treats these as two seperate events,
so we are more consistent with it, and it helps our
schema-checking tooling for node fixtures, too.
Note that we now have to exempt the two events from
our openapi checks, due to the is_mirror_dummy field
in the deprecated user block. We can decide how to
handle this later--one possibility is to just add it
as an optional field on the event_schema side.
Note that we use value_type for value instead of
bool, since properties can be non-bool things
like color, which we just don't test now. We
should test them.
We more than compensate for this by checking
the actual value of the value in
check_subscription_update.
There is a legacy format where we send
singular "message_id" instead of plural
"message_ids".
Then there are different fields for "private"
and "stream" message types.
Note that we make the schema for profile_data
slightly more realistic, but it doesn't actually get
exercised by our current tests (apart from
making sure it's a dict), since we don't have
profile data for our test realm.
We also don't have the optional fields for bots,
since our tests don't exercise that, nor
delivery_email.
So we exempt realm_user_add_event from openapi
checks for now.
When we try to match the openapi specs better, we
will probably want to add a few tests to test_events.
Obviously getting good coverage for adding users
would be nice for all these scenarios:
* delivery_email matters
* bots
* realm has profile fields
This is a prep commit for supporting "presence"
events, where the key of the dictionary is some
arbitrary string like "website" but the value
of the dictionary is another dictionary itself
with keys that are more like variable names.
This also forces us to create TupleType.
We exempt this from the openapi check,
since we haven't figured out how to model
tuples in openapi with the same precision
as event_schema (and it may be impossible).
Long term we just want to stop dealing in
tuples, of course.
StringDict is a data type for representing dictionaries where
all keys and values are strings. Add this data type to data_types.py
and edit other files so that this data type is put to use and tested.
(slightly tweaked by @showell to remove a comment and shorten
a var name now that we have a proper data type)
We also make our schema in event_schema reflect this,
which in turn makes us match the already accurate
openapi spec, so we no longer need to exempt four
types of events from our sanity checks.
We might want to rename the tool to something more
general now, since we are really reconciling three
things:
- node fixtures
- event_schema checkers for test_events
- openapi specs
The way we compare python and openapi schemas is
as follows:
- first convert openapi schemas to be build
from DictType, ListType, etc. with from_opeapi
- do a diff on the schemas
Most of the new code is just having the FooType
family of classes serialize themselves with schema().
This commit renames 'test_message_to_self' and
'test_api_message_to_self' tests to
'test_message_to_stream_by_name' and
'test_api_message_to_stream_by_name' to depict
the actual purpose of these tests.
This argument does not define if an endpoint "is a webhook"; it is set
for "/api/v1/messages", which is not really a webhook, but allows
access from webhooks.
If multiple filters match the same string, we run into an infinite
loop of converting string into urls. To fix it, we mark the matched
string as atomic after first conversion.
This mimics the backend logic for adding the data-attribute -
to know what Pygments language was used to highlight the code
block - in locally echoed messages.
New test added checks our logic for canonicalizing pygments alias
(for both frontend and backend).
Other fixtures and tests amended.
In development and test, we keep the Tornado port at 9993 and 9983,
respectively; this allows tests to run while a dev instance is
running.
In production, moving to port 9800 consistently removes an odd edge
case, when just one worker is on an entirely different port than if
two workers are used.
When converting fenced code markdown, we add the language (if specified)
in a data-attribute by tweaking the HTML generated. Doing so, allows the
frontend to make use of this attr to display view-in-playground option
for codeblocks.
We use pygments to get the lexer subclass name and use that instead of
directly using the language in the data-attribute. Doing so, helps us
map different language aliases (like `js` and `javascript`) into a common
variable (like `JavaScript`) - and avoids the client from dealing with
multiple tags corresponding to the same language.
The html structure for a message like this:
``` js
..content..
```
would now be:
<div class="codehilite" data-codehilite-language="JavaScript">
<pre>..content..</pre>
</div>
Tests and fixtures amended.
Fixes#16284.
Most of the work for this was done when we implemented correct
behavior for guest users, since they treat public streams like private
streams anyway.
The general method involves moving the messages to the new stream with
special care of UserMessage.
We delete UserMessages for subs who are losing access to the message.
For private streams with protected history, we also create UserMessage
elements for users who are not present in the old stream, since that's
important for those users to access the moved messages.
Previously, S3UploadBackend.delete_export_tarball failed to strip the
leading ‘/’ from the export path. This mistake is now caught by Moto
1.3.15. I expect it caused deletion failures in the real S3, although
I haven’t verified this.
We store export_path in the audit log with a leading ‘/’, but the
actual S3 keys do not have a leading ‘/’. Changing either system
would require a migration. So the new convention is that the
variables named ‘export_path’ have a leading ‘/’, while variables
named ‘path_id’ or ‘key’ do not.
Signed-off-by: Anders Kaseorg <anders@zulip.com>