This param allows clients to specify how much presence history they want
to fetch. Previously, the server always returned 14 days of history.
With the recent migration of the presence API to the much more efficient
system relying on incremental fetches via the last_update_id param added
in #29999, we can now afford to provide much more history to clients
that request it - as all that historical data will only be fetched once.
There are three endpoints involved:
- `/register` - this is the main useful endpoint for this, used by API
clients to fetch initial data and register an events queue. Clients can
pass the `presence_history_limit_days` param here.
- `/users/me/presence` - this endpoint is currently used by clients to
update their presence status and fetch incremental data, making the new
functionality not particularly useful here. However, we still add the
new `history_limit_days` param here, in case in the future clients
transition to using this also for the initial presence data fetch.
- `/` - used when opening the webapp. Naturally, params aren't passed
here, so the server just assumes a value from
`settings.PRESENCE_HISTORY_LIMIT_DAYS_FOR_WEB_APP` and returns
information about this default value in page_params.
This was a bug, where in the realm.presence_disabled (synonymous to
being a zephyr mirror realm) case we would return None. We have decided
on the convention of using only integers here, and -1 representing lack
of data.
The presence and user status update events are only sent to accessible
users, i.e. guests do not receive presence and user status updates for
users they cannot access.
This also removes the error in one of these functions that was using a
different constant instead of
PRESENCE_LEGACY_EVENT_OFFSET_FOR_ACTIVITY_SECONDS.
This implements the core of the rewrite described in:
For the backend data model for UserPresence to one that supports much
more efficient queries and is more correct around handling of multiple
clients. The main loss of functionality is that we no longer track
which Client sent presence data (so we will no longer be able to say
using UserPresence "the user was last online on their desktop 15
minutes ago, but was online with their phone 3 minutes ago"). If we
consider that information important for the occasional investigation
query, we have can construct that answer data via UserActivity
already. It's not worth making Presence much more expensive/complex
to support it.
For slim_presence clients, this sends the same data format we sent
before, albeit with less complexity involved in constructing it. Note
that we at present will always send both last_active_time and
last_connected_time; we may revisit that in the future.
This commit doesn't include the finalizing migration, which drops the
UserPresenceOld table.
The way to deploy is to start the backfill migration with the server
down and then start the server *without* the user_presence queue worker,
to let the migration finish without having new data interfering with it.
Once the migration is done, the queue worker can be started, leading to
the presence data catching up to the current state as the queue worker
goes over the queued up events and updating the UserPresence table.
Co-authored-by: Mateusz Mandera <mateusz.mandera@zulip.com>
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Rename functions that refer to "status" without a reference to
"presence" to help clarify in the backend between UserPresence
and UserStatus models.
Prep commit for migrating "unavailable" user status feature to
"invisible" user presence feature.
The pattern of using the same variable to apply filters
or alter the `QuerySet` in other ways might produce `QuerySet`s
with incompatible types. This behavior is not allowed by mypy.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
model__id syntax implies needing a JOIN on the model table to fetch the
id. That's usually redundant, because the first table in the query
simply has a 'model_id' column, so the id can be fetched directly.
Django is actually smart enough to not do those redundant joins, but we
should still avoid this misguided syntax.
The exceptions are ManytoMany fields and queries doing a backward
relationship lookup. If "streams" is a many-to-many relationship, then
streams_id is invalid - streams__id syntax is needed. If "y" is a
foreign fields from X to Y:
class X:
y = models.ForeignKey(Y)
then object x of class X has the field x.y_id, but y of class Y doesn't
have y.x_id. Thus Y queries need to be done like
Y.objects.filter(x__id__in=some_list)
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
I'm not sure exactly what series of history got us here, but we were
fetching the mobile_user_ids data for all users in the organization,
regardless of whether they were recently active (and thus relevant for
the main presence data set). And doing so in a sloppy fashion
(sending every user ID over the wire, rather than just having the
database join on Realm).
Fixing this saves a factor of 4-5 on the total runtime of a presence
request on organizations with 10Ks of users like chat.zulip.org; more
like 25% in an organization with 150. Since large organizations are
very heavily weighted in the overall cost of presence, this is a huge
win.
Fixes part of #13734.
This changes the payload that is used
to populate `page_params` for the webapp,
as well as responses to the once-every-50-seconds
presence pings.
Now our dictionary of users only has these
two fields in the value:
- activity_timestamp
- idle_timestamp
Example data:
{
6: Object { idle_timestamp: 1585746028 },
7: Object { active_timestamp: 1585745774 },
8: Object { active_timestamp: 1585745578,
idle_timestamp: 1585745400}
}
We only send the slimmer type of payload
to clients that have set `slim_presence`
to True.
Note that this commit does not change the format
of the event data, which still looks like this:
{
website: {
client: 'website',
pushable: false,
status: 'active',
timestamp: 1585745225
}
}
We now use realm_id for querying UserPresence
instead of building a big WHERE clause from the
list of user_ids.
This commit may be a bit hard to measure, since
we still get the list of user_ids for the PushToken
query in the same method.
This code is a bit flatter and just preps the data
for a single user. There is never any interaction
between the data for user A and user B, so we can
mostly avoid complicated nested data structures
and do most of the data-crunching on a per-user basis.
We also do an explicit sort of the data before
running it through groupby. The explicit sort
simplifies how we calculate `most_recent_info`
and also avoids needing to add `dt` to an intermediate
data structure.
Finally, when it comes to the individual client data,
the code has relied on the assumption that there is
only one row per client, which I believe to be true,
but now the code is more explicit about that.