presence: Tweak and document presence tuning values.

We're changing the ping interval from 50s to 60s, because that's what
the mobile apps have hardcoded currently, and backwards-compatibility
is more important there than the web app's previously hardcoded 50s.

For PRESENCE_PING_INTERVAL_SECS, the previous value hardcoded in both
clients was 140s, selected as "plenty of network/other latency more
than 2 x ACTIVE_PING_INTERVAL_MS". This is a pretty aggressive value;
even a single request being missed or 500ing can result in a user
appearing offline incorrectly. (There's a lag of up to one full ping
interval between when the other client checks in and when you check
in, and so we'll be at almost 2 ping intervals when you issue your
next request that might get an updated connection time from that
user).

To increase failure tolerance, we want to change the offline
threshhold from 2 x ACTIVE_PING_INTERVAL + 20s to 3 x
ACTIVE_PING_INTERVAL + 20s, aka 140s => 200s, to be more robust to
temporary failures causing us to display other users as offline.

Since the mobile apps currently have 140s and 60s hardcoded, it should
be safe to make this particular change; the mobile apps will just
remain more aggressive than the web app in marking users offline until
it uses the new API parameters.

The end result in that Zulip will be slightly less aggressive at
marking other users as offline if they go off the Internet. We will
likely be able to tune ACTIVE_PING_INTERVAL downwards once #16381 and
its follow-ups are completed, because it'll likely make these requests
much cheaper.
This commit is contained in:
Mateusz Mandera 2023-02-21 12:20:41 +01:00 committed by Tim Abbott
parent 8ef889f392
commit 52515a1560
4 changed files with 23 additions and 11 deletions

View File

@ -577,8 +577,8 @@ test("update_presence_info", ({override}) => {
override(pm_list, "update_private_messages", () => {});
page_params.realm_presence_disabled = false;
page_params.server_presence_ping_interval_seconds = 50;
page_params.server_presence_offline_threshold_seconds = 140;
page_params.server_presence_ping_interval_seconds = 60;
page_params.server_presence_offline_threshold_seconds = 200;
const server_time = 500;
const info = {

View File

@ -422,7 +422,7 @@ test("always show me", () => {
});
test("level", () => {
page_params.server_presence_offline_threshold_seconds = 140;
page_params.server_presence_offline_threshold_seconds = 200;
add_canned_users();
assert.equal(buddy_data.level(me.user_id), 0);

View File

@ -15,7 +15,7 @@ const people = zrequire("people");
const watchdog = zrequire("watchdog");
const presence = zrequire("presence");
const OFFLINE_THRESHOLD_SECS = 140;
const OFFLINE_THRESHOLD_SECS = 200;
const me = {
email: "me@zulip.com",

View File

@ -487,14 +487,26 @@ LOG_API_EVENT_TYPES = False
# TODO: Replace this with a smarter "run on only one server" system.
STAGING = False
# How long to wait before presence should treat a user as offline.
OFFLINE_THRESHOLD_SECS = 140
# Presence tuning parameters. These values were hardcoded in clients
# before Zulip 7.0 (feature level 164); modern clients should get them
# via the /register API response, making it possible to tune these to
# adjust the trade-off between freshness and presence-induced load.
#
# The default for OFFLINE_THRESHOLD_SECS is chosen as
# `PRESENCE_PING_INTERVAL_SECS * 3 + 20`, which is designed to allow 2
# round trips, plus an extra in case an update fails. See
# https://zulip.readthedocs.io/en/latest/subsystems/presence.html for
# details on the presence architecture.
#
# How long to wait before clients should treat a user as offline.
OFFLINE_THRESHOLD_SECS = 200
# How often a client should ping by asking for presence data of all users.
PRESENCE_PING_INTERVAL_SECS = 50
# Specifies the number of active users in the realm
# above which sending of presence update events will be disabled.
PRESENCE_PING_INTERVAL_SECS = 60
# Zulip sends immediate presence updates via the events system when a
# user joins or becomes online. In larger organizations, this can
# become prohibitively expensive, so we limit how many active users an
# organization can have before these presence update events are
# disabled.
USER_LIMIT_FOR_SENDING_PRESENCE_UPDATE_EVENTS = 100
# How many days deleted messages data should be kept before being