With `realm_active_humans` no longer dependent on the per-user rows,
there is no reason to preserve them -- any measure of "was a user
active" should look directly at the much richer RealmAuditLog. This
removes the bulk of the UserCount table, since the remaining rows all
require user interaction of some sort to produce rows.
This makes it no longer dependent on active_users_audit:is_bot:day,
which subsequent commits will make a RealmCount, not UserCount, query.
This folds the same behaviour of `active_users_audit` directly into
the query; however, only running over active users, using the index
from the earlier commit, and using the new `DISTINCT ON` formulation
make this a fast query compared to `active_users_audit:is_bot:day` +
the old `realm_active_humans::day`.
This produces a query which is more comprehensible, is 2x faster when
limited to a realm, and has equivalent speed when performing the full
table scan.
Fixes#29632.
The issue description explains this well:
We currently recalculate `currently_used_upload_space_bytes` every file
upload, by dint of calling `flush_used_upload_space_cache` on
save/delete, and then immediately calling
`user_profile.realm.currently_used_upload_space_bytes()` in
`notify_attachment_update`. Since this walks the Attachments table,
recalculating this can take seconds in large realms.
Switch this to using a CountStat, so we don't need to walk significant
chunks of the Attachment table when we upload an attachment. This will
also give us a historical daily graph of usage.
The previous implementation using Django's `get_or_create` for
`do_increment_logging_stat` involved two separate database queries,
potentially leading to race conditions.
Use an `ON CONFLICT ... DO UPDATE` (aka "upsert") query, which
eliminates race conditions and improves performance. This is mildly
complicated due to the different unique indexes across the various
tables, and the need for bug-for-bug compatibility with the previous
implementation.
Fixes#28947.
Co-authored-by: Alex Vandiver <alexmv@zulip.com>
LoggingCountStats with a daily duration and that are directly stored
on the RealmCount table (not via aggregation in process_count_stat),
can be in a state, after the hourly cron job to update analytics
counts, where the logged value will be live-updated later, because
the end time for the stat is still in the future.
As these logging counts are designed to be used on the self-hosted
installation for either debugging or rate limiting, sending these
partial/incomplete counts to the bouncer has low value.
This is a CountStat for tracking how many mobile notifications the
server requested.
1. On a self-hosted server, that means requesting from the push bouncer.
2. On a server that's its own push bouncer, that's just the number
directly sent.
This number has room for inaccuracy due to incrementing by the number of
user devices on a self-hosted server, as it doesn't account for errors
that may occur in the GCM/APNs low-level sending codepaths on the bouncer.
Also tests that a server that's its own push bouncer correctly
increments its mobile_pushes_sent::day CountStat, by basing it on the
values returned from the send_apple/android_push_notification functions
which tell us the actual number of successfully sent notifications.
Since the return values of send_..._push_notification are now
used in those codepaths, we need to tweak our mocks in some unrelated
tests to set up some return value to avoid errors.
We are about to add support for having RemoteZulipServer here, which is
a zilencer, not zerver, object. So let's rename this argument to
something more appropriately general.
An assert is appropriate here to ensure that some future additions of
other frequencies don't make this if/else logic wrong without explicitly
failing.
_default_manager is the same as objects on most of our models. But
when a model class is stored in a variable, the type system doesn’t
know which model the variable is referring to, so it can’t know that
objects even exists (Django doesn’t add it if the user added a custom
manager of a different name). django-stubs used to incorrectly assume
it exists unconditionally, but it no longer does.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The Django convention is for __repr__ to include the type and __str__
to omit it. In fact its default __repr__ implementation for models
automatically adds a type prefix to __str__, which has resulted in the
type being duplicated:
>>> UserProfile.objects.first()
<UserProfile: <UserProfile: emailgateway@zulip.com <Realm: zulipinternal 1>>>
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
For types like `Union[Realm, UserProfile, Stream]` and
`Union[AnonymousUser, AbstractBaseUser]`, we need assertions to
tell mypy which type we would be expecting.
This is a prep commit. Currenty we only pass CountStat.property
to last_successful_fill function. But it needs access to
CountStat.time_increment as well. We can pass the entire CountStat
object to the function as a workaround. But making last_successful_fill
a property of CountStat seems to be much more cleaner.
Whenever we use API queries to mark messages as read we now increment
two new LoggingCount stats, messages_read::hour and
messages_read_interactions::hour.
We add an early return in do_increment_logging_stat function if there
are no changes (increment is 0), as an optimization to avoid
unnecessary database queries.
We also log messages_read_interactions::hour Logging stat
as the number of API queries to mark messages as read.
We don't include tests for the case where do_update_pointer is called
because do_update_pointer will most likely be removed from the
codebase in the near future.