This field wasn't used for anything, and I think it has very limited
use for debugging, since fundamentally, it'll almost always have a
value within the hour of the actual timestamp in FillState, and any
more fine-grained logging we might want would be available in the
analytics job's own logs.
The proximal reason to remove it is that apparently Django's
model_to_dict doesn't support auto_now fields, and that caused some
trouble when working on adding more complete import/export support for
analytics data.
This is a preparatory commit for using isort for sorting all of our
imports, merging changes to files where we can easily review the
changes as something we're happy with.
These are also files with relatively little active development, which
means we don't expect much merge conflict risk from these changes.
Analytics database tables are getting big, and so we're likely moving to a
model where ~all stats are day stats, and we keep hourly stats only for the
last N days.
Also changed the name because:
* messages_sent_* suggests the counts (summed over subgroup) should be the
same as the other messages_sent stats, but they are different (these don't
include PMs).
* messages_sent_by_stream:is_bot:day is longer than 32 characters, the max
allowable length for a BaseCount.property.
Includes a database migration to remove the old stat from the analytics
tables.
Includes a database migration. The interval field was originally there to
facilitate time aggregation (e.g. aggregate_hour_to_day), but we now do such
aggregations in views code or in the frontend.
This is a major change to the analytics schema, and is the first step in a
number of refactorings and performance improvements. For instance, it allows
* Grouping sets of similar CountStats in the *Count tables. For instance,
active{_humans,_bots} will now have the same property, but have different
subgroup values.
* Combining queries that differ only in their value on 1 filter clause, so
that we make fewer passes through the zerver tables. For instance, instead
of running a query for each of messages_sent_to_public_streams and
messages_sent_to_private_streams, we can now run a single query with a
group by on Stream.invite_only, and store the group by value in the
subgroup column.
Adds two simplifying assumptions to how we process analytics stats:
* Sets the atomic unit of work to: a stat processed at an hour boundary.
* For any given stat, only allows these atomic units of work to be processed
in chronological order.
Adds a table FillState that, for each stat, keeps track of the last unit of
work that was processed.
This is primarily implemented through altering the migration file in
order to move the columns, but also we try to make the defaults a
little better for future tables inherited from BaseCount.
This is a first pass at building a framework for collecting various
stats about realms, users, streams, etc. Includes:
* New analytics tables for storing counts data
* Raw SQL queries for pulling data from zerver/models.py tables
* Aggregation functions for aggregating hourly stats into daily stats, and
aggregating user/stream level stats into realm level stats
* A management command for pulling the data
Note that counts.py was added to the linter exclude list due to errors
around %%s.