Commit Graph

11 Commits

Author SHA1 Message Date
Rishi Gupta d95fb33d8d analytics: Add subgroups to unicode representations in models.py. 2016-12-20 12:03:23 -08:00
anirudhjain75 beaa62cafa mypy: Convert several directories to use typing.Text.
Specifically, these directories are converted: [analytics/, scripts/,
tools/, zerver/management/, zilencer/, zproject/]
2016-12-07 20:51:05 -08:00
umkay a94599fca7 analytics/models.py: Add subgroup column to unique_together constraints. 2016-11-01 16:53:56 -07:00
umkay e92604ab78 analytics: Alter field length for property and interval in BaseCount. 2016-10-27 16:33:58 -07:00
umkay 610e92b94e analytics: Add subgroup column to analytics tables.
This is a major change to the analytics schema, and is the first step in a
number of refactorings and performance improvements. For instance, it allows

* Grouping sets of similar CountStats in the *Count tables. For instance,
  active{_humans,_bots} will now have the same property, but have different
  subgroup values.

* Combining queries that differ only in their value on 1 filter clause, so
  that we make fewer passes through the zerver tables. For instance, instead
  of running a query for each of messages_sent_to_public_streams and
  messages_sent_to_private_streams, we can now run a single query with a
  group by on Stream.invite_only, and store the group by value in the
  subgroup column.
2016-10-27 16:33:58 -07:00
Rishi Gupta 82b814a1cd analytics: Simplify frequency and measurement interval options.
Change the CountStat object to take an is_gauge variable instead of a
smallest_interval variable. Previously, (smallest_interval, frequency)
could be any of (hour, hour), (hour, day), (hour, gauge), (day, hour),
(day, day), or (day, gauge).
The current change is equivalent to excluding (hour, day) and (day, hour)
from the list above.

This change, along with other recent changes, allows us to simplify how we
handle time intervals. This commit also removes the TimeInterval object.
2016-10-14 10:18:37 -07:00
Rishi Gupta 655ee51e35 analytics: Add table to keep track of fill state.
Adds two simplifying assumptions to how we process analytics stats:
* Sets the atomic unit of work to: a stat processed at an hour boundary.
* For any given stat, only allows these atomic units of work to be processed
  in chronological order.

Adds a table FillState that, for each stat, keeps track of the last unit of
work that was processed.
2016-10-14 10:18:37 -07:00
umkay 721529b782 analytics: Remove HuddleCount for now.
Planned changes to the underlying analytics model will require potentially
complicated changes to huddle queries.
2016-10-14 10:18:37 -07:00
Rishi Gupta 929b69397b analytics: Change string representation of BaseCount models.
Previously we showed both the value and the id of the BaseCount record,
which is confusing in a typical case where you only care about the value,
and both the value and id are smallish ints.
2016-10-09 16:09:04 -07:00
umkay 78477ea071 Reorder the columns in analytics tables inherited from BaseCount.
This is primarily implemented through altering the migration file in
order to move the columns, but also we try to make the defaults a
little better for future tables inherited from BaseCount.
2016-10-06 17:51:01 -07:00
umkay d260a22637 Add a new statistics/analytics framework.
This is a first pass at building a framework for collecting various
stats about realms, users, streams, etc. Includes:
* New analytics tables for storing counts data
* Raw SQL queries for pulling data from zerver/models.py tables
* Aggregation functions for aggregating hourly stats into daily stats, and
  aggregating user/stream level stats into realm level stats
* A management command for pulling the data

Note that counts.py was added to the linter exclude list due to errors
around %%s.
2016-10-04 17:18:54 -07:00