Commit Graph

210 Commits

Author SHA1 Message Date
Rishi Gupta 807520411b analytics: Simplify logic in do_fill_count_stat_at_hour.
Adding FillState, removing do_aggregate_hour_to_day, and disallowing unused
(interval, frequency) pairs removes the need for the nested for loops in
do_fill_count_stat_at_hour. This commit replaces that control flow with a
simpler equivalent.
2016-10-14 10:18:37 -07:00
Rishi Gupta 655ee51e35 analytics: Add table to keep track of fill state.
Adds two simplifying assumptions to how we process analytics stats:
* Sets the atomic unit of work to: a stat processed at an hour boundary.
* For any given stat, only allows these atomic units of work to be processed
  in chronological order.

Adds a table FillState that, for each stat, keeps track of the last unit of
work that was processed.
2016-10-14 10:18:37 -07:00
umkay 721529b782 analytics: Remove HuddleCount for now.
Planned changes to the underlying analytics model will require potentially
complicated changes to huddle queries.
2016-10-14 10:18:37 -07:00
umkay 7e2340155d analytics: Fix aggregation to RealmCount for realms with no users.
Previously, if a Realm had no users (or no streams),
do_aggregate_to_summary_table would fail to add a row with value 0. This
commit fixes the issue and also simplifies the do_aggregate_to_summary_table
logic.
2016-10-11 18:20:58 -07:00
Rishi Gupta 52b56cca65 analytics: Reorder arguments to assertCountEquals.
Require a table argument and change argument order around for clarity.
2016-10-11 18:20:58 -07:00
Rishi Gupta c6b611c8b9 analytics: Re-organize tests into higher level TestClasses.
Refactor the current analytics tests into the following classes:
* TestUpdateAnalyticsCounts, which will eventually test the management
  command, backfilling, what happens when new tests are added, etc.
* TestProcessCountStat, which tests the ins and outs of propagating the
  value of a single stat up through the various *Count tables.
* TestAggregates, which tests the do_aggregate_* methods.
* TestXByYQueries, which tests the count_X_by_Y_query SQL snippets.
* TestCountStats, which has tests for individual CountStats.

This commit does not change the name or contents of any individual test.
2016-10-09 16:09:04 -07:00
Rishi Gupta 795d10b9ad analytics: Refactor tests to simplify asserts of count values.
Many tests are structured to run some process, and then check a count in a
BaseCount record using default values for realm, property, interval, and
end_time. This commit adds a new assertCountEquals method to
AnalyticsTestCase, and simplifies other assert calls as appropriate.
2016-10-09 16:09:04 -07:00
Rishi Gupta 35cf9092d5 analytics: Canonicalize creation of model objects in tests.
Add a default_realm object to AnalyticsTestCase, created 2 days before
AnalyticsTestCase.TIME_ZERO.

Add lightweight create_user, create_stream, and create_message methods to
AnalyticsTestCase, with sensible defaults. In particular, all objects are by
default created at AnalyticsTestCase.TIME_LAST_HOUR, so that they are
included when running AnalyticsTestCase.process_last_hour.
2016-10-09 16:09:04 -07:00
Rishi Gupta 6996b21480 analytics: Change tests to use a fixed TIME_ZERO.
Previously, analytics tests used timezone.now or custom datetime objects
when creating new realms, users, and streams.

This commit adds a fixed TIME_ZERO and a process_last_hour helper function
in a new AnalyticsTestCase class, and modifies the existing tests to use
them.
2016-10-09 16:09:04 -07:00
umkay d260a22637 Add a new statistics/analytics framework.
This is a first pass at building a framework for collecting various
stats about realms, users, streams, etc. Includes:
* New analytics tables for storing counts data
* Raw SQL queries for pulling data from zerver/models.py tables
* Aggregation functions for aggregating hourly stats into daily stats, and
  aggregating user/stream level stats into realm level stats
* A management command for pulling the data

Note that counts.py was added to the linter exclude list due to errors
around %%s.
2016-10-04 17:18:54 -07:00