Commit Graph

16 Commits

Author SHA1 Message Date
Umair Khan 7d51efe9a1 Django 1.10: Fix dummy data for count stat.
Django 1.10 checks the foreign key constraints as part of the testing
suite so we need to create test data which passes validation tests.
2016-11-14 16:09:12 -08:00
umkay 5490442580 analytics: Replace all joins in raw SQL with natural joins.
We alter the behavior of our queries to no longer write rows with 0 counts
to the db, and pad with 0s in the related views code. As a result we are
also able to combine the where and join clause conditions in the sql
queries. This new behavior is also updated in our tests.
2016-11-03 16:50:39 -07:00
Rishi Gupta db0e509422 do_create_realm: Replace domain argument with string_id.
Turns string_id into a required argument, and domain into an optional
argument.
2016-11-02 22:46:34 -07:00
Rishi Gupta 9ef8536cc6 models.Realm: Require Realm.string_id to be non-NULL.
Adds a database migration, adds a new string_id argument to the management
realm creation command, and adds a short name field to the web realm
creation form when REALMS_HAVE_SUBDOMAINS is False.
2016-11-02 22:46:34 -07:00
umkay 610e92b94e analytics: Add subgroup column to analytics tables.
This is a major change to the analytics schema, and is the first step in a
number of refactorings and performance improvements. For instance, it allows

* Grouping sets of similar CountStats in the *Count tables. For instance,
  active{_humans,_bots} will now have the same property, but have different
  subgroup values.

* Combining queries that differ only in their value on 1 filter clause, so
  that we make fewer passes through the zerver tables. For instance, instead
  of running a query for each of messages_sent_to_public_streams and
  messages_sent_to_private_streams, we can now run a single query with a
  group by on Stream.invite_only, and store the group by value in the
  subgroup column.
2016-10-27 16:33:58 -07:00
Rishi Gupta 82b814a1cd analytics: Simplify frequency and measurement interval options.
Change the CountStat object to take an is_gauge variable instead of a
smallest_interval variable. Previously, (smallest_interval, frequency)
could be any of (hour, hour), (hour, day), (hour, gauge), (day, hour),
(day, day), or (day, gauge).
The current change is equivalent to excluding (hour, day) and (day, hour)
from the list above.

This change, along with other recent changes, allows us to simplify how we
handle time intervals. This commit also removes the TimeInterval object.
2016-10-14 10:18:37 -07:00
Rishi Gupta 807520411b analytics: Simplify logic in do_fill_count_stat_at_hour.
Adding FillState, removing do_aggregate_hour_to_day, and disallowing unused
(interval, frequency) pairs removes the need for the nested for loops in
do_fill_count_stat_at_hour. This commit replaces that control flow with a
simpler equivalent.
2016-10-14 10:18:37 -07:00
Rishi Gupta 655ee51e35 analytics: Add table to keep track of fill state.
Adds two simplifying assumptions to how we process analytics stats:
* Sets the atomic unit of work to: a stat processed at an hour boundary.
* For any given stat, only allows these atomic units of work to be processed
  in chronological order.

Adds a table FillState that, for each stat, keeps track of the last unit of
work that was processed.
2016-10-14 10:18:37 -07:00
umkay 721529b782 analytics: Remove HuddleCount for now.
Planned changes to the underlying analytics model will require potentially
complicated changes to huddle queries.
2016-10-14 10:18:37 -07:00
umkay 7e2340155d analytics: Fix aggregation to RealmCount for realms with no users.
Previously, if a Realm had no users (or no streams),
do_aggregate_to_summary_table would fail to add a row with value 0. This
commit fixes the issue and also simplifies the do_aggregate_to_summary_table
logic.
2016-10-11 18:20:58 -07:00
Rishi Gupta 52b56cca65 analytics: Reorder arguments to assertCountEquals.
Require a table argument and change argument order around for clarity.
2016-10-11 18:20:58 -07:00
Rishi Gupta c6b611c8b9 analytics: Re-organize tests into higher level TestClasses.
Refactor the current analytics tests into the following classes:
* TestUpdateAnalyticsCounts, which will eventually test the management
  command, backfilling, what happens when new tests are added, etc.
* TestProcessCountStat, which tests the ins and outs of propagating the
  value of a single stat up through the various *Count tables.
* TestAggregates, which tests the do_aggregate_* methods.
* TestXByYQueries, which tests the count_X_by_Y_query SQL snippets.
* TestCountStats, which has tests for individual CountStats.

This commit does not change the name or contents of any individual test.
2016-10-09 16:09:04 -07:00
Rishi Gupta 795d10b9ad analytics: Refactor tests to simplify asserts of count values.
Many tests are structured to run some process, and then check a count in a
BaseCount record using default values for realm, property, interval, and
end_time. This commit adds a new assertCountEquals method to
AnalyticsTestCase, and simplifies other assert calls as appropriate.
2016-10-09 16:09:04 -07:00
Rishi Gupta 35cf9092d5 analytics: Canonicalize creation of model objects in tests.
Add a default_realm object to AnalyticsTestCase, created 2 days before
AnalyticsTestCase.TIME_ZERO.

Add lightweight create_user, create_stream, and create_message methods to
AnalyticsTestCase, with sensible defaults. In particular, all objects are by
default created at AnalyticsTestCase.TIME_LAST_HOUR, so that they are
included when running AnalyticsTestCase.process_last_hour.
2016-10-09 16:09:04 -07:00
Rishi Gupta 6996b21480 analytics: Change tests to use a fixed TIME_ZERO.
Previously, analytics tests used timezone.now or custom datetime objects
when creating new realms, users, and streams.

This commit adds a fixed TIME_ZERO and a process_last_hour helper function
in a new AnalyticsTestCase class, and modifies the existing tests to use
them.
2016-10-09 16:09:04 -07:00
umkay d260a22637 Add a new statistics/analytics framework.
This is a first pass at building a framework for collecting various
stats about realms, users, streams, etc. Includes:
* New analytics tables for storing counts data
* Raw SQL queries for pulling data from zerver/models.py tables
* Aggregation functions for aggregating hourly stats into daily stats, and
  aggregating user/stream level stats into realm level stats
* A management command for pulling the data

Note that counts.py was added to the linter exclude list due to errors
around %%s.
2016-10-04 17:18:54 -07:00