Commit Graph

73 Commits

Author SHA1 Message Date
Vishnu Ks e5c5960faa analytics: Remove unused get_user_profile_by_email import. 2017-07-14 13:35:43 -07:00
umkay d9b23b39d3 mypy: Fix strict-optional in analytics. 2017-05-26 15:39:39 -07:00
Aditya Bansal 3f0b22ce31 pep8: Add compliance with rule E261 to test_counts.py. 2017-05-07 23:21:50 -07:00
Rishi Gupta 61bf445da4 analytics: Restrict fill_to_time to hour boundaries in process_count_stat. 2017-04-28 16:15:07 -07:00
Rishi Gupta 5e49da9285 analytics: Only update daily stats on day boundaries.
Previously we would update FillState for daily stats on hourly boundaries as
well. This would create two extra queries on the FillState table every hour
(for each CountStat), which adds roughly 50ms of extra processing for each
CountStat each day, as well as two extra lines each hour in the analytics
log. This can be a minor annoyance when backfilling stats.
2017-04-18 11:02:51 -07:00
Rishi Gupta b335ad2794 models: Add MIN_INTERVAL_LENGTH to UserActivityInterval.
Was previously a floating magic number appearing in both
zerver/lib/actions.py and analytics/lib/counts.py.
2017-04-18 11:02:51 -07:00
hackerkid b2504084ab Replace timezone.now with timezone_now. 2017-04-16 12:28:56 -07:00
hackerkid 55c3d12078 Replace timezone.utc with timezone_utc. 2017-04-16 12:28:56 -07:00
Rishi Gupta 49bd330304 analytics: Add class DependentCountStat and stat realm_active_humans::day. 2017-04-14 11:41:07 -07:00
Rishi Gupta 62de1cf898 test_counts: Modernize TestProcessCountStat tests. 2017-04-14 11:41:07 -07:00
Rishi Gupta 1e8d2b984d counts.py: Rename DataCollector-level operations to be more generic.
We're about to use these for DependentCountStats that will run SQL queries
on the analytics tables instead of the zerver tables.
2017-04-14 11:41:07 -07:00
Rishi Gupta 6dff22cbaf counts.py: Change check for LoggingCountStat to use isinstance.
I think this is more pythonic?

We could also get rid of LoggingCountStats altogether, since it's now just a
special case of CountStat (is_logging == data_collector.pull_function is None).
But I think it's nice to keep the distinction since they behave so differently.
2017-04-14 11:41:07 -07:00
Rishi Gupta 118b44d4f0 counts.py: Change DataCollector to take a pull_function argument.
This will allow us to appropriately generalize CountStat to include
LoggingCountStat and CustomPullCountStat. It'll also make life easier when
we introduce DependentCountStat.
2017-04-14 11:41:07 -07:00
Rishi Gupta 6369d23633 counts.py: Rename ZerverCountQuery to DataCollector.
Not the final form of DataCollector, but the name change causes a big diff
so separating it out.
2017-04-14 11:41:07 -07:00
Rishi Gupta b3991e2557 counts.py: Move CountStat.group_by into ZerverCountQuery.
Part of a larger refactoring to reduce cyclic dependencies between CountStat
and DataCollector (coming soon).
2017-04-14 11:41:07 -07:00
Rishi Gupta 341e1b54fc counts.py: Remove zerver_table from ZerverCountQuery.
Was only needed for filter_args, which are now gone.
2017-04-14 11:41:07 -07:00
Rishi Gupta 661de6bf25 counts.py: Remove filter_args argument from CountStat definition.
It turned out to not be that useful once we added subgroup. The previous
design of the CountStat object also assumed more reuseability of the *_query
strings than what ended up happening.

The filter_args also had some carrying costs:

* It's hard to be confident that filter_args other than the ones explicitly
  in our tests would have had expected behavior.
* The filter_args/join_args system is the most complex part of the CountStat
  object, and makes understanding the *_query strings unnecessarily
  difficult for a new contributor.
2017-04-14 11:41:07 -07:00
Rishi Gupta 6bb97db136 analytics: Add active_users_audit:is_bot:day. 2017-04-14 11:41:07 -07:00
Rishi Gupta 3d514c3e8d analytics: Add a default for the value column in assertTableState.
A default value of 1 is reasonable in this framework, especially for testing
things like LoggingCountStats.
2017-04-14 11:41:07 -07:00
Rishi Gupta 2f74ccabf9 analytics: Add 15day_actives CountStat. 2017-04-14 11:41:07 -07:00
Rishi Gupta 9b661ca91f analytics: Replace CountStat.is_gauge with interval.
Groundwork for allowing stats like "Monthly Active Users".

CountStat.interval is no longer as clean a value as before, so removed it
from views.get_chart_data. It wasn't being used by the frontend anyway.

Removing interval from logger calls in counts.py is not a big loss since we
now include the frequency (which is typically also the interval) in
CountStat.property.
2017-04-14 11:41:07 -07:00
Rishi Gupta d6c5c672d3 analytics: Add minutes_active CountStat. 2017-04-14 11:41:07 -07:00
Rishi Gupta 30024d0a8f models: Remove Realm.domain. 2017-03-25 19:55:48 -07:00
hollywoodno dd067c761a analytics: Separate private messages from group private messages.
This makes it possible for our graphs to show the group private
message counts as separate from 1:1 private messages.

Fixes #4102.
2017-03-20 11:46:29 -07:00
Rishi Gupta ceac6d9c59 analytics: Remove stray comment from test_counts.py.
The "actual test that would be nice to do" was indeed done!
2017-03-17 21:58:51 -07:00
Rishi Gupta 7c6f0033ed analytics: Add test for do_drop_all_analytics_tables. 2017-03-14 16:59:54 -07:00
Rishi Gupta 35f854a2fd analytics: Add test for do_aggregate_to_summary_table. 2017-03-04 16:46:09 -08:00
Rishi Gupta 8feea6c598 analytics: Add LoggingCountStat for number of users. 2017-03-04 16:46:09 -08:00
Raghav Jajodia a3a03bd6a5 mypy: Added Dict, List and Set imports.
Fixed mypy errors associated with the upgrade.
2017-03-04 14:33:44 -08:00
Rishi Gupta 8bea47d6b5 analytics: Do a stylistic cleanup of TestProcessCountStat. 2017-03-03 16:12:12 -08:00
Rishi Gupta 20255e48a4 analytics: Change messages_sent_to_stream to a daily stat.
Analytics database tables are getting big, and so we're likely moving to a
model where ~all stats are day stats, and we keep hourly stats only for the
last N days.

Also changed the name because:
* messages_sent_* suggests the counts (summed over subgroup) should be the
  same as the other messages_sent stats, but they are different (these don't
  include PMs).
* messages_sent_by_stream:is_bot:day is longer than 32 characters, the max
  allowable length for a BaseCount.property.

Includes a database migration to remove the old stat from the analytics
tables.
2017-03-03 16:11:28 -08:00
Rishi Gupta 37bdc7c010 analytics: Remove COUNT_STATS['messages_sent:hour'].
Having both messages_sent:hour and messages_sent:is_bot:day is confusing,
since a single messages_sent:is_bot:hour would have a superset of the
information and take less total space. This commit and its parent together
replace the two stats with a single messages_sent:is_bot:hour.
2017-01-17 15:54:57 -08:00
Rishi Gupta b593ac9d7c analytics: Change messages_sent:is_bot to hourly frequency.
In preparation for replacing messages_sent.
2017-01-17 15:54:57 -08:00
Rishi Gupta 68fcb4152f analytics: Remove interval field from *Count tables.
Includes a database migration. The interval field was originally there to
facilitate time aggregation (e.g. aggregate_hour_to_day), but we now do such
aggregations in views code or in the frontend.
2017-01-17 15:54:57 -08:00
Rishi Gupta a8f2ebb443 analytics: Include interval in COUNT_STATS property names. 2017-01-17 15:54:57 -08:00
Rishi Gupta c466036c80 analytics: Remove unneeded references to interval from test_counts.py. 2017-01-17 15:54:57 -08:00
Rishi Gupta 12d277d4f4 analytics: Change messages_sent:client stat to daily frequency.
A few reasons:
* Our two other subgroup'd message stats in UserCount are at CountStat.DAY
  frequency (messages_sent:is_bot and messages_sent:message_type).
* Keeping this stat at hourly frequency would likely double the size of our
  analytics table, given the current stats. (Counterpoint: if there are
  roughly as many active streams as active users, and we keep
  messages_sent_to_stream:is_bot at hourly frequency, then maybe this stat
  is only a 30% or 50% increase).
* We're currently only showing this on the frontend as a pie chart anyway.
2017-01-17 15:54:57 -08:00
Rishi Gupta cdb1c96169 analytics tests: Refactor assertCountEquals calls to be more readable. 2017-01-17 15:54:57 -08:00
Rishi Gupta 59d50c3a47 analytics tests: Make it easy to refer to users in test realm. 2017-01-17 15:54:57 -08:00
Rishi Gupta 54e66e6079 analytics: Add remaining backend tests in TestCountStats. 2017-01-17 15:54:57 -08:00
aakash-cr7 b373f2ef0f analytics: Add backend test for messages_sent_to_stream:is_bot. 2017-01-17 15:54:57 -08:00
Amy Liu 10c0c2b16d analytics: Add backend tests for messages_sent:message_type. 2017-01-17 15:54:57 -08:00
Rishi Gupta f30b174199 analytics: Set property and interval defaults in assertCountEquals. 2017-01-17 15:54:57 -08:00
Rishi Gupta a563a15f88 analytics: Make TestCountStats tests more robust.
Adds two things to TestCountStats.setUp():
* A realm with no messages, that generally should not show up in *Count
  tables,
* Users/streams/messages created at 0, 1, 61, and 1441 (just over a day)
  minutes ago (previously was 0, 60), to better test the start_time/end_time
  in the queries, and the frequency/interval setting in the CountStats.
2017-01-17 15:54:57 -08:00
Rishi Gupta e94bc8f142 analytics tests: Autogenerate names for create* functions. 2017-01-17 15:54:57 -08:00
Amy Liu f7ce76fb63 analytics: Add create_stream_with_recipient and create_huddle_with_recipient.
This commit replaces AnalyticsTestCase.create_stream with create_stream_with_recipient and adds the method create_huddle_with_recipient.
2017-01-17 15:54:57 -08:00
Rishi Gupta 552d626ef2 analytics: Fix FillState.last_modified not being updated.
We were updating FillState with FillState.objects.filter(..).update(..),
which does not update the last_modified field (which has auto_now=True).
The correct incantation is the save() method of the actual FillState
object.
2017-01-08 23:36:34 -08:00
Rishi Gupta c7c0e36508 analytics: Add InstallationCount checks to prototype TestCountStat.
Was enabled by commit 41e8ee3 where we moved TIME_ZERO to before the realms
created by populate_db.py.

Also removes the stub for TestAggregates, since the remaining thing to be
tested was the aggregation from RealmCount to InstallationCount, and the end
to end checks provided by the TestCountStat tests should be sufficient.
2016-12-20 12:03:23 -08:00
Rishi Gupta dbc94d0fc0 analytics: Remove test for no longer supported behavior.
In a previous design, there was no FillState table, and one could run any
CountStat at any time. This is no longer supported.

This test was making sure that if one ran a CountStat at a certain hour, and
then ran it at a previous hour, the old rows would still be there.
2016-12-20 12:03:23 -08:00
Rishi Gupta e09aaf1020 analytics: Remove tests that will be subsumed by TestCountStats. 2016-12-20 12:03:23 -08:00