Commit Graph

211 Commits

Author SHA1 Message Date
Rishi Gupta 35f854a2fd analytics: Add test for do_aggregate_to_summary_table. 2017-03-04 16:46:09 -08:00
Rishi Gupta 8feea6c598 analytics: Add LoggingCountStat for number of users. 2017-03-04 16:46:09 -08:00
Rishi Gupta 51b7677db7 Add RealmAuditLog table and record user activation/deactivation events.
The RealmAuditLog will make it easier for server admins to replay history.
2017-03-04 16:45:44 -08:00
Raghav Jajodia a3a03bd6a5 mypy: Added Dict, List and Set imports.
Fixed mypy errors associated with the upgrade.
2017-03-04 14:33:44 -08:00
Rishi Gupta 1453a5bfda Change string_id of test zephyr realm from mit to zephyr.
Also changes Realm.is_zephyr_mirror_realm to use string_id=zephyr instead of
domain=mit.edu.
Part of a larger migration away from Realm.domain.
2017-03-04 12:18:01 -08:00
Rishi Gupta 8bea47d6b5 analytics: Do a stylistic cleanup of TestProcessCountStat. 2017-03-03 16:12:12 -08:00
Rishi Gupta 6c784d6321 analytics: Refactor COUNT_STATS declaration to not repeat itself. 2017-03-03 16:11:28 -08:00
Rishi Gupta 20255e48a4 analytics: Change messages_sent_to_stream to a daily stat.
Analytics database tables are getting big, and so we're likely moving to a
model where ~all stats are day stats, and we keep hourly stats only for the
last N days.

Also changed the name because:
* messages_sent_* suggests the counts (summed over subgroup) should be the
  same as the other messages_sent stats, but they are different (these don't
  include PMs).
* messages_sent_by_stream:is_bot:day is longer than 32 characters, the max
  allowable length for a BaseCount.property.

Includes a database migration to remove the old stat from the analytics
tables.
2017-03-03 16:11:28 -08:00
Rishi Gupta 4dc791f393 Clean up timestamps.py and add a test. 2017-03-01 23:03:56 -08:00
Rishi Gupta 562bc6429c Replace datetime.now() with timezone.now() in Django ORM queries.
When you pass a naive datetime to the Django ORM, it uses settings.TIME_ZONE
for the time zone. In the development environment, both settings.TIME_ZONE
and datetime.now() use 'America/New_York', so there is no change in behavior
there. (fromtimestamp with no tz argument uses the same timezone as
datetime.now)

We are soon going to change settings.TIME_ZONE to UTC, so need to remove
naive datetimes from queries to the ORM.
2017-03-01 22:54:28 -08:00
Rishi Gupta 01a4615f6e Change datetime.now to timezone.now in active_user_stats_by_day.
This actually fixes previously broken behavior, since 'date' here gets
turned into the 'day' argument of seconds_active_during_day(day), where
tzinfo is set to UTC.
2017-03-01 22:54:28 -08:00
Rishi Gupta 2b2be8120f Change datetime.now(tz=X) to timezone.now().
datetime.now with a timezone set is equivalent to timezone.now() if it's
never being printed out, but the latter is cleaner and more idiomatic.
2017-03-01 22:54:28 -08:00
Rishi Gupta eee5cb5197 analytics: Add tests for views code. 2017-02-11 14:51:01 -08:00
Rishi Gupta d6ce017a58 analytics: Minor cleanup of views.py. 2017-02-11 14:51:01 -08:00
Rishi Gupta 480fc0874b analytics: Break ties deterministically in sort_client_labels. 2017-02-11 14:51:01 -08:00
Tim Abbott f944ac8902 mypy: Fix incorrect annotation for by_used_time. 2017-02-10 23:53:44 -08:00
Rishi Gupta 19d1fc6223 stats: Pass user data to the frontend for messages sent over time. 2017-02-10 14:41:18 -08:00
Rishi Gupta 68a7f91022 stats: Add a fixed display order to summary charts.
API: Adds a "display_order" to the response, which is a suggested order of
importance for the clients or recipient types respectively.

frontend: Changes messages_sent_by_{client,recipient_type} to use a fixed
order for any given user.
2017-02-10 14:41:18 -08:00
Rishi Gupta cf3ae2eafe stats: Turn messages_sent_by_client into a bar chart.
Also includes a number of changes to messages_sent_by_recipient_type that
were convenient to do at the same time, since the two charts share a lot of
code.
2017-02-10 14:41:18 -08:00
Rishi Gupta ce89c64f43 stats.js: Move name_map computation to the backend. 2017-02-10 14:41:18 -08:00
Rishi Gupta a1b1ffe1e4 analytics: Base default views end_time on FillState, not current time. 2017-02-10 14:41:07 -08:00
Rishi Gupta 6ab31d1bac analytics: Move time computation to later in get_chart_data. 2017-02-10 14:40:14 -08:00
Tim Abbott ec52322ae1 stats: Include Zulip and realm name in heading. 2017-02-07 11:22:57 -08:00
Tim Abbott 6c4eaf3d14 analytics: Map client names to user-facing versions.
This makes the pie charts on /stats more readable.
2017-02-05 22:19:10 -08:00
Tim Abbott 161522e04c analytics: Add comment explaining server admin routes. 2017-02-02 16:23:10 -08:00
Rishi Gupta 5eb5fa3f31 analytics: Change time_range to not include current day/hour.
Current day/hour will always be 0, since we haven't computed it yet for the
CountStat tables.
2017-02-02 10:59:52 -08:00
Tim Abbott e8b0880320 analytics: Log updates to analytics counts. 2017-02-01 17:02:46 -08:00
umkay 76f3d02590 analytics: Add cron job to run analytics jobs.
This adds a cron job to update the Zulip analytics counts, complete
with locking etc.

Substantially tweaked by tabbott.
2017-02-01 17:02:46 -08:00
Tim Abbott b7df84d5a8 analytics: Add indexes to optimize performance of aggregation.
These indexes fix some slow queries used in updating the analytics
tables, resulting in the analytics system consuming far less total
resources.
2017-02-01 15:47:49 -08:00
Amy Liu 0a39e354dc analytics: Add graphs of usage statistics on /stats.
This adds a frontend for the analytics system we've had for a few
months, showing several graphs of the data in Zulip.

There's a ton more that we can do with this tooling, but this initial
version is enough to provide users with a pretty good experience.

Fixes #2052.
2017-01-31 22:18:54 -08:00
Tim Abbott 4e171ce787 lint: Clean up E126 PEP-8 rule. 2017-01-23 22:06:13 -08:00
Tim Abbott d6e38e2a5c lint: Clean up E123 PEP-8 rule. 2017-01-23 21:34:26 -08:00
Tim Abbott 9cc83f87fc lint: Clean up E241 PEP-8 rule. 2017-01-23 21:21:14 -08:00
Tim Abbott 9640a9e864 lint: Clean up E712 PEP-8 rule. 2017-01-23 21:11:18 -08:00
Rishi Gupta 29799d93c6 analytics/views.py: Always return time series data for stats.
Makes a number of simplications to the analytics views code. The main one is
that we now return the entire data series, even if the data is eventually
going to go into a pie chart. This was prompted by us wanting several
different pie charts for each stat (one for last 30 days, one for all time,
etc), but I think it is also a more natural API. The total amount of data
being sent for the pie charts now is maybe half of what is being sent for
our single 'hourly' stat, or maybe up to 10,000 ints per year the
organization has been around.

The other big change is that the data being sent back is now always explicit
about whether it is data about the realm (stored in data['realm'], or data
about the user (stored in data['user']).
2017-01-19 17:44:17 -08:00
Rishi Gupta 734ca4644c analytics: Add random_seed argument to generate_time_series_data. 2017-01-17 15:54:57 -08:00
Rishi Gupta 37bdc7c010 analytics: Remove COUNT_STATS['messages_sent:hour'].
Having both messages_sent:hour and messages_sent:is_bot:day is confusing,
since a single messages_sent:is_bot:hour would have a superset of the
information and take less total space. This commit and its parent together
replace the two stats with a single messages_sent:is_bot:hour.
2017-01-17 15:54:57 -08:00
Rishi Gupta b593ac9d7c analytics: Change messages_sent:is_bot to hourly frequency.
In preparation for replacing messages_sent.
2017-01-17 15:54:57 -08:00
Rishi Gupta 68fcb4152f analytics: Remove interval field from *Count tables.
Includes a database migration. The interval field was originally there to
facilitate time aggregation (e.g. aggregate_hour_to_day), but we now do such
aggregations in views code or in the frontend.
2017-01-17 15:54:57 -08:00
Rishi Gupta a8f2ebb443 analytics: Include interval in COUNT_STATS property names. 2017-01-17 15:54:57 -08:00
Rishi Gupta c466036c80 analytics: Remove unneeded references to interval from test_counts.py. 2017-01-17 15:54:57 -08:00
Rishi Gupta 12d277d4f4 analytics: Change messages_sent:client stat to daily frequency.
A few reasons:
* Our two other subgroup'd message stats in UserCount are at CountStat.DAY
  frequency (messages_sent:is_bot and messages_sent:message_type).
* Keeping this stat at hourly frequency would likely double the size of our
  analytics table, given the current stats. (Counterpoint: if there are
  roughly as many active streams as active users, and we keep
  messages_sent_to_stream:is_bot at hourly frequency, then maybe this stat
  is only a 30% or 50% increase).
* We're currently only showing this on the frontend as a pie chart anyway.
2017-01-17 15:54:57 -08:00
Rishi Gupta 690002aef8 analytics: Add fixtures for several CountStats. 2017-01-17 15:54:57 -08:00
Rishi Gupta 2710a944e8 analytics: Refactor fixture creation to make it more general.
Also less verbose, in preparation for adding a bunch more fixtures.
2017-01-17 15:54:57 -08:00
Rishi Gupta 1f4a4e5e26 analytics: Force --clear-existing-data option in populate_analytics_db.
Makes more sense for a fixture generating script to just clear the existing
data every time.
2017-01-17 15:54:57 -08:00
Rishi Gupta 680e7f75e1 analytics: Change generate_time_series_data argument from length to days.
Previously, this function seemed ambivalent about whether it was generating
a series of abstract data points or a series of data points that would
correspond to times. Switch firmly to the latter, so e.g. if the frequency
changes, so will the length of the output sequence.
2017-01-17 15:54:57 -08:00
Rishi Gupta 3712fda30d analytics: Ensure fixture data points are non-negative. 2017-01-17 15:54:57 -08:00
Rishi Gupta ecfc336a15 analytics: Add views for remaining /stats graphs. 2017-01-17 15:54:57 -08:00
Rishi Gupta 73c0c4c52e analytics/views.py: Increase efficiency of get_time_series_by_subgroup.
Not sure if this would actually be a performance problem in practice, but
this was originally making a database query for each subgroup (instead of
just a single query getting data for all the subgroups).

Also removed the filter against the interval column, which will soon not be
needed (interval will be uniquely determined by the property).
2017-01-17 15:54:57 -08:00
Rishi Gupta d873902755 analytics/views.py: Refactor get_messages_sent_by_humans_and_bots.
Refactor out the reusable parts, since we're about to add several more
views.
2017-01-17 15:54:57 -08:00