Commit Graph

99 Commits

Author SHA1 Message Date
umkay 5e5a0d4db9 analytics: Add user-level count query for messages sent to {PMs, streams}.
Adds a count_X_by_Y_query to counts.py, similar in spirit to a
count_recipient_by_user query, where we would join on the Message,
Recipient, and UserProfile table. Here, we also join on the Stream table in
order to distinguish private and public streams, and we merge the counts for
PM and Huddle type messages into a single subgroup.
2016-11-01 17:00:43 -07:00
umkay a94599fca7 analytics/models.py: Add subgroup column to unique_together constraints. 2016-11-01 16:53:56 -07:00
umkay e92604ab78 analytics: Alter field length for property and interval in BaseCount. 2016-10-27 16:33:58 -07:00
umkay 610e92b94e analytics: Add subgroup column to analytics tables.
This is a major change to the analytics schema, and is the first step in a
number of refactorings and performance improvements. For instance, it allows

* Grouping sets of similar CountStats in the *Count tables. For instance,
  active{_humans,_bots} will now have the same property, but have different
  subgroup values.

* Combining queries that differ only in their value on 1 filter clause, so
  that we make fewer passes through the zerver tables. For instance, instead
  of running a query for each of messages_sent_to_public_streams and
  messages_sent_to_private_streams, we can now run a single query with a
  group by on Stream.invite_only, and store the group by value in the
  subgroup column.
2016-10-27 16:33:58 -07:00
Rishi Gupta 54016e1096 analytics: Remove outdated comment in counts.py. 2016-10-25 13:42:55 -07:00
umkay 87d22c9e4d analytics: Fix count_stream_by_realm.
Add a join clause on zerver_message in count_stream_by_realm,
otherwise we only output the final total streamcount for a realm
for every time entry.
2016-10-22 19:10:36 -07:00
umkay 906a4e3b26 analytics: Add performance and transaction logging to counts.py.
For each database query made by an analytics function, log time spent and
the number of rows changed to var/logs/analytics.log.
In the spirit of write ahead logging, for each (stat, end_time)
update, log the start and end of the "transaction", as well as time
spent.
2016-10-17 16:10:03 -07:00
Tim Abbott 4a4664d268 mypy: Remove a bunch of now-unnecessary type: ignore annotations.
Since mypy and typeshed have advanced a lot over the last several
months, we no longer need these `type: ignore` annotations.
2016-10-17 11:48:34 -07:00
Rishi Gupta 82b814a1cd analytics: Simplify frequency and measurement interval options.
Change the CountStat object to take an is_gauge variable instead of a
smallest_interval variable. Previously, (smallest_interval, frequency)
could be any of (hour, hour), (hour, day), (hour, gauge), (day, hour),
(day, day), or (day, gauge).
The current change is equivalent to excluding (hour, day) and (day, hour)
from the list above.

This change, along with other recent changes, allows us to simplify how we
handle time intervals. This commit also removes the TimeInterval object.
2016-10-14 10:18:37 -07:00
Rishi Gupta 807520411b analytics: Simplify logic in do_fill_count_stat_at_hour.
Adding FillState, removing do_aggregate_hour_to_day, and disallowing unused
(interval, frequency) pairs removes the need for the nested for loops in
do_fill_count_stat_at_hour. This commit replaces that control flow with a
simpler equivalent.
2016-10-14 10:18:37 -07:00
Rishi Gupta 27d1360e1d analytics: Remove do_aggregate_hour_to_day.
The functionality provided is more naturally done in the views code. It also
allows us to aggregate using day boundaries from the local timezone, rather
than UTC.
2016-10-14 10:18:37 -07:00
Rishi Gupta 655ee51e35 analytics: Add table to keep track of fill state.
Adds two simplifying assumptions to how we process analytics stats:
* Sets the atomic unit of work to: a stat processed at an hour boundary.
* For any given stat, only allows these atomic units of work to be processed
  in chronological order.

Adds a table FillState that, for each stat, keeps track of the last unit of
work that was processed.
2016-10-14 10:18:37 -07:00
umkay 721529b782 analytics: Remove HuddleCount for now.
Planned changes to the underlying analytics model will require potentially
complicated changes to huddle queries.
2016-10-14 10:18:37 -07:00
umkay 7e2340155d analytics: Fix aggregation to RealmCount for realms with no users.
Previously, if a Realm had no users (or no streams),
do_aggregate_to_summary_table would fail to add a row with value 0. This
commit fixes the issue and also simplifies the do_aggregate_to_summary_table
logic.
2016-10-11 18:20:58 -07:00
Rishi Gupta 52b56cca65 analytics: Reorder arguments to assertCountEquals.
Require a table argument and change argument order around for clarity.
2016-10-11 18:20:58 -07:00
Rishi Gupta 929b69397b analytics: Change string representation of BaseCount models.
Previously we showed both the value and the id of the BaseCount record,
which is confusing in a typical case where you only care about the value,
and both the value and id are smallish ints.
2016-10-09 16:09:04 -07:00
Rishi Gupta c6b611c8b9 analytics: Re-organize tests into higher level TestClasses.
Refactor the current analytics tests into the following classes:
* TestUpdateAnalyticsCounts, which will eventually test the management
  command, backfilling, what happens when new tests are added, etc.
* TestProcessCountStat, which tests the ins and outs of propagating the
  value of a single stat up through the various *Count tables.
* TestAggregates, which tests the do_aggregate_* methods.
* TestXByYQueries, which tests the count_X_by_Y_query SQL snippets.
* TestCountStats, which has tests for individual CountStats.

This commit does not change the name or contents of any individual test.
2016-10-09 16:09:04 -07:00
Rishi Gupta 795d10b9ad analytics: Refactor tests to simplify asserts of count values.
Many tests are structured to run some process, and then check a count in a
BaseCount record using default values for realm, property, interval, and
end_time. This commit adds a new assertCountEquals method to
AnalyticsTestCase, and simplifies other assert calls as appropriate.
2016-10-09 16:09:04 -07:00
Rishi Gupta 35cf9092d5 analytics: Canonicalize creation of model objects in tests.
Add a default_realm object to AnalyticsTestCase, created 2 days before
AnalyticsTestCase.TIME_ZERO.

Add lightweight create_user, create_stream, and create_message methods to
AnalyticsTestCase, with sensible defaults. In particular, all objects are by
default created at AnalyticsTestCase.TIME_LAST_HOUR, so that they are
included when running AnalyticsTestCase.process_last_hour.
2016-10-09 16:09:04 -07:00
Rishi Gupta 6996b21480 analytics: Change tests to use a fixed TIME_ZERO.
Previously, analytics tests used timezone.now or custom datetime objects
when creating new realms, users, and streams.

This commit adds a fixed TIME_ZERO and a process_last_hour helper function
in a new AnalyticsTestCase class, and modifies the existing tests to use
them.
2016-10-09 16:09:04 -07:00
umkay 78477ea071 Reorder the columns in analytics tables inherited from BaseCount.
This is primarily implemented through altering the migration file in
order to move the columns, but also we try to make the defaults a
little better for future tables inherited from BaseCount.
2016-10-06 17:51:01 -07:00
umkay 01324f2afe Fix aggregation to analytics summary tables.
There are a number of different stats that need to be propagated from
UserCount and StreamCount to RealmCount, and from RealmCount to
InstallationCount. Stats with hour intervals also need to have their day
values propagated. This commit fixes a bug in the summary table aggregation
logic so that for a given interval on a CountStat object we pull the correct
counts for the interval as well as do the day aggregation if required. We Also
ensure that any aggregation then done from the realmcount
table to the installationcount table follows the same aggregation logic
for intervals.
2016-10-06 08:46:33 -07:00
Tim Abbott 273c17a072 update_analytics_counts: Add missing future imports. 2016-10-05 17:13:46 -07:00
umkay 5d0bed8673 Add script to clear analytics tables. 2016-10-05 17:11:13 -07:00
Tim Abbott 3973ae5dbb update_analytics_counts: Fix buggy argument parsing. 2016-10-04 20:43:19 -07:00
umkay d260a22637 Add a new statistics/analytics framework.
This is a first pass at building a framework for collecting various
stats about realms, users, streams, etc. Includes:
* New analytics tables for storing counts data
* Raw SQL queries for pulling data from zerver/models.py tables
* Aggregation functions for aggregating hourly stats into daily stats, and
  aggregating user/stream level stats into realm level stats
* A management command for pulling the data

Note that counts.py was added to the linter exclude list due to errors
around %%s.
2016-10-04 17:18:54 -07:00
Taranjeet a137bf15ed Wrap some lines with length greater than 120.
With some tweaks by tabbott.
2016-07-06 14:35:16 -07:00
Eklavya Sharma 71e613424b Fix annotations clashing with UserProfile's model fields. 2016-06-13 20:01:01 +05:30
Hyunchel Kim f226456675 Add type annotations for analytics/views.py.
Type of parameter for function `is_recent`(line no.812) is `datetime`.
MyPy errors out, however, when the parameter is defined as `datetime`.
To get around, type `Any` is used.
2016-06-05 15:04:24 -07:00
Tim Abbott a1a27b1789 Annotate most Zulip management commands. 2016-06-04 10:12:06 -07:00
Eklavya Sharma 94e4b39112 Replace python2.7 by python everywhere. 2016-05-29 05:03:08 -07:00
Umair Khan f9bbc5d6ff Enable i18n support in URL configuration.
This supports i18n using all of the following:
- I18N urls
- Session
- Cookie
- HTTP header
2016-05-19 08:33:30 -07:00
Tim Abbott efd24b374e analytics: Fix cnts variable reuse with different type.
Found using mypy.
2016-05-12 14:07:32 -07:00
Tim Abbott b869be9301 style: Use 'not in' consistently rather than `not foo in`. 2016-05-09 17:00:10 -07:00
Umair Khan 5359e6b0d4 Convert Zulip to use Jinja2 templates.
This results in a substantial performance improvement for all of
Zulip's backend templates.

Changes in templates:
- Change `block.super` to `super()`.
- Remove `load` tag because Jinja2 doesn't support it.
- Use `minified_js()|safe` instead of `{% minified_js %}`.
- Use `compressed_css()|safe` instead of `{% compressed_css %}`.
- `forloop.first` -> `loop.first`.
- Use `{{ csrf_input }}` instead of `{% csrf_token %}`.
- Use `{# ... #}` instead of `{% comment %}`.
- Use `url()` instead of `{% url %}`.
- Use `_()` instead of `{% trans %}` because in Jinja `trans` is a block tag.
- Use `{% trans %}` instead of `{% blocktrans %}`.
- Use `{% raw %}` instead of `{% verbatim %}`.

Changes in tools:
- Check for `trans` block in `check-templates` instead of `blocktrans`

Changes in backend:
- Create custom `render_to_response` function which takes `request` objects
  instead of `RequestContext` object. There are two reasons to do this:
    1. `RequestContext` is not compatible with Jinja2
    2. `RequestContext` in `render_to_response` is deprecated.
- Add Jinja2 related support files in zproject/jinja2 directory. It
  includes a custom backend and a template renderer, compressors for js
  and css and Jinja2 environment handler.
- Enable `slugify` and `pluralize` filters in Jinja2 environment.

Fixes #620.
2016-05-09 09:55:18 -07:00
Umair Khan 6a0c7fec72 analytics: Add `at_risk_count` to Totals row in realm summary.
This fixes reading from an unset value in realm_summary_table, which
is fine with the Django template engine but will be problematic with
jinja2.
2016-05-07 17:30:06 -07:00
Tim Abbott 191201bd10 Fix unnecessary whitespace between % and (. 2016-05-04 14:22:52 -07:00
Tim Abbott 54022ac204 Fix unnecessary whitespace between , and ). 2016-05-04 14:16:53 -07:00
Ashish 6356584f84 Replace /json/update_pointer with REST style route. 2016-04-11 21:38:23 -07:00
Ashish 41993ef2f5 Replace /json/update_message_flags with REST style route. 2016-04-11 21:38:22 -07:00
Tim Abbott a1b306f9ce Finish purging 'fromt typing import *' from Zulip codebase. 2016-04-07 14:11:21 -07:00
Tim Abbott b8c82d5b43 Add PEP-484 type annotations to analytics/. 2016-04-03 15:40:23 -07:00
Tim Abbott 2436ad19ba analytics: Cleanup confusingly type-variable all_records. 2016-02-03 19:29:07 -08:00
Tim Abbott df1670ef59 Fix various float initialization to use 0.0 instead of 0.
This is needed to type-check these values.
2016-02-03 19:29:07 -08:00
Tim Abbott 1f44417fc1 Switch to using Python 3 style division everywhere.
Also add testing for this to our Python 3 compatibility test suite.
2016-01-26 21:09:43 -08:00
Tim Abbott a79e89b28f Cleanup remaining usage of % comprehensions without explicit tuples. 2015-12-05 15:29:42 -08:00
Tim Abbott 607eedfc25 Apply Python 3 futurize transform libmodernize.fixes.fix_zip. 2015-11-01 09:35:06 -08:00
Tim Abbott f7878a61e1 Apply Python 3 futurize transform libmodernize.fixes.fix_xrange_six. 2015-11-01 09:35:06 -08:00
Tim Abbott cd6f8e9191 Apply Python 3 futurize transform libmodernize.fixes.fix_map. 2015-11-01 09:35:05 -08:00
Tim Abbott b3ac668779 Apply Python 3 futurize transform libmodernize.fixes.fix_filter. 2015-11-01 09:26:16 -08:00