Commit Graph

361 Commits

Author SHA1 Message Date
arpit551 c4b5d09283 analytics: Add LoggingCount for messages read stats.
Whenever we use API queries to mark messages as read we now increment
two new LoggingCount stats, messages_read::hour and
messages_read_interactions::hour.

We add an early return in do_increment_logging_stat function if there
are no changes (increment is 0), as an optimization to avoid
unnecessary database queries.

We also log messages_read_interactions::hour Logging stat
as the number of API queries to mark messages as read.

We don't include tests for the case where do_update_pointer is called
because do_update_pointer will most likely be removed from the
codebase in the near future.
2020-06-14 21:15:27 -07:00
arpit551 27daf38587 analytics: Derive AnalyticsTestCase class from ZulipTestCase.
The ZulipTestCase class contains various useful utility functions that
we could be using in test_counts.
2020-06-14 21:08:24 -07:00
Anders Kaseorg 69c0959f34 python: Fix misuse of Optional types for optional parameters.
There seems to have been a confusion between two different uses of the
word “optional”:

• An optional parameter may be omitted and replaced with a default
  value.
• An Optional type has None as a possible value.

Sometimes an optional parameter has a default value of None, or None
is otherwise a meaningful value to provide, in which case it makes
sense for the optional parameter to have an Optional type.  But in
other cases, optional parameters should not have Optional type.  Fix
them.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-13 15:31:27 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Anders Kaseorg 67e7a3631d python: Convert percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-10 15:02:09 -07:00
Anders Kaseorg 5839fdf963 analytics: Improve escaping correctness with psycopg2.sql.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-09 21:12:43 -07:00
Anders Kaseorg 8dd83228e7 python: Convert "".format to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus --keep-percent-format, but with the
NamedTuple changes reverted (see commit
ba7906a3c6, #15132).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-08 15:31:20 -07:00
arpit551 fb2aae1c02 analytics tests: Save recipient in stream object.
At the time of creating streams in test_counts.py we earlier did not saved
recipient in the stream object.

stream.recipient is used in many functions so they would throw error.

The right long-term fix here is probably to just use the standard
stream creation functions rather than having a hacky duplicate
here.
2020-06-08 11:33:24 -07:00
Anders Kaseorg 1f565a9f41 timezone: Use standard library datetime.timezone.utc consistently.
datetime.timezone is available in Python ≥ 3.2.  This also lets us
remove a pytz dependency from the PostgreSQL scripts.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-05 09:34:17 -07:00
Sahil Batra 77d4be56a4
users: Modify do_create_user and create_user to accept role.
We change do_create_user and create_user to accept
role as a parameter instead of 'is_realm_admin' and 'is_guest'.
These changes are done to minimize data conversions between
role and boolean fields.
2020-06-02 16:11:36 -07:00
Anders Kaseorg 840cf4b885 requirements: Drop direct dependency on mock.
mock is just a backport of the standard library’s unittest.mock now.

The SAMLAuthBackendTest change is needed because
MagicMock.call_args.args wasn’t introduced until Python
3.8 (https://bugs.python.org/issue21269).

The PROVISION_VERSION bump is skipped because mock is still an
indirect dev requirement via moto.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-26 11:40:42 -07:00
sahil839 9b78a73e36 populate_db: Add new admin user as 'Desdemona'.
This commit adds a second admin user named 'Desdemona' to dev and
test database.
2020-05-19 11:42:27 -07:00
sahil839 36dca5ba5c analytics: Add order_by to query used for fetching admin emails.
This commit adds order_by to the query that fetches admin emails
in views.py. This is added to show emails alphabetically in case
of multiple admins.
2020-05-19 11:42:27 -07:00
Anders Kaseorg fead14951c python: Convert assignment type annotations to Python 3.6 style.
This commit was split by tabbott; this piece covers the vast majority
of files in Zulip, but excludes scripts/, tools/, and puppet/ to help
ensure we at least show the right error messages for Xenial systems.

We can likely further refine the remaining pieces with some testing.

Generated by com2ann, with whitespace fixes and various manual fixes
for runtime issues:

-    invoiced_through: Optional[LicenseLedger] = models.ForeignKey(
+    invoiced_through: Optional["LicenseLedger"] = models.ForeignKey(

-_apns_client: Optional[APNsClient] = None
+_apns_client: Optional["APNsClient"] = None

-    notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
-    signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)

-    author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)
+    author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)

-    bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)
+    bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)

-    default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
-    default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)

-descriptors_by_handler_id: Dict[int, ClientDescriptor] = {}
+descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {}

-worker_classes: Dict[str, Type[QueueProcessingWorker]] = {}
-queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {}
+worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {}
+queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {}

-AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None
+AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-22 11:02:32 -07:00
Anders Kaseorg 1cf63eb5bf python: Whitespace fixes from autopep8.
Generated by autopep8, with the setup.cfg configuration from #14532.
I’m not sure why pycodestyle didn’t already flag these.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-21 17:58:09 -07:00
Vishnu KS 905f3d2d23 stats: Don't show analytics data unavailable error for new realms.
The value of start would be set to realm creation date or installation
date unless a value is explicitly provided by the user. We don't
want to show analytics data missing error in both these cases if
start is less than 24.5 hours from timezone_now() since some of the
stats are populated by a daily cron job that may take a couple of
minutes to run.

The same case applies if the value of the start is passed in the
request.
2020-04-03 11:51:20 -07:00
Steve Howell 1306239c16 tests: Use email/delivery_email more explicitly.
We try to use the correct variation of `email`
or `delivery_email`, even though in some
databases they are the same.

(To find the differences, I temporarily hacked
populate_db to use different values for email
and delivery_email, and reduced email visibility
in the zulip realm to admins only.)

In places where we want the "normal" realm
behavior of showing emails (and having `email`
be the same as `delivery_email`), we use
the new `reset_emails_in_zulip_realm` helper.

A couple random things:

    - I fixed any error messages that were leaking
      the wrong email

    - a test that claimed to rely on the order
      of emails no longer does (we sort user_ids
      instead)

    - we now use user_ids in some place where we used
      to use emails

    - for IRC mirrors I just punted and used
      `reset_emails_in_zulip_realm` in most places

    - for MIT-related tests, I didn't fix email
      vs. delivery_email unless it was obvious

I also explicitly reset the realm to a "normal"
realm for a couple tests that I frankly just didn't
have the energy to debug.  (Also, we do want some
coverage on the normal case, even though it is
"easier" for tests to pass if you mix up `email`
and `delivery_email`.)

In particular, I just reset data for the analytics
and corporate tests.
2020-03-19 16:04:03 -07:00
Mateusz Mandera b4ce167a88 models: Add recipient foreign key to Huddle.
This follows the already tested approach from
8acfa17fe6.
2020-03-17 05:41:11 -07:00
Steve Howell 1b16693526 tests: Limit email-based logins.
We now have this API...

If you really just need to log in
and not do anything with the actual
user:

    self.login('hamlet')

If you're gonna use the user in the
rest of the test:

    hamlet = self.example_user('hamlet')
    self.login_user(hamlet)

If you are specifically testing
email/password logins (used only in 4 places):

    self.login_by_email(email, password)

And for failures uses this (used twice):

    self.assert_login_failure(email)
2020-03-11 17:10:22 -07:00
arpit551 f299f31340 analytics: Fix missing unique constraint when subgroup is null.
Replaced unique_together with UniqueConstraint in models that
covered nullable fields as in unique_together database indexes
don't work where subgroup=None. So added conditional unique
index handling invalid duplicate Count data.

Added 0015_clear_duplicate_counts migration to handle existing
data that violates the constraints.

Also corrected a test case in test_counts.py which didn't clear its
state properly and thus was accidentally taking advantage of this
database schema bug.
2020-03-06 11:10:04 -08:00
arpit551 b23a5431cd analytics: Add realm argument to analytics.
This changeset is prepartory work for doing something reasonable with
analytics data during the zulip -> zulip data import process (and
potentially e.g. slack -> Zulip as well).

To support that, we need to make it possible to do our analytics
calculations for a single realm.

We do this while maintaining backwards compatibility and avoiding
massive duplicated code by adding an optional `realm` argument to the
entrypoints to the analytics system, especially process_count_stat.

More work involving restructuring FillState will be required for this
to be actually usable for its intented purpose, but this commit is a
nice checkpoint along the way.

Tweaked by tabbott to adjust comments and disable InstallationCount
updates when a realm argument is specified.
2020-01-23 17:36:13 -08:00
Vishnu Ks a26b379a14 support: Send confirmation email on realm activation. 2019-12-02 09:51:45 -08:00
Vishnu KS ec955f8f78 support: Show confirmation links in search.
Fixes #13060 #12784
2019-10-21 16:56:50 -07:00
Mateusz Mandera bbf2474bd0 tests: setUp overrides should call super().setUp().
MigrationsTestCase is intentionally omitted from this, since migrations
tests are different in their nature and so whatever setUp()
ZulipTestCase may do in the future, MigrationsTestCase may not
necessarily want to replicate.
2019-10-19 17:27:01 -07:00
Rishi Gupta 4256ee61cf billing: Change RealmAuditLog.event_type from str to int.
This is a more robust long-term model for storing these data.
2019-10-06 15:55:56 -07:00
Mateusz Mandera dbe508bb91 models: Migration of Message.pub_date to date_sent, part 2.
Fixes #1727.

With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.
2019-10-05 19:01:34 -07:00
Tim Abbott a352f2e10d test_counts: Remove custom user creation code.
Like the last commit, this avoids us needing to update this random
test class in the analytics subsystem when we adjust the UserProfile
model.
2019-09-19 14:31:58 -07:00
Anders Kaseorg becef760bf cleanup: Delete leading newlines.
Previous cleanups (mostly the removals of Python __future__ imports)
were done in a way that introduced leading newlines.  Delete leading
newlines from all files, except static/assets/zulip-emoji/NOTICE,
which is a verbatim copy of the Apache 2.0 license.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-08-06 23:29:11 -07:00
Mateusz Mandera b321dcd50b test_views: Prepare for moving system bots to zulipinternal. 2019-07-24 16:26:10 -07:00
Vishnu Ks 14e582fb59 support: Add functionality to copy admin emails.
Also renamed a bunch of functions in test_views for better
readability.
2019-06-14 10:19:50 -07:00
Tim Abbott 1d44fd724b audit log: Log which server admin deactivated a realm too. 2019-05-08 15:09:48 -07:00
Rishi Gupta 0218da64f1 support: Update success message for plan type change. 2019-05-08 15:09:48 -07:00
Rishi Gupta 98da11c558 support: Rename deactive to deactivated. 2019-05-08 15:09:48 -07:00
Vishnu Ks 6c58603eaf support: Add support for scrubbing realm. 2019-05-06 20:12:54 -07:00
Vishnu Ks f6203f068b support: Add support for activating and deactivating realm. 2019-05-06 20:12:48 -07:00
Vishnu Ks 8eeb8280b4 activity: Create interface for doing support operations.
This should grow into a tool that makes it much easier to do common
organization management tasks without using a manage.py shell.
2019-03-11 12:01:11 -07:00
Rishi Gupta 9962377018 analytics: Fix midnight-related bug in test.
Previously, this would flake if the day changed between
user2 = do_create_user('email2', 'password', ...) and
do_deactivate_user(user2).
2019-02-28 14:48:30 -08:00
Anders Kaseorg f5197518a9 analytics/zilencer/zproject: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:31:45 -08:00
Rishi Gupta 85f7ac8172 analytics: Remove Anomaly model. 2019-02-01 18:48:18 -08:00
Tim Abbott 7ddcbd0d3a analytics: Set delivery_email in hacky test user creation code. 2018-12-06 15:33:28 -08:00
Steve Howell c0fd8660d2 subject -> topic: Fix analytics test (minor). 2018-11-14 23:24:06 -08:00
Yashashvi Dave 02a5849d4c statistics: Guest user can't access realm statistics.
Don't allow guest user to access realm statistics from
UI or at API level.

Fixes part of #10749.
2018-11-02 11:43:09 -07:00
Vishnu Ks 201b99a6f8 models: Add USER_REACTIVATED event type constant to RealmAuditLog. 2018-07-10 15:42:26 +05:30
Vishnu Ks d0b89cbb44 models: Add USER_DEACTIVATED event type constant to RealmAuditLog. 2018-07-10 15:42:26 +05:30
Vishnu Ks ce3fffdbb2 models: Add USER_ACTIVATED event type constant to RealmAuditLog. 2018-07-10 15:42:26 +05:30
Vishnu Ks 2c8effe9fe models: Add USER_CREATED event type constant to RealmAuditLog. 2018-07-10 15:42:26 +05:30
Nikhil Kumar Mishra fa9d79e203 stats: Add 1 day actives and total users to number of users chart. 2018-05-20 10:56:16 -07:00
Nikhil Kumar Mishra 26decb4c48 stats: Add 1day_actives::day CountStat to analytics tables. 2018-05-20 10:56:16 -07:00
Rishi Gupta 1af7fc7344 stats: Add /stats/installation. 2018-05-18 15:12:36 -07:00
Rishi Gupta 2fe3fba6ce stats: Rename data.realm to data.everyone.
We use "Everyone" for the button labels already.

Soon we'll support "Everyone" meaning either the installation or the realm,
depending on the URL route used to access the stats.
2018-05-18 15:12:36 -07:00
Aditya Bansal 5adf983c3c analytics: Change use of typing.Text to str. 2018-05-10 14:19:49 -07:00
Shubham Dhama b26c38bc47 analytics: Make stats of all realms accessible to server admins.
In this commit:
Two new URLs are added, to make all realms accessible for server
admins. One is for the stats page itself and another for getting
chart data i.e. chart data API requests.
For the above two new URLs corresponding two view functions are
added.
2018-04-18 11:06:50 -07:00
Rishi Gupta fbd8dde1f8 invitations: Add LoggingCountStat to keep track of sent invitations. 2017-12-06 20:35:50 -08:00
rht 01885cdedc analytics: Use Python 3 syntax for typing (final). 2017-11-22 12:16:59 -08:00
Tim Abbott a0cfe45150 analytics: Wrap some longer lines. 2017-11-17 13:19:48 -08:00
rht d1689b5884 analytics: Use python 3 syntax for typing. 2017-11-17 13:16:49 -08:00
Tim Abbott 2b43a0302a python: Sort imports in smaller apps. 2017-11-15 15:55:49 -08:00
rht 17a19993f1 analytics/tests: Remove unused imports (F401). 2017-11-07 16:37:11 -08:00
rht c4fcff7178 refactor: Replace super(.*self) with Python 3-specific super().
We change all the instances except for the `test_helpers.py`
TimeTrackingCursor monkey-patching, which actually needs to specify
the base class.
2017-10-30 14:30:25 -07:00
rht 691598a88b py3: Remove "from six.moves import range".
This is no longer required, since in Python 3, this is what the range
built-in does.
2017-10-17 23:28:14 -07:00
Rishi Gupta c7bdabbda8 analytics: Disallow non-UTC fill times in process_count_stat.
No change in behavior, but we aren't supporting non-UTC times in analytics
as a whole any more, so might as well change this check as well.
2017-10-05 11:22:06 -07:00
Rishi Gupta 0596c4a810 analytics: Enforce various datetime arguments are in UTC.
Sort of a hacky hammer, but
* The original design of the analytics system mistakenly attempted to play
  nicely with non-UTC datetimes.
* Timezone errors are really hard to find and debug, and don't jump out that
  easily when reading code.

I don't know of any outstanding errors, but putting a few "assert this
timezone is in UTC" around will hopefully reduce the chance that there are
any current or future timezone errors.

Note that none of these functions are called outside of the analytics code
(and tests). This commit also doesn't change any current behavior, assuming
a database where all datetimes have been being stored in UTC.
2017-10-05 11:22:06 -07:00
Rishi Gupta 0f31cddf49 analytics: Add management command to clear single stat. 2017-10-05 11:22:06 -07:00
Tim Abbott 575bd0e255 analytics: Update name for old Android app. 2017-10-03 11:59:41 -07:00
rht 4494112862 analytics: Remove absolute_import. 2017-09-27 20:20:07 -07:00
Umair Khan 6d4ba50ceb result.json: Upgrade test_views. 2017-08-17 09:03:35 -07:00
Tim Abbott e3e2d9093c analytics: Remove references to UserProfile.objects.get. 2017-08-15 12:48:33 -07:00
neiljp (Neil Pilgrim) b782db48e1 mypy: Remove superfluous older 'type: ignore' annotations. 2017-08-08 11:27:51 -07:00
Vishnu Ks a99c60ce07 analytics: Add translation tags to stats.html. 2017-07-16 16:16:43 -07:00
Vishnu Ks e5c5960faa analytics: Remove unused get_user_profile_by_email import. 2017-07-14 13:35:43 -07:00
Vishnu Ks 0a67e00702 analytics: Use example_user in test_views.py. 2017-07-14 13:35:43 -07:00
Tim Abbott 0b60f11cf8 analytics: Display Zulip Electron app as the main desktop app.
This should make the /stats data significantly clearer.
2017-07-07 18:31:47 -07:00
Tim Abbott 31caab4229 analytics: Rename 'new iOS app' to 'Mobile app'. 2017-07-07 18:31:13 -07:00
Aditya Bansal 061cb9ae44 pep8: Add compliance with rule E261 to analytics/tests/test_views.py. 2017-05-31 17:07:15 -07:00
umkay d9b23b39d3 mypy: Fix strict-optional in analytics. 2017-05-26 15:39:39 -07:00
Aditya Bansal 3f0b22ce31 pep8: Add compliance with rule E261 to test_counts.py. 2017-05-07 23:21:50 -07:00
Rishi Gupta 61bf445da4 analytics: Restrict fill_to_time to hour boundaries in process_count_stat. 2017-04-28 16:15:07 -07:00
Rishi Gupta f595f4f7f2 analytics: Change Number of Users chart to use realm_active_humans::day.
Previously we showed the total number of users with an active account. This
changes it to show only the number of users that have logged in in the past
two weeks.
2017-04-25 18:35:13 -07:00
Rishi Gupta 5e49da9285 analytics: Only update daily stats on day boundaries.
Previously we would update FillState for daily stats on hourly boundaries as
well. This would create two extra queries on the FillState table every hour
(for each CountStat), which adds roughly 50ms of extra processing for each
CountStat each day, as well as two extra lines each hour in the analytics
log. This can be a minor annoyance when backfilling stats.
2017-04-18 11:02:51 -07:00
Rishi Gupta b335ad2794 models: Add MIN_INTERVAL_LENGTH to UserActivityInterval.
Was previously a floating magic number appearing in both
zerver/lib/actions.py and analytics/lib/counts.py.
2017-04-18 11:02:51 -07:00
hackerkid b2504084ab Replace timezone.now with timezone_now. 2017-04-16 12:28:56 -07:00
hackerkid 55c3d12078 Replace timezone.utc with timezone_utc. 2017-04-16 12:28:56 -07:00
Rishi Gupta 49bd330304 analytics: Add class DependentCountStat and stat realm_active_humans::day. 2017-04-14 11:41:07 -07:00
Rishi Gupta 62de1cf898 test_counts: Modernize TestProcessCountStat tests. 2017-04-14 11:41:07 -07:00
Rishi Gupta 1e8d2b984d counts.py: Rename DataCollector-level operations to be more generic.
We're about to use these for DependentCountStats that will run SQL queries
on the analytics tables instead of the zerver tables.
2017-04-14 11:41:07 -07:00
Rishi Gupta 6dff22cbaf counts.py: Change check for LoggingCountStat to use isinstance.
I think this is more pythonic?

We could also get rid of LoggingCountStats altogether, since it's now just a
special case of CountStat (is_logging == data_collector.pull_function is None).
But I think it's nice to keep the distinction since they behave so differently.
2017-04-14 11:41:07 -07:00
Rishi Gupta 118b44d4f0 counts.py: Change DataCollector to take a pull_function argument.
This will allow us to appropriately generalize CountStat to include
LoggingCountStat and CustomPullCountStat. It'll also make life easier when
we introduce DependentCountStat.
2017-04-14 11:41:07 -07:00
Rishi Gupta 6369d23633 counts.py: Rename ZerverCountQuery to DataCollector.
Not the final form of DataCollector, but the name change causes a big diff
so separating it out.
2017-04-14 11:41:07 -07:00
Rishi Gupta b3991e2557 counts.py: Move CountStat.group_by into ZerverCountQuery.
Part of a larger refactoring to reduce cyclic dependencies between CountStat
and DataCollector (coming soon).
2017-04-14 11:41:07 -07:00
Rishi Gupta 341e1b54fc counts.py: Remove zerver_table from ZerverCountQuery.
Was only needed for filter_args, which are now gone.
2017-04-14 11:41:07 -07:00
Rishi Gupta 661de6bf25 counts.py: Remove filter_args argument from CountStat definition.
It turned out to not be that useful once we added subgroup. The previous
design of the CountStat object also assumed more reuseability of the *_query
strings than what ended up happening.

The filter_args also had some carrying costs:

* It's hard to be confident that filter_args other than the ones explicitly
  in our tests would have had expected behavior.
* The filter_args/join_args system is the most complex part of the CountStat
  object, and makes understanding the *_query strings unnecessarily
  difficult for a new contributor.
2017-04-14 11:41:07 -07:00
Rishi Gupta 6bb97db136 analytics: Add active_users_audit:is_bot:day. 2017-04-14 11:41:07 -07:00
Rishi Gupta 3d514c3e8d analytics: Add a default for the value column in assertTableState.
A default value of 1 is reasonable in this framework, especially for testing
things like LoggingCountStats.
2017-04-14 11:41:07 -07:00
Rishi Gupta 2f74ccabf9 analytics: Add 15day_actives CountStat. 2017-04-14 11:41:07 -07:00
Rishi Gupta 9b661ca91f analytics: Replace CountStat.is_gauge with interval.
Groundwork for allowing stats like "Monthly Active Users".

CountStat.interval is no longer as clean a value as before, so removed it
from views.get_chart_data. It wasn't being used by the frontend anyway.

Removing interval from logger calls in counts.py is not a big loss since we
now include the frequency (which is typically also the interval) in
CountStat.property.
2017-04-14 11:41:07 -07:00
Rishi Gupta d6c5c672d3 analytics: Add minutes_active CountStat. 2017-04-14 11:41:07 -07:00
Rishi Gupta 6e425814bf analytics: Add a few tests for fixtures.py.
The code in fixtures.py is only called from populate_analytics_db, and is
only used for generating pretty fixture data for manual testing. This commit
adds tests for a few things that were easy to add tests for, and provides
some minimal coverage of the file, but is not meant to be comprehensive.
2017-04-13 12:37:47 -07:00
Rishi Gupta f3fc9721f4 analytics: Match client names in populate_analytics_db to populate_db.
Originally, all the client names in populate_analytics_db started with
underscores to make it easy to selectively delete and regenerate them when
re-running populate_analytics_db.

We eventually want to merge populate_analytics_db into populate_db though,
in which case it makes more sense for them to share client names, and not
worry about the case where we run (or re-run) populate_analytics_db
independently of populate_db.
2017-04-12 11:45:15 -07:00
Rishi Gupta 30024d0a8f models: Remove Realm.domain. 2017-03-25 19:55:48 -07:00
Rishi Gupta 9f60dd8387 analytics: Send zeros for data.user.bot in Messages Sent Over Time.
It will simplify the logic needed to process the "Sent by Me" view in
Messages Sent Over Time in stats.js.

Also, we gzip the data sent from our server, so there is little additional
network usage by doing this.
2017-03-25 14:18:23 -07:00
Tim Abbott a474f4359d tests: Set maxDiff to None unconditionally. 2017-03-21 07:34:16 -07:00
Tim Abbott 8041ebf579 mypy: Annotate maxDiff variable. 2017-03-21 07:31:37 -07:00
Tim Abbott 20a7609018 analytics: Rename message count types to use standard Zulip casing. 2017-03-21 00:09:54 -07:00
hollywoodno dd067c761a analytics: Separate private messages from group private messages.
This makes it possible for our graphs to show the group private
message counts as separate from 1:1 private messages.

Fixes #4102.
2017-03-20 11:46:29 -07:00
Rishi Gupta ceac6d9c59 analytics: Remove stray comment from test_counts.py.
The "actual test that would be nice to do" was indeed done!
2017-03-17 21:58:51 -07:00
Rishi Gupta 7c6f0033ed analytics: Add test for do_drop_all_analytics_tables. 2017-03-14 16:59:54 -07:00
Rishi Gupta 35f854a2fd analytics: Add test for do_aggregate_to_summary_table. 2017-03-04 16:46:09 -08:00
Rishi Gupta 8feea6c598 analytics: Add LoggingCountStat for number of users. 2017-03-04 16:46:09 -08:00
Raghav Jajodia a3a03bd6a5 mypy: Added Dict, List and Set imports.
Fixed mypy errors associated with the upgrade.
2017-03-04 14:33:44 -08:00
Rishi Gupta 8bea47d6b5 analytics: Do a stylistic cleanup of TestProcessCountStat. 2017-03-03 16:12:12 -08:00
Rishi Gupta 20255e48a4 analytics: Change messages_sent_to_stream to a daily stat.
Analytics database tables are getting big, and so we're likely moving to a
model where ~all stats are day stats, and we keep hourly stats only for the
last N days.

Also changed the name because:
* messages_sent_* suggests the counts (summed over subgroup) should be the
  same as the other messages_sent stats, but they are different (these don't
  include PMs).
* messages_sent_by_stream:is_bot:day is longer than 32 characters, the max
  allowable length for a BaseCount.property.

Includes a database migration to remove the old stat from the analytics
tables.
2017-03-03 16:11:28 -08:00
Rishi Gupta eee5cb5197 analytics: Add tests for views code. 2017-02-11 14:51:01 -08:00
Rishi Gupta a1b1ffe1e4 analytics: Base default views end_time on FillState, not current time. 2017-02-10 14:41:07 -08:00
Tim Abbott 6c4eaf3d14 analytics: Map client names to user-facing versions.
This makes the pie charts on /stats more readable.
2017-02-05 22:19:10 -08:00
Rishi Gupta 5eb5fa3f31 analytics: Change time_range to not include current day/hour.
Current day/hour will always be 0, since we haven't computed it yet for the
CountStat tables.
2017-02-02 10:59:52 -08:00
Rishi Gupta 37bdc7c010 analytics: Remove COUNT_STATS['messages_sent:hour'].
Having both messages_sent:hour and messages_sent:is_bot:day is confusing,
since a single messages_sent:is_bot:hour would have a superset of the
information and take less total space. This commit and its parent together
replace the two stats with a single messages_sent:is_bot:hour.
2017-01-17 15:54:57 -08:00
Rishi Gupta b593ac9d7c analytics: Change messages_sent:is_bot to hourly frequency.
In preparation for replacing messages_sent.
2017-01-17 15:54:57 -08:00
Rishi Gupta 68fcb4152f analytics: Remove interval field from *Count tables.
Includes a database migration. The interval field was originally there to
facilitate time aggregation (e.g. aggregate_hour_to_day), but we now do such
aggregations in views code or in the frontend.
2017-01-17 15:54:57 -08:00
Rishi Gupta a8f2ebb443 analytics: Include interval in COUNT_STATS property names. 2017-01-17 15:54:57 -08:00
Rishi Gupta c466036c80 analytics: Remove unneeded references to interval from test_counts.py. 2017-01-17 15:54:57 -08:00
Rishi Gupta 12d277d4f4 analytics: Change messages_sent:client stat to daily frequency.
A few reasons:
* Our two other subgroup'd message stats in UserCount are at CountStat.DAY
  frequency (messages_sent:is_bot and messages_sent:message_type).
* Keeping this stat at hourly frequency would likely double the size of our
  analytics table, given the current stats. (Counterpoint: if there are
  roughly as many active streams as active users, and we keep
  messages_sent_to_stream:is_bot at hourly frequency, then maybe this stat
  is only a 30% or 50% increase).
* We're currently only showing this on the frontend as a pie chart anyway.
2017-01-17 15:54:57 -08:00
Rishi Gupta cdb1c96169 analytics tests: Refactor assertCountEquals calls to be more readable. 2017-01-17 15:54:57 -08:00
Rishi Gupta 59d50c3a47 analytics tests: Make it easy to refer to users in test realm. 2017-01-17 15:54:57 -08:00
Rishi Gupta 54e66e6079 analytics: Add remaining backend tests in TestCountStats. 2017-01-17 15:54:57 -08:00
aakash-cr7 b373f2ef0f analytics: Add backend test for messages_sent_to_stream:is_bot. 2017-01-17 15:54:57 -08:00
Amy Liu 10c0c2b16d analytics: Add backend tests for messages_sent:message_type. 2017-01-17 15:54:57 -08:00
Rishi Gupta f30b174199 analytics: Set property and interval defaults in assertCountEquals. 2017-01-17 15:54:57 -08:00
Rishi Gupta a563a15f88 analytics: Make TestCountStats tests more robust.
Adds two things to TestCountStats.setUp():
* A realm with no messages, that generally should not show up in *Count
  tables,
* Users/streams/messages created at 0, 1, 61, and 1441 (just over a day)
  minutes ago (previously was 0, 60), to better test the start_time/end_time
  in the queries, and the frequency/interval setting in the CountStats.
2017-01-17 15:54:57 -08:00
Rishi Gupta e94bc8f142 analytics tests: Autogenerate names for create* functions. 2017-01-17 15:54:57 -08:00
Amy Liu f7ce76fb63 analytics: Add create_stream_with_recipient and create_huddle_with_recipient.
This commit replaces AnalyticsTestCase.create_stream with create_stream_with_recipient and adds the method create_huddle_with_recipient.
2017-01-17 15:54:57 -08:00
Rishi Gupta 552d626ef2 analytics: Fix FillState.last_modified not being updated.
We were updating FillState with FillState.objects.filter(..).update(..),
which does not update the last_modified field (which has auto_now=True).
The correct incantation is the save() method of the actual FillState
object.
2017-01-08 23:36:34 -08:00
Rishi Gupta 73dc904e9c analytics: Move time_range from views.py to lib/time_utils.py 2017-01-08 17:24:51 -08:00
Rishi Gupta 9e5325a164 Add /stats page with basic stats graph.
Adds a new url route and a new json endpoint.
2016-12-29 14:20:13 -08:00
Rishi Gupta c7c0e36508 analytics: Add InstallationCount checks to prototype TestCountStat.
Was enabled by commit 41e8ee3 where we moved TIME_ZERO to before the realms
created by populate_db.py.

Also removes the stub for TestAggregates, since the remaining thing to be
tested was the aggregation from RealmCount to InstallationCount, and the end
to end checks provided by the TestCountStat tests should be sufficient.
2016-12-20 12:03:23 -08:00
Rishi Gupta dbc94d0fc0 analytics: Remove test for no longer supported behavior.
In a previous design, there was no FillState table, and one could run any
CountStat at any time. This is no longer supported.

This test was making sure that if one ran a CountStat at a certain hour, and
then ran it at a previous hour, the old rows would still be there.
2016-12-20 12:03:23 -08:00
Rishi Gupta e09aaf1020 analytics: Remove tests that will be subsumed by TestCountStats. 2016-12-20 12:03:23 -08:00
Rishi Gupta 6748b72ccc analytics: Remove tests now covered by test_active_users_by_is_bot. 2016-12-20 12:03:23 -08:00
Rishi Gupta 2211b8b102 analytics: Change count_message_by_stream to join on UserProfile.
It seems unlikely we will need count_message_by_stream without the
UserProfile table in the future, so write count_message_by_stream_and_is_bot
in the usual query form and replace count_message_by_stream with it.
This also has the benefit of shortening our list of "special case" queries
from two to one.

The pathways of the removed test will be covered more thoroughly in the new
TestCountStats tests.
2016-12-20 12:03:23 -08:00
Rishi Gupta 6992f9784c analytics: Update TestCountStat prototype. 2016-12-20 12:03:23 -08:00
Rishi Gupta c6a6c871ee analytics: Change TIME_ZERO in tests to be in the past. 2016-12-20 12:03:23 -08:00
Rishi Gupta f34af0896d analytics: Add subgroup argument to assertCountEquals. 2016-12-20 12:03:23 -08:00
Rishi Gupta 31cf8db28c analytics: Allow assertCountEquals to work on InstallationCount. 2016-12-20 12:03:23 -08:00
anirudhjain75 beaa62cafa mypy: Convert several directories to use typing.Text.
Specifically, these directories are converted: [analytics/, scripts/,
tools/, zerver/management/, zilencer/, zproject/]
2016-12-07 20:51:05 -08:00
bulat22101 adebc75740 pep8: Fix E502 violations 2016-12-03 10:56:36 -08:00
Umair Khan 7d51efe9a1 Django 1.10: Fix dummy data for count stat.
Django 1.10 checks the foreign key constraints as part of the testing
suite so we need to create test data which passes validation tests.
2016-11-14 16:09:12 -08:00
umkay 5490442580 analytics: Replace all joins in raw SQL with natural joins.
We alter the behavior of our queries to no longer write rows with 0 counts
to the db, and pad with 0s in the related views code. As a result we are
also able to combine the where and join clause conditions in the sql
queries. This new behavior is also updated in our tests.
2016-11-03 16:50:39 -07:00
Rishi Gupta db0e509422 do_create_realm: Replace domain argument with string_id.
Turns string_id into a required argument, and domain into an optional
argument.
2016-11-02 22:46:34 -07:00
Rishi Gupta 9ef8536cc6 models.Realm: Require Realm.string_id to be non-NULL.
Adds a database migration, adds a new string_id argument to the management
realm creation command, and adds a short name field to the web realm
creation form when REALMS_HAVE_SUBDOMAINS is False.
2016-11-02 22:46:34 -07:00
umkay 610e92b94e analytics: Add subgroup column to analytics tables.
This is a major change to the analytics schema, and is the first step in a
number of refactorings and performance improvements. For instance, it allows

* Grouping sets of similar CountStats in the *Count tables. For instance,
  active{_humans,_bots} will now have the same property, but have different
  subgroup values.

* Combining queries that differ only in their value on 1 filter clause, so
  that we make fewer passes through the zerver tables. For instance, instead
  of running a query for each of messages_sent_to_public_streams and
  messages_sent_to_private_streams, we can now run a single query with a
  group by on Stream.invite_only, and store the group by value in the
  subgroup column.
2016-10-27 16:33:58 -07:00
Rishi Gupta 82b814a1cd analytics: Simplify frequency and measurement interval options.
Change the CountStat object to take an is_gauge variable instead of a
smallest_interval variable. Previously, (smallest_interval, frequency)
could be any of (hour, hour), (hour, day), (hour, gauge), (day, hour),
(day, day), or (day, gauge).
The current change is equivalent to excluding (hour, day) and (day, hour)
from the list above.

This change, along with other recent changes, allows us to simplify how we
handle time intervals. This commit also removes the TimeInterval object.
2016-10-14 10:18:37 -07:00
Rishi Gupta 807520411b analytics: Simplify logic in do_fill_count_stat_at_hour.
Adding FillState, removing do_aggregate_hour_to_day, and disallowing unused
(interval, frequency) pairs removes the need for the nested for loops in
do_fill_count_stat_at_hour. This commit replaces that control flow with a
simpler equivalent.
2016-10-14 10:18:37 -07:00
Rishi Gupta 655ee51e35 analytics: Add table to keep track of fill state.
Adds two simplifying assumptions to how we process analytics stats:
* Sets the atomic unit of work to: a stat processed at an hour boundary.
* For any given stat, only allows these atomic units of work to be processed
  in chronological order.

Adds a table FillState that, for each stat, keeps track of the last unit of
work that was processed.
2016-10-14 10:18:37 -07:00
umkay 721529b782 analytics: Remove HuddleCount for now.
Planned changes to the underlying analytics model will require potentially
complicated changes to huddle queries.
2016-10-14 10:18:37 -07:00
umkay 7e2340155d analytics: Fix aggregation to RealmCount for realms with no users.
Previously, if a Realm had no users (or no streams),
do_aggregate_to_summary_table would fail to add a row with value 0. This
commit fixes the issue and also simplifies the do_aggregate_to_summary_table
logic.
2016-10-11 18:20:58 -07:00
Rishi Gupta 52b56cca65 analytics: Reorder arguments to assertCountEquals.
Require a table argument and change argument order around for clarity.
2016-10-11 18:20:58 -07:00
Rishi Gupta c6b611c8b9 analytics: Re-organize tests into higher level TestClasses.
Refactor the current analytics tests into the following classes:
* TestUpdateAnalyticsCounts, which will eventually test the management
  command, backfilling, what happens when new tests are added, etc.
* TestProcessCountStat, which tests the ins and outs of propagating the
  value of a single stat up through the various *Count tables.
* TestAggregates, which tests the do_aggregate_* methods.
* TestXByYQueries, which tests the count_X_by_Y_query SQL snippets.
* TestCountStats, which has tests for individual CountStats.

This commit does not change the name or contents of any individual test.
2016-10-09 16:09:04 -07:00
Rishi Gupta 795d10b9ad analytics: Refactor tests to simplify asserts of count values.
Many tests are structured to run some process, and then check a count in a
BaseCount record using default values for realm, property, interval, and
end_time. This commit adds a new assertCountEquals method to
AnalyticsTestCase, and simplifies other assert calls as appropriate.
2016-10-09 16:09:04 -07:00
Rishi Gupta 35cf9092d5 analytics: Canonicalize creation of model objects in tests.
Add a default_realm object to AnalyticsTestCase, created 2 days before
AnalyticsTestCase.TIME_ZERO.

Add lightweight create_user, create_stream, and create_message methods to
AnalyticsTestCase, with sensible defaults. In particular, all objects are by
default created at AnalyticsTestCase.TIME_LAST_HOUR, so that they are
included when running AnalyticsTestCase.process_last_hour.
2016-10-09 16:09:04 -07:00
Rishi Gupta 6996b21480 analytics: Change tests to use a fixed TIME_ZERO.
Previously, analytics tests used timezone.now or custom datetime objects
when creating new realms, users, and streams.

This commit adds a fixed TIME_ZERO and a process_last_hour helper function
in a new AnalyticsTestCase class, and modifies the existing tests to use
them.
2016-10-09 16:09:04 -07:00
umkay d260a22637 Add a new statistics/analytics framework.
This is a first pass at building a framework for collecting various
stats about realms, users, streams, etc. Includes:
* New analytics tables for storing counts data
* Raw SQL queries for pulling data from zerver/models.py tables
* Aggregation functions for aggregating hourly stats into daily stats, and
  aggregating user/stream level stats into realm level stats
* A management command for pulling the data

Note that counts.py was added to the linter exclude list due to errors
around %%s.
2016-10-04 17:18:54 -07:00