Commit Graph

106 Commits

Author SHA1 Message Date
arpit551 c4b5d09283 analytics: Add LoggingCount for messages read stats.
Whenever we use API queries to mark messages as read we now increment
two new LoggingCount stats, messages_read::hour and
messages_read_interactions::hour.

We add an early return in do_increment_logging_stat function if there
are no changes (increment is 0), as an optimization to avoid
unnecessary database queries.

We also log messages_read_interactions::hour Logging stat
as the number of API queries to mark messages as read.

We don't include tests for the case where do_update_pointer is called
because do_update_pointer will most likely be removed from the
codebase in the near future.
2020-06-14 21:15:27 -07:00
Anders Kaseorg 0d6c771baf python: Guard against default value mutation with read-only types.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-13 15:31:27 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg 69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Anders Kaseorg 67e7a3631d python: Convert percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-10 15:02:09 -07:00
Anders Kaseorg 5839fdf963 analytics: Improve escaping correctness with psycopg2.sql.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-09 21:12:43 -07:00
Anders Kaseorg bdc365d0fe logging: Pass format arguments to logging.
https://docs.python.org/3/howto/logging.html#optimization

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-02 10:18:02 -07:00
Anders Kaseorg fead14951c python: Convert assignment type annotations to Python 3.6 style.
This commit was split by tabbott; this piece covers the vast majority
of files in Zulip, but excludes scripts/, tools/, and puppet/ to help
ensure we at least show the right error messages for Xenial systems.

We can likely further refine the remaining pieces with some testing.

Generated by com2ann, with whitespace fixes and various manual fixes
for runtime issues:

-    invoiced_through: Optional[LicenseLedger] = models.ForeignKey(
+    invoiced_through: Optional["LicenseLedger"] = models.ForeignKey(

-_apns_client: Optional[APNsClient] = None
+_apns_client: Optional["APNsClient"] = None

-    notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
-    signup_notifications_stream: Optional[Stream] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)
+    signup_notifications_stream: Optional["Stream"] = models.ForeignKey('Stream', related_name='+', null=True, blank=True, on_delete=CASCADE)

-    author: Optional[UserProfile] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)
+    author: Optional["UserProfile"] = models.ForeignKey('UserProfile', blank=True, null=True, on_delete=CASCADE)

-    bot_owner: Optional[UserProfile] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)
+    bot_owner: Optional["UserProfile"] = models.ForeignKey('self', null=True, on_delete=models.SET_NULL)

-    default_sending_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
-    default_events_register_stream: Optional[Stream] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_sending_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)
+    default_events_register_stream: Optional["Stream"] = models.ForeignKey('zerver.Stream', null=True, related_name='+', on_delete=CASCADE)

-descriptors_by_handler_id: Dict[int, ClientDescriptor] = {}
+descriptors_by_handler_id: Dict[int, "ClientDescriptor"] = {}

-worker_classes: Dict[str, Type[QueueProcessingWorker]] = {}
-queues: Dict[str, Dict[str, Type[QueueProcessingWorker]]] = {}
+worker_classes: Dict[str, Type["QueueProcessingWorker"]] = {}
+queues: Dict[str, Dict[str, Type["QueueProcessingWorker"]]] = {}

-AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional[LDAPSearch] = None
+AUTH_LDAP_REVERSE_EMAIL_SEARCH: Optional["LDAPSearch"] = None

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-22 11:02:32 -07:00
arpit551 2899e44218 analytics: Added comments.
A few comments are added to explain more clearly the changes made in
b23a5431cd namely about not using realm
arguments in LoggingCount Stats and the need to pass realm argument in
pull function.

The comments were tweaked by tabbott for readability.
2020-01-28 14:57:32 -08:00
arpit551 b23a5431cd analytics: Add realm argument to analytics.
This changeset is prepartory work for doing something reasonable with
analytics data during the zulip -> zulip data import process (and
potentially e.g. slack -> Zulip as well).

To support that, we need to make it possible to do our analytics
calculations for a single realm.

We do this while maintaining backwards compatibility and avoiding
massive duplicated code by adding an optional `realm` argument to the
entrypoints to the analytics system, especially process_count_stat.

More work involving restructuring FillState will be required for this
to be actually usable for its intented purpose, but this commit is a
nice checkpoint along the way.

Tweaked by tabbott to adjust comments and disable InstallationCount
updates when a realm argument is specified.
2020-01-23 17:36:13 -08:00
Rishi Gupta 4256ee61cf billing: Change RealmAuditLog.event_type from str to int.
This is a more robust long-term model for storing these data.
2019-10-06 15:55:56 -07:00
Mateusz Mandera dbe508bb91 models: Migration of Message.pub_date to date_sent, part 2.
Fixes #1727.

With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.
2019-10-05 19:01:34 -07:00
Anders Kaseorg f5197518a9 analytics/zilencer/zproject: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:31:45 -08:00
Rishi Gupta 85f7ac8172 analytics: Remove Anomaly model. 2019-02-01 18:48:18 -08:00
Vishnu Ks 4d1a68430a analytics: Remove unused RealmAuditLog import. 2018-07-10 15:42:26 +05:30
Nikhil Kumar Mishra 26decb4c48 stats: Add 1day_actives::day CountStat to analytics tables. 2018-05-20 10:56:16 -07:00
Aditya Bansal 5adf983c3c analytics: Change use of typing.Text to str. 2018-05-10 14:19:49 -07:00
Greg Price b830b446f1 logging: Reduce `create_logger` to new `log_to_file`.
The name `create_logger` suggests something much bigger than what this
function actually does -- the logger doesn't any more or less exist
after the function is called than before.  Its one real function is to
send logs to a specific file.

So, pull out that logic to an appropriately-named function just for
it.  We already use `logging.getLogger` in a number of places to
simply get a logger by name, and the old `create_logger` callsites can
do the same.
2017-12-12 17:17:08 -08:00
Greg Price ebcf0b4876 logging: Stop having `create_logger` force loglevels to INFO.
This is already the loglevel we set on the root logger, so this has no
effect -- except in tests, where `test_settings.py` attempts to set
some of these same loggers to higher loglevels.  Because the
`create_logger` call generally runs after we've configured settings,
it clobbers that effect.

The code in `test_settings.py` that tries to suppress logs only works
because it also sets `propagate=False`, which has nothing to do with
loglevels but does cause logs at this logger (and descendants) to be
dropped completely unless we've configured handlers for this logger
(or one of its relevant descendants.)
2017-12-12 17:17:07 -08:00
Rishi Gupta fbd8dde1f8 invitations: Add LoggingCountStat to keep track of sent invitations. 2017-12-06 20:35:50 -08:00
rht 6c286b5eb6 analytics: Use Python 3 syntax for typing (part 2). 2017-11-22 12:16:58 -08:00
Tim Abbott 2b43a0302a python: Sort imports in smaller apps. 2017-11-15 15:55:49 -08:00
rht 51c1a6dfc9 analytics: Text-wrap long lines exceeding 110.
License: Apache-2.0
Signed-off-by: rht <rhtbot@protonmail.com>
2017-11-10 16:22:00 -08:00
rht b557b02f2f analytics/lib: Remove unused imports (F401). 2017-11-07 16:37:07 -08:00
rht 5cfffb0e51 analytics: Remove inheritance from object. 2017-11-06 08:53:48 -08:00
rht dcc831f767 refactor: Replace all __unicode__ method with __str__.
Close #6627.
2017-11-02 11:01:47 -07:00
Rishi Gupta c7bdabbda8 analytics: Disallow non-UTC fill times in process_count_stat.
No change in behavior, but we aren't supporting non-UTC times in analytics
as a whole any more, so might as well change this check as well.
2017-10-05 11:22:06 -07:00
Rishi Gupta 0596c4a810 analytics: Enforce various datetime arguments are in UTC.
Sort of a hacky hammer, but
* The original design of the analytics system mistakenly attempted to play
  nicely with non-UTC datetimes.
* Timezone errors are really hard to find and debug, and don't jump out that
  easily when reading code.

I don't know of any outstanding errors, but putting a few "assert this
timezone is in UTC" around will hopefully reduce the chance that there are
any current or future timezone errors.

Note that none of these functions are called outside of the analytics code
(and tests). This commit also doesn't change any current behavior, assuming
a database where all datetimes have been being stored in UTC.
2017-10-05 11:22:06 -07:00
Rishi Gupta 0f31cddf49 analytics: Add management command to clear single stat. 2017-10-05 11:22:06 -07:00
Aditya Bansal d9c9bfe7f6 logger: Add new create_logger abstraction to simplify logging.
This deduplicates a ton of Python logger-creation code to use a single
standard implementation, so we can avoid copy-paste problems.
2017-08-27 18:31:53 -07:00
umkay d9b23b39d3 mypy: Fix strict-optional in analytics. 2017-05-26 15:39:39 -07:00
Aditya Bansal 27b87943af pep8: Add compliance with rule E261 to counts.py. 2017-05-07 23:21:50 -07:00
Rishi Gupta 61bf445da4 analytics: Restrict fill_to_time to hour boundaries in process_count_stat. 2017-04-28 16:15:07 -07:00
Rishi Gupta 5e49da9285 analytics: Only update daily stats on day boundaries.
Previously we would update FillState for daily stats on hourly boundaries as
well. This would create two extra queries on the FillState table every hour
(for each CountStat), which adds roughly 50ms of extra processing for each
CountStat each day, as well as two extra lines each hour in the analytics
log. This can be a minor annoyance when backfilling stats.
2017-04-18 11:02:51 -07:00
Rishi Gupta c5f1398052 analytics: Add section comments in counts.count_stats_.
Also reorders the stats a bit.
2017-04-18 11:02:51 -07:00
Rishi Gupta b335ad2794 models: Add MIN_INTERVAL_LENGTH to UserActivityInterval.
Was previously a floating magic number appearing in both
zerver/lib/actions.py and analytics/lib/counts.py.
2017-04-18 11:02:51 -07:00
hackerkid 5c8f011d66 Remove unused timezone import. 2017-04-16 12:28:56 -07:00
Rishi Gupta 49bd330304 analytics: Add class DependentCountStat and stat realm_active_humans::day. 2017-04-14 11:41:07 -07:00
Rishi Gupta 1e8d2b984d counts.py: Rename DataCollector-level operations to be more generic.
We're about to use these for DependentCountStats that will run SQL queries
on the analytics tables instead of the zerver tables.
2017-04-14 11:41:07 -07:00
Rishi Gupta 47cf1d15ba counts.py: Move performance logging call out of pull_functions.
Makes it less likely someone will write a pull function in the future and
forget.
2017-04-14 11:41:07 -07:00
Rishi Gupta 6dff22cbaf counts.py: Change check for LoggingCountStat to use isinstance.
I think this is more pythonic?

We could also get rid of LoggingCountStats altogether, since it's now just a
special case of CountStat (is_logging == data_collector.pull_function is None).
But I think it's nice to keep the distinction since they behave so differently.
2017-04-14 11:41:07 -07:00
Rishi Gupta b45185562a counts.py: Fix out of date comments. 2017-04-14 11:41:07 -07:00
Rishi Gupta ac2cc9e2da counts.py: Reorganize file into logical sections.
No changes to code or behavior.
2017-04-14 11:41:07 -07:00
Rishi Gupta 50868b98a9 counts.py: Change pull_function to take a property instead of a full stat.
Removes the circular dependency of CountStat containing a DataCollector, and
DataCollector containing a function that takes a CountStat as an argument.
2017-04-14 11:41:07 -07:00
Rishi Gupta eadfc743c8 counts.py: Remove CustomPullCountStat. 2017-04-14 11:41:07 -07:00
Rishi Gupta 118b44d4f0 counts.py: Change DataCollector to take a pull_function argument.
This will allow us to appropriately generalize CountStat to include
LoggingCountStat and CustomPullCountStat. It'll also make life easier when
we introduce DependentCountStat.
2017-04-14 11:41:07 -07:00
Rishi Gupta f9e56ad25d counts.py: Move DataCollector declarations into CountStat declarations.
The previous zerver_* names were unwieldy and not very readable. This also
puts more of the useful information in one place; in particular, makes it
easier to skim a CountStat declaration and see if we're collecting it at a
user/stream granularity or a realm granularity.
2017-04-14 11:41:07 -07:00
Rishi Gupta c20e79ab1f counts.py: Rename DataCollector.analytics_table to output_table. 2017-04-14 11:41:07 -07:00
Rishi Gupta 6369d23633 counts.py: Rename ZerverCountQuery to DataCollector.
Not the final form of DataCollector, but the name change causes a big diff
so separating it out.
2017-04-14 11:41:07 -07:00
Rishi Gupta b3991e2557 counts.py: Move CountStat.group_by into ZerverCountQuery.
Part of a larger refactoring to reduce cyclic dependencies between CountStat
and DataCollector (coming soon).
2017-04-14 11:41:07 -07:00