Note that we are much more efficient about finding
active users here:
- we do one query per realm (instead of per-user)
- we pass the cutoff date to the database
- we get back just a list of distinct ids
This function is going away completely soon. It is
querying everybody's entire UserActivity history instead
of passing the cutoff date to the database!
The query counts increase here for somewhat
contrived reasons. The tests before this
commit reflected a successful trip to the
UserProfile cache, but that's not actually
realistic in practice.
The code we deleted here was no longer
doing anything.
Maybe the code was always dead, or maybe it
was written during a time when topics_by_diversity
and topics_by_length actually had different keys.
But now it's clearly cruft.
If we have 4 or more topics, then the code above
it would already have populated the list with 4
elements, and the `if num_convos < 4` condition
would evaluate to False.
And if we had 3 or fewer topics, then we would
have already put all possible topics into our
result, and the `topics_by_diversity[num_convos:4]`
slice would be empty.
It's possible that we should just have a simple
heuristic for topic hotness like `10*num_senders
+ messages`, so we don't have to maintain this
fiddly function, and we can just do something like
`topics_by_score[:4]`.
I now use sets for stream_ids in more of the digest
code.
As part of this I replaced exclude_subscription_modified_streams
with streams_recently_modified_for_user.
It's easier for the caller to just ask for ids
to delete from its callee than it is to pass
in a set/list to mutate.
The simpler boundary between the functions makes
the tests easier to write--you can see the
`filtered_streams` logic goes away in this diff.
I also make the tests a bit more thorough by using
combinations of Cordelia/Othello and Verona/Denmark
to try to find multiple possible flaws.
And I make the time intervals longer than 1s to
avoid false negatives from slow CI boxes.
If we have multiple users, this reduces the amount
of queries we need to do, because we get all
subscriptions for all users in a single query
to Subscription.
For the single-user case, we are introducing an
extra query hop, but the database is doing
roughly the same work, because we are just breaking
up this complex query into two hops:
messages =
select ... from message
where recipient__type_id in (
select stream_id from subscription
where ...
)
Now it's more like:
stream_ids =
select stream_id from subscription
where ...
messages =
select ... from message
where recipient__type_id in stream_ids
Note that we are not changing anything semantically
or algorithmically yet. The only overhead here
for the single-user case is boxing and unboxing
data into single-item dicts and lists.
The interfaces for callers in the view and the
queue processor remain the same for now.
This extraction will make a bit more sense when
we start doing bulk operations on a realm to
get digests, but even now, it encapsulates the
slightly complex way we cherry-pick the top 4
topics for a user.
This prep step is mostly for diff hygiene; the next
commit will make the code a bit nicer.
The original code here had the nice property that
most (but not all) of the DB work happened up
front in `handle_digest_email`, and none of the
DB work was delegated to the callers. But I
prefer the tradeoff of making the helpers a bit
more cohesive--let them get the data they need.
And we have query-count coverage in our tests,
so there's no real danger of having helpers
down in the stack insidiously doing a bunch of
extra DB hops.
This reverts commit c3779338c6 (part
of #14638), which incorrectly depended on commits from the future,
with the effect of either halting the flow of entropic time in an
irresolvable temporal paradox, summoning extradimensional beings to
rain destruction on the galaxy, or failing CI.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
datetime.timezone is available in Python ≥ 3.2. This also lets us
remove a pytz dependency from the PostgreSQL scripts.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
We've had a bug for a while that if any ScheduledEmail objects get
created with the wrong email sender address, even after the sysadmin
corrects the problem, they'll still get errors because of the objects
stored with the wrong format.
We solve this by using FromAddress placeholders strings in
send_future_email function, so that ScheduledEmail objects end up
setting the final `from_address` value when mail is actually sent
using the setting in effect at that time.
Fixes#11008.
Fixes#1727.
With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.
This renames Subscription.in_home_view field to is_muted, for greater
clarity as to what it does just from seeing the setting name, without
having to look it up.
Also disabled an obsolete test_migrations test.
Fixes#10042.
Digest emails were disabled for soft deactivated users, since UserMessage
objects are created for such users lazily when they return.
We now compute the message list for gathering hot conversations by looking at
all the messages sent to the streams where the user is subscribed, while they
were subscribed.
Fixes#6297
Allow realms to specify the day of the week when the digest should be sent out.
When enqueue-ing digests, pick only the realms that chose the current weekday as
the day to send out digests.
This adds language paramater to send_future_email. As a result, this
properly internationalizes invitation reminder emails, by passing
correct language into send_future_email.
Fixes#11240.
This adds a function that sends provided email to all administrators
of a realm, but in a single email. As a result, send_email now takes
arguments to_user_ids and to_emails instead of to_user_id and
to_email.
We adjust other APIs to match, but note that send_future_email does
not yet support the multiple recipients model for good reasons.
Tweaked by tabbott to modify `manage.py deliver_email` to handle
backwards-compatibily for any ScheduledEmail objects already in the
database.
Fixes#10896.
We use the message a lot for the query modified
here, so I think it's worth taking the up-front
hit of getting bulkier objects to avoid O(N)
hops back to the database.
This renames Realm.show_digest_email field to
digest_emails_enabled, for greater clarity as to what it does
just from seeing the setting name, without having to look it up.
Fixes part of #10042.
Extracting this helper library will help us avoid an import loop
between notifications.py and message.py (with bugdown in between).
But in addition to that, it's a more natural model, since some of the
uses for these functions weren't part of the notifications code
anyway.
The new can_access_all_realm_members function is meant to act as a
base function for guest users and Zephyr realm users regarding the
accessibility of the information of other users in the realm.
This is a simple computed field. It's intended to more clearly
capture the meaning of this restriction for the users in zephyr mirror
realms, and eventually support guest user accounts in normal Zulip
realms.
This feature isn't really ready yet -- the relevance isn't good, so
the emails aren't a great experience. More work needed; pending that,
just don't send them.
There's already a per-realm setting, which doesn't have a control in
the org settings UI but does suppress it in the per-user settings UI.
Piggyback on that to suppress that UI control when the feature is
disabled at the server level too.
Also cut a comment that hasn't really made sense since the logic was
changed months ago -- the comment originally explained why we sent
digests on Tuesday, Wednesday, and Thursday, and doesn't correspond to
why we dialled back to weekly on Tuesdays.
This commit prefixes stream names in urls with stream ids,
so that the urls don't break when we rename streams.
strean name: foo bar.com%
before: #narrow/stream/foo.20bar.2Ecom.25
after: #narrow/stream/20-foo-bar.2Ecom.25
For new realms, everything is simple under the new scheme, since
we just parse out the stream id every time to figure out where
to narrow.
For old realms, any old URLs will still work under the new scheme,
assuming the stream hasn't been renamed (and of course old urls
wouldn't have survived stream renaming in the first place). The one
exception is the hopefully rare case of a stream name starting with
something like "99-" and colliding with another stream whose id is 99.
The way that we enocde the stream name portion of the URL is kind
of unimportant now, since we really only look at the stream id, but
we still want a safe encoding of the name that is mostly human
readable, so we now convert spaces to dashes in the stream name. Also,
we try to ensure more code on both sides (frontend and backend) calls
common functions to do the encoding.
Fixes#4713
The name `create_logger` suggests something much bigger than what this
function actually does -- the logger doesn't any more or less exist
after the function is called than before. Its one real function is to
send logs to a specific file.
So, pull out that logic to an appropriately-named function just for
it. We already use `logging.getLogger` in a number of places to
simply get a logger by name, and the old `create_logger` callsites can
do the same.
Because calls to `create_logger` generally run after settings are
configured, these would override what we have in `settings.LOGGING` --
which in particular defeated any attempt to set log levels in
`test_settings.py`. Move all of these settings to the same place in
`settings.py`, so they can be overridden in a uniform way.
ScheduledJob was written for much more generality than it ended up being
used for. Currently it is used by send_future_email, and nothing
else. Tailoring the model to emails in particular will make it easier to do
things like selectively clear emails when people unsubscribe from particular
email types, or seamlessly handle using the same email on multiple realms.
This commit is a step towards the goal of replacing most of the
send_future_email pathway with a call to send_email.
Note that this commit changes the default value of sender from "Zulip
<NOREPLY_EMAIL_ADDRESS>" to "NOREPLY_EMAIL_ADDRESS". NOREPLY_EMAIL_ADDRESS
will soon be changed to have the Zulip in front.
datetime.utcnow() is a timezone-naive datetime. The Django ORM interprets it
in the settings.TIME_ZONE timezone (e.g. 'America/New_York' in the
development server). We perhaps haven't noticed errors yet since with
'America/New_York' all it means is that emails are sent 5 hours early, or a
slightly different set of messages are included in the digest.