This improves the approach of creating multiple parallel processes by
using subprocess.Popen() instead of run_parallel() and
subprocess.call() while exporting an organization's message
history. This prevents forking twice for individual subprocess.
While this has some performance benefit, the main reason to fix this
is that it fixes an issue with the data export web UI introduced in
run_parallel forks exited).
Fixes#12904.
process_missed_message did nothing other than calling
send_to_missed_message_address with the same arguments, so there's no
reason to have these as separate functions.
Addresses point 1 of #13533.
MissedMessageEmailAddress objects get tied to the specific that was
missed by the user. A useful benefit of that is that email message sent
to that address will handle topic changes - if the message that was
missed gets its topic changed, the email response will get posted under
the new topic, while in the old model it would get posted under the
old topic, which could potentially be confusing.
Migrating redis data to this new model is a bit tricky, so the migration
code has comments explaining some of the compromises made there, and
test_migrations.py tests handling of the various possible cases that
could arise.
Preparatory commit for making the email mirror use the database instead
of redis for missed message addresses.
This model will represent missed message email addresses, which
currently have their data stored in redis.
The redis data will be converted and migrated into these models and
the email mirror will start using them in the main commit.
For cross realm bots, explicitly set bot_owner_id
to None. This makes it clear that the cross realm
bots have no owner, whereas before it could be
misdiagnosed as the server forgetting to set the
field.
The recip.id || recip.user_id idiom has only been
needed for some old unit tests.
It was previously required as a bad workaround for the
local echo issue fixed in dd1a6a97bd
where we would get `display_recipient` values added in an invalid format.
I want to be able to easily test this without
having to simulate all the jQuery side effects.
This simply preserves the old logic, which seems
to handle one edge case without handling every
possible edge case. The edge cases aren't super
important here, though, since the only thing it affects
is bolding "Private Messages", and when to do that
is somewhat up to personal tastes.
Having said that, we could definitely improve
this code and possibly should move some of this
logic to either narrow_state.js or filter.js.
Instead of doing various ad-hoc calculations of
which PM is "active" and plumbing it through various
functions and then updating it via jQuery instead of
just the template, we now just calculate `is_active`
in `_build_private_messages_list` with a little
helper function.
This test mostly tests logic that I'm about
to remove in subsequent commits, and it's a bit
messy.
This commit removes 100% line coverage, but I
will restore that a few commits later.
In 3cfc3ca24b I removed
the feature that limited PM conversations to five or
less (including the active conversation), but I
didn't clean up this parameter. I think lint was
confused by the fact that we did mutate it.
I am wondering if this started out as an experiment
and was never fully polished before the push? Or
maybe I was just careless. Anyway, I don't
think were any symptoms here--it was just dead code
that we didn't need.
This fixes a rebase issue between the int_dict introduction and use
for people.js with the introduce of filter_values on dict.js and use
inside people.js.
Note that we haven't fully swept this for Dict,
since some dicts are keyed by strings. For
example PM counts can have a huddle like
"101,102,103" as a key.
This should be slightly more performant, and we
often call this function N times, such as when
rendering the buddy list.
There's a minor change to pm_list to avoid
an unnecessary computation on huddles that would
otherwise trigger a blueslip warning for the
huddles case.
Once we get past the special check for fake
person objects already having `pm_recipient_count`,
we can rely on the object being a `person`
object with `user_id` set.
When we are pulling data from message.display_recipient
for private messages, the user_id field is always
called 'id', not 'user_id', so we can simplify
some defensive code.
This required lots of manual testing:
- search/navigate user presence
- send PM and mention user
- pay attention to compose fade
- send stream msg and mention user
- open Private Messages in top-left and click
- test unread counts
- invite user who already has account
- search for users in search bar
- check user settings
- User Groups
- Users
- Deactivated Users
- Bots
- create a bot
- mention user groups
- send group PM then click on lower right
- view/edit/create streams
If there are still pieces of code that don't convert
ids to ints, the code should still work but report
blueslip errors.
I try to mostly convert user_ids to ints in the callers,
since often the callers are dealing with small amounts
of data, like user ids from huddles.
We only ever show 3 or 4 people in search suggestions
(possibly w/a couple variations, like pm-with/sender/etc.),
so we can try to search a smaller subset of people
before going through the entire realm.
We use message_store.user_ids() for this, since you
typically want to search messages for people that
have sent messages recently, and we already sort
based on PM conversations.
This should avoid some memory allocations.
We also use build_person_matcher to avoid
repeating the same logic over and over
again to process the query into termlets.
We also remove people.get_all_persons() and
people.person_matches_query().
This may actually be a slowdown for the worst case
scenario, but it sets us up to be able to easily
short circuit the removal of diacritic characters
for users that have pure ascii names.
For example, czo has lots of names like this:
- Tim Abbott
- Steve Howell
Since they're pure ascii, we can do a one-time
check. A subsequent commit will show how we use
this.
This looks like simple code cleanup, but it's more
than that.
The code cleanup here is that we don't have three
callbacks to get a list of typeaheads for bootstrap.
Instead, we just have one function that does all the
main work.
And then the speedup comes from the fact we no longer
need to remove diacritics from the query for every
time through our loop of seeing if a person matches
the query.
It's a bit subtle to see in the diff, but these are
the relevant lines:
const matcher = exports.get_person_or_user_group_matcher(query);
const filtered_results = _.filter(people_and_groups, matcher);
Before this, bootstrap was doing $.grep, and we'd have
to reinitialize the matcher for every person.
If you profile this before and after, you'll see that
remove_diacritics gets called fewer times.
To profile this, you want to loads lots of users into
your DB and try to autocomplete "Extra", as in "Extra1 User".
If you try to autocomplete something else, then my patch
won't really help, and `remove_diacritics` will still
show up as expensive. Because it is that expensive a function.
These had to be done in tandem, since they were
both kinda coupled to the function that is now
called query_matches_name_description.
(This commit slightly negatively impacts PM
lookups, but this is addressed in the subsequent
commit, which makes PMs much faster. The impact
is super minimal--it's just an extra function
dispatch.)
This may seem silly now, since we are returning a function
that still dispatches over all flavors of search for
every item, but subsequent commits will make it obvious
why I'm doing this.
We want to do our own matching of items, rather than
just giving a callback to bootstrap, which does $.grep
on all the items.
Doing our own matching gives us flexibility for future
improvements like custom data structures for searching
through big amounts of data. Even in the short term
we can speed up searches by pulling expensive operations
outside the grep/filter call.
This architecture has been in place for our search
bar since ~2014.
The benchmark is commented out. It takes only a few
milliseconds to run, so there may be no reason not
to always run it. It doesn't test correctness, so
it would arguably inflate line coverage, but set/get
are obviously covered elsewhere.