We add a test for the case "if not all(val is not None for val in result):"
on result returned by redis_client.hmget in send_to_missed_message_address.
If the text part of an email message didn't specify the charset in the
Content-Type header, the text content wouldn't be found. We fix this, by
assuming us-ascii charset in those cases, as specified by RFC6657:
https://tools.ietf.org/html/rfc6657
With the previous commit, fixes#1836.
As specified in the issue above, we make
get_email_gateway_message_string_from_address raise an exception if
it doesn't recognise the email gateway address pattern. Then, we make
appropriate adjustments in the codepaths which call this function.
These functions don't really belong in actions.py, so we move them out,
into email_mirror_helpers.py. They can't go directly into
email_mirror.py or we'd get circular imports resulting in ImportError.
Fixes#9840.
Old addresses caused bugs in some cases with non-latin characters in
stream names (see issue number above). We switch to using django's
slugify helper function to convert stream names to full ascii, while
also getting rid of problematic non-alphanumeric characters, in a
reasonable way. See Django's documentation for slugify to see more about
how this function works.
Tests extended by tabbott to cover cases where we do end up with ascii.
To prepare for changing how the stream name gets encoded into mirror
email addresses while making sure old addresses keep working, we ignore
the stream_name part when receiving emails into the mirror and we only
look at the email_token to identify into which stream to mirror the
email.
Addresses point 2 of #10612. We use a regex to detect if a form
of FWD indicator is present at the beginning of the subject, which
means the message has been forwarded.
remove_quotations argument is added to a couple of functions where
it's necessary.
In filter_footer, the criteria for a line to be a possible beginning
of a footer is changed to line.strip() == "--", instead of
line.strip().startswith("--"), because the former would remove
quotations from plaintext emails. This change makes sense, because
RFC 3676 specifies ""-- " as the separator line between the body
and the signature of a message":
https://tools.ietf.org/html/rfc3676
We remove the 'subject' argument of process_stream_message and make
subject processing happen inside the function, as it's a more
appropriate place than the general process_message function and is
needed to have a good way of disabling removing quotations in forwarded
emails sent into the mirror.
This used to have a single function test_email_subject_stripping which
would run through a sizeable list of example subjects from subjects.json
fixture, form an email with each subject, send it to the email mirror
and check if the resulting stream message has a correctly stripped
topic. That took too much time, because we run through the entire
process_message and most_recent_message codepaths a lot of times.
We change the way of testing to:
1. Ensure process_message applies subject stripping (only need to run
process_message twice here)
2. Test the strip_from_subject function separately, on all the example
from the subjects.json fixtures. This is very fast.
Fixes part 3 of #10612. When sending an email to the email mirror to a
stream address, if "+show-sender" is added in the address, the stream
message will now include "From: <sender>" at the top.
Fixes part 1 of #10612. We use a regex to remove RE:, FWD: (and similar
variations) from email subjects. Unit test is included, we add
subjects.json in fixtures containing various subjects to try the
stripping on.
When trying to find the email gateway address, use the
`email.util.getaddresses` function to deal with cases
where multiple recipients are included in the email header
or the stream address appears as an angle-addr with a
name given (e.g. if someone added it to their address book).
Added some other headers where the required address may
appear: "Resent" headers are sometimes used for forwarding,
and streams may also be found in CC. There is no way to find
the address if the email was recieved as a BCC.
This verifies an important case. We still have an open bug for why in
some production environments, the email_gateway_bot seems to not be
tagged as an API super user (resulting in this code path not working).
The digest emails have little in common with the email mirror, beyond
that they both involve email. Give their tests their own file, with a
corresponding name, so it's easy to find this code's tests.
This commit prefixes stream names in urls with stream ids,
so that the urls don't break when we rename streams.
strean name: foo bar.com%
before: #narrow/stream/foo.20bar.2Ecom.25
after: #narrow/stream/20-foo-bar.2Ecom.25
For new realms, everything is simple under the new scheme, since
we just parse out the stream id every time to figure out where
to narrow.
For old realms, any old URLs will still work under the new scheme,
assuming the stream hasn't been renamed (and of course old urls
wouldn't have survived stream renaming in the first place). The one
exception is the hopefully rare case of a stream name starting with
something like "99-" and colliding with another stream whose id is 99.
The way that we enocde the stream name portion of the URL is kind
of unimportant now, since we really only look at the stream id, but
we still want a safe encoding of the name that is mostly human
readable, so we now convert spaces to dashes in the stream name. Also,
we try to ensure more code on both sides (frontend and backend) calls
common functions to do the encoding.
Fixes#4713
This code appears to exist to cover a few extra lines in
zerver/lib/digest.py. But it's rather brittle, tucked as it is into
the middle of a different test's loop, and with the upcoming
introduction of the `lear` realm in testing, this test code itself
loses coverage.
For now, rather than fix this test code up just delete it; we don't
have 100% coverage on `zerver/lib/digest.py`, while we do on this test
file, so that avoids breaking coverage in CI. As a followup, we
should add back some logic like this but in a more robust way,
probably as its own separate test method.
We don't have our linter checking test files due to ultra-long strings
that are often present in test output that we verify. But it's worth
at least cleaning out all the ultra-long def lines.
This enforces our use of a consistent style in how we access Python
modules; "from os.path import dirname" is a particularly popular
abbreviation inconsistent with our style, and so it deserves a lint
rule.
Commit message and error text tweaked by tabbott.
Fixes#6543.
The example_user() function is specifically designed for
AARON, hamlet, cordelia, and friends, and it allows a concise
way of using their built-in user profiles. Eventually, the
widespread use of example_user() should help us with refactorings
such as moving the tests users out of the "zulip.com" realm
and deprecating get_user_profile_by_email.
This fixes a performance problem where we were previously starting up
a full Django process (~0.7s even on a fast machine) every time a new
email came in, potentially allowing users to accidentally DoS a Zulip
server. Now, we just post over HTTPS, allowing the existing thread
pool support to do its job.
- Add script wrapper to communicate postfix pipe with django web server
over HTTP(S). It uses shared_secret authentication mode.
- Add django view to process messages from email mirror server.
- Clean management command `email-mirror`. Left just functional
for cron email processing.
- Add routes for new tornado view.
- Change pipe script in master process postfix config template
based on updated script.
- Add tests.
Tweaked by tabbott to adjust the directory and set better defaults.
Fixes#2421.
Finishes the refactoring started in c1bbd8d. The goal of the refactoring is
to change the argument to get_realm from a Realm.domain to a
Realm.string_id. The steps were
* Add a new function, get_realm_by_string_id.
* Change all calls to get_realm to use get_realm_by_string_id instead.
* Remove get_realm.
* (This commit) Rename get_realm_by_string_id to get_realm.
Part of a larger migration to remove the Realm.domain field entirely.
If you supplied an unrecognizable address to our email system,
or you had EMAIL_GATEWAY_PATTERN configured wrong,
the get_missed_message_token_from_address() used to crash
hard and cryptically with a traceback saying that you can't
call startswith() on a None object.
Now we throw a ZulipEmailForwardError exception. This will
still lead to a traceback, but it should be easier to diagnose
the problem.
In our email mirror, we have a special format for missed
message emails that uses a 32-bit randomly generated token
that we put into redis that is then prefixed with "mm" for
a total of 34 characters.
We had a bug where we would mis-classify emails like
mmcfoo@example.com as being these system-generated emails
that were part of the redis setup.
It's actually a little unclear how the bug in the library
function would have manifested from the user's point of view,
but it was definitely buggy code, and it's possibly related in
a subtle way to an error report we got from a customer where
only one of their users, who happened to have a name like
mmcfoo, was having problems with the mirror.
This makes us more consistent, since we have other wrappers
like client_patch, client_put, and client_delete.
Wrapping also will facilitate instrumentation of our posting code.
This reverts commit f1f48f305e.
The use of sklearn unfortunately caused a substantial slowdown to the
Zulip provisioning process, which didn't seem worth it for a
relatively minor feature.