This commit introduces the ability to do custom fetches
and to essentially use temp tables for intermediate results.
(The temp table stuff deals with recipients/subscriptions
having three different flavors--user, stream, and huddle.)
Most directly useful for the migration to zulipchat.com.
Creates a new field in UserProfile to store the tos_version, as well as two
new settings TOS_VERSION and FIRST_TIME_TOS_TEMPLATE. We check for a version
mismatch between what the user has signed and the current
settings.TOS_VERSION whenever the user hits the home page, and redirect them
if needed.
Note that accounts_accept_terms.html and
zerver.views.accounts_accept_terms were unused before this commit
(they date from c327446537)
The function to create the message partial files has been
renamed to export_partial_message_files(). It now gets its own
list of user profile ids and recipient ids from the response,
so that we can de-clutter do_export_realm().
The name avatar_bucket was confusing for a boolean, and
in some places it was used for non-S3 paths.
I considered the more concise 'is_avatar', but that
was still confusing when you are processing multiple
files, because you think it's a calculated property
on one file instead of an overall codepath switch.
I also considered splitting up some functions, but
there is a lot of common logic between handling
file uploads and avatars that's not trivial to extract
into helpers, especially on the S3 side.
I did some minor moving around of code that made us have
one fewer function without any additional conditional
logic. The names are more explicit about saying
"from_local" and "from_s3". Also, there is less clutter
now in do_export_realm(), which is evolving into more of
a dispatcher and less of a worker.
This is pretty minor cleanup, but it makes it a little more
explicit what we're writing to the shard file, and it allows
us to use a more specific mypy type when calling
floatify_datetime_fields.
We no longer have an in-process code path to export
UserMessage rows. We want to only maintain the
subprocess code, which we'll always use in production,
and which will work fine in dev.
Adds a new field default language in the zerver_realm model.
This realm level default language will be used as default language
for newly created users. Realm level default language can be
changed from the administration page.
Fixes#1372.
I also fixed some small things like removing unnecessary return
statements, and adding a TODO.
In some cases I explicitly cast stuff at run-time to set() or
str() to appease mypy, as well as make it clear to somebody
reading the code that the callee might not respect ordering
or tolerate unicode.
The previous export tool would only work properly for small realms,
and was missing a number of important features:
* Export of avatars and uploads from S3
* Export of presence data, activity data, etc.
* Faithful export/import of timestamps
* Parallel export of messages
* Not OOM killing for large realms
The new tool runs as a pair of documented management commands, and
solves all of those problems.
Also we add a new management command for exporting the data of an
individual user.
Define Integration and WebhookIntegration classes.
Change webhook part of integration's guide.
Replace hardcoded webhook urls to generating
based on WEBHOOKS list.
The old behavior was to raise an exception, but Django was catching
the exception and doing unexpected things. For instance, in the
manage.py shell, printing out a ModelReprMixin object (with
__unicode__ not implemented) would result in nothing being printed,
rather than it raising a error or otherwise alerting the programmer as
to what was going on.
This fixes a regression where missed message emails would not be sent
at all in the event that EMAIL_GATEWAY_PATTERN was unset.
The overall experience still isn't great, but it's better than crashing.
Fixes: #1411
[commit message expanded by tabbott]
This will lead to minor differences in the warnings that
people see when they run tests that are slow. We call out
the slowness a little more clearly from a visual standpoint,
and we simplify the calculation of the slowness threshold.
We still allow more time for tests with the `@slow` decorator
to run, but we don't use their expected_run_time.
Our flush functions update user profile cache entries which can cause
confusing race conditions (see e.g. #1257). To resolve this, we move
all the user_profile flush functions to delete the entry instead of
updating it -- it will then be fetched as part of the next request
that needs to access the user object.
There are still races here, and there is perhaps an argument that a
better fix for this would be to re-fetch the object and then put it
into the cache, but this resolves the main cache correctness problem
we had with the previous implementation.
Fixes: #1322.
This makes us more consistent, since we have other wrappers
like client_patch, client_put, and client_delete.
Wrapping also will facilitate instrumentation of our posting code.
This function is only called in cases where user_profile isn't None,
and the code reads better if we just check that first rather than
checking it on every line that accesses user_profile.
This allows the frontend to fetch data on the subscribers list (etc.)
for streams where the user has never been subscribed, making it
possible to implement UI showing details like subscribe counts on the
subscriptions page.
This is likely a performance regression for very large teams with
large numbers of streams; we'll want to do some testing to determine
the impact (and thus whether we should make this feature only fully
enabled for larger realms).
There were a bunch of authorization and well-formedness checks in
zerver.lib.actions.do_update_message that I moved to
zerver.views.messages.update_message_backend.
Reason: by convention, functions in actions.py complete their actions;
error checking should be done outside the file when possible.
Fixes: #1150.
This is controlled through the admin tab and a new field in the Realms table.
Notes:
* The admin tab setting takes a value in minutes, whereas the backend stores it
in seconds.
* This setting is unused when allow_message_editing is false.
* There is some generosity in how the limit is enforced. For instance, if the
user sees the hovering edit button, we ensure they have at least 5 seconds to
click it, and if the user gets to the message edit form, we ensure they have
at least 10 seconds to make the edit, by relaxing the limit.
* This commit also includes a countdown timer in the message edit form.
Resolves#903.
Correctly encode and decode strings in convert_html_to_markdown.
It wasn't possible to use universal_newlines=True since
Popen.communicate doesn't encode/decode strings correctly on
python 2.
Add methods assert_equals_response and assert_in_response to
AuthedTestCase. These methods make it convenient to check if
a string equals the contents of an HttpResponse's body or if a
string is a substring of the contents of an HttpResponse's body.
response.content is binary data, but code usually assumes it to
be text. Fix this by decoding response.content where required.
Don't do this in tests yet.
This is controlled through the admin tab and a new field in the Realms
table. This mirrors the behavior of the old hardcoded setting
feature_flags.disable_message_editing. Partially resolves#903.
This reverts commit f1f48f305e.
The use of sklearn unfortunately caused a substantial slowdown to the
Zulip provisioning process, which didn't seem worth it for a
relatively minor feature.
Originally this cache was used to transmit data from Django to Tornado
(and also for general message caching purposes), but now nothing
actually reads from this cache, so we can eliminate it.
Apparently, the message cache we were filling was completely useless
and unused, and furthermore, the cache we were filling as part of
restarting the server was also totally useless, since it didn't have
the messages users would be requesting.
The functions truncate_content, truncate_body and truncate_topic
are only meant to be used on text strings. So change its
parameter types from AnyStr to text_type.
Many stubs in xml.etree.ElementTree use Union[str, bytes] as
return type. Mypy wants us to correctly handle each case. This
is correct, but not useful for us since we know that we'll always
get str. So force the return value to text_type, to supress mypy
errors.
Also encode/decode strings appropriately when using api_keys to generate
basic auth header.
Also fix clashing annotations in zerver/tests/test_external.py.
* Add Optional where required.
* Set type of req_redis_key as `(text_type) -> text_type` for consistency.
Almost all our cache keys and redis keys use this signature.
For a long time, rest_dispatch has had this hack where we have to
create a copy of it in each views file using it, in order to directly
access the globals list in that file. This removes that hack, instead
making rest_dispatch just use Django's import_string to access the
target method to use.
[tweaked and reorganized from acrefoot's original branch in various
ways by tabbott]
If a user's session cookie expired, the next REST API request their
browser did would go into the json_unauthorized code path. This
returned a response with a WWW-Authenticate tag for HTTP Basic Auth
(since that's what the REST API uses), even for /json requests which
should only be authenticated using session auth.
We fix this by explicitly passing the desired WWW-Authenticate state.
Fixes: #800.
After annotating rate_limiter.rules, mypy complained that rules does
not support cmp. So use key to customize sort instead of cmp.
Python docs also recommend using key over cmp.
zerver.lib.initial_password.initial_password is supposed to return an
Optional[text_type], but it returns an Optional[binary_type] instead.
Encode the return value to make sure it returns an Optional[text_type].
Add a class 'BaseHandler' and make it a base class of OuterHandler,
QuoteHandler and CodeHandler. This will help annotate some functions
and improve type checking.
get_display_recipient's annotation clashes with other wrong annotations.
Fix those wrong annotations.
Since get_display_recipient returns a Union, use isinstance checks and
casts to make mypy checks succeed.
generate_random_token used to return a value of type six.binary_type
and its return type was annotated as `str`. This commit fixes that
by making it return a value of type `six.text_type` and updating
the annotation accordingly.
Also fix clashing annnotations.
Change choices of UserProfile.avatar_sources and UserProfile.tutorial_status
from str literals to unicode literals. This is done because these fields
are CharFields, which are of type `six.text_type`. So the set of values
which they can take should also be of the type `six.text_type`.
Also fix clashing annotations.
Change `str` to `text_type` in annotations in zerver/models.py
related to realm emoji and realm filters.
Also fix clashing annotations in zerver/lib/bugdown/__init__.py.
Due to a cyclic dependency issue, functions having models as parameters
were annotated as Any.
That issue is fixed by importing models inside an `if False:` block,
so that mypy sees them but they are not imported at runtime.
In update_user_profile_caches, the return type in annotation was
marked as Any. Change that to None because, nothing is being returned
in that function.
This changes the type annotations for the cache keys in Zulip to be
consistently text_type, and updates the annotations for values that
are used as cache keys across the codebase.
str_utils.py has functions for converting strings from one type to
another. It also has a TypeVar called NonBinaryStr, which is like AnyStr
except that it doesn't allow bytes.
Previously, uploaded files were served:
* With S3UploadBackend, via get_uploaded_file (redirects to S3)
* With LocalUploadBackend in production, via nginx directly
* With LocalUploadBackend in development, via Django's static file server
This changes that last case to use get_uploaded_file in development,
which is a key step towards being able to do proper access control
authorization.
Does not affect production.
This has no functional changes; we just replace the old hacky
assignment of functions with assignment of the upload backend to a
variable.
I'm not totally happy with this, because we end up having to copy the
type annotations of the three methods 4 times each, but this should
make it a lot easier to test the (non-default-in-tests) S3 backend
using end-to-end tests, which would have caught
13bac1cc2a.
I expect we'll iterate on the interface over time; ideally, I'd like
all the code that checks LOCAL_UPLOADS_DIR to be inside upload.py, and
primarily in these classes.
Calling open() with mode 'w' or 'a' will create a file if it doesn't exist,
while mode 'r' will cause an exception. This can be easily tested with:
python -c 'open("test.tmp", "w")'
ls test.tmp
Also, fixed a a small type annotation in users.py because email must
be a string because emails don't support UTF-8 at this time (according
a comment in gravatar_hash in avatar.py).
Currently this uses a Union type for connection_id; we need to figure
out what actually sets that and what its type is and fix that later
(see https://github.com/zulip/zulip/issues/896).
Also, fixed up the annotations for tornadoviews to better align with
how narrows was defined as `Iterable[Sequence[str]]` rather than
`List[Tuple[str, str]]`.
Had to add some "type: ignore" because the pattern used in match
doesn't affect the type returned. A fix for this issue has been pushed
to typeshed - https://github.com/python/typeshed/pull/244
These ones don't fix any bugs, because the mutable arg is never passed
outside of the callable or mutated. But it's good practice to not use
them in case those invariants are changed in the future.
[Substantially revised by tabbott]
This probably still has some bugs in it, but having mostly complete
annotations for models.py will help a lot for the annotations folks
are adding to other files.
Add two options to the `test-backend` script:
1. verbose
If given the `test-backend` script will give detailed output.
2. no-shallow
Default value is False. If given the `test-backend` script will
fail if it finds a template which is shallow tested.
This stub file allows us to annotate view functions using the actual
types present in the bodies of the functions, rather than everything
having the type REQ.
In function bulk_add_subscriptions, some variables were named
`stream_name` but their type is Stream, not a string. Rename
those variables to `stream`.
Long ago, there was work on an experimental integration model where
every user in a realm would have administrative control over all bots,
with the goal of simplifying the process of setting up communally
administered bots for smaller teams. While that new model was never
fully implemented (and thus never setup as an option), an error in
that original implementation meant that the data on all bots in a
realm, including their API keys, was sent to the browsers of users via
the `realm_bots` variable in `page_params`. The data wasn't displayed
in the UI for non-admin users, but was available via e.g. the
javascript console.
This commit updates this behavior to only send sensitive bot data like
API keys to the owner of the bot (and realm admins).
We may in the future implement a model simplifying communally
administered integrations, but if we do that, those bots should be
limited in their capabilities (e.g. only able to send webhook
messages).
This bug has been present since Zulip was released as open source.
The old code for this lookup was unnecessarily complicated because we
were working around Guardian, where the `is_realm_admin` check was
extremely expensive.
Previously we relied on having two matching list of fields for the
get_active_user_dicts_in_realm, one in the actual code and the other
in the caching system. By unifying these lists to have a single
source, we eliminate a class of caching bugs we might otherwise
regularly introduce.
This results in a substantial performance improvement for all of
Zulip's backend templates.
Changes in templates:
- Change `block.super` to `super()`.
- Remove `load` tag because Jinja2 doesn't support it.
- Use `minified_js()|safe` instead of `{% minified_js %}`.
- Use `compressed_css()|safe` instead of `{% compressed_css %}`.
- `forloop.first` -> `loop.first`.
- Use `{{ csrf_input }}` instead of `{% csrf_token %}`.
- Use `{# ... #}` instead of `{% comment %}`.
- Use `url()` instead of `{% url %}`.
- Use `_()` instead of `{% trans %}` because in Jinja `trans` is a block tag.
- Use `{% trans %}` instead of `{% blocktrans %}`.
- Use `{% raw %}` instead of `{% verbatim %}`.
Changes in tools:
- Check for `trans` block in `check-templates` instead of `blocktrans`
Changes in backend:
- Create custom `render_to_response` function which takes `request` objects
instead of `RequestContext` object. There are two reasons to do this:
1. `RequestContext` is not compatible with Jinja2
2. `RequestContext` in `render_to_response` is deprecated.
- Add Jinja2 related support files in zproject/jinja2 directory. It
includes a custom backend and a template renderer, compressors for js
and css and Jinja2 environment handler.
- Enable `slugify` and `pluralize` filters in Jinja2 environment.
Fixes#620.
This fixes an exception where client_id was never set in an error code
path. It shouldn't be needed, but I think this makes the code clearer
and this will help in debugging the actual problem.
Related to #753.
The error message for a test file that doesn't import properly was
previously pretty difficult to understand and it wasn't clear how to
debug the issue.
This commit adds the capability to keep track and remove uploaded
files. Unclaimed attachments are files that have been uploaded to the
server but are not referred in any messages. A management command to
remove old unclaimed files after a week is also included.
Tests for getting the file referred in messages are also included.
Since we don't have a stable way to get the Dropbox preview failure
image (and it was sorta a weird setup anyway), it seems best to just
remove the condition.
Previously we needed to use a specified password when activating a
formerly mirror dummy user, in order for that user to be able to
(re)set their password and login. Now that we have our own password
reset form, this is no longer required.
As documented in https://github.com/zulip/zulip/issues/441, Guardian
has quite poor performance, and in fact almost 50% of the time spent
running the Zulip backend test suite on my laptop was inside Guardian.
As part of this migration, we also clean up the old API_SUPER_USERS
variable used to mark EMAIL_GATEWAY_BOT as an API super user; now that
permission is managed entirely via the database.
When rebasing past this commit, developers will need to do a
`manage.py migrate` in order to apply the migration changes before the
server will run again.
We can't yet remove Guardian from INSTALLED_APPS, requirements.txt,
etc. in this release, because otherwise the reverse migration won't
work.
Fixes#441.
When uploaded avatar image is not a valid image file, PIL raises
IOError. Catch the IOError raised by PIL and raise JsonableError.
This will return a response with status code 400.
Previously, the UserProfile objects were created in the order
generated by a Set, which meant tests would randomly start failing if
the code that runs before this part of populate_db changed (and thus
caused the Set object used to pass users into bulk_create_users to
have a different order when enumerated).
This fixes the issue in two ways -- one by sorting the users inside
bulk_create_users, and second by attaching subscriptions to users
based on a deterministic ordering.
The restarted Tornado processes seemed to escape the process group and
thus continue running after run-dev.py finished.
While we're at it, we don't need to dump/reload event queues in the
test suite either.
The previous version of sanitize_name dropped all unicode characters
and mangled filenames with multiple `.`s in the extension, leading to
confusing URLs for files uploaded to Zulip.
Fixes#321.
[tweaked significantly by tabbott]
It's always been the case that in production, Tornado dumps all the
event queues when shut down so that they can be reloaded by the
replacement Tornado process. This never worked in development because
the codepath for auto-reload didn't go through either a signal or
sys.exit (it re-execs the process instead).
This meant that we didn't have a mechanism for testing the event queue
dump/load functionality in the development environment. We fix this
by adding such dumping/loading. However, this breaks the automatic
reloading of open browser windows on a server restart, so we add that
back in by adjusting the special `restart` events to pass a special
`immediate` flag when used in development.
This also has the benefit of removing the "Bad event queue" errors one
would get on every file save induced restart on the Python console.
This is a no-op right now, but we'll want the new structure for the
next commit, and splitting this out makes it a lot easier to read what
is actually changed in the next commit.
This change drops the memory used for Python processes run by Zulip in
development from about 1GB to 300MB on my laptop.
On the front of safety, http://pika.readthedocs.org/en/latest/faq.html
explains "Pika does not have any notion of threading in the code. If
you want to use Pika with threading, make sure you have a Pika
connection per thread, created in that thread. It is not safe to share
one Pika connection across threads.". Since this code only connects
to rabbitmq inside the individual threads, I believe this should be
safe.
Progress towards #32.
Apparently, our event queue garbage collection logic never actually
disconnected any existing handler objects.
We fix this by disconnecting the handler inside cleanup(), adding a
special check to avoid creating a pointless timeout object.
This line appears to have been lost in rebasing from the original
implementation of 1396eb7022faec4c2d91553800a35781a96dd5bd; so the
previous fix actually only addressed the issue in a rare exception
case.
Replaced calls to ifilterfalse by list comprehensions because
ifilterfalse is not part of python 3. Also changed some lists to sets
for faster lookup.
Refer to #256.
In 2ea0daab19, handlers were moved to
being tracked via the handlers_by_id dict, but nothing cleared this
dict, resulting in every handler object being leaked. Since a Tornado
process uses a different handler object for every request, this
resulted in a significant memory leak. We fix this by clearing the
handlers_by_id dict in the two code paths that would result in a
Tornado handler being de-allocated: the exception codepath and the
handler disconnect codepath.
Fixes#463.
Add a function email_allowed_for_realm that checks whether a user with
given email is allowed to join a given realm (either because the email
has the right domain, or because the realm is open), and use it
whenever deciding whether to allow adding a user to a realm.
This commit is not intended to change any behavior, except in one case
where the Zulip realm's domain was not being converted to lowercase.
While I believe this actually produced correct output since users are
always subscribed to streams within their realm, this code was
definitely wrong.
Discovered using the mypy type-checking tool.
At present, we only do a few simple checks on the client type inside
the event system, and this saves database/memcached queries.
Note that this preserves the structure of the marshalled name in
to_dict/from_dict as client_type to avoid an unnecessary migration.
Previously, client descriptors were referenced directly from the
handler object. Once we split the Tornado process into separate queue
and connection servers, these will no longer be in the same process,
so we need to reference them by ID instead.
This commit is somewhat ugly, but its purpose is to be early
preparation for splitting Tornado into a queue server and a frontend
server, and this code belongs, by and large, in the queue server
component.
Previously these were hardcoded in zproject/settings.py to be accessed
on localhost.
[Modified by Tim Abbott to adjust comments and fix configure-rabbitmq]
Previously:
* It wouldn't raise an exception if the stream didn't exist
* It didn't correctly handle being passed a stream name
that differed in case from the stream name in the database.
This removes from our cache a moderate amount of totally useless alert
word data corresponding to users who don't have any alert words.
Thanks to @dbiollo for the suggestion!
Just doing the database query is more readable, and has about the same
performance as before in the case where active user dicts for the
realm are in cache (and is substantially better in the rare case that
this isn't in the cache).
Thanks to @dbiollo for the perf investigation and suggestion!