This is a very early version of a tool to convert Hipchat
tar files into data files that can be used by the Zulip
import process.
We include the most fundamental entities--users and
streams. Customers who don't care about past messages
or customizations could start an instance off of this
and start communicating.
Of course, there are a lot of things missing in the
initial version:
* messages!
* file assets -- avatars, emojis, attachments
* probably lots of other minor things
We currently ignore any incoming dates from Hipchat data
and just use the current time. This is consistent with
other imports.
We also don't have any docs yet, although the process
will be extremely similar to the "Slack" process:
https://zulipchat.com/help/import-from-slack
Also, there's a comment at the top of convert_hipchat_data.py
that describes how to test this in dev mode.
I tested this by following the steps in the comment above.
The users just "show up" in /devlogin, so that's nice, and
you can send messages to other users. To verify the stream
data you have to go into the gear menu and click on "All
Streams", then you can subscribe and send a message.
Production users will need to get new passwords and
re-subscribe to streams. We will probably auto-subscribe
all users to public streams.
This fixes an issue where passing a path like `~/exports/foo` would
result in a `~` directory being created and the export/import not
working correctly.
This flag is used to track which user/message pairs correspond to an
active mobile push notification, that should potentially be cleared
when the user reads the message.
This flag should never appear on a message that is also marked as
read; eventually we may want a cron job to check for that condition.
We include a partial index on UserMessage for this flag.
The is_private flag is intended to be set if recipient type is
'private'(1) or 'huddle'(3), otherwise i.e if it is 'stream'(2), it
should be unset.
This commit adds a database index for the is_private flag (which we'll
need to use it). That index is used to reset the flag if it was
already set. The already set flags were due to a previous removal of
is_me_message flag for which the values were not cleared out.
For now, the is_private flag is always 0 since the really hard part of
this migration is clearing the unspecified previous state; future
commits will fully implement it actually doing something.
History: Migration rewritten significantly by tabbott to ensure it
runs in only 3 minutes on chat.zulip.org. A key detail in making that
work was to ensure that we use the new index for the queries to find
rows to update (which currently requires the `order_by` and `limit`
clauses).
The only changes visible at the AST level, checked using
https://github.com/asottile/astpretty, are
zerver/lib/test_fixtures.py:
'\x1b\\[(1|0)m' ↦ '\\x1b\\[(1|0)m'
'\\[[X| ]\\] (\\d+_.+)\n' ↦ '\\[[X| ]\\] (\\d+_.+)\\n'
which is fine because re treats '\\x1b' and '\\n' the same way as
'\x1b' and '\n'.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Significantly tweaked by tabbott because:
* Argparse was already handling the early checks
* Splitting the bottom loop into two loops means we validate all the
input before trying to run actual import code on anything.
* The argparse documentation was confusing about whether the paths
should be files or directories.
We extract the entire operations of the management command to a
function create_if_missing_realm_internal_bots in the
zerver/lib/onboarding.py. The logic for determining if there are any realm
internal bots which have not been created is extracted to a function
missing_any_realm_internal_bots in actions.py.
We add conditional infinite sleep to this delivery job as a means to
handle case of multiple servers in service to a realm running this
job. In such a scenerio race conditions might arise leading to
multiple deliveries for same message. This way we try to match the
behaviour of what other jobs do in such a case.
Note: We should eventually do something to make such jobs work
while being running on multiple servers.
This should help avoid confusing error messages for anyone
accidentally running this twice.
In particular, this also makes it easier to run Zulip inside
Kubernetes, since one doesn't need to worry about duplicate calls.
This fixes exceptions when sending PMs in development (where we were
trying to connect to the localhost push bouncer, which we weren't
authorized for, but even if we were, it wouldn't work, since there's
no APNS/GCM certs).
At the same time, we also set and order of operations that ensures one
has the opportunity to adjust the server URL before submitting
anything to us.
When you're importing with --destroy-rebuild-database, we need to
check subdomain availability after we've cleared out the database;
otherwise, trying to reuse the same subdomain doesn't work.
This logging was apparently broken when sorting imports; it's a fairly
unique thing in our codebase that this would be a problem. Prevent
future regressions by adding this exception explicitly to the isort
configuration.
Issue #2088 asked for a wrapper to be created for
`create_stream_if_needed` (called `ensure_stream`) for the 25 times that
`create_stream_if_needed` is called and ignores whether the stream was
created. This commit replaces relevant occurences of
`create_stream_if_needed` with `ensure_stream`, including imports.
The changes weren't significant enough to add any tests or do any
additional manual testing.
The refactoring intended to make the API easier to use in most cases.
The majority of uses of `create_stream_if_needed` ignored the second
parameter.
Fixes: #2088.
We also delete a couple helper functions that were only used there.
This management command was primarily used before we had a UI for
creating outgoing webhook bots.
We no longer accept URLs while creating emoji; so this management
command was probably left out while migrating realm emoji
infrastructure to upload backend.
We could fix this to work properly today, but the command was
originally written in a context when Zulip didn't have a UI for
managing realm emoji at all. Now that we do have such a UI, it
doesn't have a compelling use case, and work on migrating the realm
emoji schema demonstrates that this does have a maintenance cost.
So, we simply remove this command.
The fresh imported data shows that the users emails are not included
in the data. However, the data received from the older method of slack
(which is using legacy tokens) contains the email data of the users.
This code duplicated the code in setup_realm_internal_bots, with some
added logic to avoid trying to create the same bot twice. That logic
was buggy so that it would never work at all -- it subtracted a set of
UserProfile objects from a set of email strings -- so it looked like
the command might blow up when run after the users already existed.
In fact, the buggy logic wasn't necessary, because the work the
command does after it is idempotent -- in particular `create_users`,
within its subroutine `bulk_create_users`, already filters out users
that already exist. So just cut the buggy stuff out, deduplicate the
rest with `setup_realm_internal_bots`, and document that invariant on
the latter.
While we're here, in the common case bail early without doing any
per-realm work in Python, since we're running this on every upgrade.
The name `create_logger` suggests something much bigger than what this
function actually does -- the logger doesn't any more or less exist
after the function is called than before. Its one real function is to
send logs to a specific file.
So, pull out that logic to an appropriately-named function just for
it. We already use `logging.getLogger` in a number of places to
simply get a logger by name, and the old `create_logger` callsites can
do the same.
Because calls to `create_logger` generally run after settings are
configured, these would override what we have in `settings.LOGGING` --
which in particular defeated any attempt to set log levels in
`test_settings.py`. Move all of these settings to the same place in
`settings.py`, so they can be overridden in a uniform way.
This is already the loglevel we set on the root logger, so this has no
effect -- except in tests, where `test_settings.py` attempts to set
some of these same loggers to higher loglevels. Because the
`create_logger` call generally runs after we've configured settings,
it clobbers that effect.
The code in `test_settings.py` that tries to suppress logs only works
because it also sets `propagate=False`, which has nothing to do with
loglevels but does cause logs at this logger (and descendants) to be
dropped completely unless we've configured handlers for this logger
(or one of its relevant descendants.)
These are the exceptions to the rule that our queues correspond to
queue-processor workers.
Purging `notify_tornado` in particular is a useful workaround right
now for some error spew in the dev environment.
Do you call get_recipient(Recipient.STREAM, stream_id) or
get_recipient(stream_id, Recipient.STREAM)? I could never
remember, and it was not very type safe, since both parameters
are integers.
Before this change, we populated two cache entries for each
message that we sent. The entries were largely redundant,
with the only difference being whether we sent the content
as raw markdown or as the rendered HTML.
This commit makes it so we only have one cache entry per
message, and it includes both content and rendered_content.
One legacy source on confusion here is that `content`
changes meaning when you're on the front end. Here is the
situation going forward:
database:
content = raw
rendered_contented = rendered
cache entry:
content = raw
rendered_contented = rendered
payload for the frontend:
content = raw (for apply_markdown=False)
content = rendered (for apply_markdown=True)
Now we use 'git ls-files' to get the list of locales that we actually
track. Previously we were using os.listdir to get the contents of the
static/locale directory. This could also return locales which were
present in the directory but are not supported by us, e.g. zh_CN.
We have been assigning locale to language code. Mostly code and locale
are same but for languages like zh-Hans, locale is zh_Hans and code is
zh-hans.
After this commit, compilemessages command should be run.
Previously we were using regexes to extract the language from our
locale files. Now we use LANG_INFO data structure provided by Django
to do the same and fallback to PO files only when language code is not
present in the Django data structure.
This should mean that maintaining two Zulip development environments
using the same Git checkout no longer has caching problems keeping
track of the migration status.
Previously we used to mark a key as unstranlated if its value was equal
to it in translations.json. This had an issue because it didn't allow
otherwise valid cases where key was equal to the value.
This commit solves the problem by disallowing an empty string as a valid
translation and then using the empty string as the value for all the
unstranslated keys.
Fixes#5261
compilemessages command now does all the heavy lifting by creating a
language_name_map.json file under locale directory. This file is used
by get_language_list to retrieve the require information.
Fixes: #6486
The commit() call in fix() breaks migrations and tests (unless you
mock) due to outer transactions.
We now explicitly call commit() from the management command.
This commit completely switches us over to using a
dedicated model called MutedTopic to track which topics
a user has muted.
This includes the necessary migrations to create the
table and populate it from legacy data in UserProfile.
A subsequent commit will actually remove the old field
in UserProfile.
The double forward slash (//) after the protocol in URLs was being
mistakenly considered the beginning of an inline JS comment, causing
internationalization strings being cut unexpectedly.
Now the check for inline JS comments is only run in .js files.
This code empirically doesn't work. It's not entirely clear why, even
having done quite a bit of debugging; partly because the code is quite
convoluted, and because it shows the symptoms of people making changes
over time without really understanding how it was supposed to work.
Moreover, this code targets an old version of the APNs provider API.
Apple deprecated that in 2015, in favor of a shiny new one which uses
HTTP/2 to meet the same needs for concurrency and scale that the old
one had to do a bunch of ad-hoc protocol design for.
So, rip this code out. We'll build a pathway to the new API from
scratch; it's not that complicated.
Most of the code in show_unreads is for diagnosising unread
counts issues, and we may not use that often.
We're creating a dedicated fix_unreads management command with
less clutter.
This causes `upgrade-zulip-from-git`, as well as a no-option run of
`tools/build-release-tarball`, to produce a Zulip install running
Python 3, rather than Python 2. In particular this means that the
virtualenv we create, in which all application code runs, is Python 3.
One shebang line, on `zulip-ec2-configure-interfaces`, explicitly
keeps Python 2, and at least one external ops script, `wal-e`, also
still runs on Python 2. See discussion on the respective previous
commits that made those explicit. There may also be some other
third-party scripts we use, outside of this source tree and running
outside our virtualenv, that still run on Python 2.
This management command creates the same indexes as migrations
82, 83, and 95, which are all indexes on the huge UserMessage
table. (*)
This command quickly no-ops with clear messaging when the
indexes already exist, so it's idempotent in that regard. (If
somebody somehow creates an index by the same name incorrectly,
they can always drop it in dbshell and re-run this command.)
If any of the migrations have not been run, which we detect simply
by the existence of the indexes, then we create them using a
`CREATE INDEX CONCURRENTLY` command. This functionality in
postgres allows you to create indexes against large tables
without disrupting queries against those tables. The tradeoff
here is that creating indexes concurrently takes significantly
longer than doing them non-concurrently.
Since most tables are small, we typically just use regular
Django migrations and run them during a brief interval while
the app is down.
For indexes on big tables, we will want to run this command
as part of the upgrade process, and we will want to run
it while the app is still up, otherwise it's pointless.
All the code in create_indexes() is literally copy/pasted
from the relevant migrations, and that scheme should work
going forward. (It uses a different implementation of
create_index_if_not_exist than the migrations use, but the
code is identical lexically in the function.)
If we ever do major restructuring of our large tables, such
as UserMessage, and we end up droppping some of these indexes,
then we will need to make this command migrations-aware. For
now it's safe to assume that indexes are generally additive in
nature, and the sooner we create them during the upgrade process,
the better.
(*) UserMessage is huge for large installations, of course.
This no longer does the correct thing (in terms of onboarding emails,
default streams, etc), and is tempting for new server admins to use.
Once we remove it we'll also have the invariant that we can't have a realm
without a user, which will simplify accounts_register a bit.
This now breaks the process of cleaning up unread counts for
non-active streams into a three step process.
This allows us to use our unread message flags index, at least
in testing on dev. Here is the relevant excerpt from explain
analyze:
Bitmap Index Scan on zerver_usermessage_unread_message_id
This makes supervisor see the service as cheerfully running
and let it alone, rather than constantly retry starting it.
Because the crash/restart loop means repeatedly spending a
couple of seconds loading Django and the app, separated by
brief periods while supervisor notices the crash and acts
on it, it was actually consuming about 30-50% CPU on the
zulipchat.com staging server.
This is to be used for the case of container orchestration instead of
shell arg to prevent snooping by any user account on the server via `ps
-ef` or any superuser with read access to the user\'s bash history.
ScheduledJob was written for much more generality than it ended up being
used for. Currently it is used by send_future_email, and nothing
else. Tailoring the model to emails in particular will make it easier to do
things like selectively clear emails when people unsubscribe from particular
email types, or seamlessly handle using the same email on multiple realms.
Both the queue processor and ScheduledJob emails need to sometimes pass a
to_user_id and sometimes pass a to_email, and it's more convenient to just
have one function that they can call that can handle either.
Also removes the now redundant send_email_to_user.
The MitUser table was removed in df525ad.
confirm_mituser.html could have been accessed through the last few lines of
confirmation/views.py:
templates.insert(0, 'confirmation/confirm_%s.html'
% (obj._meta.model_name,))
The commit message on df525ad suggests there was another way
confirm_mituser.html could have been called, but I don't currently see
evidence for it in the code.
Usually we write translation expressions as `{{t ... }}`, but `{{ t ... }}`
is equally valid as far as Handlebars is concerned, and it matches how we
usually write simple variable substitutions, as `{{ ... }}`. So occasionally
someone writes `{{ t ... }}`; currently there are two examples of this
in the codebase, in `settings/bot-{settings,list-admin}.handlebars`.
Probably it'd be good to pick a style and enforce it uniformly, but
until we do, the other style shouldn't break translation.
- For threaded workers:
Django's autoreloader catches SIGQUIT(3) to reload the program. If
a process being watched by autoreloader exits with status code 3,
reloader will restart the process. To reload, we send SIGUSR1(10)
signal from consumers to a handler in process_queue which then
exits with status code 3.
- For single worker per process:
Catch the SIGUSR1 and quit; supervisorctl will restart the worker
automatically.
Fixes#5512
This system hasn't been in active use for several years, and had some
problems with it's design. So it makes sense to just remove it to declutter
the codebase.
Fixes#5655.
This new library is intended to make it easy for management commands
to access a realm or a user in a realm without having to duplicate any
of the annoying parsing/extraction code.
Make it less likely that further development will break compatibility with
ZULIP_ADMINISTRATORs of the form "name <email>".
Note that the suggested value for this setting has been
'zulip-admin@example.com' for a while, so hopefully this commit causes no
change for most installations.
Once we implement org_type-specific features, it'll be easy to change a
corporate realm to a community realm, but hard to go the other way. The main
difference (the main thing that makes migrating from a community realm to a
corporate realm hard) is that you'd have to make everyone sign another terms
of service.
Now REs have moved to template module. This commit adds a condition
to use trans_real module if the Django version is less than 1.11 else
use template module.
While running queue processors multithreaded will limit the
performance available to very small systems, it's easy to fix that by
adding more RAM, and previously, Zulip didn't work on such systems at
all, so this is unambiguously an improvement there.
Fixes#32.
Fixes#34.
(Commit message expanded significantly by tabbott.)
Previously, all notification preference setting had a dedicated test
and setter. Now, all are handled through a modular function using the
property_types framework.
This commit is a step towards the goal of replacing most of the
send_future_email pathway with a call to send_email.
Note that this commit changes the default value of sender from "Zulip
<NOREPLY_EMAIL_ADDRESS>" to "NOREPLY_EMAIL_ADDRESS". NOREPLY_EMAIL_ADDRESS
will soon be changed to have the Zulip in front.
This commit replaces all uses of django.core.mail.send_mail with send_email,
other than in the password reset flow, since that code looks like it is just
a patch to Django's password reset code.
The send_email function is in a new file, since putting it in
zerver.lib.notifications would create an import loop with confirmation.models.
send_future_email will soon be moved into email.py as well.
This fixes a performance problem where we were previously starting up
a full Django process (~0.7s even on a fast machine) every time a new
email came in, potentially allowing users to accidentally DoS a Zulip
server. Now, we just post over HTTPS, allowing the existing thread
pool support to do its job.
- Add script wrapper to communicate postfix pipe with django web server
over HTTP(S). It uses shared_secret authentication mode.
- Add django view to process messages from email mirror server.
- Clean management command `email-mirror`. Left just functional
for cron email processing.
- Add routes for new tornado view.
- Change pipe script in master process postfix config template
based on updated script.
- Add tests.
Tweaked by tabbott to adjust the directory and set better defaults.
Fixes#2421.
Some Handlebars strings contained whitespaces characters at their ends.
With this, such characters are removed, as well as multiple spaces
(like the ones produced by code indentation).
This also includes a couple of fixes that removes spaces that were
intentionally placed before/after the string to translate.
This better sets expectatations for the fact that in Zulip, the
Organization settings UI is available read-only to non-administrator
users.
Tweaked by tabbott to update some additional references.
This system was quite complicated, and never had great semantics.
Eventually, we'll want some other system for gating which server
should generate digest emails for which realm controlled via the
database.
In this commit we just change the upload_avatar_image function to accept
two user_profiles acting_user_profile and target_user_profile. Basically
email param is dropped for a target_user_profile so that avatar's could
be moved lateron to user id based storage.
datetime.utcnow() is a timezone-naive datetime. The Django ORM interprets it
in the settings.TIME_ZONE timezone (e.g. 'America/New_York' in the
development server). We perhaps haven't noticed errors yet since with
'America/New_York' all it means is that emails are sent 5 hours early, or a
slightly different set of messages are included in the digest.
When you pass a naive datetime to the Django ORM, it uses settings.TIME_ZONE
for the time zone. In the development environment, both settings.TIME_ZONE
and datetime.now() use 'America/New_York', so there is no change in behavior
there. (fromtimestamp with no tz argument uses the same timezone as
datetime.now)
We are soon going to change settings.TIME_ZONE to UTC, so need to remove
naive datetimes from queries to the ORM.