This is a very early version of a tool to convert Hipchat
tar files into data files that can be used by the Zulip
import process.
We include the most fundamental entities--users and
streams. Customers who don't care about past messages
or customizations could start an instance off of this
and start communicating.
Of course, there are a lot of things missing in the
initial version:
* messages!
* file assets -- avatars, emojis, attachments
* probably lots of other minor things
We currently ignore any incoming dates from Hipchat data
and just use the current time. This is consistent with
other imports.
We also don't have any docs yet, although the process
will be extremely similar to the "Slack" process:
https://zulipchat.com/help/import-from-slack
Also, there's a comment at the top of convert_hipchat_data.py
that describes how to test this in dev mode.
I tested this by following the steps in the comment above.
The users just "show up" in /devlogin, so that's nice, and
you can send messages to other users. To verify the stream
data you have to go into the gear menu and click on "All
Streams", then you can subscribe and send a message.
Production users will need to get new passwords and
re-subscribe to streams. We will probably auto-subscribe
all users to public streams.
This fixes an issue where passing a path like `~/exports/foo` would
result in a `~` directory being created and the export/import not
working correctly.
This flag is used to track which user/message pairs correspond to an
active mobile push notification, that should potentially be cleared
when the user reads the message.
This flag should never appear on a message that is also marked as
read; eventually we may want a cron job to check for that condition.
We include a partial index on UserMessage for this flag.
The is_private flag is intended to be set if recipient type is
'private'(1) or 'huddle'(3), otherwise i.e if it is 'stream'(2), it
should be unset.
This commit adds a database index for the is_private flag (which we'll
need to use it). That index is used to reset the flag if it was
already set. The already set flags were due to a previous removal of
is_me_message flag for which the values were not cleared out.
For now, the is_private flag is always 0 since the really hard part of
this migration is clearing the unspecified previous state; future
commits will fully implement it actually doing something.
History: Migration rewritten significantly by tabbott to ensure it
runs in only 3 minutes on chat.zulip.org. A key detail in making that
work was to ensure that we use the new index for the queries to find
rows to update (which currently requires the `order_by` and `limit`
clauses).
The only changes visible at the AST level, checked using
https://github.com/asottile/astpretty, are
zerver/lib/test_fixtures.py:
'\x1b\\[(1|0)m' ↦ '\\x1b\\[(1|0)m'
'\\[[X| ]\\] (\\d+_.+)\n' ↦ '\\[[X| ]\\] (\\d+_.+)\\n'
which is fine because re treats '\\x1b' and '\\n' the same way as
'\x1b' and '\n'.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Significantly tweaked by tabbott because:
* Argparse was already handling the early checks
* Splitting the bottom loop into two loops means we validate all the
input before trying to run actual import code on anything.
* The argparse documentation was confusing about whether the paths
should be files or directories.
We extract the entire operations of the management command to a
function create_if_missing_realm_internal_bots in the
zerver/lib/onboarding.py. The logic for determining if there are any realm
internal bots which have not been created is extracted to a function
missing_any_realm_internal_bots in actions.py.
We add conditional infinite sleep to this delivery job as a means to
handle case of multiple servers in service to a realm running this
job. In such a scenerio race conditions might arise leading to
multiple deliveries for same message. This way we try to match the
behaviour of what other jobs do in such a case.
Note: We should eventually do something to make such jobs work
while being running on multiple servers.
This should help avoid confusing error messages for anyone
accidentally running this twice.
In particular, this also makes it easier to run Zulip inside
Kubernetes, since one doesn't need to worry about duplicate calls.
This fixes exceptions when sending PMs in development (where we were
trying to connect to the localhost push bouncer, which we weren't
authorized for, but even if we were, it wouldn't work, since there's
no APNS/GCM certs).
At the same time, we also set and order of operations that ensures one
has the opportunity to adjust the server URL before submitting
anything to us.
When you're importing with --destroy-rebuild-database, we need to
check subdomain availability after we've cleared out the database;
otherwise, trying to reuse the same subdomain doesn't work.
This logging was apparently broken when sorting imports; it's a fairly
unique thing in our codebase that this would be a problem. Prevent
future regressions by adding this exception explicitly to the isort
configuration.
Issue #2088 asked for a wrapper to be created for
`create_stream_if_needed` (called `ensure_stream`) for the 25 times that
`create_stream_if_needed` is called and ignores whether the stream was
created. This commit replaces relevant occurences of
`create_stream_if_needed` with `ensure_stream`, including imports.
The changes weren't significant enough to add any tests or do any
additional manual testing.
The refactoring intended to make the API easier to use in most cases.
The majority of uses of `create_stream_if_needed` ignored the second
parameter.
Fixes: #2088.
We also delete a couple helper functions that were only used there.
This management command was primarily used before we had a UI for
creating outgoing webhook bots.
We no longer accept URLs while creating emoji; so this management
command was probably left out while migrating realm emoji
infrastructure to upload backend.
We could fix this to work properly today, but the command was
originally written in a context when Zulip didn't have a UI for
managing realm emoji at all. Now that we do have such a UI, it
doesn't have a compelling use case, and work on migrating the realm
emoji schema demonstrates that this does have a maintenance cost.
So, we simply remove this command.
The fresh imported data shows that the users emails are not included
in the data. However, the data received from the older method of slack
(which is using legacy tokens) contains the email data of the users.
This code duplicated the code in setup_realm_internal_bots, with some
added logic to avoid trying to create the same bot twice. That logic
was buggy so that it would never work at all -- it subtracted a set of
UserProfile objects from a set of email strings -- so it looked like
the command might blow up when run after the users already existed.
In fact, the buggy logic wasn't necessary, because the work the
command does after it is idempotent -- in particular `create_users`,
within its subroutine `bulk_create_users`, already filters out users
that already exist. So just cut the buggy stuff out, deduplicate the
rest with `setup_realm_internal_bots`, and document that invariant on
the latter.
While we're here, in the common case bail early without doing any
per-realm work in Python, since we're running this on every upgrade.
The name `create_logger` suggests something much bigger than what this
function actually does -- the logger doesn't any more or less exist
after the function is called than before. Its one real function is to
send logs to a specific file.
So, pull out that logic to an appropriately-named function just for
it. We already use `logging.getLogger` in a number of places to
simply get a logger by name, and the old `create_logger` callsites can
do the same.
Because calls to `create_logger` generally run after settings are
configured, these would override what we have in `settings.LOGGING` --
which in particular defeated any attempt to set log levels in
`test_settings.py`. Move all of these settings to the same place in
`settings.py`, so they can be overridden in a uniform way.
This is already the loglevel we set on the root logger, so this has no
effect -- except in tests, where `test_settings.py` attempts to set
some of these same loggers to higher loglevels. Because the
`create_logger` call generally runs after we've configured settings,
it clobbers that effect.
The code in `test_settings.py` that tries to suppress logs only works
because it also sets `propagate=False`, which has nothing to do with
loglevels but does cause logs at this logger (and descendants) to be
dropped completely unless we've configured handlers for this logger
(or one of its relevant descendants.)
These are the exceptions to the rule that our queues correspond to
queue-processor workers.
Purging `notify_tornado` in particular is a useful workaround right
now for some error spew in the dev environment.
Do you call get_recipient(Recipient.STREAM, stream_id) or
get_recipient(stream_id, Recipient.STREAM)? I could never
remember, and it was not very type safe, since both parameters
are integers.
Before this change, we populated two cache entries for each
message that we sent. The entries were largely redundant,
with the only difference being whether we sent the content
as raw markdown or as the rendered HTML.
This commit makes it so we only have one cache entry per
message, and it includes both content and rendered_content.
One legacy source on confusion here is that `content`
changes meaning when you're on the front end. Here is the
situation going forward:
database:
content = raw
rendered_contented = rendered
cache entry:
content = raw
rendered_contented = rendered
payload for the frontend:
content = raw (for apply_markdown=False)
content = rendered (for apply_markdown=True)
Now we use 'git ls-files' to get the list of locales that we actually
track. Previously we were using os.listdir to get the contents of the
static/locale directory. This could also return locales which were
present in the directory but are not supported by us, e.g. zh_CN.
We have been assigning locale to language code. Mostly code and locale
are same but for languages like zh-Hans, locale is zh_Hans and code is
zh-hans.
After this commit, compilemessages command should be run.
Previously we were using regexes to extract the language from our
locale files. Now we use LANG_INFO data structure provided by Django
to do the same and fallback to PO files only when language code is not
present in the Django data structure.
This should mean that maintaining two Zulip development environments
using the same Git checkout no longer has caching problems keeping
track of the migration status.
Previously we used to mark a key as unstranlated if its value was equal
to it in translations.json. This had an issue because it didn't allow
otherwise valid cases where key was equal to the value.
This commit solves the problem by disallowing an empty string as a valid
translation and then using the empty string as the value for all the
unstranslated keys.
Fixes#5261
compilemessages command now does all the heavy lifting by creating a
language_name_map.json file under locale directory. This file is used
by get_language_list to retrieve the require information.
Fixes: #6486
The commit() call in fix() breaks migrations and tests (unless you
mock) due to outer transactions.
We now explicitly call commit() from the management command.
This commit completely switches us over to using a
dedicated model called MutedTopic to track which topics
a user has muted.
This includes the necessary migrations to create the
table and populate it from legacy data in UserProfile.
A subsequent commit will actually remove the old field
in UserProfile.
The double forward slash (//) after the protocol in URLs was being
mistakenly considered the beginning of an inline JS comment, causing
internationalization strings being cut unexpectedly.
Now the check for inline JS comments is only run in .js files.
This code empirically doesn't work. It's not entirely clear why, even
having done quite a bit of debugging; partly because the code is quite
convoluted, and because it shows the symptoms of people making changes
over time without really understanding how it was supposed to work.
Moreover, this code targets an old version of the APNs provider API.
Apple deprecated that in 2015, in favor of a shiny new one which uses
HTTP/2 to meet the same needs for concurrency and scale that the old
one had to do a bunch of ad-hoc protocol design for.
So, rip this code out. We'll build a pathway to the new API from
scratch; it's not that complicated.
Most of the code in show_unreads is for diagnosising unread
counts issues, and we may not use that often.
We're creating a dedicated fix_unreads management command with
less clutter.
This causes `upgrade-zulip-from-git`, as well as a no-option run of
`tools/build-release-tarball`, to produce a Zulip install running
Python 3, rather than Python 2. In particular this means that the
virtualenv we create, in which all application code runs, is Python 3.
One shebang line, on `zulip-ec2-configure-interfaces`, explicitly
keeps Python 2, and at least one external ops script, `wal-e`, also
still runs on Python 2. See discussion on the respective previous
commits that made those explicit. There may also be some other
third-party scripts we use, outside of this source tree and running
outside our virtualenv, that still run on Python 2.
This management command creates the same indexes as migrations
82, 83, and 95, which are all indexes on the huge UserMessage
table. (*)
This command quickly no-ops with clear messaging when the
indexes already exist, so it's idempotent in that regard. (If
somebody somehow creates an index by the same name incorrectly,
they can always drop it in dbshell and re-run this command.)
If any of the migrations have not been run, which we detect simply
by the existence of the indexes, then we create them using a
`CREATE INDEX CONCURRENTLY` command. This functionality in
postgres allows you to create indexes against large tables
without disrupting queries against those tables. The tradeoff
here is that creating indexes concurrently takes significantly
longer than doing them non-concurrently.
Since most tables are small, we typically just use regular
Django migrations and run them during a brief interval while
the app is down.
For indexes on big tables, we will want to run this command
as part of the upgrade process, and we will want to run
it while the app is still up, otherwise it's pointless.
All the code in create_indexes() is literally copy/pasted
from the relevant migrations, and that scheme should work
going forward. (It uses a different implementation of
create_index_if_not_exist than the migrations use, but the
code is identical lexically in the function.)
If we ever do major restructuring of our large tables, such
as UserMessage, and we end up droppping some of these indexes,
then we will need to make this command migrations-aware. For
now it's safe to assume that indexes are generally additive in
nature, and the sooner we create them during the upgrade process,
the better.
(*) UserMessage is huge for large installations, of course.
This no longer does the correct thing (in terms of onboarding emails,
default streams, etc), and is tempting for new server admins to use.
Once we remove it we'll also have the invariant that we can't have a realm
without a user, which will simplify accounts_register a bit.