We do not need direct_members and direct_subgroups field of
UserGroup objects in the export data since we already have
UserGroupMembership and GroupGroupMembership object data.
While importing we keep these fields empty when creating
UserGroup objects and direct_members and direct_subgroups
fields will get set when UserGroupMembership and
GroupGroupMembership objects are created.
This change will also help us in further changes when we
will change the order of importing to import UserGroup
objects just after Realm objects.
Technically recipient_id cannot be None when recipient exists. We
actually just want to check if the recipient exists.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This code is actually a noop (and would be a bug if it wasn't a noop),
because when this runs the server is already initialized, meaning the
internal realm exists and the system bots have been created, so
UserProfile.objects.filter(email=email) is always truthy. Also, system
bots are supposed to live in the internal realm, not in the realm being
imported so this code doesn't make sense currently.
We now use FULL_MEMBERS_GROUP_NAME instead of
writing the actual full members system group
name at multiple places, so that we can have
all the group names coded at one place only.
Previously, we had a function named create_add_users_to_system_user_groups
for creating system user groups and adding users to them in case when
exports do not contain these groups when importing from other services.
This commit just separates out the call to create_system_user_groups_for_realm
outside the function and the function is thus renamed to
add_users_to_system_user_group. This change is done because in further
commits we would need to update the import order and user groups will
be created before creating user profile objects.
`_cache` is not an attribute defined on `BaseCache`, but an
implementation detail of django_bmemcache.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This matches the metadata that we store in the database, and means
that the S3 metadatata invariant of always having a `user_profile_id`
in the metadata.
This does not fix existing imports, which may still have missing
`user_profile_id`s.
There can be cases when system groups data is not present while
importing, like when importing from other products, so this
commit adds code to create system user groups and add users to
it according to their role.
Sometimes we may get data to import, due to export bugs, malformed data
etc., which doesn't have the invariant of RealmEmoji.author always being
set. The import code should fix that, by choosing a reasonable default
and setting it.
It is confusing to have the plan type constants not be namespaced
by the thing they represent. We already have a namespacing
convention in place for constants, so we should use it for
Realm.plan_type as well.
This commit renames members field of UserGroup to direct_members
for better readability because in the new permissions model, a
user group can be a sub-group of another group and thus technically
members of sub-group will also be members of that group.
This is a prep commit for new permissions model.
Extracted this commit from #19866.
Co-authored-by: Anders Kaseorg <anders@zulip.com>
Because we create all realms with do_create_user (including in the
test suite), we just need to change that function, add a migration for
existing realms, and ensure the data import code path correctly
creates these objects.
Note that the import code path will create a RealmUserDefault row with
default values if it is not present in the import data, which is
important for importing data from other tools like Slack.
This way, we no longer have to manually keep the upload path code in
sync with the upload path code in zerver/lib/upload.py.
This was originally suggested in
https://github.com/zulip/zulip/pull/19478#issuecomment-911479530.
This change fixes a bug when importing into a server using the local
file uploads backend, where the `import_realm.py` copy wasn't using
our standard 256-directory approach to avoid putting too many files in
a single directory.
This adds a new class called MessageRenderingResult to contain the
additional properties we added to the Message object (like alert_words)
as well as the rendered content to ensure typesafe reference. No
behavioral change is made except changes in typing.
This is a preparatory change for adding django-stubs to the backend.
Related: #18777
While importing a realm, the stream dictionaries in data['zerver_stream']
already contains the field named `rendered_description`, which is set to
`""`. This lead the code to assume that the stream rendered_description
was already set, due to which, it was not setting the rendered_description
field for any stream.
The command:
codespell --skip='./locale,*.svg,./docs/translating,postgresql.conf.template.erb,.*fixtures,./yarn.lock,./docs/THIRDPARTY,./tools/setup/emoji/emoji_names.py,./tools/setup/emoji/emoji_map.json,./zerver/management/data/unified_reactions.json' --ignore-words=codespell_ignore_words.txt .
The content of codespell_ignore_words:
```
te
ans
pullrequest
ist
cros
wit
nwe
circularly
ned
ba
ressemble
ser
sur
hel
fpr
alls
nd
ot
```
`ensure_basic_avatar_image` and `ensure_medium_avatar_image` are
essentially the same thing, except a size parameter.
So, refactor them into a single function.
This doesn't introduce any functional changes.
Tweaked exports.py to add the config object there so that our export
tool can include the table when exporting. Also includes all the
changes required to import the new table from the exported data.
Helper function `get_realm_playgrounds` added to fetch all
playgrounds in a realm.
Tests amended.
Adds backend code for the mute users feature.
This is just infrastructure work (database
interactions, helpers, tests, events, API docs
etc) and does not involve any behavioral/semantic
aspects of muted users.
Adds POST and DELETE endpoints, to keep the
URL scheme mostly consistent in terms of `users/me`.
TODOs:
1. Add tests for exporting `zulip_muteduser` database table.
2. Add dedicated methods to python-zulip-api to be used
in place of the current `client.call_endpoint` implementation.
This adds the is_user_active with the appropriate code for setting the
value correctly in the future. In the following commit a migration to
backfill the value for existing Subscriptions will be added.
To ensure correct user_profile.is_active handling also in tests, we
replace all direct .is_active mutation with calls to appropriate
functions.
This prevents the memcached connection from being shared across
multiple processes, and hopefully addresses unexpected behavior from
cached functions like get_user_profile_by_id invoked inside the worker
processes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
There are three functional side effects:
• Correct an insignificant but mathematically offensive bias toward
repeated characters in generate_api_key introduced in commit
47b4283c4b4c70ecde4d3c8de871c90ee2506d87; its entropy is increased
from 190.52864 bits to 190.53428 bits.
• Use the base32 alphabet in confirmation.models.generate_key; its
entropy is reduced from 124.07820 bits to the documented 120 bits, but
now it uses 1 syscall instead of 24.
• Use the base32 alphabet in get_bigbluebutton_url; its entropy is
reduced from 51.69925 bits to 50 bits, but now it uses 1 syscall
instead of 10.
(The base32 alphabet is A-Z 2-7. We could probably replace all of
these with plain secrets.token_urlsafe, since I expect most callers
can handle the full urlsafe_b64 alphabet A-Z a-z 0-9 - _ without
problems.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
A few major themes here:
- We remove short_name from UserProfile
and add the appropriate migration.
- We remove short_name from various
cache-related lists of fields.
- We allow import tools to continue to
write short_name to their export files,
and then we simply ignore the field
at import time.
- We change functions like do_create_user,
create_user_profile, etc.
- We keep short_name in the /json/bots
API. (It actually gets turned into
an email.)
- We don't modify our LDAP code much
here.
Prior to this commit whenever convert was imported from zerver.lib.markdown
it was aliased as markdown_convert for readability.
This commit rename convert function to markdown_convert so that it can be
directly import it without aliasing and without compromising readability.
This commit is first of few commita which aim to change all the
bugdown references to markdown. This commits rename the files,
file path mentions and change the imports.
Variables and other references to bugdown will be renamed in susequent
commits.
Generated by pyupgrade --py36-plus --keep-percent-format.
Now including %d, %i, %u, and multi-line strings.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Generated by pyupgrade --py36-plus --keep-percent-format, but with the
NamedTuple changes reverted (see commit
ba7906a3c6, #15132).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
datetime.timezone is available in Python ≥ 3.2. This also lets us
remove a pytz dependency from the PostgreSQL scripts.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Refactored code in actions.py and streams.py to move stream related
functions into streams.py and remove the dependency on actions.py.
validate_sender_can_write_to_stream function in actions.py was renamed
to access_stream_for_send_message in streams.py.
This is critical for importing the very first realm into an empty
server, since in 27b15a9722, we changed
the model to create the internal realm when the first real realm would
be created, but neglected the data import code path.
"Zulip Voyager" was a name invented during the Hack Week to open
source Zulip for what a single-system Zulip server might be called, as
a Star Trek pun on the code it was based on, "Zulip Enterprise".
At the time, we just needed a name quickly, but it was never a good
name, just a placeholder. This removes that placeholder name from
much of the codebase. A bit more work will be required to transition
the `zulip::voyager` Puppet class, as that has some migration work
involved.
This legacy cross-realm bot hasn't been used in several years, as far
as I know. If we wanted to re-introduce it, I'd want to implement it
as an embedded bot using those common APIs, rather than the totally
custom hacky code used for it that involves unnecessary queue workers
and similar details.
Fixes#13533.
This is adds foreign keys to the corresponding Recipient object in the
UserProfile on Stream tables, a denormalization intended to improve
performance as this is a common query.
In the migration for setting the field correctly for existing users,
we do a direct SQL query (because Django 1.11 doesn't provide any good
method for doing it properly in bulk using the ORM.).
A consequence of this change to the model is that a bit of code needs
to be added to the functions responsible for creating new users (to
set the field after the Recipient object gets created). Fortunately,
there's only a few code paths for doing that.
Also an adjustment is needed in the import system - this introduces a
circular relation between Recipient and UserProfile. The field cannot be
set until the Recipient objects have been created, but UserProfiles need
to be created before their corresponding Recipients. We deal with this
by first importing UserProfiles same way as before, but we leave the
personal_recipient field uninitialized. After creating the Recipient
objects, we call a function to set the field for all the imported users
in bulk.
A similar change is made for managing Stream objects.
Fixes#1727.
With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.
Apparently, a subtle mismatch between the filename/URL formats for our
upload codebases meant that importing Slack avatars into systems using
S3_UPLOAD_BACKEND would end up with the avatars having the wrong URLs.
Our recently-added code for rewriting user IDs on data import didn't
correctly handle wildcard mentions and mentions generated by very old
versions of Zulip (pre data-user-id).
The previous query ended up doing an awkward join that did not
guarantee use of the Recipient index on zerver_message, turning a very
fast query into something that could take much longer for a single
stream than the rest of the import combined.
lxml parser appends html and body tags to the soup object which
are not reqired. There are no other major parsing diffrences between
the two parsers as long the HTML input is perfectly formated.
lxml parser is much faster than html.parser but it hardly matters
in our case.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#differences-
between-parsers
Previously, if you exported a Zulip organization and then re-imported
it, we'd end up renumbering the user IDs and all direct foreign key
references to them in the database, but not the data-user-id
references in mentions. Fix this by parsing the message content and
doing that renumbering.
(Because we import raw markdown, not HTML, from third-party tools,
these changes won't affect data import from slack etc.)
Fixes the high-priority part of #11293.
This field is primarily intended to support avoiding displaying the
"more topics" feature in new organizations and streams, where we might
know that all messages in the stream are already available in the
browser.
Based on original work by Roman Godov, and significantly modified by
tabbott.
The second migration involved here could be expensive on Zulip Cloud,
but is unlikely to be an issue on other servers.
In commit de65a04 we can see that if the need ever arises to modify
how stream descriptions are rendered, we would need to make changes
at 5 different call points which can be quite cumbersome. So this
functionality has been extracted to a new method called
'render_stream_descriptions'.
This commit leverages the ahocorasick algorithm to build a set of user_ids
that have their alert_words present in the message. It runs in linear time
of the order of length of the input message as opposed to number of
alert_words. This is after building a ahocorasick Automaton which runs
in O(number of alert_words in entire realm) which is usually cached.
We want to use the baseline features of bugdown, but not fancy things
like inline URL previews, since the whole structure of stream
descriptions is to have a single-line thing supporting some
formatting.
The migration part of this change fixes a bug encountered by some
organizations upgrading from older versions of Zulip.
We've for a while had logic to set plan_type to LIMITED when importing
into Zulip Cloud; we need corresponding logic to set it to SELF_HOSTED
when importing into a self-hosted server.
Fixes#11541.
This helps keep the realm.json small and easy to process; previously,
almost the entire size of that file was the analytics data.
We implement this by refactoring the analytics Config objects into a
separate subroutine that writes to a separate file, plus the
corresponding import code.
Manual testing was performed by exporting the 'analytics' realm, and
importing back to a newly created 'test' realm. The 'test' realm was
then exported and the json files were inspected. The data appeared
consistent with no abnormalities.
Fixes: #11220.
This commit does the following three things:
1. Update stream model to accomodate rendered description.
2. Render and save the stream rendered description on update.
3. Render and save stream descriptions on creation.
Further, the stream's rendered description is also sent whenever the
stream's description is being sent.
This is preparatory work for eliminating the use of the
non-authoritative marked.js markdown parser for stream descriptions.
This should eliminate the need to do manual analytics work when
importing organizations imported/exported using the zulip -> zulip
import/export tools.
The octet-stream content type is potentially under-specified, but it's
better than potentially submitting None and increases consistency of
this part of the codebase.
The boto library's s3 interface allows setting only string-format
metadata keys. So we need to cast the last_modified floating-point
timestamp into a string before storing on the S3 object.
This bug mostly broke uploading avatars when using the S3 storage backend.
Our HipChat conversion tool didn't properly handle basic avatar
images, resulting in only the medium-size avatar images being imported
properly. This fixes that bug by asking the import tool to do the
thumbnailing for the basic avatar image (from the .original file) as
well as the medium avatar image.
Fixes a bug in import_realm where secondary attributes like message
visibility weren't being set, and also makes bugs like this less likely in
the future.
Also, putting the plan_type change at the end of import_realm, so that
future restrictions to LIMITED realms don't affect the import process.
We've had a long stream of bugs existed because only one of these two
code paths was tested (usually the local uploads backend). By
deduplicating these functions, we ensure that this category of bugs no
longer happens.
Following my recent refactor, this is just a straightforward merge,
with code for one or the other backend ending up inside an if
statement.
Previously, we were incorrectly importing avatar PNGs to a filename
without the .png extension, resulting in them effectively not being
imported.
This was mitigated by the fact that we imported the originals and ran
the appropriate `ensure_` functions, but still a bug.
This commit speeds up the import by avoiding
sender lookups and instead using the data
for users that we already have in memory.
This avoids a few DB hops, many hops to memcached,
plus some object construction.
We now call do_render_markdown() directly. This
also makes it more explicit that the import has
never rendered alert words.
This function requires a message object, whereas
we want to work with JSON data to avoid necessary
queries when we import data. Inlining the function
sets us up for a subsequent refactoring.
We change the way we deal with theoretical return
values of `None` to use an assertion; otherwise,
we would have to loosen up a bunch of mypy types
from `str` to `Optional[str]`. It's not clear `None`
is even possible--we've moved toward throwing exceptions
there instead of silently failing.
The previous logic was incorrect, in that if `content_type` was set to
None (which happens with Slack/HipChat export, among other things),
then we wouldn't run the `guess_type` logic to auto-detect the
Content-Type to send to S3.
The UserMessage table can be huge, so creating a
bunch of entries in `ID_MAP` can overflow memory.
We don't have any tables that depend on `UserMessage`,
and we don't send the 'id' fields from `zerver_usermessage`
to the database, so re-mapping them was just busy-work.
When we create new ids for message rows, we
now sort the new ids by their corresponding
pub_date values in the rows.
This takes a sizable chunk of memory.
This feature only gets turned on if you
set sort_by_date to True in realm.json.
We use UserMessageLite to avoid Django overhead, and we
do updates in chunks of 10000. (The export may be broken
into several files already, but a reasonable chunking at
import time is good defense against running out of memory.)
The code was needlessly querying the DB to get full
objects for entities where we only needed user_id,
realm_id, and stream_id.
With my test data of ~1000 records this sped up the
function from ~8s to ~0.5s. The speedup would probably
be even more for larger data sets.
If any user had sent the reply to the welcome bot recommended by our
tutorial, then the Zulip export/import process didn't work properly,
because we weren't including (and then remapping) the recipient ID for
sending PMs to the cross-realm bots. This commit fixes that gap, by
recording the necessary data on the export side, and doing the
appropriate remapping on the import side.
Previously, our realm import logic only did the special remapping
logic for the original notifications_stream_id; when we added the new
signup_notifications_stream_id field, we neglected to handle it in the
same way.
The 'last_modified' value in emoji records is
needed for uploading the file to the S3 backend.
We set the same in the function 'import_uploads_s3'.
We also have to remove the keyword 'last_modified'
while building the RealmEmoji dict, as it is not
a field which exists in RealmEmoji objects.
The s3 import code path made a hard assumption about `user_profile_id`
being set (we'd already fixed this in the local uploads code path).
Ideally, it should be, and I've opened #10268 for fixing that, but for
now this is how it needs to work.
After the messages have been imported, set the rendered_content of the
messages instead of leaving its value to be 'None'.
This is important to ensure that:
(1) Performance for users is good after completing the import.
(2) The database's full-text indexes have all of the imported messages
(which only happens properly when Message rows have their
rendered_content field edited).
Fixes#9168.