Make the import of `Realm`, `Stream` and `UserGroup` objects be
done in single transaction, to make the import process in general
more atomic.
This also removes the need to temporarily unset the Stream references
on the Realm object. Since Django creates foreign key constraints
with `DEFERRABLE INITIALLY DEFERRED`, an insertion of a Realm row can
reference not-yet-existing Stream rows as long as the row is created
before the transaction commits.
Discussion - https://chat.zulip.org/#narrow/stream/101-design/topic/New.20permissions.20model/near/1585274.
We now set tos_version to "-1" for imported users and the ones
created using API or using other methods like LDAP, SCIM and
management commands. This value will help us to allow users to
change email address visibility setting during first login.
The comment was outdated, currently we import UserProfiles before
realm_tables - because some models in realm_tables have a dependency on
UserProfile.
Also makes sense to elaborate a bit more in the comment that it's just
an outline of the ordering, not an exhaustive list.
Fixes#25414.
We add Attachment.scheduled_messages relation to track ScheduledMessages
which reference the attachment.
The import bits can be done after merging this, by updating #25345.
This implements the core of the rewrite described in:
For the backend data model for UserPresence to one that supports much
more efficient queries and is more correct around handling of multiple
clients. The main loss of functionality is that we no longer track
which Client sent presence data (so we will no longer be able to say
using UserPresence "the user was last online on their desktop 15
minutes ago, but was online with their phone 3 minutes ago"). If we
consider that information important for the occasional investigation
query, we have can construct that answer data via UserActivity
already. It's not worth making Presence much more expensive/complex
to support it.
For slim_presence clients, this sends the same data format we sent
before, albeit with less complexity involved in constructing it. Note
that we at present will always send both last_active_time and
last_connected_time; we may revisit that in the future.
This commit doesn't include the finalizing migration, which drops the
UserPresenceOld table.
The way to deploy is to start the backfill migration with the server
down and then start the server *without* the user_presence queue worker,
to let the migration finish without having new data interfering with it.
Once the migration is done, the queue worker can be started, leading to
the presence data catching up to the current state as the queue worker
goes over the queued up events and updating the UserPresence table.
Co-authored-by: Mateusz Mandera <mateusz.mandera@zulip.com>
So far, we've used the BitField .authentication_methods on Realm
for tracking which backends are enabled for an organization. This
however made it a pain to add new backends (requiring altering the
column and a migration - particularly troublesome if someone wanted to
create their own custom auth backend for their server).
Instead this will be tracked through the existence of the appropriate
rows in the RealmAuthenticationMethods table.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
On multi-realm systems this results in traversal of all messages in
all realms and returns a massive payload of 1 row per stream on
the server, not the intended one row per realm.
This guarantees that the Realm is always non-None when we hit the
codepath is_static_or_current_realm_url via
do_change_stream_description, so that we can properly skip rewritting
some images.
Fixes#19405
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
On my data (about 10 million messages in 1600 streams) this used to take
about 40 hours, while the improved statement completes in roughly 30
seconds.
The old solution had postgres go through the entire table until the
first match for each stream. Thus, the time spent scanning the table
got longer and longer for each stream because postgres always started at
the beginning (and somehow it did not use any indices) and had to skip
over all rows until it found the first message from the stream that is
was looking for each time.
This new statement just performans a bulk operation, scanning the table
only once and then inserts the results directly into the destination
table.
Slightly more verbose inforation about this change can be found in:
https://chat.zulip.org/#narrow/stream/31-production-help/topic/Import.20Rocketchat.20data/near/1408867
Signed-off-by: Florian Pritz <bluewind@xinu.at>
These were useful as a transitional workaround to ignore type errors
that only show up with django-stubs, while avoiding errors about
unused type: ignore comments without django-stubs. Now that the
django-stubs transition is complete, switch to type: ignore comments
so that mypy will tell us if they become unnecessary. Many already
have.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This commit adds the OPTIONAL .realm attribute to Message
(and ArchivedMessage), with the server changes for making new Messages
have this set. Old Messages still have to be migrated to backfill this,
before it can be non-nullable.
Appropriate test changes to correctly set .realm for Messages the tests
manually create are included here as well.
This commit sets can_remove_subscribers_group to admins system
group while creating streams as it will be the default value
of this setting. In further we would provide an option to set
value of this setting to any user group while creating streams
using API or UI.
We change the import order to import UserGroup objects before
Stream such that we can set can_remove_subscribers_group correctly.
We do not import UserGroupMembership objects here along with
UserGroup since UserProfile objects are not imported and
GroupGroupMembership are also imported later as these are not
required before.
We do not need direct_members and direct_subgroups field of
UserGroup objects in the export data since we already have
UserGroupMembership and GroupGroupMembership object data.
While importing we keep these fields empty when creating
UserGroup objects and direct_members and direct_subgroups
fields will get set when UserGroupMembership and
GroupGroupMembership objects are created.
This change will also help us in further changes when we
will change the order of importing to import UserGroup
objects just after Realm objects.
Technically recipient_id cannot be None when recipient exists. We
actually just want to check if the recipient exists.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This code is actually a noop (and would be a bug if it wasn't a noop),
because when this runs the server is already initialized, meaning the
internal realm exists and the system bots have been created, so
UserProfile.objects.filter(email=email) is always truthy. Also, system
bots are supposed to live in the internal realm, not in the realm being
imported so this code doesn't make sense currently.
We now use FULL_MEMBERS_GROUP_NAME instead of
writing the actual full members system group
name at multiple places, so that we can have
all the group names coded at one place only.
Previously, we had a function named create_add_users_to_system_user_groups
for creating system user groups and adding users to them in case when
exports do not contain these groups when importing from other services.
This commit just separates out the call to create_system_user_groups_for_realm
outside the function and the function is thus renamed to
add_users_to_system_user_group. This change is done because in further
commits we would need to update the import order and user groups will
be created before creating user profile objects.
`_cache` is not an attribute defined on `BaseCache`, but an
implementation detail of django_bmemcache.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This matches the metadata that we store in the database, and means
that the S3 metadatata invariant of always having a `user_profile_id`
in the metadata.
This does not fix existing imports, which may still have missing
`user_profile_id`s.
There can be cases when system groups data is not present while
importing, like when importing from other products, so this
commit adds code to create system user groups and add users to
it according to their role.
Sometimes we may get data to import, due to export bugs, malformed data
etc., which doesn't have the invariant of RealmEmoji.author always being
set. The import code should fix that, by choosing a reasonable default
and setting it.
It is confusing to have the plan type constants not be namespaced
by the thing they represent. We already have a namespacing
convention in place for constants, so we should use it for
Realm.plan_type as well.
This commit renames members field of UserGroup to direct_members
for better readability because in the new permissions model, a
user group can be a sub-group of another group and thus technically
members of sub-group will also be members of that group.
This is a prep commit for new permissions model.
Extracted this commit from #19866.
Co-authored-by: Anders Kaseorg <anders@zulip.com>
Because we create all realms with do_create_user (including in the
test suite), we just need to change that function, add a migration for
existing realms, and ensure the data import code path correctly
creates these objects.
Note that the import code path will create a RealmUserDefault row with
default values if it is not present in the import data, which is
important for importing data from other tools like Slack.
This way, we no longer have to manually keep the upload path code in
sync with the upload path code in zerver/lib/upload.py.
This was originally suggested in
https://github.com/zulip/zulip/pull/19478#issuecomment-911479530.
This change fixes a bug when importing into a server using the local
file uploads backend, where the `import_realm.py` copy wasn't using
our standard 256-directory approach to avoid putting too many files in
a single directory.