Cinder throws error creating incremental backup from parent in another project

Bug #1869746 reported by Rodrigo Barbieri on 2020-03-30
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Undecided
Rodrigo Barbieri
Ubuntu Cloud Archive
Status tracked in Victoria
Stein
Undecided
Unassigned
Train
Undecided
Unassigned
Ussuri
Undecided
Unassigned
Victoria
Undecided
Unassigned
cinder (Ubuntu)
Undecided
Unassigned

Bug Description

Environment: Ubuntu Bionic, OpenStack Train, Volume and Backup Driver Ceph

This bug was also reproduced on Queens.

Steps to reproduce:

1) Create a volume v1 as user demo
2) Create a full backup of v1 as user admin
3) Create an incremental backup of v1 as user demo
4) Result: Backup is created successfully, but error in logs. Logs do not show INFO message and parent_backup.num_dependent_backups is not incremented [0].

Expectation: Either no error successfully finishing method create_backup, or not creating backup at all, with backup entity set as Error (see below).

Using POSIX driver, the backup is not created successfully and backup entity remains in error state. The POSIX driver tries to read backup.parent_id before creating the backup therefore raises the error earlier.

Trace:

2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server [req-19fccda2-b765-4b68-9663-aeae5cf3002e 3bee8c02ecce4744af287b3b8dd40ca5 9370e1e898214d88bd9eda4f88eccba9 - f883c6315a004471b2ab1b04d8a111d9 f883c6315a004471b2ab1b04d8a111d9] Exception during message handling: cinder.exception.BackupNotFound: Backup 7a15b9ec-7648-483f-ad32-6074567eca2a could not be found.
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/backup/manager.py", line 447, in create_backup
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server backup.parent_id)
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/objects/base.py", line 352, in get_by_id
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server orm_obj = db.get_by_id(context, cls.model, id, *args, **kwargs)
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/db/api.py", line 1801, in get_by_id
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server return IMPL.get_by_id(context, model, id, *args, **kwargs)
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 189, in wrapper
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 7125, in get_by_id
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server return _GET_METHODS[model](context, id, *args, **kwargs)
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 189, in wrapper
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 5154, in backup_get
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server project_only=project_only)
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py", line 5165, in _backup_get
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server raise exception.BackupNotFound(backup_id=backup_id)
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server cinder.exception.BackupNotFound: Backup 7a15b9ec-7648-483f-ad32-6074567eca2a could not be found.
2020-03-30 14:53:21.341 25018 ERROR oslo_messaging.rpc.server

[0] https://github.com/openstack/cinder/blob/e79b98367bd72f0323258dc0506de28b31c4aa48/cinder/backup/manager.py#L448

====================================================================

[Impact]

Whenever an admin creates a backup for a tenant volume, all subsequent incremental backups will have a silent error during creation when linking the dependent backup to its parent because the code detects it has a parent (by listing backups from all projects) but is unable to find the DB entity while scoped to only its project. Upon deleting the child backup, it will hit another silent error (because of lack of link) and it will skip cleaning up the quota for the backup. The quota will then misrepresent the number and size of backups, and the discrepancy will increase as this error is hit for several backups. Ultimately, the user will hit a quota error with a lower number/size of backups than the actual quota limit.

The fix addressed the problem by changing the code that searches for the parent backup. The code will now only search for backups under the same project, so it will prevent the error from happening.

[Test case]

1) Reproducing the bug:

1a) Create a volume v1 as user demo
1b) Create a full backup b1 of v1 as user admin
1c) Create an incremental backup b2 of v1 as user demo
1d) Confirm backup b2 is created successfully
1e) Check log file "/var/log/cinder/cinder-backup.log" for error message below with b2's ID.

cinder.exception.BackupNotFound: Backup <b2_id> could not be found

2) Cleanup not necessary

3) Install package that contains fixed code

4) Confirm bug is fixed

4a) Repeat steps 1a to 1d, make sure to create a new volume and new backups from it.
4b) Check log file "/var/log/cinder/cinder-backup.log" for error messages related to new b2 ID. There shouldn't be any.

[Regression Potential]

The fix has already been validated in upstream CI with new functional tests that perform that specific workflow (however, checks API backup field instead of logs), as far back as Rocky release. The code changes affect the creation step of backups, and both the new and previously existing functional tests thoroughly test the creation step in different ways, therefore regressions with this patch are not expected.

[Other Info]

For releases older than Train, the fix for bug 1809323 needs to be included as a dependency.

description: updated

We discussed this bug today in Cinder's weekly meeting and got a lot of good feedback. See notes in [0]. The discussion also continued in the #openstack-cinder IRC channel.

There are many possible approaches to this problem, each with its own impacts and drawbacks.

Among the suggested solutions are:
a) have all backups created having the same project as the volume
b) have all incremental backups created having the same project as its parent (in that case it disappears from the user's view?)

Among the impact of such solutions are:
a) what happens to the backups after a volume transfer is performed
b) backups should be semantically separated from volumes. Coupling them together makes it similar to snapshots
c) quota consumption when someone else is taking backups on user's behalf
d) source volume or parent backup is deleted, but it is in another project, how it impacts the incremental backup and the data
e) what is the impact for backup drivers

This is very complex and needs more analysis, we will further discuss this.

[0] http://eavesdrop.openstack.org/meetings/cinder/2020/cinder.2020-04-01-14.00.log.html#l-111

also, alternative solution c) keep completely separate trees of backups per project, which means we would deny incremental backups if the parent is not the same project as the user invoking.

This is most similar to what we have today, just would need to add an extra validation.

(wish I could edit posts)

Also forgot to mention two items of concern for any of the solutions mentioned: backwards compatibility and API changes (for the possibility of backporting the fix).

Changed in cinder:
status: New → Confirmed

After debugging this a bit more and thanks to Rajat's investigation. The problem is the difference between the API using elevated (admin role) and manager using the user role.

Looks like, there is not reason reason why we are using elevated context on the API code (checked into the DB) but removing the elevated may lead to some other broken functionality that worked previously (in general this could affect other cases).

tags: added: sts
Changed in cinder:
status: Confirmed → In Progress
assignee: nobody → Rodrigo Barbieri (rodrigo-barbieri2010)
Changed in cinder:
milestone: none → victoria-1

Reviewed: https://review.opendev.org/720833
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=8ebeafcbbafb700052f3abfc1f7ba004269a92e8
Submitter: Zuul
Branch: master

commit 8ebeafcbbafb700052f3abfc1f7ba004269a92e8
Author: Rodrigo Barbieri <email address hidden>
Date: Fri Apr 17 17:54:51 2020 -0300

    Fix cross-project incremental backups

    This patch addresses the scenario where an
    incremental backup can be created having a
    parent backup that was taken in a different
    project. This scenario ultimately leads to
    a silent error when creating/deleting the
    incremental backup and quotas going out of
    sync.

    The proposed fix is to narrow the backup search
    down to the same project. To achieve this, a
    method's signature had to be updated to achieve
    the desired optimized behavior of passing the
    volume's project_id parameter.

    Closes-bug: #1869746
    Closes-bug: #1873518

    Change-Id: Icb621ff5966133f59d9d43ca2dd9f8e1919b1149

Changed in cinder:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/738943
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=5358c996b40bbb0fd749ec182ff3ef67f5d71c16
Submitter: Zuul
Branch: stable/ussuri

commit 5358c996b40bbb0fd749ec182ff3ef67f5d71c16
Author: Rodrigo Barbieri <email address hidden>
Date: Wed Jul 1 18:20:15 2020 -0300

    Fix cross-project incremental backups

    This patch addresses the scenario where an
    incremental backup can be created having a
    parent backup that was taken in a different
    project. This scenario ultimately leads to
    a silent error when creating/deleting the
    incremental backup and quotas going out of
    sync.

    The proposed fix is to narrow the backup search
    down to the same project. To achieve this, a
    method's signature had to be updated to achieve
    the desired optimized behavior of passing the
    volume's project_id parameter.

    Closes-bug: #1869746
    Closes-bug: #1873518

    Change-Id: Icb621ff5966133f59d9d43ca2dd9f8e1919b1149
    (cherry picked from commit 8ebeafcbbafb700052f3abfc1f7ba004269a92e8)

tags: added: in-stable-ussuri

Reviewed: https://review.opendev.org/739154
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=f3cdc27563766b2f49a1d88500f32c8d86838c96
Submitter: Zuul
Branch: stable/train

commit f3cdc27563766b2f49a1d88500f32c8d86838c96
Author: Rodrigo Barbieri <email address hidden>
Date: Wed Jul 1 18:20:15 2020 -0300

    Fix cross-project incremental backups

    This patch addresses the scenario where an
    incremental backup can be created having a
    parent backup that was taken in a different
    project. This scenario ultimately leads to
    a silent error when creating/deleting the
    incremental backup and quotas going out of
    sync.

    The proposed fix is to narrow the backup search
    down to the same project. To achieve this, a
    method's signature had to be updated to achieve
    the desired optimized behavior of passing the
    volume's project_id parameter.

    Closes-bug: #1869746
    Closes-bug: #1873518

    conflicts:
        cinder/tests/unit/backup/test_backup.py

    Test test_create_backup_failed_with_empty_backup_objects
    required a mock for
    test_create_backup_failed_with_empty_backup_objects
    which had been removed in ussuri, as part of commit
    f0211b53b82cde96eb39e9914c3e5fae232db07c. That
    commit removed code and leftover the mock, which I
    had removed in the change this patch was
    cherry-picked from, as it was unnecessary.

    The removed code section of cinder/backup/api.py
    from commit f0211b53b82cde96eb39e9914c3e5fae232db07c
    does not impact the functionality of this change at all.

    Change-Id: Icb621ff5966133f59d9d43ca2dd9f8e1919b1149
    (cherry picked from commit 8ebeafcbbafb700052f3abfc1f7ba004269a92e8)
    (cherry picked from commit 5358c996b40bbb0fd749ec182ff3ef67f5d71c16)

tags: added: in-stable-train

Reviewed: https://review.opendev.org/739607
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=5983eee1c35ababa1b6e2bad347cf98bd79b226f
Submitter: Zuul
Branch: stable/stein

commit 5983eee1c35ababa1b6e2bad347cf98bd79b226f
Author: Rodrigo Barbieri <email address hidden>
Date: Wed Jul 1 18:20:15 2020 -0300

    Fix cross-project incremental backups

    This patch addresses the scenario where an
    incremental backup can be created having a
    parent backup that was taken in a different
    project. This scenario ultimately leads to
    a silent error when creating/deleting the
    incremental backup and quotas going out of
    sync.

    The proposed fix is to narrow the backup search
    down to the same project. To achieve this, a
    method's signature had to be updated to achieve
    the desired optimized behavior of passing the
    volume's project_id parameter.

    Closes-bug: #1869746
    Closes-bug: #1873518

    Change-Id: Icb621ff5966133f59d9d43ca2dd9f8e1919b1149
    (cherry picked from commit 8ebeafcbbafb700052f3abfc1f7ba004269a92e8)
    (cherry picked from commit 5358c996b40bbb0fd749ec182ff3ef67f5d71c16)
    (cherry picked from commit f3cdc27563766b2f49a1d88500f32c8d86838c96)

tags: added: in-stable-stein

Reviewed: https://review.opendev.org/742221
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=c4de3a8c838c4053ab60964cee53c6fa5e062d73
Submitter: Zuul
Branch: stable/rocky

commit c4de3a8c838c4053ab60964cee53c6fa5e062d73
Author: Rodrigo Barbieri <email address hidden>
Date: Wed Jul 1 18:20:15 2020 -0300

    Fix cross-project incremental backups

    This patch addresses the scenario where an
    incremental backup can be created having a
    parent backup that was taken in a different
    project. This scenario ultimately leads to
    a silent error when creating/deleting the
    incremental backup and quotas going out of
    sync.

    The proposed fix is to narrow the backup search
    down to the same project. To achieve this, a
    method's signature had to be updated to achieve
    the desired optimized behavior of passing the
    volume's project_id parameter.

    Closes-bug: #1869746
    Closes-bug: #1873518

    Change-Id: Icb621ff5966133f59d9d43ca2dd9f8e1919b1149
    (cherry picked from commit 8ebeafcbbafb700052f3abfc1f7ba004269a92e8)
    (cherry picked from commit 5358c996b40bbb0fd749ec182ff3ef67f5d71c16)
    (cherry picked from commit f3cdc27563766b2f49a1d88500f32c8d86838c96)
    (cherry picked from commit 5983eee1c35ababa1b6e2bad347cf98bd79b226f)

tags: added: in-stable-rocky
description: updated
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers