Instance uses base image file when it is rebooted after snapshot creation if cinder nfs backend is used

Bug #1860913 reported by Takashi Kajinami
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Medium
Gorka Eguileor

Bug Description

Description
===========
When we use nfs backend in cinder and attach a cinder volume to an instance, the instance access to the file in nfs share, which is named like volume-<volume id>.

When the volume is attached to an instance and we take snapshot with "openstack volume snapshot create <volume> --force", it will create the following 3 files in nfs share.

 (1) volume-<volume id>
   base image freezed when taking snapshot

 (2) volume-<volume id>-<snapshot id>
  diff image where instance should write into after taking snapshot

 (3) volume-<volume id>.info
   json file to manage active snapshot

As described above, after taking snapshot, the instance should write into (2) volume-<volume id>-<snapshot id> .
It works just after taking snapshot, but if we stop and start the instance, the instance starts to write into (1) volume-<volume id>, which it should not modify.

Steps to reproduce
==================
1. Create a volume in cinder nfs backend
2. Create a bfv instance with the volume
3. Take snapshot of the volume
4. Stop and Start the instance

Expected result
===============
The instance keeps writing into volume-<volume id>-<snapshot id>

Actual result
=============
The instance writes into volume-<volume id>

Environment
===========
I reproduced the issue with Queens release with
 nova: libvirt driver
 cinder: nfs backed, with nfs_snapshot_support=True

As far as I see the implementation about file path handling, I don't see any changes in the way how we handle disk file path for nfs backend, so the problem should be reproduced with master.

Logs & Configs
==============
N/A

tags: added: volumes
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

By definition, a snapshot is an immutable object that shouldn't modified after being created.
I'm surprised you say you can see some writes to the snapshot just after creating it. How do you know this ? By looking at the QEMU command line or lsof/strace() it ?

To be clear, I think it's an invalid bug if you want to write the snapshot, but if you found some way to write the snapshot, then it could be a bug.

Punting the status to Invalid, but please modify the bug status back to New once you reply.

Changed in nova:
status: New → Incomplete
Revision history for this message
Takashi Kajinami (kajinamit) wrote :

We can observe that the instance uses snapshot image by instance xml, after you stop and start the instance. (Of cause, you can identify it by lsof)
Please refer the downstream bug[1] which includes more detailed observation.
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1757691#c1

The problem is that bdm is not updated since the initiall attacmhent and it has still a pointer to the base image after taking snapshot. When nova regenerate xml when launching instance, it refers information in bdm, and it results in such a incorrect rollback.

Changed in nova:
status: Incomplete → New
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

This looks a valid bug then. We could alrady have other upstream bugs that refer to the same problem so this one would be then closed as a duplicate but I'd love the person handling the bug to confirm and close/dup this LP bug to the right one.

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
tags: added: cinder nfs
tags: added: snapshot
Revision history for this message
Lee Yarwood (lyarwood) wrote :

This came up a while ago in bug #1304695 and was resolved at the time by the following change:

libvirt: Refresh volume connection_info after volume snapshot
https://review.opendev.org/#/c/87432/

I assume with cinderv3 this isn't causing a refresh of the actual connection_info within the attachment in the nfs c-vol backend.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/720769

Changed in nova:
assignee: nobody → Lee Yarwood (lyarwood)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Lee Yarwood (<email address hidden>) on branch: master
Review: https://review.opendev.org/720769
Reason: We can't fix this in n-cpu without an idempotent connection_info refresh API, the NFS c-vol driver can however fix this by updating the stored connection_info allowing the current attachment_show method of refreshing connection_info to work.

Revision history for this message
Lee Yarwood (lyarwood) wrote :

I've just closed out the openstack/nova change as this isn't fixable on the n-cpu side at the moment without an idempotent connection_info refresh API.

We can however fix this in openstack/cinder by forcing the NFS c-vol driver to update the saved connection_info during the snapshot, allowing n-cpu's call to attachment_show to update the stored connection_info within the BDM.

affects: nova → cinder
Changed in cinder:
status: In Progress → New
assignee: Lee Yarwood (lyarwood) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/722081

Revision history for this message
pavera (pavera-b) wrote :

So, would the proposed fixes on this bug actually fix it? Shouldn't the NFS driver just update the DB since it operates differently than most other drivers? Seems like it would be better to have the state in the DB (which should be the source of truth) be accurate? Re-introducing this bug seems inevitable as long as that DB connection_info is incorrect buy available for use.

Lee Yarwood (lyarwood)
Changed in nova:
importance: Undecided → Medium
assignee: nobody → Lee Yarwood (lyarwood)
Changed in cinder:
assignee: nobody → Lee Yarwood (lyarwood)
Lee Yarwood (lyarwood)
Changed in cinder:
status: New → In Progress
Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by "Lee Yarwood <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/722081

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by "Lee Yarwood <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/nova/+/720769

Lee Yarwood (lyarwood)
no longer affects: nova
Changed in cinder:
assignee: Lee Yarwood (lyarwood) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/799843

Gorka Eguileor (gorka)
Changed in cinder:
assignee: nobody → Gorka Eguileor (gorka)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/799843
Committed: https://opendev.org/openstack/cinder/commit/25eb0a7d76922e3a1a289d26c36b96a91c4059db
Submitter: "Zuul (22348)"
Branch: master

commit 25eb0a7d76922e3a1a289d26c36b96a91c4059db
Author: Gorka Eguileor <email address hidden>
Date: Tue Jul 6 13:35:44 2021 +0200

    NFS: Update connection info on online snap create

    The NFS snapshot creation for an attached volume requires interaction
    between Nova and Cinder, and a new qcow2 file is used after the
    attachment completes.

    This means that the connection properties stored in the Attachment is no
    longer valid, as it is pointing to the old qcow2 file, so if Nova tries
    to use that attachment it will start writing on the old qcow2 file.

    A flow showing this issue is:

    - Attach NFS volume
    - Create snapshot
    - Hard reboot

    After that the VM will start using the base image, breaking the qcow2
    chain.

    If we delete the snapshot in the meantime, then the VM will fail to
    reboot.

    This patch fixes this inconsistency by updating the connection info
    field inside the remotefs driver.

    We usually prefer that drivers don't to touch the DB, directly or
    indirectly (using OVOs), but in this case we are using OVOs methods
    instead of the usual model update on the volume manager because there
    are cases in the driver where a snapshot is created (cloning via
    snapshot) and we have to update the attachment without the manager, as
    it is unaware that a temporary snapshot is being created.

    Besides that main reason there are other less critical reasons to do the
    attachment update in the driver:

    - Smaller code change
    - Easier to backport
    - Limit change impact on other areas (better for backport)
    - The snapshot_create model update code in the manager does not support
      updating attachments.
    - There are cases in the cinder volume manager where the model update
      values returned by snapshot_create are not being applied.

    Snapshot deletion belonging to in-use volumes are not affected by this
    issue because we only do block commit when the snapshot file we are
    deleting is not the active file. In _delete_snapshot_online:

            if utils.paths_normcase_equal(info['active_file'],
                                          info['snapshot_file']):

    Closes-Bug: #1860913
    Change-Id: I62fcef3169dcb9f4363a5344af4b2711edfef632

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/cinder/+/802036

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/cinder/+/802039

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/cinder/+/802045

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/cinder/+/802048

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/802036
Committed: https://opendev.org/openstack/cinder/commit/d4960fd597aad8502cad46b38cdb50c6b88f7c63
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit d4960fd597aad8502cad46b38cdb50c6b88f7c63
Author: Gorka Eguileor <email address hidden>
Date: Tue Jul 6 13:35:44 2021 +0200

    NFS: Update connection info on online snap create

    The NFS snapshot creation for an attached volume requires interaction
    between Nova and Cinder, and a new qcow2 file is used after the
    attachment completes.

    This means that the connection properties stored in the Attachment is no
    longer valid, as it is pointing to the old qcow2 file, so if Nova tries
    to use that attachment it will start writing on the old qcow2 file.

    A flow showing this issue is:

    - Attach NFS volume
    - Create snapshot
    - Hard reboot

    After that the VM will start using the base image, breaking the qcow2
    chain.

    If we delete the snapshot in the meantime, then the VM will fail to
    reboot.

    This patch fixes this inconsistency by updating the connection info
    field inside the remotefs driver.

    We usually prefer that drivers don't to touch the DB, directly or
    indirectly (using OVOs), but in this case we are using OVOs methods
    instead of the usual model update on the volume manager because there
    are cases in the driver where a snapshot is created (cloning via
    snapshot) and we have to update the attachment without the manager, as
    it is unaware that a temporary snapshot is being created.

    Besides that main reason there are other less critical reasons to do the
    attachment update in the driver:

    - Smaller code change
    - Easier to backport
    - Limit change impact on other areas (better for backport)
    - The snapshot_create model update code in the manager does not support
      updating attachments.
    - There are cases in the cinder volume manager where the model update
      values returned by snapshot_create are not being applied.

    Snapshot deletion belonging to in-use volumes are not affected by this
    issue because we only do block commit when the snapshot file we are
    deleting is not the active file. In _delete_snapshot_online:

            if utils.paths_normcase_equal(info['active_file'],
                                          info['snapshot_file']):

    Closes-Bug: #1860913
    Change-Id: I62fcef3169dcb9f4363a5344af4b2711edfef632
    (cherry picked from commit 25eb0a7d76922e3a1a289d26c36b96a91c4059db)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/802039
Committed: https://opendev.org/openstack/cinder/commit/e7c5e7a7f6f35b2db043cc5ed4a0ef1f77e1f830
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit e7c5e7a7f6f35b2db043cc5ed4a0ef1f77e1f830
Author: Gorka Eguileor <email address hidden>
Date: Tue Jul 6 13:35:44 2021 +0200

    NFS: Update connection info on online snap create

    The NFS snapshot creation for an attached volume requires interaction
    between Nova and Cinder, and a new qcow2 file is used after the
    attachment completes.

    This means that the connection properties stored in the Attachment is no
    longer valid, as it is pointing to the old qcow2 file, so if Nova tries
    to use that attachment it will start writing on the old qcow2 file.

    A flow showing this issue is:

    - Attach NFS volume
    - Create snapshot
    - Hard reboot

    After that the VM will start using the base image, breaking the qcow2
    chain.

    If we delete the snapshot in the meantime, then the VM will fail to
    reboot.

    This patch fixes this inconsistency by updating the connection info
    field inside the remotefs driver.

    We usually prefer that drivers don't to touch the DB, directly or
    indirectly (using OVOs), but in this case we are using OVOs methods
    instead of the usual model update on the volume manager because there
    are cases in the driver where a snapshot is created (cloning via
    snapshot) and we have to update the attachment without the manager, as
    it is unaware that a temporary snapshot is being created.

    Besides that main reason there are other less critical reasons to do the
    attachment update in the driver:

    - Smaller code change
    - Easier to backport
    - Limit change impact on other areas (better for backport)
    - The snapshot_create model update code in the manager does not support
      updating attachments.
    - There are cases in the cinder volume manager where the model update
      values returned by snapshot_create are not being applied.

    Snapshot deletion belonging to in-use volumes are not affected by this
    issue because we only do block commit when the snapshot file we are
    deleting is not the active file. In _delete_snapshot_online:

            if utils.paths_normcase_equal(info['active_file'],
                                          info['snapshot_file']):

    Closes-Bug: #1860913
    Change-Id: I62fcef3169dcb9f4363a5344af4b2711edfef632
    (cherry picked from commit 25eb0a7d76922e3a1a289d26c36b96a91c4059db)
    (cherry picked from commit d4960fd597aad8502cad46b38cdb50c6b88f7c63)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/802045
Committed: https://opendev.org/openstack/cinder/commit/a5e44127ffceec55c3e98a46ebe3686e34b0c971
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit a5e44127ffceec55c3e98a46ebe3686e34b0c971
Author: Gorka Eguileor <email address hidden>
Date: Tue Jul 6 13:35:44 2021 +0200

    NFS: Update connection info on online snap create

    The NFS snapshot creation for an attached volume requires interaction
    between Nova and Cinder, and a new qcow2 file is used after the
    attachment completes.

    This means that the connection properties stored in the Attachment is no
    longer valid, as it is pointing to the old qcow2 file, so if Nova tries
    to use that attachment it will start writing on the old qcow2 file.

    A flow showing this issue is:

    - Attach NFS volume
    - Create snapshot
    - Hard reboot

    After that the VM will start using the base image, breaking the qcow2
    chain.

    If we delete the snapshot in the meantime, then the VM will fail to
    reboot.

    This patch fixes this inconsistency by updating the connection info
    field inside the remotefs driver.

    We usually prefer that drivers don't to touch the DB, directly or
    indirectly (using OVOs), but in this case we are using OVOs methods
    instead of the usual model update on the volume manager because there
    are cases in the driver where a snapshot is created (cloning via
    snapshot) and we have to update the attachment without the manager, as
    it is unaware that a temporary snapshot is being created.

    Besides that main reason there are other less critical reasons to do the
    attachment update in the driver:

    - Smaller code change
    - Easier to backport
    - Limit change impact on other areas (better for backport)
    - The snapshot_create model update code in the manager does not support
      updating attachments.
    - There are cases in the cinder volume manager where the model update
      values returned by snapshot_create are not being applied.

    Snapshot deletion belonging to in-use volumes are not affected by this
    issue because we only do block commit when the snapshot file we are
    deleting is not the active file. In _delete_snapshot_online:

            if utils.paths_normcase_equal(info['active_file'],
                                          info['snapshot_file']):

    Conflicts:
        cinder/tests/unit/volume/drivers/test_remotefs.py
    Resolved conflict introduced mainly by the encryption support fixes
    (see I896f70d204ad103e968ab242ba9045ca984827c4)

    Closes-Bug: #1860913
    Change-Id: I62fcef3169dcb9f4363a5344af4b2711edfef632
    (cherry picked from commit 25eb0a7d76922e3a1a289d26c36b96a91c4059db)
    (cherry picked from commit d4960fd597aad8502cad46b38cdb50c6b88f7c63)
    (cherry picked from commit e7c5e7a7f6f35b2db043cc5ed4a0ef1f77e1f830)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/train)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/802048
Committed: https://opendev.org/openstack/cinder/commit/3df6646364ce0aaee514b9cddd79b68ba47634e1
Submitter: "Zuul (22348)"
Branch: stable/train

commit 3df6646364ce0aaee514b9cddd79b68ba47634e1
Author: Gorka Eguileor <email address hidden>
Date: Tue Jul 6 13:35:44 2021 +0200

    NFS: Update connection info on online snap create

    The NFS snapshot creation for an attached volume requires interaction
    between Nova and Cinder, and a new qcow2 file is used after the
    attachment completes.

    This means that the connection properties stored in the Attachment is no
    longer valid, as it is pointing to the old qcow2 file, so if Nova tries
    to use that attachment it will start writing on the old qcow2 file.

    A flow showing this issue is:

    - Attach NFS volume
    - Create snapshot
    - Hard reboot

    After that the VM will start using the base image, breaking the qcow2
    chain.

    If we delete the snapshot in the meantime, then the VM will fail to
    reboot.

    This patch fixes this inconsistency by updating the connection info
    field inside the remotefs driver.

    We usually prefer that drivers don't to touch the DB, directly or
    indirectly (using OVOs), but in this case we are using OVOs methods
    instead of the usual model update on the volume manager because there
    are cases in the driver where a snapshot is created (cloning via
    snapshot) and we have to update the attachment without the manager, as
    it is unaware that a temporary snapshot is being created.

    Besides that main reason there are other less critical reasons to do the
    attachment update in the driver:

    - Smaller code change
    - Easier to backport
    - Limit change impact on other areas (better for backport)
    - The snapshot_create model update code in the manager does not support
      updating attachments.
    - There are cases in the cinder volume manager where the model update
      values returned by snapshot_create are not being applied.

    Snapshot deletion belonging to in-use volumes are not affected by this
    issue because we only do block commit when the snapshot file we are
    deleting is not the active file. In _delete_snapshot_online:

            if utils.paths_normcase_equal(info['active_file'],
                                          info['snapshot_file']):

    Conflicts:
        cinder/tests/unit/volume/drivers/test_remotefs.py
    Resolved conflict introduced mainly by the encryption support fixes
    (see I896f70d204ad103e968ab242ba9045ca984827c4)

    Closes-Bug: #1860913
    Change-Id: I62fcef3169dcb9f4363a5344af4b2711edfef632
    (cherry picked from commit 25eb0a7d76922e3a1a289d26c36b96a91c4059db)
    (cherry picked from commit d4960fd597aad8502cad46b38cdb50c6b88f7c63)
    (cherry picked from commit e7c5e7a7f6f35b2db043cc5ed4a0ef1f77e1f830)
    (cherry picked from commit a5e44127ffceec55c3e98a46ebe3686e34b0c971)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 18.1.0

This issue was fixed in the openstack/cinder 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 19.0.0.0rc1

This issue was fixed in the openstack/cinder 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 16.4.1

This issue was fixed in the openstack/cinder 16.4.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 17.2.0

This issue was fixed in the openstack/cinder 17.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder train-eol

This issue was fixed in the openstack/cinder train-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.