Bug #1860913 “Instance uses base image file when it is rebooted ...” : Bugs : Cinder

Sylvain Bauza (sylvain-bauza) on 2020-04-16

tags:

added: volumes

Revision history for this message

Sylvain Bauza (sylvain-bauza) wrote on 2020-04-16:

#1

By definition, a snapshot is an immutable object that shouldn't modified after being created.
I'm surprised you say you can see some writes to the snapshot just after creating it. How do you know this ? By looking at the QEMU command line or lsof/strace() it ?

To be clear, I think it's an invalid bug if you want to write the snapshot, but if you found some way to write the snapshot, then it could be a bug.

Punting the status to Invalid, but please modify the bug status back to New once you reply.

Changed in nova:
status:	New → Incomplete

Revision history for this message

Takashi Kajinami (kajinamit) wrote on 2020-04-16:

#2

We can observe that the instance uses snapshot image by instance xml, after you stop and start the instance. (Of cause, you can identify it by lsof)
Please refer the downstream bug[1] which includes more detailed observation.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1757691#c1

The problem is that bdm is not updated since the initiall attacmhent and it has still a pointer to the base image after taking snapshot. When nova regenerate xml when launching instance, it refers information in bdm, and it results in such a incorrect rollback.

Changed in nova:
status:	Incomplete → New

Revision history for this message

Sylvain Bauza (sylvain-bauza) wrote on 2020-04-17:

#3

This looks a valid bug then. We could alrady have other upstream bugs that refer to the same problem so this one would be then closed as a duplicate but I'd love the person handling the bug to confirm and close/dup this LP bug to the right one.

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Medium
tags:	added: cinder nfs
tags:	added: snapshot

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2020-04-17:

#4

This came up a while ago in bug #1304695 and was resolved at the time by the following change:

libvirt: Refresh volume connection_info after volume snapshot
https://review.opendev.org/#/c/87432/

I assume with cinderv3 this isn't causing a refresh of the actual connection_info within the attachment in the nfs c-vol backend.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-17: Fix proposed to nova (master)

#5

Fix proposed to branch: master
Review: https://review.opendev.org/720769

Changed in nova:
assignee:	nobody → Lee Yarwood (lyarwood)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-21: Change abandoned on nova (master)

#6

Change abandoned by Lee Yarwood (<email address hidden>) on branch: master
Review: https://review.opendev.org/720769
Reason: We can't fix this in n-cpu without an idempotent connection_info refresh API, the NFS c-vol driver can however fix this by updating the stored connection_info allowing the current attachment_show method of refreshing connection_info to work.

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2020-04-21:

#7

I've just closed out the openstack/nova change as this isn't fixable on the n-cpu side at the moment without an idempotent connection_info refresh API.

We can however fix this in openstack/cinder by forcing the NFS c-vol driver to update the saved connection_info during the snapshot, allowing n-cpu's call to attachment_show to update the stored connection_info within the BDM.

affects:	nova → cinder
Changed in cinder:
status:	In Progress → New
assignee:	Lee Yarwood (lyarwood) → nobody

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-22: Related fix proposed to cinder (master)

#8

Related fix proposed to branch: master
Review: https://review.opendev.org/722081

Revision history for this message

pavera (pavera-b) wrote on 2020-08-24:

#9

So, would the proposed fixes on this bug actually fix it? Shouldn't the NFS driver just update the DB since it operates differently than most other drivers? Seems like it would be better to have the state in the DB (which should be the source of truth) be accurate? Re-introducing this bug seems inevitable as long as that DB connection_info is incorrect buy available for use.

Lee Yarwood (lyarwood) on 2020-09-21

Changed in nova:
importance:	Undecided → Medium
assignee:	nobody → Lee Yarwood (lyarwood)
Changed in cinder:
assignee:	nobody → Lee Yarwood (lyarwood)

Lee Yarwood (lyarwood) on 2020-09-22

Changed in cinder:
status:	New → In Progress
Changed in nova:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-05: Change abandoned on cinder (master)

#10

Change abandoned by "Lee Yarwood <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/722081

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-05: Change abandoned on nova (master)

#11

Change abandoned by "Lee Yarwood <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/nova/+/720769

Lee Yarwood (lyarwood) on 2021-07-06

no longer affects:	nova
Changed in cinder:
assignee:	Lee Yarwood (lyarwood) → nobody

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-07: Fix proposed to cinder (master)

#12

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/799843

Gorka Eguileor (gorka) on 2021-07-07

Changed in cinder:
assignee:	nobody → Gorka Eguileor (gorka)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-22: Fix merged to cinder (master)

#13

Reviewed: https://review.opendev.org/c/openstack/cinder/+/799843
Committed: https://opendev.org/openstack/cinder/commit/25eb0a7d76922e3a1a289d26c36b96a91c4059db
Submitter: "Zuul (22348)"
Branch: master

commit 25eb0a7d76922e3a1a289d26c36b96a91c4059db
Author: Gorka Eguileor <email address hidden>
Date: Tue Jul 6 13:35:44 2021 +0200

NFS: Update connection info on online snap create

    The NFS snapshot creation for an attached volume requires interaction
    between Nova and Cinder, and a new qcow2 file is used after the
    attachment completes.

    This means that the connection properties stored in the Attachment is no
    longer valid, as it is pointing to the old qcow2 file, so if Nova tries
    to use that attachment it will start writing on the old qcow2 file.

A flow showing this issue is:

    - Attach NFS volume
    - Create snapshot
    - Hard reboot

After that the VM will start using the base image, breaking the qcow2
chain.

If we delete the snapshot in the meantime, then the VM will fail to
reboot.

This patch fixes this inconsistency by updating the connection info
field inside the remotefs driver.

    We usually prefer that drivers don't to touch the DB, directly or
    indirectly (using OVOs), but in this case we are using OVOs methods
    instead of the usual model update on the volume manager because there
    are cases in the driver where a snapshot is created (cloning via
    snapshot) and we have to update the attachment without the manager, as
    it is unaware that a temporary snapshot is being created.

Besides that main reason there are other less critical reasons to do the
attachment update in the driver:

    - Smaller code change
    - Easier to backport
    - Limit change impact on other areas (better for backport)
    - The snapshot_create model update code in the manager does not support
      updating attachments.
    - There are cases in the cinder volume manager where the model update
      values returned by snapshot_create are not being applied.

    Snapshot deletion belonging to in-use volumes are not affected by this
    issue because we only do block commit when the snapshot file we are
    deleting is not the active file. In _delete_snapshot_online:

if utils.paths_normcase_equal(info['active_file'],
info['snapshot_file']):

Closes-Bug: #1860913
Change-Id: I62fcef3169dcb9f4363a5344af4b2711edfef632

Reviewed:  https://review.opendev.org/c/openstack/cinder/+/799843
Committed: https://opendev.org/openstack/cinder/commit/25eb0a7d76922e3a1a289d26c36b96a91c4059db
Submitter: "Zuul (22348)"
Branch:    master

commit 25eb0a7d76922e3a1a289d26c36b96a91c4059db
Author: Gorka Eguileor <geguileo@redhat.com>
Date:   Tue Jul 6 13:35:44 2021 +0200

NFS: Update connection info on online snap create
    
    The NFS snapshot creation for an attached volume requires interaction
    between Nova and Cinder, and a new qcow2 file is used after the
    attachment completes.
    
    This means that the connection properties stored in the Attachment is no
    longer valid, as it is pointing to the old qcow2 file, so if Nova tries
    to use that attachment it will start writing on the old qcow2 file.
    
    A flow showing this issue is:
    
    - Attach NFS volume
    - Create snapshot
    - Hard reboot
    
    After that the VM will start using the base image, breaking the qcow2
    chain.
    
    If we delete the snapshot in the meantime, then the VM will fail to
    reboot.
    
    This patch fixes this inconsistency by updating the connection info
    field inside the remotefs driver.
    
    We usually prefer that drivers don't to touch the DB, directly or
    indirectly (using OVOs), but in this case we are using OVOs methods
    instead of the usual model update on the volume manager because there
    are cases in the driver where a snapshot is created (cloning via
    snapshot) and we have to update the attachment without the manager, as
    it is unaware that a temporary snapshot is being created.
    
    Besides that main reason there are other less critical reasons to do the
    attachment update in the driver:
    
    - Smaller code change
    - Easier to backport
    - Limit change impact on other areas (better for backport)
    - The snapshot_create model update code in the manager does not support
      updating attachments.
    - There are cases in the cinder volume manager where the model update
      values returned by snapshot_create are not being applied.
    
    Snapshot deletion belonging to in-use volumes are not affected by this
    issue because we only do block commit when the snapshot file we are
    deleting is not the active file.  In _delete_snapshot_online:
    
            if utils.paths_normcase_equal(info['active_file'],
                                          info['snapshot_file']):
    
    Closes-Bug: #1860913
    Change-Id: I62fcef3169dcb9f4363a5344af4b2711edfef632

Changed in cinder:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-23: Fix proposed to cinder (stable/wallaby)

#14

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/cinder/+/802036

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-23: Fix proposed to cinder (stable/victoria)

#15

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/cinder/+/802039

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-23: Fix proposed to cinder (stable/ussuri)

#16

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/cinder/+/802045

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-23: Fix proposed to cinder (stable/train)

#17

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/cinder/+/802048

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-30: Fix merged to cinder (stable/wallaby)

#18

Reviewed: https://review.opendev.org/c/openstack/cinder/+/802036
Committed: https://opendev.org/openstack/cinder/commit/d4960fd597aad8502cad46b38cdb50c6b88f7c63
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit d4960fd597aad8502cad46b38cdb50c6b88f7c63
Author: Gorka Eguileor <email address hidden>
Date: Tue Jul 6 13:35:44 2021 +0200