failed compute node didn't delete instance's path directory in init_host
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
Low
|
Lee Yarwood | ||
| Queens |
Low
|
Lee Yarwood | ||
| Rocky |
Low
|
Lee Yarwood |
Bug Description
It will do clean for evacuated instances in method init_host in nova/compute/
while using volume-backend shared-storage (like ceph) , nova leaves instance path directory in failed compuet node.
The root cause is that we only passed argument "destroy_disks" to driver.destroy, the value will be true while using ceph.
then will not delete instance path .
This may lead live-migration instance failure with exeption DestinationDisk
Sean Dague (sdague) wrote : | #1 |
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → Low |
tags: | added: ceph |
Changed in nova: | |
assignee: | nobody → Bartosz Fic (bartosz-fic) |
tags: | added: live-migrate |
Changed in nova: | |
assignee: | Bartosz Fic (bartosz-fic) → nobody |
tags: |
added: live-migration removed: live-migrate |
Changed in nova: | |
assignee: | nobody → lvmxh (shaohef) |
Changed in nova: | |
assignee: | lvmxh (shaohef) → nobody |
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
assignee: | nobody → Lee Yarwood (lyarwood) |
status: | Confirmed → In Progress |
Timofey Durakov (tdurakov) wrote : | #3 |
This problem isn't affect live migration because there are 2 similar methods for checking shared storage, one described here is used during revert resize and cleanup compute after fail. So removing live-migrate tag.
P.S. this check duplication should be refactored.
tags: | removed: live-migration |
Timofey Durakov (tdurakov) wrote : | #4 |
oh, side effect for live-migration process.
tags: | added: live-migration |
Change abandoned by Lee Yarwood (<email address hidden>) on branch: master
Review: https:/
@Lee Yarwood: Are you still working on this?
Lee Yarwood (lyarwood) wrote : | #7 |
Yes but I see that @tdurakov also has a review up re-factoring much of this code [1]. I'll review this and reassign if required.
Augustina Ragwitz (auggy) wrote : | #8 |
I left a comment on that review to include this bug in that "closes bug" list. If that's inaccurate then please feel free to pipe up and clarify.
Sean Dague (sdague) wrote : | #9 |
There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.
Changed in nova: | |
status: | In Progress → Confirmed |
assignee: | Lee Yarwood (lyarwood) → nobody |
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
assignee: | nobody → Jeffrey Zhang (jeffrey4l) |
status: | Confirmed → In Progress |
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
assignee: | Jeffrey Zhang (jeffrey4l) → Lee Yarwood (lyarwood) |
Fix proposed to branch: stable/rocky
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit d6c1f6a1032ed2e
Author: Lee Yarwood <email address hidden>
Date: Mon Dec 3 09:03:26 2018 +0000
libvirt: Add workaround to cleanup instance dir when using rbd
At present all virt drivers provide a cleanup method that takes a single
destroy_disks boolean to indicate when the underlying storage of an
instance should be destroyed.
When cleaning up after an evacuation or revert resize the value of
destroy_disks is determined by the compute layer calling down both into
the check_instance_
and remote check_instance_
the host now running the instance.
For the Libvirt driver the initial local call will return None when
using the shared block RBD imagebackend as it is assumed all instance
storage is shared resulting in destroy_disks always being False when
cleaning up. This behaviour is wrong as the instance disks are stored
separately to the instance directory that still needs to be cleaned up
on the host. Additionally this directory could also be shared
independently of the disks on a NFS share for example and would need to
also be checked before removal.
This change introduces a backportable workaround configurable for the
Libvirt driver with which operators can ensure that the instance
directory is always removed during cleanup when using the RBD
imagebackend. When enabling this workaround operators will need to
ensure that the instance directories are not shared between computes.
Future work will allow for the removal of this workaround by separating
the shared storage checks from the compute to virt layers between the
actual instance disks and any additional storage required by the
specific virt backend.
Related-Bug: #1761062
Partial-Bug: #1414895
Change-Id: I8fd6b9f857a1c4
Fix proposed to branch: stable/queens
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit 8c678ae57299076
Author: Lee Yarwood <email address hidden>
Date: Mon Dec 3 09:03:26 2018 +0000
libvirt: Add workaround to cleanup instance dir when using rbd
At present all virt drivers provide a cleanup method that takes a single
destroy_disks boolean to indicate when the underlying storage of an
instance should be destroyed.
When cleaning up after an evacuation or revert resize the value of
destroy_disks is determined by the compute layer calling down both into
the check_instance_
and remote check_instance_
the host now running the instance.
For the Libvirt driver the initial local call will return None when
using the shared block RBD imagebackend as it is assumed all instance
storage is shared resulting in destroy_disks always being False when
cleaning up. This behaviour is wrong as the instance disks are stored
separately to the instance directory that still needs to be cleaned up
on the host. Additionally this directory could also be shared
independently of the disks on a NFS share for example and would need to
also be checked before removal.
This change introduces a backportable workaround configurable for the
Libvirt driver with which operators can ensure that the instance
directory is always removed during cleanup when using the RBD
imagebackend. When enabling this workaround operators will need to
ensure that the instance directories are not shared between computes.
Future work will allow for the removal of this workaround by separating
the shared storage checks from the compute to virt layers between the
actual instance disks and any additional storage required by the
specific virt backend.
Related-Bug: #1761062
Partial-Bug: #1414895
Change-Id: I8fd6b9f857a1c4
(cherry picked from commit d6c1f6a1032ed2e
tags: | added: in-stable-rocky |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/queens
commit b7bf1fbe4917c28
Author: Lee Yarwood <email address hidden>
Date: Mon Dec 3 09:03:26 2018 +0000
libvirt: Add workaround to cleanup instance dir when using rbd
At present all virt drivers provide a cleanup method that takes a single
destroy_disks boolean to indicate when the underlying storage of an
instance should be destroyed.
When cleaning up after an evacuation or revert resize the value of
destroy_disks is determined by the compute layer calling down both into
the check_instance_
and remote check_instance_
the host now running the instance.
For the Libvirt driver the initial local call will return None when
using the shared block RBD imagebackend as it is assumed all instance
storage is shared resulting in destroy_disks always being False when
cleaning up. This behaviour is wrong as the instance disks are stored
separately to the instance directory that still needs to be cleaned up
on the host. Additionally this directory could also be shared
independently of the disks on a NFS share for example and would need to
also be checked before removal.
This change introduces a backportable workaround configurable for the
Libvirt driver with which operators can ensure that the instance
directory is always removed during cleanup when using the RBD
imagebackend. When enabling this workaround operators will need to
ensure that the instance directories are not shared between computes.
Future work will allow for the removal of this workaround by separating
the shared storage checks from the compute to virt layers between the
actual instance disks and any additional storage required by the
specific virt backend.
NOTE(lyarwood): Conflicts as If1b6e5f20d2ea8
only merged in Rocky and the backports of
Id3c74c019d
I217fba9138
stable/queens from stable/rocky.
Conflicts:
Related-Bug: #1761062
Partial-Bug: #1414895
Change-Id: I8fd6b9f857a1c4
(cherry picked from commit d6c1f6a1032ed2e
(cherry picked from commit 8c678ae57299076
tags: | added: in-stable-queens |
Change abandoned by Jeffrey Zhang (<email address hidden>) on branch: master
Review: https:/
Changed in nova: | |
status: | In Progress → Fix Released |
This seems to be a ceph specific issue, marking as low