Instance migration/Instance resize with lvm volumes

Bug #1831657 reported by Dr. Clemens Hardewig
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
Medium
Dr. Clemens Hardewig

Bug Description

This bug was observed when investigating the issue described in Bug: #1755266 (see comments in fix proposal https://review.opendev.org/#/c/618621/)

When going to resize or migrate an instance whose storage is provided via lvm, the lvm volumes on source node are not cleaned up but remain with the node and fill up the volume group; currently they need to be removed manually via lvremove. This affects pike, queens, rocky and stein at least.

Steps to reproduce

1.) create an instance with lvm volumes (eg root/swap/...)
2.) Perform an instance resize
3.) after successful resizing, the instance is moved too a different node
4.) check the volumes in vg on source node

The lvm volumes from the old instance are still there

The delete function in /nova/virt/libvirt/driver.py, to _cleanup_resize is not handling this.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Yup this is a known issue. Chris Friesen from WindRiver had an old patch (from Titanium Cloud and what is now StarlingX) for lvm resize support: https://review.opendev.org/#/c/337334/

tags: added: libvirt lvm resize
Revision history for this message
Dr. Clemens Hardewig (bringha1) wrote :

Hmmm - but this one has been abandoned

Changed in nova:
assignee: nobody → Dr. Clemens Hardewig (bringha1)
Revision history for this message
sean mooney (sean-k-mooney) wrote :

This is a know issue with a patch that was previously submitted and abandon to fix it so triaging as medium priority

Changed in nova:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Dr. Clemens Hardewig (bringha1) wrote :

Thanks for your comment - So is then the recommendation to resume (the code of) the abandoned patch and adopt it to the newer versions accordingly? As far as I see, the abandon has happened in 2016 already ...

Before I try to spend some work on it, I appreciate your recommendation ....

Thanks

Revision history for this message
sean mooney (sean-k-mooney) wrote :

i think chris just ran out of time to upstream this and since windriver had already applied this fix downstream in there product he and his team moved on to other work.

so if we can updated it and validate it resolves the issue i dont think there is an objection to fixing this provide people are happy to do the work.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Yeah I don't think there is really any procedural block on this. As Sean said, WindRiver had that patch and kept it downstream and ran out of time/resources to chase it upstream. I think last I asked Chris he said WindRiver was no longer using local lvm so they didn't need to pursue this anymore anyway.

We do have an lvm CI job (nova-lvm) but obviously the resize/migrate tests have to be disabled:

https://github.com/openstack/nova/blob/89712fe834942a85be3441ed175a79ff696bfe01/devstack/tempest-dsvm-lvm-rc#L41

If we can get a working patch and re-enable those tests for the nova-lvm job then I think it would be a good step forward.

Revision history for this message
Chris Friesen (cbf123) wrote :

Matt is correct...we moved away from using local LVM so it wasn't a priority anymore. The general concept should still be valid, but the patch will definitely need to be updated.

(Incidentally, local-resize with thin-provisioned local LVM allowed for really fast resizes since no data actually got copied.)

Revision history for this message
Robert Varjasi (robert.varjasi) wrote :

It would be great if you can implement it. It comes handy when we have local NVMe SSDs and we want to use LVM ephemerals instead of file based RAW or QCOW images for ephemeral.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers