nova backup fails to backup an instance with attached volume (libvirt, LVM backed)

Bug #1313573 reported by Yogev Rabl on 2014-04-28
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Feilong Wang
Juno
Undecided
Unassigned

Bug Description

Description of problem:
An instance has an attached volume, after running the command:
# nova backup <instance id> <backup name> snapshot <rotation (an integer)>
An image has been created (type backup) and the status is stuck in 'queued'.

Version-Release number of selected component (if applicable):
openstack-nova-compute-2013.2.3-6.el6ost.noarch
openstack-nova-conductor-2013.2.3-6.el6ost.noarch
openstack-nova-novncproxy-2013.2.3-6.el6ost.noarch
openstack-nova-scheduler-2013.2.3-6.el6ost.noarch
openstack-nova-api-2013.2.3-6.el6ost.noarch
openstack-nova-cert-2013.2.3-6.el6ost.noarch

python-glance-2013.2.3-2.el6ost.noarch
python-glanceclient-0.12.0-2.el6ost.noarch
openstack-glance-2013.2.3-2.el6ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. launch an instance from a volume.
2. backup the instance.

Actual results:
The backup is stuck in queued state.

Expected results:
the backup should be available as an image in Glance.

Additional info:
The nova-compute error & the glance logs are attached.

Yogev Rabl (yrabl) wrote :
Yogev Rabl (yrabl) wrote :
Yogev Rabl (yrabl) wrote :
tags: added: volumes
Nikola Đipanov (ndipanov) wrote :

So important thing to notice here and is not mentioned by the bug report is that this is reported with CONF.libvirt_images_type set to 'lvm'.

Looking at the snapshotting code in the libvirt driver and the line that causes the compute stack trace - when running an LVM backed instance with an attached volume libvirt_utils.find_disk method seems to give us the wrong disk back.

It would be good to see the libvirt.xml generated for the offending instance, but seeing how poorly tested the find_disk method is - I would not be surprised that it is in fact the cause of the bug.

tags: added: libvirt
Changed in nova:
importance: Undecided → Medium
status: New → Triaged
summary: - nova backup fails to backup an instance with attached volume
+ nova backup fails to backup an instance with attached volume (libvirt,
+ LVM backed)
Changed in nova:
assignee: nobody → Nikola Đipanov (ndipanov)
nick dallamora (ndallamora) wrote :

Is there a workaround for this? Or has there been any progress?

Nikola Đipanov (ndipanov) wrote :

Hey , I haven't had a chance to look at this sadly. Feel free to take over if you think you have a fix.

John Pierce (john-pierce) wrote :

I also get this error when I try to backup a LVM volume-based instance (which happens to be how I setup nearly all my instances - doh!)

Here's the relevant compute.log error when I try to run nova backup on my LVM volume-based instance:

2014-07-07 21:31:31.198 18720 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: Unexpected error while running command.
Command: sudo nova-rootwrap /etc/nova/rootwrap.conf lvs -o vg_all,lv_all --separator | /dev/disk/by-path/ip-10.0.0.15:3260-iscsi-iqn.2010-10.org.opens
tack:volume-3daa863e-fffe-4800-9918-877038248c0a-lun-1
Exit code: 5

Honestly, it just looks like libvirt_utils.logical_volume_info(path) is broken or doesn't work correctly on my CentOS 6.5 host as here's specific TRACE messages related to this error in my compute.log:

2014-07-07 21:31:31.198 18720 TRACE oslo.messaging.rpc.dispatcher info = libvirt_utils.logical_volume_info(path)
2014-07-07 21:31:31.198 18720 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/utils.py", line 335, in logical_volume_info
2014-07-07 21:31:31.198 18720 TRACE oslo.messaging.rpc.dispatcher '--separator', '|', path, run_as_root=True)
2014-07-07 21:31:31.198 18720 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/utils.py", line 53, in execute

So if I try to run what I think utils.py is trying to run at the command line (as shown in the first compute.log error listed at the top of this comment):

[root@wildcat ~]$ sudo nova-rootwrap /etc/nova/rootwrap.conf lvs -o vg_all,lv_all --separator \| /dev/disk/by-path/ip-10.0.0.15:3260-iscsi-iqn.2010-10.org.openstack:volume-3daa863e-fffe-4800-9918-877038248c0a-lun-1
  "disk/by-path/ip-10.0.0.15:3260-iscsi-iqn.2010-10.org.openstack:volume-3daa863e-fffe-4800-9918-877038248c0a-lun-1": Invalid path for Logical Volume

The lvs command just doesn't find this iSCSI path valid.

Playing around with the lvs command, I just don't see how I can use this lvs command ever works on machines that are compute nodes (i.e. not running the cinder iSCSI daemons) as the lvscan doesn't ever report the iSCSI volumes that are mounted for the instances running on the host.

Anyways, I am not sure how to fix this but the whole implementation of libvirt_utils.logical_volume_info(path) seems suspect to me as it can't seem to possibly ever work as the lvs command just can't see the iSCSI connected volumes.

Nikola, if you are picking this up, please re-assign it to yourself. thanks.

Changed in nova:
assignee: Nikola Đipanov (ndipanov) → nobody
Feilong Wang (flwang) on 2015-03-12
Changed in nova:
assignee: nobody → Fei Long Wang (flwang)
Bruno Lago (teolupus) wrote :

Also affects people using the RBD driver.

Fix proposed to branch: master
Review: https://review.openstack.org/164494

Changed in nova:
status: Triaged → In Progress
Feilong Wang (flwang) on 2015-03-16
Changed in nova:
milestone: none → kilo-3
Thierry Carrez (ttx) on 2015-03-20
Changed in nova:
milestone: kilo-3 → kilo-rc1

Fix proposed to branch: master
Review: https://review.openstack.org/167418

Reviewed: https://review.openstack.org/167418
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2b94135865af710dc9c7210d23e1df5f54afed62
Submitter: Jenkins
Branch: master

commit 2b94135865af710dc9c7210d23e1df5f54afed62
Author: Fei Long Wang <email address hidden>
Date: Wed Mar 25 11:30:07 2015 +1300

    Raise exception when backup volume-backed instance

    This patch will be backported to Juno and Icehouse so that
    Nova can fail immediately to let user know that it's not
    supported in that release.

    Partial-Bug: #1313573

    Change-Id: Ic84fa9e0b9c2d7b6cf49955aa4f0d44ade2b5397

John Garbutt (johngarbutt) wrote :

Removing this as RC1 blocking, adding the tag so we can track if this was the right thing to do.

tags: added: kilo-rc-potential
Changed in nova:
milestone: kilo-rc1 → none
John Garbutt (johngarbutt) wrote :

To explain, no longer RC1 blocking because we now report the error better, rather than "fail".

John Garbutt (johngarbutt) wrote :

this already made it into rc1

Changed in nova:
status: In Progress → Fix Committed
tags: removed: kilo-rc-potential

Reviewed: https://review.openstack.org/168759
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b0d8d69dbee4737cdecfde7e96b6f3bf321f477d
Submitter: Jenkins
Branch: stable/juno

commit b0d8d69dbee4737cdecfde7e96b6f3bf321f477d
Author: Fei Long Wang <email address hidden>
Date: Wed Mar 25 11:30:07 2015 +1300

    Raise exception when backup volume-backed instance

    This patch will be backported to Juno and Icehouse so that
    Nova can fail immediately to let user know that it's not
    supported in that release.

    Partial-Bug: #1313573

    NOTE: This conflict is because there is a new parameter
    named 'id' for method:
    common.raise_http_conflict_for_instance_invalid_state.

    Conflicts:
            nova/api/openstack/compute/contrib/admin_actions.py
            nova/api/openstack/compute/plugins/v3/create_backup.py

    Change-Id: Ic84fa9e0b9c2d7b6cf49955aa4f0d44ade2b5397
    (cherry picked from commit 2b94135865af710dc9c7210d23e1df5f54afed62)

tags: added: in-stable-juno
Thierry Carrez (ttx) on 2015-06-24
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2015-10-15
Changed in nova:
milestone: liberty-1 → 12.0.0

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/164494
Reason: This patch is very old and appears to not be active any more. I am therefore abandoning it to keep the nova review queue sane. Feel free to restore the change when you're actively working on it again.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers