migrate of in-use LVM doesn't seem to work

Bug #1255622 reported by John Griffith
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cinder
Invalid
High
Unassigned

Bug Description

While testing the retype code in a condition that forces a migration there seem to be some problems with the migration code.

Set up multi-backend LVM and a type for each, then create a volume of type lvm-1 and attach to an instance. Run retype to lvm-2, the new lv is created, and eventually the volumes type is updated.

The result however is that there are two LV's now, even though Cinder only knows about one of them. In addition, attempts to do a "nova volume-detach" now fail for resource not found. Also attempts to run a subsequent retype fail because of an internal status of migrating.

One of the bigger problems here is it's very difficult to detect what's going on and what volumes map to what. Even worse, now when I'm using for example "volume-xyz" what LV is it actually pointing to? This all depends on where we're at in the migration process wich makes things very confusing and difficult to debug.

Tags: lvm migration
Revision history for this message
John Griffith (john-griffith) wrote :

The resource not found problem seems to be something else and may be completely unrelated. When the detach is issued we get a trace in nova-api crashing the n-api service and are unable to restart it.

2013-11-27 11:01:10.694 CRITICAL nova [-] [Errno 98] Address already in use
2013-11-27 11:01:10.694 TRACE nova Traceback (most recent call last):
2013-11-27 11:01:10.694 TRACE nova File "/usr/local/bin/nova-api", line 10, in <module>
2013-11-27 11:01:10.694 TRACE nova sys.exit(main())
2013-11-27 11:01:10.694 TRACE nova File "/opt/stack/nova/nova/cmd/api.py", line 49, in main
2013-11-27 11:01:10.694 TRACE nova max_url_len=16384)
2013-11-27 11:01:10.694 TRACE nova File "/opt/stack/nova/nova/service.py", line 318, in __init__
2013-11-27 11:01:10.694 TRACE nova max_url_len=max_url_len)
2013-11-27 11:01:10.694 TRACE nova File "/opt/stack/nova/nova/wsgi.py", line 123, in __init__
2013-11-27 11:01:10.694 TRACE nova self._socket = eventlet.listen(bind_addr, family, backlog=backlog)
2013-11-27 11:01:10.694 TRACE nova File "/usr/local/lib/python2.7/dist-packages/eventlet/convenience.py", line 38, in listen
2013-11-27 11:01:10.694 TRACE nova sock.bind(addr)
2013-11-27 11:01:10.694 TRACE nova File "/usr/lib/python2.7/socket.py", line 224, in meth
2013-11-27 11:01:10.694 TRACE nova return getattr(self._sock,name)(*args)
2013-11-27 11:01:10.694 TRACE nova error: [Errno 98] Address already in use
2013-11-27 11:01:10.694 TRACE nova

Changed in cinder:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Avishay Traeger (avishay-il) wrote :

Trying to reproduce the bug:

1. Havana-based devstack with multi-backend (two LVMs)
2. cinder type-create lvm1
3. cinder type-key lvm1 set capabilities:volume_backend_name=LVM_iSCSI
4. cinder type-create lvm2
5. cinder type-key lvm2 set capabilities:volume_backend_name=LVM_iSCSI2
6. cinder create --volume-type lvm1 1
7. verified that volume is on lvm1 and with proper volume type
8. nova boot --image a9a4d780-0ba4-44a3-be25-31926e4ec226 --flavor m1.tiny foo
9. (boot VM and attach volume, ensure status is in-user)
10. cinder retype cfaeb596-7828-4c11-a2bd-144b1a702e0e lvm2
11. after a few seconds, host is lvm2, volume type is lvm2, status is in-use
12. sudo lvs
  LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
  volume-1f652358-062f-4b02-8045-c4073befb0ae stack-volumes2 -wi-ao-- 1.00g

Only one volume, in second VG.

Revision history for this message
Avishay Traeger (avishay-il) wrote :
Revision history for this message
Avishay Traeger (avishay-il) wrote :

Can you check your logs if n-cpu is failing? One gotcha is that libvirt and qemu need to be new enough to support the feature.

Of course, no matter what the problem is, proper cleanup is still an issue.

Changed in cinder:
assignee: nobody → Swapnil Kulkarni (coolsvap)
Revision history for this message
John Griffith (john-griffith) wrote :
Download full text (11.0 KiB)

We end up with an orphaned volume (the dest volume we created is never actually deleted, although the DB is updated to say it is)

For some added confusion, the /dev/disk/by-path attached to the compute node does in fact have the iqn of the destination volume that was created. Also there's ZERO info or feedback describing that a volume is being migrated, has been migrated or even worse if it fails to be migrated.

Here's some output info to demonstrate the confusion in all of this:

jgriffith@trusty ~/devstack $ cinder show e5cd3e59-3f92-4f51-a28b-8cd0763182eb
+---------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+---------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| attachments | [{u'device': u'/dev/vdb', u'server_id': u'f89dfa19-5f57-4eea-bd2f-ceebae48539f', u'id': u'e5cd3e59-3f92-4f51-a28b-8cd0763182eb', u'host_name': None, u'volume_id': u'e5cd3e59-3f92-4f51-a28b-8cd0763182eb'}] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2014-10-23T19:46:24.000000 |
| description | None |
| encrypted | False |
| id | e5cd3e59-3f92-4...

Mike Perez (thingee)
Changed in cinder:
assignee: Swapnil Kulkarni (coolsvap) → nobody
status: Triaged → Confirmed
Jay Bryant (jsbryant)
tags: added: lvm migration
Vincent Hou (houshengbo)
Changed in cinder:
assignee: nobody → Vincent Hou (houshengbo)
Revision history for this message
Jay Bryant (jsbryant) wrote :

Thanks for taking a look at this Vincent!

Revision history for this message
Vincent Hou (houshengbo) wrote :

@John, I totally understand your confusion. However, if one volume with volume-id for example migrates to another host or back-end storage, its volume is remains unchanged, BUT the provider_location will change to the destination target. Please use "select id, status, migration_status, provider_location from volumes" to check the volume after the migration. The provider_location will link to a new target, which is created by the dest volume with a different volume id.

After the migration is finished, the dest volume itself will be cleaned up in LVM and the DB, but the target still exists and is connected to the migrated volume. I think this explains why you see the "orphaned volume". The volume in LVM and info in DB are gone, but the target has to be there as the back-end storage for the migrated volume. In DB, the column provider_location will indicate the link between them.

Revision history for this message
Vincent Hou (houshengbo) wrote :

Corrections. The dest volume is not deleted, only the information in DB is deleted. The target of this dest volume will link to the original migrated volume. The column provider_location will indicate the link.

Revision history for this message
Vincent Hou (houshengbo) wrote :

The main steps for volume migration mechanism:
1. Create a volume as the destination volume, which logs its information in the database.
2. Attach both the migrated volume and the destination volume to one host, and copy the volume content to the destination volume.
3. After the content is copied, detach both of them.
4. Delete the original migrated volume, and delete the information of the destination volume but the REAL LVM still exists.
5. Change the DB information for the migrated volume, especially provider_location, to link to the destination volume(LVM). (This step is done by copying all the information row of the dest volume to a new row and modifying the volume id to the migrated volume id and some other status information to show this volume is alive in cinder.)

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Automatically unassigning due to inactivity.

Changed in cinder:
assignee: Vincent Hou (houshengbo) → nobody
Mannu (mannu-ray)
Changed in cinder:
assignee: nobody → Mannu (mannu-ray)
status: Confirmed → In Progress
Changed in cinder:
assignee: Mannu (mannu-ray) → Sachin Yede (yede-sachin45)
Revision history for this message
Sachin Yede (yede-sachin45) wrote :

Performed the following steps to generate an issue:
1) Installed devstack with liberty version

2) Created multibackend (with two LVMS)

3) Migrated from lvm1 to lvm2 through retype which was successful.
     cinder retype 45cd3d68-6864-4d9a-a4cc-a9ab452603e0 lvm2

4) Volume occupied by the new driver was the same as the one with which retype was operated.

Hence , marking it as invalid.

Changed in cinder:
status: In Progress → Invalid
Changed in cinder:
assignee: Sachin Yede (yede-sachin45) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.