2014-08-15 13:20:21 |
Jeegn Chen |
bug |
|
|
added bug |
2014-08-15 13:20:34 |
Jeegn Chen |
nova: assignee |
|
Jeegn Chen (jeegn-chen) |
|
2014-08-15 13:30:00 |
Jeegn Chen |
description |
When a volume is attached to a VM in the source compute node through multipath, the related files in /dev/disk/by-path/ are like this
stack@ubuntu-server12:~/devstack$ ls /dev/disk/by-path/*24
/dev/disk/by-path/ip-192.168.3.50:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.a5-lun-24
/dev/disk/by-path/ip-192.168.4.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b4-lun-24
The information on its corresponding multipath device is like this
stack@ubuntu-server12:~/devstack$ sudo multipath -l 3600601602ba03400921130967724e411
3600601602ba03400921130967724e411 dm-3 DGC,VRAID
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=-1 status=active
| `- 19:0:0:24 sdl 8:176 active undef running
`-+- policy='round-robin 0' prio=-1 status=enabled
`- 18:0:0:24 sdj 8:144 active undef running
But when the VM is migrated to the destination, the related information is like the following example since we CANNOT guarantee that all nodes are able to access the same iSCSI portals and the same target LUN number. And the information is used to overwrite connection_info in the DB before the post live migration logic is executed.
stack@ubuntu-server13:~/devstack$ ls /dev/disk/by-path/*24
/dev/disk/by-path/ip-192.168.3.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b5-lun-100
/dev/disk/by-path/ip-192.168.4.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b4-lun-100
stack@ubuntu-server12:~/devstack$ sudo multipath -l 3600601602ba03400921130967724e411
3600601602ba03400921130967724e411 dm-3 DGC,VRAID
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=-1 status=active
| `- 19:0:0:100 sdf 8:176 active undef running
`-+- policy='round-robin 0' prio=-1 status=enabled
`- 18:0:0:100 sdg 8:144 active undef running
As a result, if post live migration in source side uses <IP>, <IQN> and <TARGET LUN Number> to find the devices to clean up, it may use 192.168.3.51, iqn.1992-04.com.emc:cx.fnm00124500890.a5 and 100.
However, the correct one should be 192.168.3.50, iqn.1992-04.com.emc:cx.fnm00124500890.a5 and 24.
Similar philosophy in (https://bugs.launchpad.net/nova/+bug/1327497) can be used to fix it: Leverage the unchanged multipath_id to find correct devices to delete. |
When a volume is attached to a VM in the source compute node through multipath, the related files in /dev/disk/by-path/ are like this
stack@ubuntu-server12:~/devstack$ ls /dev/disk/by-path/*24
/dev/disk/by-path/ip-192.168.3.50:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.a5-lun-24
/dev/disk/by-path/ip-192.168.4.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b4-lun-24
The information on its corresponding multipath device is like this
stack@ubuntu-server12:~/devstack$ sudo multipath -l 3600601602ba03400921130967724e411
3600601602ba03400921130967724e411 dm-3 DGC,VRAID
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=-1 status=active
| `- 19:0:0:24 sdl 8:176 active undef running
`-+- policy='round-robin 0' prio=-1 status=enabled
`- 18:0:0:24 sdj 8:144 active undef running
But when the VM is migrated to the destination, the related information is like the following example since we CANNOT guarantee that all nodes are able to access the same iSCSI portals and the same target LUN number. And the information is used to overwrite connection_info in the DB before the post live migration logic is executed.
stack@ubuntu-server13:~/devstack$ ls /dev/disk/by-path/*24
/dev/disk/by-path/ip-192.168.3.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b5-lun-100
/dev/disk/by-path/ip-192.168.4.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b4-lun-100
stack@ubuntu-server13:~/devstack$ sudo multipath -l 3600601602ba03400921130967724e411
3600601602ba03400921130967724e411 dm-3 DGC,VRAID
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=-1 status=active
| `- 19:0:0:100 sdf 8:176 active undef running
`-+- policy='round-robin 0' prio=-1 status=enabled
`- 18:0:0:100 sdg 8:144 active undef running
As a result, if post live migration in source side uses <IP>, <IQN> and <TARGET LUN Number> to find the devices to clean up, it may use 192.168.3.51, iqn.1992-04.com.emc:cx.fnm00124500890.a5 and 100.
However, the correct one should be 192.168.3.50, iqn.1992-04.com.emc:cx.fnm00124500890.a5 and 24.
Similar philosophy in (https://bugs.launchpad.net/nova/+bug/1327497) can be used to fix it: Leverage the unchanged multipath_id to find correct devices to delete. |
|
2014-08-15 14:07:36 |
OpenStack Infra |
nova: status |
New |
In Progress |
|
2014-10-30 08:54:50 |
OpenStack Infra |
nova: status |
In Progress |
Fix Committed |
|
2014-11-12 12:46:31 |
Shintaro Mizuno |
bug |
|
|
added subscriber Shintaro Mizuno |
2014-11-13 22:20:41 |
OpenStack Infra |
tags |
|
in-stable-juno |
|
2014-12-04 23:32:43 |
Alan Pevec |
nominated for series |
|
nova/juno |
|
2014-12-04 23:32:44 |
Alan Pevec |
bug task added |
|
nova/juno |
|
2014-12-04 23:34:33 |
Alan Pevec |
nova/juno: status |
New |
Fix Committed |
|
2014-12-04 23:34:33 |
Alan Pevec |
nova/juno: milestone |
|
2014.2.1 |
|
2014-12-05 08:16:43 |
Alan Pevec |
nova/juno: status |
Fix Committed |
Fix Released |
|
2014-12-18 20:10:42 |
Thierry Carrez |
nova: status |
Fix Committed |
Fix Released |
|
2014-12-18 20:10:42 |
Thierry Carrez |
nova: milestone |
|
kilo-1 |
|
2015-04-30 09:16:29 |
Thierry Carrez |
nova: milestone |
kilo-1 |
2015.1.0 |
|
2016-05-17 15:10:36 |
Launchpad Janitor |
branch linked |
|
lp:~ubuntu-server-dev/nova/icehouse |
|
2016-05-18 09:43:41 |
Martin Pitt |
bug task added |
|
nova (Ubuntu) |
|
2016-05-18 09:43:49 |
Martin Pitt |
nominated for series |
|
Ubuntu Trusty |
|
2016-05-18 09:43:49 |
Martin Pitt |
bug task added |
|
nova (Ubuntu Trusty) |
|
2016-05-24 21:40:31 |
Martin Pitt |
nova (Ubuntu): status |
New |
Fix Released |
|
2016-05-24 21:41:27 |
Martin Pitt |
nova (Ubuntu Trusty): status |
New |
Fix Committed |
|
2016-05-24 21:41:30 |
Martin Pitt |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2016-05-24 21:41:32 |
Martin Pitt |
bug |
|
|
added subscriber SRU Verification |
2016-05-24 21:41:36 |
Martin Pitt |
tags |
in-stable-juno |
in-stable-juno verification-needed |
|
2016-07-01 23:29:13 |
Billy Olsen |
description |
When a volume is attached to a VM in the source compute node through multipath, the related files in /dev/disk/by-path/ are like this
stack@ubuntu-server12:~/devstack$ ls /dev/disk/by-path/*24
/dev/disk/by-path/ip-192.168.3.50:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.a5-lun-24
/dev/disk/by-path/ip-192.168.4.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b4-lun-24
The information on its corresponding multipath device is like this
stack@ubuntu-server12:~/devstack$ sudo multipath -l 3600601602ba03400921130967724e411
3600601602ba03400921130967724e411 dm-3 DGC,VRAID
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=-1 status=active
| `- 19:0:0:24 sdl 8:176 active undef running
`-+- policy='round-robin 0' prio=-1 status=enabled
`- 18:0:0:24 sdj 8:144 active undef running
But when the VM is migrated to the destination, the related information is like the following example since we CANNOT guarantee that all nodes are able to access the same iSCSI portals and the same target LUN number. And the information is used to overwrite connection_info in the DB before the post live migration logic is executed.
stack@ubuntu-server13:~/devstack$ ls /dev/disk/by-path/*24
/dev/disk/by-path/ip-192.168.3.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b5-lun-100
/dev/disk/by-path/ip-192.168.4.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b4-lun-100
stack@ubuntu-server13:~/devstack$ sudo multipath -l 3600601602ba03400921130967724e411
3600601602ba03400921130967724e411 dm-3 DGC,VRAID
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=-1 status=active
| `- 19:0:0:100 sdf 8:176 active undef running
`-+- policy='round-robin 0' prio=-1 status=enabled
`- 18:0:0:100 sdg 8:144 active undef running
As a result, if post live migration in source side uses <IP>, <IQN> and <TARGET LUN Number> to find the devices to clean up, it may use 192.168.3.51, iqn.1992-04.com.emc:cx.fnm00124500890.a5 and 100.
However, the correct one should be 192.168.3.50, iqn.1992-04.com.emc:cx.fnm00124500890.a5 and 24.
Similar philosophy in (https://bugs.launchpad.net/nova/+bug/1327497) can be used to fix it: Leverage the unchanged multipath_id to find correct devices to delete. |
[Impact]
When a volume is attached to a VM in the source compute node through multipath, the related files in /dev/disk/by-path/ are like this
stack@ubuntu-server12:~/devstack$ ls /dev/disk/by-path/*24
/dev/disk/by-path/ip-192.168.3.50:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.a5-lun-24
/dev/disk/by-path/ip-192.168.4.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b4-lun-24
The information on its corresponding multipath device is like this
stack@ubuntu-server12:~/devstack$ sudo multipath -l 3600601602ba03400921130967724e411
3600601602ba03400921130967724e411 dm-3 DGC,VRAID
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=-1 status=active
| `- 19:0:0:24 sdl 8:176 active undef running
`-+- policy='round-robin 0' prio=-1 status=enabled
`- 18:0:0:24 sdj 8:144 active undef running
But when the VM is migrated to the destination, the related information is like the following example since we CANNOT guarantee that all nodes are able to access the same iSCSI portals and the same target LUN number. And the information is used to overwrite connection_info in the DB before the post live migration logic is executed.
stack@ubuntu-server13:~/devstack$ ls /dev/disk/by-path/*24
/dev/disk/by-path/ip-192.168.3.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b5-lun-100
/dev/disk/by-path/ip-192.168.4.51:3260-iscsi-iqn.1992-04.com.emc:cx.fnm00124500890.b4-lun-100
stack@ubuntu-server13:~/devstack$ sudo multipath -l 3600601602ba03400921130967724e411
3600601602ba03400921130967724e411 dm-3 DGC,VRAID
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=-1 status=active
| `- 19:0:0:100 sdf 8:176 active undef running
`-+- policy='round-robin 0' prio=-1 status=enabled
`- 18:0:0:100 sdg 8:144 active undef running
As a result, if post live migration in source side uses <IP>, <IQN> and <TARGET LUN Number> to find the devices to clean up, it may use 192.168.3.51, iqn.1992-04.com.emc:cx.fnm00124500890.a5 and 100.
However, the correct one should be 192.168.3.50, iqn.1992-04.com.emc:cx.fnm00124500890.a5 and 24.
Similar philosophy in (https://bugs.launchpad.net/nova/+bug/1327497) can be used to fix it: Leverage the unchanged multipath_id to find correct devices to delete.
[Test Case]
Live migrate an instance which uses iSCSI multipath. Verify the correct target is removed on source hypervisor.
[Regression Potential]
Not much, its included in the next release (Juno). The change introduces a check to use a field already used by fiber multipath connections which was not used by iscsi multipath code path on cleanup. If it fails it would keep remaining behavior of not cleaning up iscsi sessions/paths. |
|
2016-07-01 23:29:20 |
Billy Olsen |
tags |
in-stable-juno verification-needed |
verification-needed |
|
2016-07-01 23:29:31 |
Billy Olsen |
tags |
verification-needed |
in-stable-juno verification-done |
|
2016-07-04 09:57:51 |
Martin Pitt |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2016-07-04 09:59:50 |
Launchpad Janitor |
nova (Ubuntu Trusty): status |
Fix Committed |
Fix Released |
|