Concurrent deletion of instances leads to residual multipath
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
In Progress
|
Undecided
|
Unassigned |
Bug Description
Description
===========
A 100G **iSCSI** **shared** volume was attached to 3 instances scheduled on the same node(node-2), then I deleted these 3 instances concurrently, the 3 instances could be deleted but the output of command 'multipath -ll' shown exception as follows.
[root@node-2 ~]# multipath -ll
Jan 10 10:25:42 | sdj: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdl: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdk: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdn: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdi: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdo: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdm: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdp: prio = const (setting: emergency fallback - alua failed)
mpathaj (36001405acb21c
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 24:0:0:39 sdj 8:144 failed faulty running
| |- 17:0:0:39 sdl 8:176 failed faulty running
| |- 22:0:0:39 sdk 8:160 failed faulty running
| `- 19:0:0:39 sdn 8:208 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 23:0:0:39 sdi 8:128 failed faulty running
|- 18:0:0:39 sdo 8:224 failed faulty running
|- 21:0:0:39 sdm 8:192 failed faulty running
`- 20:0:0:39 sdp 8:240 failed faulty running
Steps to reproduce
==================
1.Booting 3 instances using RBD as root disk, there is no requirement for the protocol type of the system disk in this step.
2.Creating a iSCSI shared volume as the data disk of the instance, you may using commercial storage or other storage systems using the iSCSIs protocol.
3.Attaching the shared volume to the 3 instances separately.
4.Make sure all the instances were mounted successfully, then delete the instances concurrently.
Expected result
===============
The 3 instances could be deleted completely, and no residual multipaths when execute 'multipath -ll'.
Actual result
=============
The 3 instances could be deleted, but the node had residual multipaths, as you can see the output from description above.
Environment
===========
1. Exact version of OpenStack you are running. See the following
Wallaby Nova & Cinder, docked a commercial storage using iSCSI.
2. Which hypervisor did you use?
Libvirt 8.0.0 + qemu-kvm 6.2.0
2. Which storage type did you use?
Using RBD as root disk, 1 shared iSCSI volume as data-disk to 3 instances scheduled on the same node.
3. Which networking type did you use?
omit...
Logs & Configs
==============
According to code of deleting, nova will not disconnect the shared volume from instance when the volume also attached to the other instances on the same node, then log 'Detected multiple connections on this host for volume'. node-2 nova-compute output:
2024-01-10 11:05:29.904 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦ 2024-01-
2024-01-10 11:05:30.143 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦ 2024-01-
2024-01-10 11:05:30.334 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦ 2024-01-
And oslo:
[root@node-2 ~]# multipath -ll
Jan 10 10:25:42 | sdj: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdl: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdk: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdn: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdi: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdo: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdm: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdp: prio = const (setting: emergency fallback - alua failed)
mpathaj (36001405acb21c
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 24:0:0:39 sdj 8:144 failed faulty running
| |- 17:0:0:39 sdl 8:176 failed faulty running
| |- 22:0:0:39 sdk 8:160 failed faulty running
| `- 19:0:0:39 sdn 8:208 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 23:0:0:39 sdi 8:128 failed faulty running
|- 18:0:0:39 sdo 8:224 failed faulty running
|- 21:0:0:39 sdm 8:192 failed faulty running
`- 20:0:0:39 sdp 8:240 failed faulty running
Related fix proposed to branch: master /review. opendev. org/c/openstack /nova/+ /907958
Review: https:/