Deletion of instances will be stuck forever if any of deletion hung in 'multipath -r'

Bug #1447490 reported by Peter Wang
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Low
Unassigned

Bug Description

I created about 25 VMs from bootable volumes, after finishing this,
I ran a script to deletion all of them in a very short time.

while what i saw was: all of the VMs were in 'deleting' status and would never be deleted after waiting for hours

from ps cmd:
stack@ubuntu-server13:/var/log/libvirt$ ps aux | grep multipath
root 8205 0.0 0.0 504988 5560 ? SLl Apr22 0:01 /sbin/multipathd
root 115515 0.0 0.0 64968 2144 pts/3 S+ Apr22 0:00 sudo nova-rootwrap /etc/nova/rootwrap.conf multipath -r
root 115516 0.0 0.0 42240 9488 pts/3 S+ Apr22 0:00 /usr/bin/python /usr/local/bin/nova-rootwrap /etc/nova/rootwrap.conf multipath -r
root 115525 0.0 0.0 41792 2592 pts/3 S+ Apr22 0:00 /sbin/multipath -r
stack 151825 0.0 0.0 11744 936 pts/0 S+ 02:10 0:00 grep --color=auto multipath

then i killed the multipath -r commands

all vm ran into ERROR status

after digging into nova code,
nova always trying to get a global file lock :
@utils.synchronized('connect_volume')
    def disconnect_volume(self, connection_info, disk_dev):
        """Detach the volume from instance_name."""
        iscsi_properties = connection_info['data']

      ......
      if self.use_multipath and multipath_device:
            return self._disconnect_volume_multipath_iscsi(iscsi_properties,
                                                           multipath_device)

and then rescan iscsi by 'multipath -r'

def _disconnect_volume_multipath_iscsi(self, iscsi_properties,
                                           multipath_device):
        self._rescan_iscsi()
        self._rescan_multipath() ---> self._run_multipath('-r', check_exit_code=[0, 1, 21])

In my case, 'multipath -r' hang for a very long time and did not exit for serveral hours
in addtion, this block all deletion of VM instances in the same Nova Node

IMO, Nova should not wait the "BLOCK" command forever, at least, a timeout is needed for command such as'multipath -r' and 'multipath -ll'

or is there any other solution for my case?

MY ENVIRONMENT:
Ubuntu Server 14:
multipath-tools
multipath enabled in Nova node

Thanks
Peter

Peter Wang (peter.wang)
information type: Public → Public Security
Peter Wang (peter.wang)
information type: Public Security → Public
Revision history for this message
Moshe Levi (moshele) wrote :

I am pushing commit into oslo.concurrency to support timeout ( https://review.openstack.org/#/c/177030/ )
once this will be merged and release we can use it to put a timeout on the multipath command

Revision history for this message
Peter Wang (peter.wang) wrote :

OK, thanks Moshe for the INFO

tags: added: volumes
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Matt Riedemann (mriedem)
tags: added: multipath
Revision history for this message
Matt Riedemann (mriedem) wrote :

Can this be tried against liberty or mitaka nova when we're using the os-brick library which had other fixes for multipath issues than did nova?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.