multipath-tools create duplicate device

Bug #1030040 reported by vincent
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
multipath-tools (Ubuntu)
Invalid
Undecided
Peter Petrakis

Bug Description

I am testing Ubuntu12.04LTS and find multipath-tools could create duplicate device on same device.

root@Linux51:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04 LTS
Release: 12.04
Codename: precise

root@Linux51:~# apt-cache policy multipath-tools
multipath-tools:
  Installed: 0.4.9-3ubuntu5
  Candidate: 0.4.9-3ubuntu5
  Version table:
 *** 0.4.9-3ubuntu5 0

root@Linux51:~# multipath -ll mpath35
mpath35 (3600601601c102900cce9d9e2ae3fe011) dm-19 DGC,VRAID
size=12G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 5:0:3:10 sdau 66:224 active ready running
| `- 6:0:3:10 sddg 70:224 active ready running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 5:0:2:10 sdr 65:16 active ready running
  `- 6:0:2:10 sdcd 69:16 active ready running
root@Linux51:~# multipath -ll mpath36
mpath36 (3600601601c102900d4e9d9e2ae3fe011) dm-20 DGC,VRAID
size=12G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 5:0:3:10 sdau 66:224 active ready running
| `- 6:0:3:10 sddg 70:224 active ready running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 5:0:2:10 sdr 65:16 active ready running
  `- 6:0:2:10 sdcd 69:16 active ready running

after useing multipath -f 3600601601c102900cce9d9e2ae3fe011, the mpath35 disapper, even reboot host

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Hi Vincent,

Could you please provide?

1) any lvm config file or other "stacked" block device configurations
2) if lvm: vgscan -vvv 2>&1 | tee > vgscan.log
3) multipath.conf
4) echo 'show config' | multipathd -k > multipathd-runtime-config.log
5) dmsetup table -v > dmsetup.log
6) multipath -v4 -ll > multipath.log
7) system logs

put that together into a tarball with a datetime stamp and attach it to the bug.

To assist in debugging. Please stop multipathd, flush all paths and then only
run multipath client so we can observe how the paths are created.

0) service multipath-tools stop
1) multipath -F # check that paths have been deleted
2) multipath -v4 > multipath-create-all-no-daemon.log
3) dmsetup table -v > dmsetup-no-daemon.log
3) multipath -ll > multipath-normal-listing.log

attach all this in another tarball appropriately labelled and attach.

BTW, is your hardware handler actually installed? lsmod | grep scsi_dh

Help me understand why your SD names have 'lapped" to use an additional character,
are you performing a lot of hotplug or are there that many disks? an lsscsi -l output
would be nice.

Thanks.

Changed in multipath-tools (Ubuntu):
status: New → Incomplete
assignee: nobody → Peter Petrakis (peter-petrakis)
Revision history for this message
vincent (vincent-y-chen) wrote :
Revision history for this message
vincent (vincent-y-chen) wrote :

Hi,
I can't reproduce this issue now. It only happen once. As my first description, i use "multipath -f " to remove the duplicate device. then everything is ok.

Here is the log,configuratin file as requested. Due to no duplicated devices are produced, maybe these log can't help debuging.

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Hmm, A reproduction case is going to be necessary, there are some
weird things going on in syslog.2 which I'm unfamiliar with. Unfortunately,
I don't have access to a symmetrix either. I really need to have multipathd running
in the foreground, with full logging to a file, to catch what went down.

From 7/27/12 00:38:48 -- 00:38:58 I just see lines and lines of this:
Jul 27 00:38:58 Linux51 multipathd: mpath12: sdm - emc_clariion_checker: Logical Unit is unbound or LUNZ

For about every block device, this messages means the path is down. It's safety against issuing IO
to a LUNZ which will block forever.

libmultipath/checkers/emc_clariion.c

        if ( /* LUN should at least be bound somewhere and not be LUNZ */
                sense_buffer[4] == 0x00) {
                MSG(c, "emc_clariion_checker: Logical Unit is unbound "
                    "or LUNZ");
                return PATH_DOWN;
        }

After a few filters...
Jul 27 05:45:36 Linux51 kernel: [188724.191711] scsi 6:0:0:0: Direct-Access DGC LUNZ 0532 PQ: 0 ANSI: 4
Jul 27 05:45:36 Linux51 kernel: [188724.193584] scsi 6:0:1:0: Direct-Access DGC LUNZ 0532 PQ: 0 ANSI: 4
Jul 27 05:45:43 Linux51 kernel: [188730.940803] scsi 5:0:0:0: Direct-Access DGC LUNZ 0532 PQ: 0 ANSI: 4
Jul 27 05:45:43 Linux51 kernel: [188730.942590] scsi 5:0:1:0: Direct-Access DGC LUNZ 0532 PQ: 0 ANSI: 4

ok, that's 4, but there were dozens in the logs, which means the rest were unbound.

Then there's this,
Jul 27 00:39:49 Linux51 udevd[26799]: timeout '/sbin/blkid -o udev -p /dev/dm-53'
Jul 27 00:39:49 Linux51 udevd[26794]: timeout '/sbin/blkid -o udev -p /dev/dm-52'

Jul 27 00:39:50 Linux51 udevd[26848]: timeout: killing '/sbin/blkid -o udev -p /dev/dm-66' [26890]
Jul 27 00:39:50 Linux51 udevd[26857]: timeout: killing '/sbin/blkid -o udev -p /dev/dm-60' [26899]
...

Something is off, if multipath did create two maps to the same device I suspect it to be a victim
of these path availability issues.

<11:37:45>multipath_conf$ zcat syslog.2.gz | grep unbound | sort -u | wc -l
43055

That can't be right.

Revision history for this message
vincent (vincent-y-chen) wrote :

Thanks, peter. Please close this case

Changed in multipath-tools (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
vincent (vincent-y-chen) wrote :

Any update, Peter

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

@Vincent: I thought you asked for this case to be closed?
Did you mean to update another bug?

Revision history for this message
vincent (vincent-y-chen) wrote :

sorry, my mistake. i should have request update on another case.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.