Ubuntu
multipath-tools package

Cannot read superblock after FC multipath failover

Bug #1020436 reported by Steve Fisher on 2012-07-03

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	multipath-tools (Ubuntu)	Invalid	Undecided	Peter Petrakis

Bug Description

This might not be multipath-tools, but it's the likeliest candidate.

If fibre channel connectivity is disrupted on an active path, I can see the multipath daemon rejuggling active paths as expected. However instead of utilising a new path, the server continues trying active i/o down the same path until it fails, and the superblock on the lun is marked unreadable.

The luns simply cannot be remounted even after the paths are recovered. However, after a simple reboot, the luns are again readable _regardless_ of the visible path. (Eg I shut the active FC port then rebooted, and the storage was mountable per normal over a different path).

I've replicated this behaviour using both xfs and ext4 filesystems, on multiple different luns presented to the server.

The luns are visible over four FC paths, and the /dev/mapper/mpathX identity is used as the physical volume for an LVM2 volume group, into which the logical volume(s) and filesystem are created.

root@<host>:~# lsb_release -rd
Description: Ubuntu 10.04.3 LTS
Release: 10.04

As you can see here, the /dev/test and /dev/test2 volume groups are both active, yet showing i/o errors after a pathing fail.

root@<host>:~# lvscan
  /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  ACTIVE '/dev/test2/lvol0' [4.00 GiB] inherit
  ACTIVE '/dev/test/lvol0' [4.00 GiB] inherit
  ACTIVE '/dev/mvsan/san' [500.00 GiB] inherit

root@<host>:~# pvscan
  /dev/test2/lvol0: read failed after 0 of 4096 at 4290707456: Input/output error
  /dev/test2/lvol0: read failed after 0 of 4096 at 4290764800: Input/output error
  /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/test2/lvol0: read failed after 0 of 4096 at 4096: Input/output error
  /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 4290707456: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 4290764800: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 4096: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  PV /dev/mapper/mpath2 VG test2 lvm2 [4.00 GiB / 0 free]
  PV /dev/mapper/mpath1 VG test lvm2 [4.00 GiB / 0 free]
  PV /dev/mapper/mpath0 VG mvsan lvm2 [500.00 GiB / 0 free]

root@<host>:~# multipath -ll
mpath2 (360060e80058a070000008a0700000023) dm-7 HITACHI ,OPEN-V
[size=4.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=4][active]
\_ 6:0:0:2 sdd 8:48 [active][ready]
\_ 6:0:1:2 sdg 8:96 [active][ready]
\_ 7:0:0:2 sdj 8:144 [active][ready]
\_ 7:0:1:2 sdp 8:240 [active][ready]
mpath1 (360060e80058a070000008a070000000e) dm-6 HITACHI ,OPEN-V
[size=4.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=4][active]
\_ 6:0:0:1 sdc 8:32 [active][ready]
\_ 6:0:1:1 sdf 8:80 [active][ready]
\_ 7:0:0:1 sdi 8:128 [active][ready]
\_ 7:0:1:1 sdo 8:224 [active][ready]
mpath0 (360060e80058a070000008a0700002417) dm-5 HITACHI ,OPEN-V
[size=500G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=4][active]
\_ 6:0:0:0 sdb 8:16 [active][ready]
\_ 6:0:1:0 sde 8:64 [active][ready]
\_ 7:0:0:0 sdh 8:112 [active][ready]
\_ 7:0:1:0 sdk 8:160 [active][ready]

The dmesg output is very instructive, but still doesn't help with the root cause per se. I set the scsi-timeouts nice and high (90 seconds) to accommodate any potential delays in path failover. The path checks (test unit ready/TUR) occur about every 20 seconds.

See below:

[ 2192.891962] qla2xxx 0000:24:00.0: LOOP DOWN detected (2 3 0).
[ 2222.820850] rport-7:0-0: blocked FC remote port time out: removing target and saving binding
[ 2222.821297] rport-7:0-1: blocked FC remote port time out: removing target and saving binding
[ 2222.827286] device-mapper: multipath: Failing path 8:112.
[ 2222.827369] device-mapper: multipath: Failing path 8:160.

So this is all as expected.

Then:

[ 2222.924665] sd 7:0:0:0: [sdh] Synchronizing SCSI cache
[ 2222.924696] sd 7:0:0:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2222.926071] device-mapper: multipath: Failing path 8:160.
[ 2222.976444] sd 7:0:1:2: [sdm] Unhandled error code
[ 2222.976446] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2222.976449] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f e1 00 00 00 01 00
[ 2222.976455] end_request: I/O error, dev sdm, sector 8380672
[ 2222.980258] sd 7:0:1:2: [sdm] Unhandled error code
[ 2222.980260] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2222.980262] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f e1 70 00 00 01 00
[ 2222.980267] end_request: I/O error, dev sdm, sector 8380784
[ 2222.984139] sd 7:0:1:2: [sdm] Unhandled error code
[ 2222.984141] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2222.984142] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 80 00 00 01 00
[ 2222.984147] end_request: I/O error, dev sdm, sector 384
[ 2222.988137] sd 7:0:1:2: [sdm] Unhandled error code
[ 2222.988139] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2222.988141] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 88 00 00 01 00
[ 2222.988145] end_request: I/O error, dev sdm, sector 392
[ 2222.992274] sd 7:0:1:2: [sdm] Unhandled error code
[ 2222.992276] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2222.992278] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 80 00 00 04 00
[ 2222.992282] end_request: I/O error, dev sdm, sector 384
[ 2222.996583] sd 7:0:1:1: [sdl] Unhandled error code
[ 2222.996585] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2222.996587] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 7f e1 00 00 00 08 00
[ 2222.996592] end_request: I/O error, dev sdl, sector 8380672
[ 2223.000980] sd 7:0:1:1: [sdl] Unhandled error code
[ 2223.000983] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.000985] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 7f e1 70 00 00 08 00
[ 2223.000992] end_request: I/O error, dev sdl, sector 8380784
[ 2223.005571] sd 7:0:1:1: [sdl] Unhandled error code
[ 2223.005573] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.005576] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 01 80 00 00 08 00
[ 2223.005583] end_request: I/O error, dev sdl, sector 384
[ 2223.010258] sd 7:0:1:1: [sdl] Unhandled error code
[ 2223.010260] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.010263] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 01 88 00 00 08 00
[ 2223.010269] end_request: I/O error, dev sdl, sector 392
[ 2223.015062] sd 7:0:1:1: [sdl] Unhandled error code
[ 2223.015065] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.015068] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 01 80 00 00 08 00
[ 2223.015072] end_request: I/O error, dev sdl, sector 384
[ 2223.046324] sd 7:0:0:2: [sdj] Unhandled error code
[ 2223.046325] sd 7:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.046328] sd 7:0:0:2: [sdj] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
[ 2223.046333] end_request: I/O error, dev sdj, sector 8388480
[ 2223.051339] sd 7:0:0:2: [sdj] Unhandled error code
[ 2223.051341] sd 7:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.051343] sd 7:0:0:2: [sdj] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
[ 2223.051347] end_request: I/O error, dev sdj, sector 8388592
[ 2223.051841] sd 7:0:0:1: [sdi] Synchronizing SCSI cache
[ 2223.051868] sd 7:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.056574] sd 7:0:1:0: [sdk] Unhandled error code
[ 2223.056576] sd 7:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.056578] sd 7:0:1:0: [sdk] CDB: Read(10): 28 00 3e 7f ff 80 00 00 08 00
[ 2223.056583] end_request: I/O error, dev sdk, sector 1048575872
[ 2223.061961] sd 7:0:1:0: [sdk] Unhandled error code
[ 2223.061962] sd 7:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.061964] sd 7:0:1:0: [sdk] CDB: Read(10): 28 00 3e 7f ff f0 00 00 08 00
[ 2223.061969] end_request: I/O error, dev sdk, sector 1048575984
[ 2223.067450] sd 7:0:1:0: [sdk] Unhandled error code
[ 2223.067453] sd 7:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.067456] sd 7:0:1:0: [sdk] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 2223.067462] end_request: I/O error, dev sdk, sector 0
[ 2223.073081] sd 7:0:1:0: [sdk] Unhandled error code
[ 2223.073083] sd 7:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.073085] sd 7:0:1:0: [sdk] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
[ 2223.073089] end_request: I/O error, dev sdk, sector 8
[ 2223.078824] sd 7:0:1:0: [sdk] Unhandled error code
[ 2223.078825] sd 7:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.078827] sd 7:0:1:0: [sdk] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 2223.078832] end_request: I/O error, dev sdk, sector 0
[ 2223.084675] sd 7:0:1:1: [sdl] Unhandled error code
[ 2223.084677] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.084679] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
[ 2223.084683] end_request: I/O error, dev sdl, sector 8388480
[ 2223.090671] sd 7:0:1:1: [sdl] Unhandled error code
[ 2223.090672] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.090674] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
[ 2223.090679] end_request: I/O error, dev sdl, sector 8388592
[ 2223.096830] sd 7:0:1:1: [sdl] Unhandled error code
[ 2223.096832] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.096834] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 2223.096838] end_request: I/O error, dev sdl, sector 0
[ 2223.103075] sd 7:0:1:1: [sdl] Unhandled error code
[ 2223.103077] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.103079] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
[ 2223.103083] end_request: I/O error, dev sdl, sector 8
[ 2223.109441] sd 7:0:1:1: [sdl] Unhandled error code
[ 2223.109444] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.109446] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 2223.109453] end_request: I/O error, dev sdl, sector 0
[ 2223.115922] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.115924] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.115926] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
[ 2223.115930] end_request: I/O error, dev sdm, sector 8388480
[ 2223.122543] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.122545] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.122547] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
[ 2223.122551] end_request: I/O error, dev sdm, sector 8388592
[ 2223.128957] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.128959] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.128961] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 2223.128965] end_request: I/O error, dev sdm, sector 0
[ 2223.135176] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.135178] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.135180] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
[ 2223.135185] end_request: I/O error, dev sdm, sector 8
[ 2223.140962] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.140964] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.140965] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 2223.140970] end_request: I/O error, dev sdm, sector 0
[ 2223.164023] sd 7:0:0:2: [sdj] Synchronizing SCSI cache
[ 2223.164050] sd 7:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.178664] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.178666] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.178668] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 80 00 00 04 00
[ 2223.178673] end_request: I/O error, dev sdm, sector 384
[ 2223.183803] sd 7:0:1:1: [sdl] Unhandled error code
[ 2223.183805] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.183807] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 01 80 00 00 08 00
[ 2223.183812] end_request: I/O error, dev sdl, sector 384
[ 2223.191271] sd 7:0:1:1: [sdl] Unhandled error code
[ 2223.191273] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.191275] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 2223.191280] end_request: I/O error, dev sdl, sector 0
[ 2223.195433] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.195435] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.195438] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 2223.195444] end_request: I/O error, dev sdm, sector 0
[ 2223.236558] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.236560] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.236563] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f e1 00 00 00 01 00
[ 2223.236568] end_request: I/O error, dev sdm, sector 8380672
[ 2223.240754] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.240756] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.240758] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f e1 70 00 00 01 00
[ 2223.240763] end_request: I/O error, dev sdm, sector 8380784
[ 2223.241440] sd 7:0:1:0: [sdk] Synchronizing SCSI cache
[ 2223.241531] sd 7:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.244982] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.244984] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.244986] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 80 00 00 01 00
[ 2223.244991] end_request: I/O error, dev sdm, sector 384
[ 2223.249170] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.249173] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.249176] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 88 00 00 01 00
[ 2223.249183] end_request: I/O error, dev sdm, sector 392
[ 2223.253313] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.253315] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.253318] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 80 00 00 04 00
[ 2223.253325] end_request: I/O error, dev sdm, sector 384
[ 2223.270352] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.270355] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.270359] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
[ 2223.270366] end_request: I/O error, dev sdm, sector 8388480
[ 2223.274548] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.274550] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.274552] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
[ 2223.274557] end_request: I/O error, dev sdm, sector 8388592
[ 2223.278653] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.278654] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.278656] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 2223.278661] end_request: I/O error, dev sdm, sector 0
[ 2223.282759] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.282760] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.282762] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
[ 2223.282767] end_request: I/O error, dev sdm, sector 8
[ 2223.286863] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.286865] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.286867] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 2223.286872] end_request: I/O error, dev sdm, sector 0
[ 2223.321219] sd 7:0:1:2: [sdm] Unhandled error code
[ 2223.321221] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.321223] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 80 00 00 04 00
[ 2223.321229] end_request: I/O error, dev sdm, sector 384
[ 2223.321231] sd 7:0:1:1: [sdl] Synchronizing SCSI cache
[ 2223.321261] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2223.443319] sd 7:0:1:2: [sdm] Synchronizing SCSI cache
[ 2223.443351] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

And then:

[ 2241.094148] Aborting journal on device dm-3-8.
[ 2241.098327] __ratelimit: 24 callbacks suppressed
[ 2241.098330] Buffer I/O error on device dm-3, logical block 458752
[ 2241.102730] lost page write due to I/O error on dm-3
[ 2241.102743] JBD2: I/O error detected when updating journal superblock for dm-3-8.
[ 2265.843946] EXT4-fs error (device dm-3): ext4_journal_start_sb: Detected aborted journal
[ 2265.855469] EXT4-fs (dm-3): Remounting filesystem read-only
[ 2280.676139] I/O error in filesystem ("dm-2") meta-data dev dm-2 block 0x18 ("xfs_trans_read_buf") error 5 buf count 4096

This is very disturbing as it means that the multipathing is not in fact working at all - You can see that the I/O errors are pointing at 'dm-3' rather than the mpath associated device, but thats normal as the logical volume appears in that place.

The filesystems are healthy, when you reboot the server and check them there are zero errors. There's something wrong with the way that the pathing failover is being handled, but I'm not sure where I'd find the ultimate root cause of this.

This segfault occurs after the above issues have already manifested:

[ 4057.146508] multipathd[3261]: segfault at a ip 00007f89bdce798a sp 00007fffd25c2780 error 4 in libc-2.11.1.so[7f89bdc70000+17a000].

root@<host>:~# apt-cache policy multipath-tools
multipath-tools:
  Installed: 0.4.8-14ubuntu4.10.04.2
  Candidate: 0.4.8-14ubuntu4.10.04.2
  Version table:
*** 0.4.8-14ubuntu4.10.04.2 0
        500 http://au.archive.ubuntu.com/ubuntu/ lucid-updates/main Packages
        100 /var/lib/dpkg/status
     0.4.8-14ubuntu4 0
        500 http://au.archive.ubuntu.com/ubuntu/ lucid/main Packages

Let me know if you need further details.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: multipath-tools 0.4.8-14ubuntu4.10.04.2
ProcVersionSignature: Ubuntu 2.6.32-41.91-server 2.6.32.59+drm33.24
Uname: Linux 2.6.32-41-server x86_64
Architecture: amd64
Date: Tue Jul 3 15:13:23 2012
InstallationMedia: Ubuntu-Server 10.04.3 LTS "Lucid Lynx" - Release amd64 (20110719.2)
ProcEnviron:
LANG=en_AU.UTF8
SHELL=/bin/bash
SourcePackage: multipath-tools

See original description

Tags:

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-03:

Dependencies.txt Edit (2.0 KiB, text/plain; charset="utf-8")

description:

updated

Steve Fisher (stefan-nicolaivic) on 2012-07-09

summary:	- Multiple filesystems cannot read superblock after fibre path failover + Cannot read superblock after fibre path failover
summary:	- Cannot read superblock after fibre path failover + Cannot read superblock after FC multipath failover

Revision history for this message

Peter Petrakis (peter-petrakis) wrote on 2012-07-09:

To assist in further debug, stop the multipath daemon, then run it in the
foreground with max verbosity, while running it under strace. Log strace
to file and the foreground stdout and stderr to another file. Post the results
here.

Also include your multipath.conf, lvm.conf, output of dmsetup table, and
'echo show config | multipathd -k'.

While the system is in the failure mode, in running ps aux do you see
a series of kpartx processes in uninterruptable state?

I presume you've configured your lvm.conf blacklist correctly? Thanks.

https://help.ubuntu.com/12.04/serverguide/multipath-devices.html#multipath-devices-in-logical-volumes

Changed in multipath-tools (Ubuntu):
assignee:	nobody → Peter Petrakis (peter-petrakis)
status:	New → Incomplete

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-10:

Hi Peter,

Thanks for picking this up.

Note: I ran an 'update-initramfs -u -k all' and rebooted just for good measure before proceeding. There's some output regarding a missing firmware file, I'm not sure it's relevant:

root@rgrprod-pmdh-proc-03:/etc# update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-2.6.32-41-server
W: Possible missing firmware /lib/firmware/ql8100_fw.bin for module qla2xxx

Oddly, multipathd didn't log a single thing to stderr.

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-10:

multipathd_stdout.log Edit (420.3 KiB, text/plain)

note: multipath_stderr.log exists, but is empty.

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-10:

Download full text (6.0 KiB)

<user>@<hostname>:~$ sudo dmsetup table
mpath2: 0 8388608 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:48 1000 8:96 1000 8:144 1000 8:192 1000
mpath1: 0 8388608 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:128 1000 8:32 1000 8:80 1000 8:176 1000
mpath0: 0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:160 1000 8:16 1000 8:64 1000 8:112 1000
test2-lvol0: 0 8380416 linear 8:192 384
test-lvol0: 0 8380416 linear 8:176 384
<vgname2>-san: 0 1048567808 linear 8:160 384
<vgname>-slash: 0 48824320 linear 8:1 384
<vgname>-swap: 0 9764864 linear 8:1 48824704

----

<user>@<hostname>:~$ sudo dmsetup table
mpath2: 0 8388608 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:48 1000 8:96 1000 8:144 1000 8:192 1000 
mpath1: 0 8388608 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:128 1000 8:32 1000 8:80 1000 8:176 1000 
mpath0: 0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:160 1000 8:16 1000 8:64 1000 8:112 1000 
test2-lvol0: 0 8380416 linear 8:192 384
test-lvol0: 0 8380416 linear 8:176 384
<vgname2>-san: 0 1048567808 linear 8:160 384
<vgname>-slash: 0 48824320 linear 8:1 384
<vgname>-swap: 0 9764864 linear 8:1 48824704

----

<user>@<hostname>:~$ sudo echo "show config" | sudo multipathd -k
multipathd> show config
defaults {
	user_friendly_names yes
}
blacklist {
	devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*
	devnode ^hd[a-z]
	devnode ^dcssblk[0-9]*
	devnode ^cciss!c[0-9]d[0-9]*
	wwid 3600605b0039afe20ff000054052e7d38
	device {
		vendor DGC
		product LUNZ
	}
	device {
		vendor IBM
		product S/390.*
	}
}
blacklist_exceptions {
}
devices {
	device {
		vendor APPLE*
		product Xserve RAID 
		path_grouping_policy multibus
	}
	device {
		vendor 3PARdata
		product VV
		path_grouping_policy multibus
	}
	device {
		vendor DEC
		product HSG80
		path_grouping_policy group_by_prio
		path_checker hp_sw
		features 1 queue_if_no_path
		hardware_handler 1 hp_sw
		prio_callout /sbin/mpath_prio_hp_sw /dev/%n
	}
	device {
		vendor HP
		product A6189A
		path_grouping_policy multibus
		path_checker readsector0
	}
	device {
		vendor (COMPAQ|HP)
		product (MSA|HSV)1.0.*
		path_grouping_policy group_by_prio
		path_checker hp_sw
		features 1 queue_if_no_path
		hardware_handler 1 hp_sw
		prio_callout /sbin/mpath_prio_hp_sw /dev/%n
	}
	device {
		vendor HP
		product MSA VOLUME
		path_grouping_policy group_by_prio
		path_checker tur
		prio_callout /sbin/mpath_prio_alua /dev/%n
		failback immediate
	}
	device {
		vendor (COMPAQ|HP)
		product (MSA|HSV)1.1.*
		path_grouping_policy group_by_prio
		path_checker tur
		prio_callout /sbin/mpath_prio_alua /dev/%n
		failback immediate
	}
	device {
		vendor HP
		product HSV2.*
		path_grouping_policy group_by_prio
		path_checker tur
		prio_callout /sbin/mpath_prio_alua /dev/%n
		failback immediate
	}
	device {
		vendor HP
		product LOGICAL VOLUME.*
		path_grouping_policy multibus
		getuid_callout /lib/udev/scsi_id -n -g -u -d /dev/%n
		path_checker tur
	}
	device {
		vendor DDN
		product SAN DataDirector
		path_grouping_policy multibus
	}
	device {
		vendor EMC
		product SYMMETRIX
		path_grouping_policy multibus
		getuid_callout /lib/udev/scsi_id -g -u -ppre-spc3-83 -d /dev/%n
		path_checker readsector0
	}
	device {
		vendor DGC
		product .*
		product_blacklist LUNZ
		path_grouping_policy group_by_prio
		path_checker emc_clariion
		features 1 queue_if_no_path
		hardware_handler 1 emc
		prio_callout /sbin/mpath_prio_emc /dev/%n
		failback immediate
		no_path_retry 60
	}
	device {
		vendor FSC
		product CentricStor
		path_grouping_policy group_by_serial
		path_checker readsector0
	}
	device {
		vendor (HITACHI|HP)
		product OPEN-.*
		path_grouping_policy multibus
		path_checker tur
		features 1 queue_if_no_path
	}
	device {
		vendor HITACHI
		product DF.*
		path_grouping_policy group_by_prio
		path_checker tur
		features 1 queue_if_no_path
		prio_callout /sbin/mpath_prio_hds_modular /dev/%n
		failback immediate
	}
	device {
		vendor IBM
		product ProFibre 4000R
		path_grouping_policy multibus
		path_checker readsector0
	}
	device {
		vendor IBM
		product 1742
		path_grouping_policy group_by_prio
		path_checker tur
		prio_callout /sbin/mpath_prio_rdac /dev/%n
		failback immediate
	}
	device {
		vendor IBM
		product 3526
		path_grouping_policy group_by_prio
		path_checker tur
		prio_callout /sbin/mpath_prio_rdac /dev/%n
		failback immediate
	}
	device {
		vendor IBM
		product 3542
		path_grouping_policy group_by_serial
		path_checker tur
	}
	device {
		vendor IBM
		product 2105(800|F20)
		path_grouping_policy group_by_serial
		path_checker tur
		features 1 queue_if_no_path
	}
	device {
		vendor IBM
		product 1750500
		path_grouping_policy group_by_prio
		path_checker tur
		features 1 queue_if_no_path
		prio_callout /sbin/mpath_prio_alua /dev/%n
		failback immediate
	}
	device {
		vendor IBM
		product 2107900
		path_grouping_policy multibus
		path_checker tur
		features 1 queue_if_no_path
	}
	device {
		vendor IBM
		product 2145
		path_grouping_policy group_by_prio
		path_checker tur
		features 1 queue_if_no_path
		prio_callout /sbin/mpath_prio_alua /dev/%n
		failback immediate
	}
	device {
		vendor IBM
		product S/390 DASD ECKD
		product_blacklist S/390.*
		path_grouping_policy multibus
		getuid_callout /sbin/dasdinfo -u -b %n
		features 1 queue_if_no_path
	}
	device {
		vendor NETAPP
		product LUN.*
		path_grouping_policy group_by_prio
		path_checker readsector0
		features 1 queue_if_no_path
		prio_callout /sbin/mpath_prio_netapp /dev/%n
		failback immediate
		rr_min_io 128
	}
	device {
		vendor IBM
		product Nseries.*
		path_grouping_policy group_by_prio
		path_checker readsector0
		features 1 queue_if_no_path
		prio_callout /sbin/mpath_prio_netapp /dev/%n
		failback immediate
		rr_min_io 128
	}
	device {
		vendor Pillar
		product Axiom.*
		path_grouping_policy group_by_prio
		path_checker tur
		prio_callout /sbin/mpath_prio_alua %n
	}
	device {
		vendor SGI
		product TP9[13]00
		path_grouping_policy multibus
		path_checker readsector0
	}
	device {
		vendor SGI
		product TP9[45]00
		path_grouping_policy group_by_prio
		path_checker rdac
		prio_callout /sbin/mpath_prio_rdac /dev/%n
		failback immediate
		no_path_retry queue
	}
	device {
		vendor SGI
		product IS.*
		path_grouping_policy group_by_prio
		path_checker rdac
		prio_callout /sbin/mpath_prio_rdac /dev/%n
		failback immediate
		no_path_retry queue
	}
	device {
		vendor STK
		product OPENstorage D280
		path_grouping_policy group_by_prio
		path_checker tur
		prio_callout /sbin/mpath_prio_rdac /dev/%n
		failback immediate
	}
	device {
		vendor SUN
		product (StorEdge 3510|T4)
		path_grouping_policy multibus
		path_checker readsector0
	}
}
multipaths {
}
multipathd>

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-10:

Download full text (41.4 KiB)

-- lvm.conf blacklist should be fine, searches /dev but filters all /dev/sd.*

I'll attach those files momentarily.

I didn't see any kpartx processes running at all; there wasn't anything in an uninterruptable sleep state ('D') either.

Listing all kernel and some additional processes here (removed stuff like apache / php processes and certain publicly identifiable stuff that's not relevant)

USER PID %CPU %MEM root 1 root 2 0.0 0.0 root 3 0.0 0.0 root 4 0.0 0.0 root 5 0.0 0.0 root 6 0.0 0.0 root 7 0.0 0.0 root 8 0.0 0.0 root 9 0.0 0.0 root 10 0.0 0.0 root 11 0.0 0.0 root 12 0.0 0.0 root 13 0.0 0.0 root 14 0.0 0.0 root 15 0.0 0.0 root 16 0.0 0.0 root 17 0.0 0.0 root 18 0.0 0.0 root 19 0.0 0.0 root 20 0.0 0.0 root 21 0.0 0.0 root 22 0.0 0.0 root 23 0.0 0.0 root 24 0.0 0.0 root 25 0.1 0.0 root 26 0.0 0.0 root 27 0.0 0.0 root 28 0.0 0.0 root 29 0.0 0.0 root 30 0.0 0.0 root 31 0.0 0.0 root 32 0.0 0.0 root 33 0.0 0.0 root 34 0.0 0.0 root 35 0.0 0.0 VSZ RSS TTY STAT START TIME COMMAND
0.6 0.0 23828 2064 ? Ss 14:59 0:04 /sbin/init
0 0 ? S 14:59 0:00 [kthreadd]
0 0 ? S 14:59 0:00 [migration/0]
0 0 ? S 14:59 0:00 [ksoftirqd/0]
0 0 ? S 14:59 0:00 [watchdog/0]
0 0 ? S 14:59 0:00 [migration/1]
0 0 ? S 14:59 0:00 [ksoftirqd/1]
0 0 ? S 14:59 0:00 [watchdog/1]
0 0 ? S 14:59 0:00 [migration/2]
0 0 ? S 14:59 0:00 [ksoftirqd/2]
0 0 ? S 14:59 0:00 [watchdog/2]
0 0 ? S 14:59 0:00 [migration/3]
0 0 ? S 14:59 0:00 [ksoftirqd/3]
0 0 ? S 14:59 0:00 [watchdog/3]
0 0 ? S 14:59 0:00 [migration/4]
0 0 ? S 14:59 0:00 [ksoftirqd/4]
0 0 ? S 14:59 0:00 [watchdog/4]
0 0 ? S 14:59 0:00 [migration/5]
0 0 ? S 14:59 0:00 [ksoftirqd/5]
0 0 ? S 14:59 0:00 [watchdog/5]
0 0 ? S 14:59 0:00 [migration/6]
0 0 ? S 14:59 0:00 [ksoftirqd/6]
0 0 ? S 14:59 0:00 [watchdog/6]
0 0 ? S 14:59 0:00 [migration/7]
0 0 ? S 14:59 0:01 [ksoftirqd/7]
0 0 ? S 14:59 0:00 [watchdog/7]
0 0 ? S 14:59 0:00 [migration/8]
0 0 ? S 14:59 0:00 [ksoftirqd/8]
0 0 ? S 14:59 0:00 [watchdog/8]
0 0 ? S 14:59 0:00 [migration/9]
0 0 ? S 14:59 0:00 [ksoftirqd/9]
0 0 ? S 14:59 0:00 [watchdog/9]
0 0 ? S 14:59 0:00 [migration/10]
0 0 ? S 14:59 0:00 [ksoftirqd/10]
0 0 ? S 14:59 ...

-- lvm.conf blacklist should be fine, searches /dev but filters all /dev/sd.*

I'll attach those files momentarily.

I didn't see any kpartx processes running at all; there wasn't anything in an uninterruptable sleep state ('D') either.

Listing all kernel and some additional processes here (removed stuff like apache / php processes and certain publicly identifiable stuff that's not relevant)

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.6  0.0  23828  2064 ?        Ss   14:59   0:04 /sbin/init
root         2  0.0  0.0      0     0 ?        S    14:59   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/0]
root         4  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/0]
root         6  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/1]
root         7  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/1]
root         8  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/1]
root         9  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/2]
root        10  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/2]
root        11  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/2]
root        12  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/3]
root        13  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/3]
root        14  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/3]
root        15  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/4]
root        16  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/4]
root        17  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/4]
root        18  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/5]
root        19  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/5]
root        20  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/5]
root        21  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/6]
root        22  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/6]
root        23  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/6]
root        24  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/7]
root        25  0.1  0.0      0     0 ?        S    14:59   0:01 [ksoftirqd/7]
root        26  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/7]
root        27  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/8]
root        28  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/8]
root        29  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/8]
root        30  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/9]
root        31  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/9]
root        32  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/9]
root        33  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/10]
root        34  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/10]
root        35  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/10]
root        36  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/11]
root        37  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/11]
root        38  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/11]
root        39  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/12]
root        40  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/12]
root        41  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/12]
root        42  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/13]
root        43  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/13]
root        44  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/13]
root        45  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/14]
root        46  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/14]
root        47  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/14]
root        48  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/15]
root        49  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/15]
root        50  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/15]
root        51  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/16]
root        52  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/16]
root        53  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/16]
root        54  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/17]
root        55  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/17]
root        56  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/17]
root        57  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/18]
root        58  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/18]
root        59  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/18]
root        60  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/19]
root        61  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/19]
root        62  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/19]
root        63  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/20]
root        64  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/20]
root        65  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/20]
root        66  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/21]
root        67  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/21]
root        68  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/21]
root        69  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/22]
root        70  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/22]
root        71  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/22]
root        72  0.0  0.0      0     0 ?        S    14:59   0:00 [migration/23]
root        73  0.0  0.0      0     0 ?        S    14:59   0:00 [ksoftirqd/23]
root        74  0.0  0.0      0     0 ?        S    14:59   0:00 [watchdog/23]
root        75  0.0  0.0      0     0 ?        S    14:59   0:00 [events/0]
root        76  0.0  0.0      0     0 ?        S    14:59   0:00 [events/1]
root        77  0.0  0.0      0     0 ?        S    14:59   0:00 [events/2]
root        78  0.0  0.0      0     0 ?        S    14:59   0:00 [events/3]
root        79  0.0  0.0      0     0 ?        S    14:59   0:00 [events/4]
root        80  0.0  0.0      0     0 ?        S    14:59   0:00 [events/5]
root        81  0.0  0.0      0     0 ?        S    14:59   0:00 [events/6]
root        82  0.0  0.0      0     0 ?        S    14:59   0:00 [events/7]
root        83  0.0  0.0      0     0 ?        S    14:59   0:00 [events/8]
root        84  0.0  0.0      0     0 ?        S    14:59   0:00 [events/9]
root        85  0.0  0.0      0     0 ?        S    14:59   0:00 [events/10]
root        86  0.0  0.0      0     0 ?        S    14:59   0:00 [events/11]
root        87  0.0  0.0      0     0 ?        S    14:59   0:00 [events/12]
root        88  0.0  0.0      0     0 ?        S    14:59   0:00 [events/13]
root        89  0.0  0.0      0     0 ?        S    14:59   0:00 [events/14]
root        90  0.0  0.0      0     0 ?        S    14:59   0:00 [events/15]
root        91  0.0  0.0      0     0 ?        S    14:59   0:00 [events/16]
root        92  0.0  0.0      0     0 ?        S    14:59   0:00 [events/17]
root        93  0.0  0.0      0     0 ?        S    14:59   0:00 [events/18]
root        94  0.0  0.0      0     0 ?        S    14:59   0:00 [events/19]
root        95  0.0  0.0      0     0 ?        S    14:59   0:00 [events/20]
root        96  0.0  0.0      0     0 ?        S    14:59   0:00 [events/21]
root        97  0.0  0.0      0     0 ?        S    14:59   0:00 [events/22]
root        98  0.0  0.0      0     0 ?        S    14:59   0:00 [events/23]
root        99  0.0  0.0      0     0 ?        S    14:59   0:00 [cpuset]
root       100  0.0  0.0      0     0 ?        S    14:59   0:00 [khelper]
root       101  0.0  0.0      0     0 ?        S    14:59   0:00 [netns]
root       102  0.0  0.0      0     0 ?        S    14:59   0:00 [async/mgr]
root       103  0.0  0.0      0     0 ?        S    14:59   0:00 [pm]
root       105  0.0  0.0      0     0 ?        S    14:59   0:00 [sync_supers]
root       106  0.0  0.0      0     0 ?        S    14:59   0:00 [bdi-default]
root       107  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/0]
root       108  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/1]
root       109  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/2]
root       110  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/3]
root       111  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/4]
root       112  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/5]
root       113  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/6]
root       114  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/7]
root       115  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/8]
root       116  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/9]
root       117  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/10]
root       118  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/11]
root       119  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/12]
root       120  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/13]
root       121  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/14]
root       122  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/15]
root       123  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/16]
root       124  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/17]
root       125  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/18]
root       126  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/19]
root       127  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/20]
root       128  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/21]
root       129  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/22]
root       130  0.0  0.0      0     0 ?        S    14:59   0:00 [kintegrityd/23]
root       131  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/0]
root       132  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/1]
root       133  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/2]
root       134  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/3]
root       135  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/4]
root       136  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/5]
root       137  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/6]
root       138  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/7]
root       139  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/8]
root       140  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/9]
root       141  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/10]
root       142  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/11]
root       143  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/12]
root       144  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/13]
root       145  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/14]
root       146  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/15]
root       147  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/16]
root       148  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/17]
root       149  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/18]
root       150  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/19]
root       151  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/20]
root       152  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/21]
root       153  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/22]
root       154  0.0  0.0      0     0 ?        S    14:59   0:00 [kblockd/23]
root       155  0.0  0.0      0     0 ?        S    14:59   0:00 [kacpid]
root       156  0.0  0.0      0     0 ?        S    14:59   0:00 [kacpi_notify]
root       157  0.0  0.0      0     0 ?        S    14:59   0:00 [kacpi_hotplug]
root       158  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/0]
root       159  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/1]
root       160  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/2]
root       161  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/3]
root       162  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/4]
root       163  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/5]
root       164  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/6]
root       165  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/7]
root       166  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/8]
root       167  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/9]
root       168  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/10]
root       169  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/11]
root       170  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/12]
root       171  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/13]
root       172  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/14]
root       173  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/15]
root       174  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/16]
root       175  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/17]
root       176  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/18]
root       177  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/19]
root       178  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/20]
root       179  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/21]
root       180  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/22]
root       181  0.0  0.0      0     0 ?        S    14:59   0:00 [ata/23]
root       182  0.0  0.0      0     0 ?        S    14:59   0:00 [ata_aux]
root       183  0.0  0.0      0     0 ?        S    14:59   0:00 [ksuspend_usbd]
root       184  0.0  0.0      0     0 ?        S    14:59   0:00 [khubd]
root       185  0.0  0.0      0     0 ?        S    14:59   0:00 [kseriod]
root       186  0.0  0.0      0     0 ?        S    14:59   0:00 [kmmcd]
root       211  0.0  0.0      0     0 ?        S    14:59   0:00 [khungtaskd]
root       212  0.0  0.0      0     0 ?        S    14:59   0:00 [kswapd0]
root       213  0.0  0.0      0     0 ?        S    14:59   0:00 [kswapd1]
root       214  0.0  0.0      0     0 ?        SN   14:59   0:00 [ksmd]
root       215  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/0]
root       216  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/1]
root       217  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/2]
root       218  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/3]
root       219  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/4]
root       220  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/5]
root       221  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/6]
root       222  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/7]
root       223  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/8]
root       224  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/9]
root       225  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/10]
root       226  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/11]
root       227  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/12]
root       228  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/13]
root       229  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/14]
root       230  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/15]
root       231  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/16]
root       232  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/17]
root       233  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/18]
root       234  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/19]
root       235  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/20]
root       236  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/21]
root       237  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/22]
root       238  0.0  0.0      0     0 ?        S    14:59   0:00 [aio/23]
root       239  0.0  0.0      0     0 ?        S    14:59   0:00 [ecryptfs-kthrea]
root       240  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/0]
root       241  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/1]
root       242  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/2]
root       243  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/3]
root       244  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/4]
root       245  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/5]
root       246  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/6]
root       247  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/7]
root       248  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/8]
root       249  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/9]
root       250  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/10]
root       251  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/11]
root       252  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/12]
root       253  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/13]
root       254  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/14]
root       255  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/15]
root       256  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/16]
root       257  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/17]
root       258  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/18]
root       259  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/19]
root       260  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/20]
root       261  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/21]
root       262  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/22]
root       263  0.0  0.0      0     0 ?        S    14:59   0:00 [crypto/23]
root       266  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_eh_0]
root       267  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_eh_1]
root       268  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_eh_2]
root       271  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_eh_3]
root       274  0.0  0.0      0     0 ?        S    14:59   0:00 [kstriped]
root       275  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/0]
root       276  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/1]
root       277  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/2]
root       278  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/3]
root       279  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/4]
root       280  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/5]
root       281  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/6]
root       282  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/7]
root       283  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/8]
root       284  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/9]
root       285  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/10]
root       286  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/11]
root       287  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/12]
root       288  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/13]
root       289  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/14]
root       290  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/15]
root       291  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/16]
root       292  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/17]
root       293  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/18]
root       294  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/19]
root       295  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/20]
root       296  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/21]
root       297  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/22]
root       298  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpathd/23]
root       299  0.0  0.0      0     0 ?        S    14:59   0:00 [kmpath_handlerd]
root       300  0.0  0.0      0     0 ?        S    14:59   0:00 [ksnapd]
root       301  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/0]
root       302  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/1]
root       303  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/2]
root       304  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/3]
root       305  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/4]
root       306  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/5]
root       307  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/6]
root       308  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/7]
root       309  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/8]
root       310  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/9]
root       311  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/10]
root       312  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/11]
root       313  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/12]
root       314  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/13]
root       315  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/14]
root       316  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/15]
root       317  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/16]
root       318  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/17]
root       319  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/18]
root       320  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/19]
root       321  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/20]
root       322  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/21]
root       323  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/22]
root       324  0.0  0.0      0     0 ?        S    14:59   0:00 [kondemand/23]
root       325  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/0]
root       326  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/1]
root       327  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/2]
root       328  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/3]
root       329  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/4]
root       330  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/5]
root       331  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/6]
root       332  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/7]
root       333  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/8]
root       334  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/9]
root       335  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/1]
root       336  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/1]
root       337  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/1]
root       338  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/1]
root       339  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/1]
root       340  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/1]
root       341  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/1]
root       342  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/1]
root       343  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/1]
root       344  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/1]
root       345  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/2]
root       346  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/2]
root       347  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/2]
root       348  0.0  0.0      0     0 ?        S    14:59   0:00 [kconservative/2]
root       524  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/0]
root       525  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/1]
root       526  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/2]
root       527  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/3]
root       528  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/4]
root       529  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/5]
root       530  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/6]
root       531  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/7]
root       532  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/8]
root       533  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/9]
root       534  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/10]
root       535  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/11]
root       536  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/12]
root       537  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/13]
root       538  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/14]
root       539  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/15]
root       540  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/16]
root       541  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/17]
root       543  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/18]
root       544  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/19]
root       545  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/20]
root       546  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/21]
root       547  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/22]
root       548  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_tgtd/23]
root       552  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_eh_4]
root       553  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_eh_5]
root       558  0.0  0.0      0     0 ?        S<   14:59   0:00 [qla2xxx_5_dpc]
root       559  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_wq_5]
root       560  0.0  0.0      0     0 ?        S    14:59   0:00 [fc_wq_5]
root       561  0.0  0.0      0     0 ?        S    14:59   0:00 [fc_dl_5]
root       563  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_eh_6]
root       565  0.0  0.0      0     0 ?        S<   14:59   0:00 [qla2xxx_6_dpc]
root       566  0.0  0.0      0     0 ?        S    14:59   0:00 [scsi_wq_6]
root       567  0.0  0.0      0     0 ?        S    14:59   0:00 [fc_wq_6]
root       568  0.0  0.0      0     0 ?        S    14:59   0:00 [fc_dl_6]
root       583  0.0  0.0      0     0 ?        S    14:59   0:00 [kdmflush]
root       587  0.0  0.0      0     0 ?        S    14:59   0:00 [kdmflush]
root       601  0.0  0.0      0     0 ?        S    14:59   0:00 [kjournald]
root       670  0.0  0.0  16904   916 ?        S    14:59   0:00 upstart-udev-bridge --daemon
root       675  0.0  0.0  17076   916 ?        S<s  14:59   0:00 udevd --daemon
daemon    1041  0.0  0.0   8256   564 ?        Ss   14:59   0:00 portmap
statd     1265  0.0  0.0  10392   832 ?        Ss   14:59   0:00 rpc.statd -L --port 4000 --outgoing-port 4001
root      1309  0.0  0.0      0     0 ?        S    14:59   0:00 [kdmflush]
root      1312  0.0  0.0      0     0 ?        S    14:59   0:00 [kdmflush]
root      1315  0.0  0.0      0     0 ?        S    14:59   0:00 [kdmflush]
root      1335  0.0  0.0      0     0 ?        S    14:59   0:00 [kdmflush]
root      1403  0.0  0.0      0     0 ?        S    14:59   0:00 [jbd2/dm-3-8]
root      1404  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1405  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1406  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1407  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1408  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1409  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1410  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1411  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1412  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1413  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1414  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1415  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1416  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1417  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1418  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1419  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1420  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1421  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1422  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1423  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1424  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1425  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1426  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1427  0.0  0.0      0     0 ?        S    14:59   0:00 [ext4-dio-unwrit]
root      1449  0.0  0.0      0     0 ?        S    14:59   0:00 [xfs_mru_cache]
root      1450  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/0]
root      1451  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/1]
root      1452  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/2]
root      1453  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/3]
root      1454  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/4]
root      1455  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/5]
root      1456  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/6]
root      1457  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/7]
root      1458  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/8]
root      1459  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/9]
root      1460  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/10]
root      1461  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/11]
root      1462  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/12]
root      1463  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/13]
root      1464  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/14]
root      1465  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/15]
root      1466  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/16]
root      1468  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/17]
root      1469  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/18]
root      1470  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/19]
root      1471  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/20]
root      1472  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/21]
root      1473  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/22]
root      1474  0.0  0.0      0     0 ?        S    14:59   0:00 [xfslogd/23]
root      1475  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/0]
root      1476  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/1]
root      1477  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/2]
root      1478  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/3]
root      1479  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/4]
root      1480  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/5]
root      1481  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/6]
root      1482  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/7]
root      1483  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/8]
root      1484  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/9]
root      1485  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/10]
root      1486  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/11]
root      1487  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/12]
root      1488  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/13]
root      1490  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/14]
root      1491  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/15]
root      1492  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/16]
root      1493  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/17]
root      1494  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/18]
root      1495  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/19]
root      1496  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/20]
root      1497  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/21]
root      1498  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/22]
root      1499  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsdatad/23]
root      1500  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/0]
root      1501  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/1]
root      1502  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/2]
root      1503  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/3]
root      1504  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/4]
root      1505  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/5]
root      1506  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/6]
root      1507  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/7]
root      1508  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/8]
root      1509  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/9]
root      1510  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/10]
root      1511  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/11]
root      1512  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/12]
root      1513  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/13]
root      1514  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/14]
root      1515  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/15]
root      1516  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/16]
root      1517  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/17]
root      1518  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/18]
root      1519  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/19]
root      1520  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/20]
root      1521  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/21]
root      1522  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/22]
root      1523  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsconvertd/23]
root      1525  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsbufd]
root      1528  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsaild]
root      1558  0.0  0.0      0     0 ?        S    14:59   0:00 [kdmflush]
root      1642  0.0  0.0      0     0 ?        S    14:59   0:00 [kdmflush]
root      1760  0.0  0.0      0     0 ?        S    14:59   0:00 [xfssyncd]
root      1768  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsbufd]
root      1769  0.0  0.0      0     0 ?        S    14:59   0:00 [xfsaild]
root      1770  0.0  0.0      0     0 ?        S    14:59   0:00 [xfssyncd]
syslog    1808  0.0  0.0 211600  3400 ?        Sl   14:59   0:00 rsyslogd -c4
root      1818  0.0  0.0      0     0 ?        S    14:59   0:00 [bond0]
105       1828  0.0  0.0  23424   900 ?        Ss   14:59   0:00 dbus-daemon --system --fork
root      1892  0.0  0.0   6080   660 tty4     Ss+  14:59   0:00 /sbin/getty -8 38400 tty4
root      1896  0.0  0.0   6080   660 tty5     Ss+  14:59   0:00 /sbin/getty -8 38400 tty5
root      1900  0.0  0.0   6080   656 tty2     Ss+  14:59   0:00 /sbin/getty -8 38400 tty2
root      1901  0.0  0.0   6080   660 tty3     Ss+  14:59   0:00 /sbin/getty -8 38400 tty3
root      1905  0.0  0.0   6080   656 tty6     Ss+  14:59   0:00 /sbin/getty -8 38400 tty6
root      1915  0.0  0.0  11284   688 ?        Ss   14:59   0:00 /usr/sbin/irqbalance
root      1917  0.0  0.0  21076  1076 ?        Ss   14:59   0:00 cron
daemon    1918  0.0  0.0  18884   468 ?        Ss   14:59   0:00 atd
root      2064  0.0  0.0      0     0 ?        S    14:59   0:00 [flush-251:0]
root      2927  0.0  0.0  24536  1072 ?        Ss   14:59   0:00 /usr/sbin/ntpd
ntpd      2928  0.0  0.0  20316  1068 ?        S    14:59   0:00 /usr/sbin/ntpd
root      2936  0.0  0.0  49264  2588 ?        Ss   14:59   0:00 /usr/sbin/sshd -D
clamav    3195  0.0  0.0  42804  1860 ?        Ss   15:00   0:00 /usr/bin/freshclam -d --quiet
nobody    3200  0.0  0.0 148744 11780 ?        Sl   15:00   0:00 /usr/bin/memcached -m 4096 -p 11211 -u nobody -l 127.0.0.1
ossec     3244  0.1  0.0  12804   968 ?        S    15:00   0:00 /var/ossec/bin/ossec-agentd
root      3251  0.0  0.0   4244   576 ?        S    15:00   0:00 /var/ossec/bin/ossec-logcollector
root      3267  0.5  0.0   5100  1608 ?        S    15:00   0:03 /var/ossec/bin/ossec-syscheckd
root      3340  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/0]
root      3341  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/1]
root      3342  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/2]
root      3343  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/3]
root      3344  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/4]
root      3345  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/5]
root      3346  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/6]
root      3347  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/7]
root      3348  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/8]
root      3349  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/9]
root      3350  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/10]
root      3351  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/11]
root      3352  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/12]
root      3353  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/13]
root      3354  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/14]
root      3355  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/15]
root      3356  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/16]
root      3357  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/17]
root      3358  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/18]
root      3359  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/19]
root      3360  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/20]
root      3361  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/21]
root      3362  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/22]
root      3363  0.0  0.0      0     0 ?        S    15:00   0:00 [rpciod/23]
root      3364  0.0  0.0      0     0 ?        S    15:00   0:00 [nfsiod]
root      3365  0.0  0.0      0     0 ?        S    15:00   0:00 [lockd]
---
root      6334  0.0  0.0      0     0 ?        S    15:07   0:00 [flush-251:4]
root      6349  0.0  0.0      0     0 ?        S    15:07   0:00 [flush-0:21]
root      6350  0.0  0.0  85528  3680 pts/1    SLl+ 15:07   0:00 /sbin/multipathd -d -v 255
root      6352  0.0  0.0   4340   632 pts/1    S+   15:07   0:00 strace -o /home/user/strace_mpathd.txt -D /sbin/multipathd -d -v 255
postfix   8511  0.0  0.0  39376  2388 ?        S    15:10   0:00 cleanup -z -t unix -u -c
postfix   8512  0.0  0.0  39288  2260 ?        S    15:10   0:00 trivial-rewrite -n rewrite -t unix -u -c
postfix   8513  0.0  0.0  43568  2632 ?        S    15:10   0:00 smtp -t unix -u -c
postfix   8516  0.0  0.0  39548  2964 ?        S    15:10   0:00 local -t unix
root      8613  0.0  0.0  16868   760 ?        S<   15:10   0:00 udevd --daemon
root      8695  0.0  0.0  16868   760 ?        S<   15:10   0:00 udevd --daemon
root      8874  0.0  0.0      0     0 ?        S    15:10   0:00 [flush-8:128]
root      8875  0.0  0.0      0     0 ?        S    15:10   0:00 [flush-8:144]
root      8876  0.0  0.0      0     0 ?        S    15:10   0:00 [flush-8:224]
root      8877  0.0  0.0      0     0 ?        S    15:10   0:00 [flush-8:240]
root      8878  0.0  0.0      0     0 ?        S    15:10   0:00 [flush-8:112]
root      8879  0.0  0.0      0     0 ?        S    15:10   0:00 [flush-8:208]

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-10:

multipath.conf Edit (158 bytes, text/plain)

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-10:

lvm.conf Edit (18.7 KiB, text/plain)

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-10:

note: /dev/sda is a raid volume used for the root vg; the filter is actually "r|/dev/sd[b-z]|"

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-10:

#10

strace_mpathd.log Edit (138.3 KiB, text/plain)

Revision history for this message

Peter Petrakis (peter-petrakis) wrote on 2012-07-10:

#11

Your blacklist definition is wrong. You provided.

"""
blacklist {
        wwid "3600605b0039afe20ff000054052e7d38"
        vendor "SMART"
        product "SMART"
}

defaults {
user_friendly_names yes
}
"""

wwid cites an entire device, vendor & product cite a different device and must be
included in device { } block. If you look at the "show config" output, you'll note
that your blacklisted device by vendor/product isn't there.

Change to:

"""
blacklist {
  wwid "3600605b0039afe20ff000054052e7d38"
  device {
    vendor "SMART"
    product "SMART"
  }
}

defaults {
user_friendly_names yes
}
"""

update your initrd (multipath.conf is copied there) and reboot.

Also, it would be nice to see a list of all your storage devices. I prefer
lsscsi, e.g.
root@nashira:~# lsscsi
[0:0:0:0] cd/dvd hp DVD A DS8A5LH 1HE3 /dev/sr0
[2:0:0:0] disk NETAPP LUN 8020 /dev/sdb
[2:0:1:0] disk NETAPP LUN 8020 /dev/sdd
[3:0:0:0] storage HP P410i 3.66 -
[3:0:0:1] disk HP LOGICAL VOLUME 3.66 /dev/sda
[5:0:0:0] disk NETAPP LUN 8020 /dev/sdc
[5:0:1:0] disk NETAPP LUN 8020 /dev/sde

Finally, since this is FC, how is it zoned? Detailed port level topology please.
Thanks.

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-11:

#12

Storage devices:

$ sudo lsscsi
[0:0:0:0] cd/dvd TSSTcorp CDDVDW TS-L633F IT03 /dev/sr0
[4:2:0:0] disk IBM ServeRAID M5015 2.12 /dev/sda
[5:0:0:0] disk HITACHI OPEN-V 6008 /dev/sdb
[5:0:0:1] disk HITACHI OPEN-V 6008 /dev/sdc
[5:0:0:2] disk HITACHI OPEN-V 6008 /dev/sdd
[5:0:1:0] disk HITACHI OPEN-V 6008 /dev/sde
[5:0:1:1] disk HITACHI OPEN-V 6008 /dev/sdf
[5:0:1:2] disk HITACHI OPEN-V 6008 /dev/sdg
[6:0:0:0] disk HITACHI OPEN-V 6008 /dev/sdh
[6:0:0:1] disk HITACHI OPEN-V 6008 /dev/sdi
[6:0:0:2] disk HITACHI OPEN-V 6008 /dev/sdj
[6:0:1:0] disk HITACHI OPEN-V 6008 /dev/sdk
[6:0:1:1] disk HITACHI OPEN-V 6008 /dev/sdl
[6:0:1:2] disk HITACHI OPEN-V 6008 /dev/sdm

Zoning:

- There are two HBAs. Each HBA is plugged into a separate Brocade DCX switch, and each switch is connected to a different fabric (Fabric A and Fabric B respectively)
- The storage array presenting the luns is also directly connected to the same switches, in each Fabric via 4 ports (Array ports 1A,3A,6A,8A -> Fabric A; Ports 2A,4A,5A,8A -> Fabric B).
- Each HBA is zoned so as to see two storage array ports (2 in each fabric). Basically this means that when I disable the FC port an HBA is connected to, it loses two paths to the lun.

HBA1 -> DCX 1 (Fabric A) -> 3A,6A
HBA2 -> DCX 2 (Fabric B) -> 4A,5A

Blacklisting:

Thanks for pointing this out - (note that I only added in the 'SMART' vendor/product after the problem initially exhibited - in an effort to remove the warning regarding a USB flash storage device that appeared in the multipath -ll output. It _shouldn't_ be a factor).

Fixed now, initrd updated. Reboot/testing later today (likely in about 3 hours' time).

$ echo "show config" | sudo multipathd -k
multipathd> show config
defaults {
user_friendly_names yes
}
blacklist {
devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*
devnode ^hd[a-z]
devnode ^dcssblk[0-9]*
devnode ^cciss!c[0-9]d[0-9]*
wwid 3600605b0039afe20ff000054052e7d38
device {
  vendor SMART
  product SMART
}
device {
  vendor DGC
  product LUNZ
}
device {
  vendor IBM
  product S/390.*
}
}
blacklist_exceptions {
}
--snip

Storage devices:

$ sudo lsscsi
[0:0:0:0]    cd/dvd  TSSTcorp CDDVDW TS-L633F  IT03  /dev/sr0
[4:2:0:0]    disk    IBM      ServeRAID M5015  2.12  /dev/sda
[5:0:0:0]    disk    HITACHI  OPEN-V           6008  /dev/sdb
[5:0:0:1]    disk    HITACHI  OPEN-V           6008  /dev/sdc
[5:0:0:2]    disk    HITACHI  OPEN-V           6008  /dev/sdd
[5:0:1:0]    disk    HITACHI  OPEN-V           6008  /dev/sde
[5:0:1:1]    disk    HITACHI  OPEN-V           6008  /dev/sdf
[5:0:1:2]    disk    HITACHI  OPEN-V           6008  /dev/sdg
[6:0:0:0]    disk    HITACHI  OPEN-V           6008  /dev/sdh
[6:0:0:1]    disk    HITACHI  OPEN-V           6008  /dev/sdi
[6:0:0:2]    disk    HITACHI  OPEN-V           6008  /dev/sdj
[6:0:1:0]    disk    HITACHI  OPEN-V           6008  /dev/sdk
[6:0:1:1]    disk    HITACHI  OPEN-V           6008  /dev/sdl
[6:0:1:2]    disk    HITACHI  OPEN-V           6008  /dev/sdm

Zoning:

-  There are two HBAs.  Each HBA is plugged into a separate Brocade DCX switch, and each switch is connected to a different fabric (Fabric A and Fabric B respectively) 
-  The storage array presenting the luns is also directly connected to the same switches, in each Fabric via 4 ports (Array ports 1A,3A,6A,8A -> Fabric A; Ports 2A,4A,5A,8A -> Fabric B).
- Each HBA is zoned so as to see two storage array ports (2 in each fabric).  Basically this means that when I disable the FC port an HBA is connected to, it loses two paths to the lun.

HBA1 -> DCX 1 (Fabric A) -> 3A,6A
HBA2 -> DCX 2 (Fabric B) -> 4A,5A

Blacklisting:

Fixed now, initrd updated.  Reboot/testing later today (likely in about 3 hours' time).

$ echo "show config" | sudo multipathd -k
multipathd> show config
defaults {
	user_friendly_names yes
}
blacklist {
	devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*
	devnode ^hd[a-z]
	devnode ^dcssblk[0-9]*
	devnode ^cciss!c[0-9]d[0-9]*
	wwid 3600605b0039afe20ff000054052e7d38
	device {
		vendor SMART
		product SMART
	}
	device {
		vendor DGC
		product LUNZ
	}
	device {
		vendor IBM
		product S/390.*
	}
}
blacklist_exceptions {
}
--snip

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-11:

#13

Download full text (5.2 KiB)

I note this segfault has appeared in dmesg on the last couple of boots.

[ 304.214560] multipathd[3083]: segfault at a ip 00007f5eb838798a sp 00007fffb225ebb0 error 4 in libc-2.11.1.so[7f5eb8310000+17a000]

After adjusting the blacklist, updating initrd and rebooting, the same behaviour is exhibited when I shut the active FC port:

I note this segfault has appeared in dmesg on the last couple of boots.

[  304.214560] multipathd[3083]: segfault at a ip 00007f5eb838798a sp 00007fffb225ebb0 error 4 in libc-2.11.1.so[7f5eb8310000+17a000]

After adjusting the blacklist, updating initrd and rebooting, the same behaviour is exhibited when I shut the active FC port:

[  429.234838] qla2xxx 0000:24:00.0: LOOP DOWN detected (2 3 0).
[  459.165399]  rport-6:0-1: blocked FC remote port time out: removing target and saving binding
[  459.165875]  rport-6:0-0: blocked FC remote port time out: removing target and saving binding
[  459.166496] device-mapper: multipath: Failing path 8:112.
[  459.166573] device-mapper: multipath: Failing path 8:128.
[  459.166765] device-mapper: multipath: Failing path 8:144.
[  459.166831] device-mapper: multipath: Failing path 8:160.
[  459.166914] device-mapper: multipath: Failing path 8:176.
[  459.166965] device-mapper: multipath: Failing path 8:192.
[  459.219341] sd 6:0:1:0: [sdk] Synchronizing SCSI cache
[  459.219374] sd 6:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.220728] device-mapper: multipath: Failing path 8:112.
[  459.299111] sd 6:0:1:1: [sdl] Synchronizing SCSI cache
[  459.299138] sd 6:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.300305] device-mapper: multipath: Failing path 8:128.
[  459.378913] sd 6:0:1:2: [sdm] Synchronizing SCSI cache
[  459.378943] sd 6:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.381678] device-mapper: multipath: Failing path 8:144.
[  459.430852] sd 6:0:0:1: [sdi] Unhandled error code
[  459.430854] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.430857] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
[  459.430862] end_request: I/O error, dev sdi, sector 8388480
[  459.431165] sd 6:0:0:1: [sdi] Unhandled error code
[  459.431166] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.431168] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
[  459.431173] end_request: I/O error, dev sdi, sector 8388592
[  459.431473] sd 6:0:0:1: [sdi] Unhandled error code
[  459.431475] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.431477] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.431482] end_request: I/O error, dev sdi, sector 0
[  459.431767] sd 6:0:0:1: [sdi] Unhandled error code
[  459.431770] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.431772] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
[  459.431776] end_request: I/O error, dev sdi, sector 8
[  459.432072] sd 6:0:0:1: [sdi] Unhandled error code
[  459.432073] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.432075] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.432081] end_request: I/O error, dev sdi, sector 0
[  459.432397] sd 6:0:0:2: [sdj] Unhandled error code
[  459.432399] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.432401] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
[  459.432406] end_request: I/O error, dev sdj, sector 8388480
[  459.432712] sd 6:0:0:2: [sdj] Unhandled error code
[  459.432714] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.432717] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
[  459.432723] end_request: I/O error, dev sdj, sector 8388592
[  459.433034] sd 6:0:0:2: [sdj] Unhandled error code
[  459.433036] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.433039] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.433045] end_request: I/O error, dev sdj, sector 0
[  459.433345] sd 6:0:0:2: [sdj] Unhandled error code
[  459.433347] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.433350] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
[  459.433356] end_request: I/O error, dev sdj, sector 8
[  459.433672] sd 6:0:0:2: [sdj] Unhandled error code
[  459.433674] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.433677] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.433684] end_request: I/O error, dev sdj, sector 0
[  459.456208] sd 6:0:0:0: [sdh] Synchronizing SCSI cache
[  459.456236] sd 6:0:0:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.465583] sd 6:0:0:2: [sdj] Unhandled error code
[  459.465585] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.465588] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.465593] end_request: I/O error, dev sdj, sector 0
[  459.548504] sd 6:0:0:1: [sdi] Synchronizing SCSI cache
[  459.548540] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  459.625817] sd 6:0:0:2: [sdj] Synchronizing SCSI cache
[  459.625846] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  464.012300] I/O error in filesystem ("dm-4") meta-data dev dm-4 block 0x4c498e0       ("xfs_trans_read_buf") error 5 buf count 4096
[  464.012801] I/O error in filesystem ("dm-4") meta-data dev dm-4 block 0x4c498e0       ("xfs_trans_read_buf") error 5 buf count 4096

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-11:

#14

strace_mpathd_110712.log Edit (138.5 KiB, text/plain)

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-11:

#15

multipathd_stdout_110712.log Edit (223.7 KiB, text/plain)

Peter Petrakis (peter-petrakis) on 2012-07-13

Changed in multipath-tools (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Peter Petrakis (peter-petrakis) wrote on 2012-07-13:

#16

Thanks for all the feedback, there's quite a bit of information here to sort through.
In the meanwhile, it would be an interesting data point to see how well this array
functions under multipath 0.4.9, which is what Precise, the new LTS uses.

This can be accomplished by using apt pinning.

http://askubuntu.com/questions/103320/install-packages-from-newer-release-without-building-apt-pinning/103338#103338

In your case:

$ sudo tee /etc/apt/sources.list.d/precise.list << EOF
deb http://us.archive.ubuntu.com/ubuntu/ precise main restricted
deb-src http://us.archive.ubuntu.com/ubuntu/ precise main restricted

deb http://us.archive.ubuntu.com/ubuntu/ precise-updates main restricted
deb-src http://us.archive.ubuntu.com/ubuntu/ precise-updates main restricted
EOF

$ sudo tee /etc/apt/preferences << EOF
Package: multipath-tools*
Pin: release n=precise
Pin-Priority: 990

Package: *
Pin: release n=lucid
Pin-Priority: 900

Package: *
Pin: release o=Ubuntu
Pin-Priority: -10
EOF

$ sudo apt-get update
$ sudo apt-get upgrade

Should update multipath-tools and boot to 0.4.9, if not then
$ sudo apt-get install -t precise multipath-tools multipath-tools-boot

This should unjam you while we triage the data and will provide
a valuable data point to aid in backporting if necessary.

Note that the config syntax has changed slightly.

https://help.ubuntu.com/12.04/serverguide/device-mapper-multipathing.html#multipath-new-and-changed-features

It's ok to leave both directives in the file multpath simply globs keywords and uses
the ones that "fit". Remember to update your initrd after changing the config file.

I assume you have multiple SAN clients, please keep one at 0.4.8 so we can continue
to debug this. Thanks.

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-16:

#17

To date, the issue hasn't been observed on the two physical hosts we have running 10.04.1 LTS with the same multipath-tools version, which certainly raises a flag.

They are being used as mission critical / production database servers so I'm scheduling a window in which we will be able to confirm whether or not it is manifesting on those hosts too.

Revision history for this message

Peter Petrakis (peter-petrakis) wrote on 2012-07-18:

#18

Download full text (3.9 KiB)

Alrighty.

[OK] lvm.conf device blacklist
[OK] multipath.conf, sans fix I already pointed out
You can ignore that FW blurb, it's just a script that shouldn't be so noisy.

failover logs *look good*

bug-1020436$ grep fail_path multipathd_stdout.log | sort -u
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0 NF fail_path 8:112
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0 NF fail_path 8:160
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1 NF fail_path 8:128
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1 NF fail_path 8:176
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2 NF fail_path 8:144
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2 NF fail_path 8:192

# 3 mins go by... oh and several devices are failed "twice"

# Jul 10 15:10:19 devices start to return

Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0 NF fail_path 8:16
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0 NF fail_path 8:64
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1 NF fail_path 8:32
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1 NF fail_path 8:80
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2 NF fail_path 8:48
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2 NF fail_path 8:96

and according to udev, note that I re-sequenced the grep and grouped by mpathX.

bug-1020436$ grep -A4 PATH_FAILED multipathd_stdout.log

This all looks OK, you can see the paths count down and stop at the right count.
The way multibus works is a lo...

Alrighty.

[OK] lvm.conf device blacklist
[OK] multipath.conf, sans fix I already pointed out
You can ignore that FW blurb, it's just a script that shouldn't be so noisy.

failover logs *look good*

bug-1020436$ grep fail_path multipathd_stdout.log | sort -u
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0  NF  fail_path 8:112
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0  NF  fail_path 8:160
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1  NF  fail_path 8:128
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1  NF  fail_path 8:176
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2  NF  fail_path 8:144
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2  NF  fail_path 8:192

# 3 mins go by... oh and several devices are failed "twice"

# Jul 10 15:10:19 devices start to return

Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0  NF  fail_path 8:16
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0  NF  fail_path 8:64
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1  NF  fail_path 8:32
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1  NF  fail_path 8:80
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2  NF  fail_path 8:48
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2  NF  fail_path 8:96

and according to udev, note that I re-sequenced the grep and grouped by mpathX.

bug-1020436$ grep -A4 PATH_FAILED multipathd_stdout.log

This all looks OK, you can see the paths count down and stop at the right count.
The way multibus works is a lot like network bonding link aggregation, io is
simply steered down the remaining paths. What might have happened is the
LVs backed by this mpath device somehow got deactivated and that's why the
filesystem can't write back.

Try this for me please. Shutdown those paths and keep them down, then
after about a minute, run.
# lvs -o lv_attr
 
We should see "w" and "a" for every volume, if not, that's the problem.

Also, while in the "failed state", see if you can do a dd read from one
of the mpaths? like dd if=/dev/mapper/mpath0 of=/dev/null count=10.
If that works, try the same thing with a random sd device.

I don't see anything amiss in the strace. The SG ioctls look OK and
the ioctls made to DM make sense.

We want to determine at which level that sending block ios is hung up.

Changed in multipath-tools (Ubuntu):
status:	Confirmed → Incomplete

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-23:

#19

It looks like all the /dev/mapper/mpathX targets, and the remaining 'sd' unique paths (with one fabric disabled) were all readable after the path shutdown, but the underlying LVs were somehow still deactivated when the paths disappeared.

root@<hostname>-03:/srv/mysql# dmesg
<snip>
[1039812.298161] I/O error in filesystem ("dm-4") meta-data dev dm-4 block 0x1f414af0 ("xlog_iodone") error 5 buf count 3072
[1039812.311419] xfs_force_shutdown(dm-4,0x2) called from line 1043 of file /build/buildd/linux-2.6.32/fs/xfs/xfs_log.c. Return address = 0xffffffffa0162516
[1039812.311433] Filesystem "dm-4": Log I/O Error Detected. Shutting down filesystem: dm-4
[1039812.326038] Please umount the filesystem, and rectify the problem(s)

root@<hostname>:/srv/mysql# ls -l
ls: cannot open directory .: Input/output error

root@<hostname>:/srv/mysql# lvs -o lv_attr
  /dev/test2/lvol0: read failed after 0 of 2048 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/mvsan/san: read failed after 0 of 2048 at 0: Input/output error
  Attr
  -wi-ao
  -wi-ao
  -wi-ao
  -wi-ao
  -wi-ao
root@<hostname>:/srv/mysql# dd if=/dev/mapper/mpath
mpath0 mpath1 mpath2
root@<hostname>:/srv/mysql# dd if=/dev/mapper/mpath0 of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.0191405 s, 267 kB/s
root@<hostname>:/srv/mysql# dd if=/dev/mapper/mpath1 of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.00693051 s, 739 kB/s
root@<hostname>:/srv/mysql# dd if=/dev/mapper/mpath2 of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.00559024 s, 916 kB/s

-----

root@<hostname>:/srv/mysql# ls -l /dev/sd*
brw-rw---- 1 root disk 8, 0 2012-07-11 14:24 /dev/sda
brw-rw---- 1 root disk 8, 1 2012-07-11 14:24 /dev/sda1
brw-rw---- 1 root disk 8, 16 2012-07-11 14:24 /dev/sdb
brw-rw---- 1 root disk 8, 32 2012-07-11 14:24 /dev/sdc
brw-rw---- 1 root disk 8, 48 2012-07-11 14:24 /dev/sdd
brw-rw---- 1 root disk 8, 64 2012-07-11 14:24 /dev/sde
brw-rw---- 1 root disk 8, 80 2012-07-11 14:24 /dev/sdf
brw-rw---- 1 root disk 8, 96 2012-07-11 14:24 /dev/sdg
root@<hostname>:/srv/mysql# dd if=/dev/sdb of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 5.0139e-05 s, 102 MB/s
root@<hostname>:/srv/mysql# dd if=/dev/sdc of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.0726e-05 s, 126 MB/s
root@<hostname>:/srv/mysql# dd if=/dev/sdd of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.0935e-05 s, 125 MB/s
root@<hostname>:/srv/mysql# dd if=/dev/sde of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.2647e-05 s, 120 MB/s
root@<hostname>:/srv/mysql# dd if=/dev/sdf of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.1718e-05 s, 123 MB/s
root@<hostname>:/srv/mysql# dd if=/dev/sdg of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.3425e-05 s, 118 MB/s

root@<hostname>-03:/srv/mysql# dmesg
<snip>
[1039812.298161] I/O error in filesystem ("dm-4") meta-data dev dm-4 block 0x1f414af0       ("xlog_iodone") error 5 buf count 3072
[1039812.311419] xfs_force_shutdown(dm-4,0x2) called from line 1043 of file /build/buildd/linux-2.6.32/fs/xfs/xfs_log.c.  Return address = 0xffffffffa0162516
[1039812.311433] Filesystem "dm-4": Log I/O Error Detected.  Shutting down filesystem: dm-4
[1039812.326038] Please umount the filesystem, and rectify the problem(s)

root@<hostname>:/srv/mysql# ls -l
ls: cannot open directory .: Input/output error

root@<hostname>:/srv/mysql#  lvs -o lv_attr
  /dev/test2/lvol0: read failed after 0 of 2048 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/mvsan/san: read failed after 0 of 2048 at 0: Input/output error
  Attr  
  -wi-ao
  -wi-ao
  -wi-ao
  -wi-ao
  -wi-ao
root@<hostname>:/srv/mysql# dd if=/dev/mapper/mpath
mpath0  mpath1  mpath2  
root@<hostname>:/srv/mysql# dd if=/dev/mapper/mpath0 of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.0191405 s, 267 kB/s
root@<hostname>:/srv/mysql# dd if=/dev/mapper/mpath1 of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.00693051 s, 739 kB/s
root@<hostname>:/srv/mysql# dd if=/dev/mapper/mpath2 of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.00559024 s, 916 kB/s

-----

root@<hostname>:/srv/mysql# ls -l /dev/sd*
brw-rw---- 1 root disk 8,  0 2012-07-11 14:24 /dev/sda
brw-rw---- 1 root disk 8,  1 2012-07-11 14:24 /dev/sda1
brw-rw---- 1 root disk 8, 16 2012-07-11 14:24 /dev/sdb
brw-rw---- 1 root disk 8, 32 2012-07-11 14:24 /dev/sdc
brw-rw---- 1 root disk 8, 48 2012-07-11 14:24 /dev/sdd
brw-rw---- 1 root disk 8, 64 2012-07-11 14:24 /dev/sde
brw-rw---- 1 root disk 8, 80 2012-07-11 14:24 /dev/sdf
brw-rw---- 1 root disk 8, 96 2012-07-11 14:24 /dev/sdg
root@<hostname>:/srv/mysql# dd if=/dev/sdb of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 5.0139e-05 s, 102 MB/s
root@<hostname>:/srv/mysql# dd if=/dev/sdc of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.0726e-05 s, 126 MB/s
root@<hostname>:/srv/mysql# dd if=/dev/sdd of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.0935e-05 s, 125 MB/s
root@<hostname>:/srv/mysql# dd if=/dev/sde of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.2647e-05 s, 120 MB/s
root@<hostname>:/srv/mysql# dd if=/dev/sdf of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.1718e-05 s, 123 MB/s
root@<hostname>:/srv/mysql# dd if=/dev/sdg of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.3425e-05 s, 118 MB/s

Revision history for this message

Peter Petrakis (peter-petrakis) wrote on 2012-07-23:

#20

Download full text (3.4 KiB)

Hmm, we've got some counter indicators here.

lvs claims that the volumes are active. But the probe itself
is showing problems reading the volumes.

XFS is telling us that it cannot write it's journal to disk.
[1039812.311433] Filesystem "dm-4": Log I/O Error Detected. Shutting down filesystem: dm-4

"fs/xfs/xfs_rw.c"

90 /*
91 * Force a shutdown of the filesystem instantly while keeping
92 * the filesystem consistent. We don't do an unmount here; just shutdown
93 * the shop, make sure that absolutely nothing persistent happens to
94 * this filesystem after this point.
95 */
96 void
97 xfs_do_force_shutdown(
...
127 if (flags & SHUTDOWN_CORRUPT_INCORE) {
128 xfs_cmn_err(XFS_PTAG_SHUTDOWN_CORRUPT, CE_ALERT, mp,
129 "Corruption of in-memory data detected. Shutting down filesystem: %s",
130 mp->m_fsname);
131 if (XFS_ERRLEVEL_HIGH <= xfs_error_level) {
132 xfs_stack_trace();
133 }
134 } else if (!(flags & SHUTDOWN_FORCE_UMOUNT)) {
135 if (logerror) {
136 xfs_cmn_err(XFS_PTAG_SHUTDOWN_LOGERROR, CE_ALERT, mp,
137 "Log I/O Error Detected. Shutting down filesystem: %s",
138 mp->m_fsname);

The logs conveniently tell us where it was called from too.
1009 void
1010 xlog_iodone(xfs_buf_t *bp)
1011 {

...

1036 /*
1037 * Race to shutdown the filesystem if we see an error.
1038 */
1039 if (XFS_TEST_ERROR((XFS_BUF_GETERROR(bp)), l->l_mp,
1040 XFS_ERRTAG_IODONE_IOERR, XFS_RANDOM_IODONE_IOERR)) {
1041 xfs_ioerror_alert("xlog_iodone", l->l_mp, bp, XFS_BUF_ADDR(bp));
1042 XFS_BUF_STALE(bp);
1043 xfs_force_shutdown(l->l_mp, SHUTDOWN_LOG_IO_ERROR);
1044 /*
1045 * This flag will be propagated to the trans-committed
1046 * callback routines to let them know that the log-commit
1047 * didn't succeed.
1048 */
1049 aborted = XFS_LI_ABORTED;

I assume dm-4 is the LV that XFS is mounted on, did you run the dd test on that?

I'm starting to wonder if the LVM device filter is lying to us, after failover, something
changes which misrepresents the LV and then XFS bails out.

If you can perform that DD for every PV that backs dm-4 successfully then there's
something wrong with the DM map for those LVs after failover occurs.

OK, what I need from you now is a before and after (same fault injection method) of:
0) ls -lR /dev/ > dev_major_minor.log
1) lvs -o lv_attr
2) pvdisplay -vvv
3) lvdisplay -vvv
4) dmsetup table -v
5) "dd test" on all block devices: lv, mp, sd
6) dmesg output

Please attach this as a single tarball, that has a timestamp in the filename
and has a directory structure of:

foo.tgz
before/
after/

If this all checks out, then what's probably happening is the when
multipath begins the failover process, there's enough of a delay that
XFS simply bails out early before IO is ready to be sent down the
remaining paths. group by priority may ...

Hmm, we've got some counter indicators here.

lvs claims that the volumes are active. But the probe itself
is showing problems reading the volumes.

XFS is telling us that it cannot write it's journal to disk.
[1039812.311433] Filesystem "dm-4": Log I/O Error Detected. Shutting down filesystem: dm-4

"fs/xfs/xfs_rw.c"

90 /*
 91  * Force a shutdown of the filesystem instantly while keeping
 92  * the filesystem consistent. We don't do an unmount here; just shutdown
 93  * the shop, make sure that absolutely nothing persistent happens to
 94  * this filesystem after this point.
 95  */
 96 void
 97 xfs_do_force_shutdown(
...
127         if (flags & SHUTDOWN_CORRUPT_INCORE) {
128                 xfs_cmn_err(XFS_PTAG_SHUTDOWN_CORRUPT, CE_ALERT, mp,
129     "Corruption of in-memory data detected.  Shutting down filesystem: %s",
130                         mp->m_fsname);
131                 if (XFS_ERRLEVEL_HIGH <= xfs_error_level) {
132                         xfs_stack_trace();
133                 }
134         } else if (!(flags & SHUTDOWN_FORCE_UMOUNT)) {
135                 if (logerror) {
136                         xfs_cmn_err(XFS_PTAG_SHUTDOWN_LOGERROR, CE_ALERT, mp,
137                 "Log I/O Error Detected.  Shutting down filesystem: %s",
138                                 mp->m_fsname);

The logs conveniently tell us where it was called from too.
1009 void
1010 xlog_iodone(xfs_buf_t *bp)
1011 {

...

1036         /*
1037          * Race to shutdown the filesystem if we see an error.
1038          */
1039         if (XFS_TEST_ERROR((XFS_BUF_GETERROR(bp)), l->l_mp,
1040                         XFS_ERRTAG_IODONE_IOERR, XFS_RANDOM_IODONE_IOERR)) {
1041                 xfs_ioerror_alert("xlog_iodone", l->l_mp, bp, XFS_BUF_ADDR(bp));
1042                 XFS_BUF_STALE(bp);
1043                 xfs_force_shutdown(l->l_mp, SHUTDOWN_LOG_IO_ERROR);
1044                 /*
1045                  * This flag will be propagated to the trans-committed
1046                  * callback routines to let them know that the log-commit
1047                  * didn't succeed.
1048                  */
1049                 aborted = XFS_LI_ABORTED;

I assume dm-4 is the LV that XFS is mounted on, did you run the dd test on that?

I'm starting to wonder if the LVM device filter is lying to us, after failover, something
changes which misrepresents the LV and then XFS bails out.

If you can perform that DD for every PV that backs dm-4 successfully then there's
something wrong with the DM map for those LVs after failover occurs.

OK, what I need from you now is a before and after  (same fault injection method) of:
0) ls -lR /dev/ > dev_major_minor.log
1) lvs -o lv_attr
2) pvdisplay -vvv
3) lvdisplay -vvv
4) dmsetup table -v
5) "dd test" on all block devices: lv, mp, sd
6) dmesg output

Please attach this as a single tarball, that has a timestamp in the filename
and has a directory structure of:

foo.tgz
  before/
  after/

I looked at the XFS mount arguments and didn't find anything that would
make it more lenient in these situations.

If you can manage it, a LUN formatted with ext3 under these circumstances
would help in ruling out whether the filesystem is part of the problem.

Thanks.

Revision history for this message

Peter Petrakis (peter-petrakis) wrote on 2012-07-23:

#21

scratch the ext3 test, I just re-read the description.

"I've replicated this behaviour using both xfs and ext4 filesystems, on multiple
different luns presented to the server."

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-24:

#22

before and after tests Edit (58.5 KiB, application/x-tar)

Download full text (4.7 KiB)

Output from before and after commands is attached.

I'm pretty sure you're right about the LVM device filter; I figured that setting the scsi-timeout to 90 seconds (being way longer than the TUR path checker, which is scheduled every 20 seconds) should be enough to handle the device failover.

It's as if dm-4 is using sdk (for example), but when this changes, either the journal is somehow stuck trying to update via the failed path, or dm-4 gets removed from the virtual device list because sdk is no longer there, instead of being reinstated with a new path.

(Yes, dm-4 is where the xfs filesystem is (mpath0). The multipathed "dm-" entries are currently dm-5, dm-6 and dm-7.)

root@<hostname>:/etc# multipath -ll | grep dm-
mpath2 (360060e80058a070000008a0700000023) dm-7 HITACHI ,OPEN-V
mpath1 (360060e80058a070000008a070000000e) dm-5 HITACHI ,OPEN-V
mpath0 (360060e80058a070000008a0700002417) dm-6 HITACHI ,OPEN-V

root@<hostname>:/etc# pvscan
  PV /dev/mapper/mpath2 VG test2 lvm2 [4.00 GiB / 0 free]
  PV /dev/mapper/mpath0 VG mvsan lvm2 [500.00 GiB / 0 free]
  PV /dev/mapper/mpath1 VG test lvm2 [4.00 GiB / 0 free]
  PV /dev/sda1 VG rgrprod-proc03 lvm2 [135.97 GiB / 108.03 GiB free]
  Total: 4 [643.96 GiB] / in use: 4 [643.96 GiB] / in no VG: 0 [0 ]

/dev/mapper/test2-lvol0 on /tmp/test2 type xfs (rw)
/dev/mapper/test-lvol0 on /tmp/test type ext4 (rw)
/dev/mapper/mvsan-san on /srv/mysql type xfs (rw)

---

Interestingly, this is before failure:

root@<hostname>:/etc# find /sys/devices -name *dm-* -print | grep -v virtual | sort -t '/' --key=13
/sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-0
/sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-1
/sys/devices/pci0000:00/0000:00:09.0/0000:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:2/block/sdm/holders/dm-2
/sys/devices/pci0000:00/0000:00:09.0/0000:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:1/block/sdl/holders/dm-3
/sys/devices/pci0000:00/0000:00:09.0/0000:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:0/block/sdk/holders/dm-4
/sys/devices/pci0000:00/0000:00:07.0/0000:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:1/block/sdc/holders/dm-5
/sys/devices/pci0000:00/0000:00:07.0/0000:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:1/block/sdf/holders/dm-5
/sys/devices/pci0000:00/0000:00:09.0/0000:24:00.0/host6/rport-6:0-0/target6:0:0/6:0:0:1/block/sdi/holders/dm-5
/sys/devices/pci0000:00/0000:00:09.0/0000:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:1/block/sdl/holders/dm-5
/sys/devices/pci0000:00/0000:00:07.0/0000:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:0/block/sdb/holders/dm-6
/sys/devices/pci0000:00/0000:00:07.0/0000:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:0/block/sde/holders/dm-6
/sys/devices/pci0000:00/0000:00:09.0/0000:24:00.0/host6/rport-6:0-0/target6:0:0/6:0:0:0/block/sdh/holders/dm-6
/sys/devices/pci0000:00/0000:00:09.0/0000:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:0/block/sdk/holders/dm-6
/sys/devices/pci0000:00/0000:00:07.0/0000:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:2/block/sdd/holders/dm-7
/...

Output from before and after commands is attached.

(Yes, dm-4 is where the xfs filesystem is (mpath0).  The multipathed "dm-" entries are currently dm-5, dm-6 and dm-7.)

root@<hostname>:/etc# multipath -ll | grep dm-
mpath2 (360060e80058a070000008a0700000023) dm-7 HITACHI ,OPEN-V        
mpath1 (360060e80058a070000008a070000000e) dm-5 HITACHI ,OPEN-V        
mpath0 (360060e80058a070000008a0700002417) dm-6 HITACHI ,OPEN-V

root@<hostname>:/etc# pvscan
  PV /dev/mapper/mpath2   VG test2            lvm2 [4.00 GiB / 0    free]
  PV /dev/mapper/mpath0   VG mvsan            lvm2 [500.00 GiB / 0    free]
  PV /dev/mapper/mpath1   VG test             lvm2 [4.00 GiB / 0    free]
  PV /dev/sda1            VG rgrprod-proc03   lvm2 [135.97 GiB / 108.03 GiB free]
  Total: 4 [643.96 GiB] / in use: 4 [643.96 GiB] / in no VG: 0 [0   ]

/dev/mapper/test2-lvol0 on /tmp/test2 type xfs (rw)
/dev/mapper/test-lvol0 on /tmp/test type ext4 (rw)
/dev/mapper/mvsan-san on /srv/mysql type xfs (rw)

---

Interestingly, this is before failure:

After failure, that same listing looks like:

Which I suppose is to be expected - except I'm guessing that the dm-(2,3,4) devices shouldn't disappear - they should map to the "new" sdX paths via the physical devices /dev/mapper/mpath(0,1,2) ... or however that pathing failover is meant to work in LVM.

Revision history for this message

Peter Petrakis (peter-petrakis) wrote on 2012-07-24:

#23

Download full text (3.2 KiB)

It doesn't look like your lvm.conf filter is working.

(pvdisplay)

        /dev/sdb: block size is 4096 bytes
      /dev/sdb: lvm2 label detected
      Ignoring duplicate PV RLOhdDURbD7uK2La3MDK2olkP0BF2Tu7 on /dev/sdb - using dm /dev/mapper/mpath0
        Closed /dev/sdb
        Opened /dev/sdc RO O_DIRECT

the sd names shouldn't even be seen.

from the output of vgscan -vvv you should see something like this:

      /dev/ram15: No label detected
        Closed /dev/ram15
        /dev/sdc: Skipping (regex)
        /dev/sdc1: Skipping (regex)

Change your filter to what's documented in the multipath guide.

https://help.ubuntu.com/12.04/serverguide/multipath-devices.html#multipath-devices-in-logical-volumes

filter = [ "r/block/", "r/disk/", "r/sd.*/", "a/.*/" ]

Run vgscan -vvv after to verify, those pvdisplay errors should also go away.

If you want it to be extra quiet, you can filter out ram devices too, here's mine on an hp server
with the following filter.

filter = [ "r/block/", "r/ram.*/", "r/disk/", "r/cciss/", "r/sd.*/", "a/.*/" ]

The thing to know about sd names is that they are not deterministic, ever, so to keep sda as
part of your vg set you'll need to determine it's unique udev name and filter that in. Looking
back at your original filter, I can see that it's wrong now.

# By default we accept every block device:
filter = [ "a/.*/", "r|/dev/sd[b-z]|" ]

It should be

# By default we accept every block device:
filter = [ "a/.*/", "r/sd.[b-z]/" ]

Again, if you hotplug sda, or reverse scan your busses, this ceases filter accurately, use the
unique udev names.

Beyond that, everything looks good, all block devices are responding, that are left, after failover.

I tested failover using lvm + xfs using multipath 0.4.8 with both multibus and priority grouping, I
could not reproduce the issue. Load test was iozone -a in a loop. I just realized, you didn't describe
what load these were under during the failure, or how many failovers are required to reach the fault.

You're dmtables look sane.
-0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:64 1000 8:16 1000 8:96 1000 8:160 1000
+0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 2 1 8:64 1000 8:16 1000

You can tell that it's all in one group via the single round-robin directive, which marks the beginning
of all paths in that group, clearly you went from 4 -> 2. The fact that the map is updated
at all means multipathd did it's job. multipathd's main job is the patch checking, and managing the
policy we define for those paths by updating the dm tables as needed. DM then simply operates
on what it has for a table definition and keeps chugging along until that table changes.

back to the original multipathd output you provide DM_STATE=active and DM_SUSPENDED=0, that
map was never deactivated. I tried pushing my queue up to 1000 from 128 for rr_min_io, no change.

If it's not the lvm filter then it's something else very subtle. Unless you're comfortable with upgrading
to multipath 0.4.9 via apt pinning (done this before with lucid, very stable, and better) as I described in
comment #16, the only way we'll get to the bottom of this ...

It doesn't look like your lvm.conf filter is working.

(pvdisplay)

the sd names shouldn't even be seen.

from the output of vgscan -vvv you should see something like this:

/dev/ram15: No label detected
        Closed /dev/ram15
        /dev/sdc: Skipping (regex)
        /dev/sdc1: Skipping (regex)

Change your filter to what's documented in the multipath guide.

https://help.ubuntu.com/12.04/serverguide/multipath-devices.html#multipath-devices-in-logical-volumes

filter = [ "r/block/", "r/disk/", "r/sd.*/", "a/.*/" ]

Run vgscan -vvv after to verify, those pvdisplay errors should also go away.

If you want it to be extra quiet, you can filter out ram devices too, here's mine on an hp server
with the following filter.

filter = [ "r/block/", "r/ram.*/", "r/disk/", "r/cciss/", "r/sd.*/", "a/.*/" ]

# By default we accept every block device:
    filter = [ "a/.*/", "r|/dev/sd[b-z]|" ]

It should be

# By default we accept every block device:
    filter = [ "a/.*/", "r/sd.[b-z]/" ]

Again, if you hotplug sda, or reverse scan your busses, this ceases filter accurately, use the 
unique udev names.

Beyond that, everything looks good, all block devices are responding, that are left, after failover.

You're dmtables look sane.
-0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:64 1000 8:16 1000 8:96 1000 8:160 1000 
+0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 2 1 8:64 1000 8:16 1000

You can tell that it's all in one group via the single round-robin directive, which marks the beginning
of all paths in that group, clearly you went from 4 -> 2. The fact that the map is updated
at all means multipathd did it's job.  multipathd's main job is the patch checking, and managing the
policy we define for those paths by updating the dm tables as needed. DM then simply operates
on what it has for a table definition and keeps chugging along until that table changes.

back to the original multipathd output you provide DM_STATE=active and DM_SUSPENDED=0, that
map was never deactivated. I tried pushing my queue up to 1000 from 128 for rr_min_io, no change.

Revision history for this message

Steve Fisher (stefan-nicolaivic) wrote on 2012-07-27:

#24

The root cause was in fact the lvm.conf filter, but explicitly not for the reason you'd think.

The issue is that if I added "a|.*|" into regex array, it was ignoring my 'sd[b-z]', loop and ram exclusions, both singly and in combination.

It seems to be an obscure issue with the use of square brackets inside the regex in certain ways.

Eg if I use the line:

filter = [ "a|.*|", "r|.*|" ]
or
filter = [ "a|.*|", "r|loop|" ]

Then this filters _everything_ in the first example, or just loop devices respectively... just as you would expect.

However as soon as I use something like:

filter = [ "a|.*|", "r|loop[0-9]|" ]

Then I don't get any filtering at all... except that defining _only_ the removal filters, per:

filter = [ "r|loop[0-9]+|" ]

.. and other filters of that ilk DO work as expected -- so long as I remove the "accept-all" regex.

I'm not sure if this parsing behaviour is intended -- it could be to do with the way the array is encapsulated.

Even this line works as expected:

filter = [ "a|.*|", "r|sd.[b-z]|" ]

-- Possibly since placing the the match-any-single-character (".") in there changes the behaviour.

The filter currently in use works as expected and looks like:

filter = [ "r|sd[b-z]|", "r|ram[0-9]+|", "r|loop[0-9]+|", ]

After update-initramfs and reboot, I can successfully disable paths, and the failover works perfectly.

Your insight is much appreciated, I would never have expected /dev/mapper/mpath0 to become aliased to a single disk target.

I suspect that what has actually happened is that things were working perfectly for a time, but then broke upon a reboot of the server.

Ie, due to the lack of filtering, LVM jumbled the detected volumes at boot time (due to the way /dev is traversed) such that mpath0 was actually pointing at a fixed location eg /dev/sdb rather than being multipathed as expected.

Peter Petrakis (peter-petrakis) on 2012-07-27

Changed in multipath-tools (Ubuntu):
status:	Incomplete → Invalid

Revision history for this message

Peter Petrakis (peter-petrakis) wrote on 2012-07-27:

#25

Could you please file a new bug against LVM/lucid (just add a tag 'lucid') detailing
this behavior? Thanks.

I've never been a fan of LVM's filtering mechanism, then again the source is
there, I could do something about it :)

What you described here:
"""
However as soon as I use something like:

filter = [ "a|.*|", "r|loop[0-9]|" ]

Then I don't get any filtering at all... except that defining _only_ the removal filters, per:

filter = [ "r|loop[0-9]+|" ]

.. and other filters of that ilk DO work as expected -- so long as I remove the "accept-all" regex.
"""

Sounds like a bug to me. The whole point of this filter-in, filter-out mechanism is
to perform operations that exclude everything by default, and then allow only certain
devices in. Just like you would configure a firewall.

As for multipath aliasing, it happens more often than I would like, which is why I
made such a point to include it into the official documentation. Once you start
stacking block devices it can get complicated very quickly. We should probably
expand the docs to include udev alias examples, as configurations like yours are the
second most common e.g. disk 1 not part of VG but everything else is.

Glad we could get you sorted. I really appreciate the level of detail and responsiveness
you were able to provide.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntumultipath-tools package

Cannot read superblock after FC multipath failover

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
multipath-tools package