Output from before and after commands is attached.
I'm pretty sure you're right about the LVM device filter; I figured that setting the scsi-timeout to 90 seconds (being way longer than the TUR path checker, which is scheduled every 20 seconds) should be enough to handle the device failover.
It's as if dm-4 is using sdk (for example), but when this changes, either the journal is somehow stuck trying to update via the failed path, or dm-4 gets removed from the virtual device list because sdk is no longer there, instead of being reinstated with a new path.
--
(Yes, dm-4 is where the xfs filesystem is (mpath0). The multipathed "dm-" entries are currently dm-5, dm-6 and dm-7.)
/dev/mapper/test2-lvol0 on /tmp/test2 type xfs (rw)
/dev/mapper/test-lvol0 on /tmp/test type ext4 (rw)
/dev/mapper/mvsan-san on /srv/mysql type xfs (rw)
Which I suppose is to be expected - except I'm guessing that the dm-(2,3,4) devices shouldn't disappear - they should map to the "new" sdX paths via the physical devices /dev/mapper/mpath(0,1,2) ... or however that pathing failover is meant to work in LVM.
Output from before and after commands is attached.
I'm pretty sure you're right about the LVM device filter; I figured that setting the scsi-timeout to 90 seconds (being way longer than the TUR path checker, which is scheduled every 20 seconds) should be enough to handle the device failover.
It's as if dm-4 is using sdk (for example), but when this changes, either the journal is somehow stuck trying to update via the failed path, or dm-4 gets removed from the virtual device list because sdk is no longer there, instead of being reinstated with a new path.
--
(Yes, dm-4 is where the xfs filesystem is (mpath0). The multipathed "dm-" entries are currently dm-5, dm-6 and dm-7.)
root@<hostname> :/etc# multipath -ll | grep dm- 70000008a070000 0023) dm-7 HITACHI ,OPEN-V 70000008a070000 000e) dm-5 HITACHI ,OPEN-V 70000008a070000 2417) dm-6 HITACHI ,OPEN-V
mpath2 (360060e80058a0
mpath1 (360060e80058a0
mpath0 (360060e80058a0
root@<hostname> :/etc# pvscan
PV /dev/mapper/mpath2 VG test2 lvm2 [4.00 GiB / 0 free]
PV /dev/mapper/mpath0 VG mvsan lvm2 [500.00 GiB / 0 free]
PV /dev/mapper/mpath1 VG test lvm2 [4.00 GiB / 0 free]
PV /dev/sda1 VG rgrprod-proc03 lvm2 [135.97 GiB / 108.03 GiB free]
Total: 4 [643.96 GiB] / in use: 4 [643.96 GiB] / in no VG: 0 [0 ]
/dev/mapper/ test2-lvol0 on /tmp/test2 type xfs (rw) test-lvol0 on /tmp/test type ext4 (rw) mvsan-san on /srv/mysql type xfs (rw)
/dev/mapper/
/dev/mapper/
---
Interestingly, this is before failure:
root@<hostname> :/etc# find /sys/devices -name *dm-* -print | grep -v virtual | sort -t '/' --key=13 pci0000: 00/0000: 00:1c.0/ 0000:01: 00.0/host4/ target4: 2:0/4:2: 0:0/block/ sda/sda1/ holders/ dm-0 pci0000: 00/0000: 00:1c.0/ 0000:01: 00.0/host4/ target4: 2:0/4:2: 0:0/block/ sda/sda1/ holders/ dm-1 pci0000: 00/0000: 00:09.0/ 0000:24: 00.0/host6/ rport-6: 0-1/target6: 0:1/6:0: 1:2/block/ sdm/holders/ dm-2 pci0000: 00/0000: 00:09.0/ 0000:24: 00.0/host6/ rport-6: 0-1/target6: 0:1/6:0: 1:1/block/ sdl/holders/ dm-3 pci0000: 00/0000: 00:09.0/ 0000:24: 00.0/host6/ rport-6: 0-1/target6: 0:1/6:0: 1:0/block/ sdk/holders/ dm-4 pci0000: 00/0000: 00:07.0/ 0000:1f: 00.0/host5/ rport-5: 0-0/target5: 0:0/5:0: 0:1/block/ sdc/holders/ dm-5 pci0000: 00/0000: 00:07.0/ 0000:1f: 00.0/host5/ rport-5: 0-1/target5: 0:1/5:0: 1:1/block/ sdf/holders/ dm-5 pci0000: 00/0000: 00:09.0/ 0000:24: 00.0/host6/ rport-6: 0-0/target6: 0:0/6:0: 0:1/block/ sdi/holders/ dm-5 pci0000: 00/0000: 00:09.0/ 0000:24: 00.0/host6/ rport-6: 0-1/target6: 0:1/6:0: 1:1/block/ sdl/holders/ dm-5 pci0000: 00/0000: 00:07.0/ 0000:1f: 00.0/host5/ rport-5: 0-0/target5: 0:0/5:0: 0:0/block/ sdb/holders/ dm-6 pci0000: 00/0000: 00:07.0/ 0000:1f: 00.0/host5/ rport-5: 0-1/target5: 0:1/5:0: 1:0/block/ sde/holders/ dm-6 pci0000: 00/0000: 00:09.0/ 0000:24: 00.0/host6/ rport-6: 0-0/target6: 0:0/6:0: 0:0/block/ sdh/holders/ dm-6 pci0000: 00/0000: 00:09.0/ 0000:24: 00.0/host6/ rport-6: 0-1/target6: 0:1/6:0: 1:0/block/ sdk/holders/ dm-6 pci0000: 00/0000: 00:07.0/ 0000:1f: 00.0/host5/ rport-5: 0-0/target5: 0:0/5:0: 0:2/block/ sdd/holders/ dm-7 pci0000: 00/0000: 00:07.0/ 0000:1f: 00.0/host5/ rport-5: 0-1/target5: 0:1/5:0: 1:2/block/ sdg/holders/ dm-7 pci0000: 00/0000: 00:09.0/ 0000:24: 00.0/host6/ rport-6: 0-0/target6: 0:0/6:0: 0:2/block/ sdj/holders/ dm-7 pci0000: 00/0000: 00:09.0/ 0000:24: 00.0/host6/ rport-6: 0-1/target6: 0:1/6:0: 1:2/block/ sdm/holders/ dm-7
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
After failure, that same listing looks like:
root@<hostname> :/etc# find /sys/devices -name *dm-* -print | grep -v virtual | sort -t '/' --key=13 pci0000: 00/0000: 00:1c.0/ 0000:01: 00.0/host4/ target4: 2:0/4:2: 0:0/block/ sda/sda1/ holders/ dm-0 pci0000: 00/0000: 00:1c.0/ 0000:01: 00.0/host4/ target4: 2:0/4:2: 0:0/block/ sda/sda1/ holders/ dm-1 pci0000: 00/0000: 00:07.0/ 0000:1f: 00.0/host5/ rport-5: 0-0/target5: 0:0/5:0: 0:1/block/ sdc/holders/ dm-5 pci0000: 00/0000: 00:07.0/ 0000:1f: 00.0/host5/ rport-5: 0-1/target5: 0:1/5:0: 1:1/block/ sdf/holders/ dm-5 pci0000: 00/0000: 00:07.0/ 0000:1f: 00.0/host5/ rport-5: 0-0/target5: 0:0/5:0: 0:0/block/ sdb/holders/ dm-6 pci0000: 00/0000: 00:07.0/ 0000:1f: 00.0/host5/ rport-5: 0-1/target5: 0:1/5:0: 1:0/block/ sde/holders/ dm-6 pci0000: 00/0000: 00:07.0/ 0000:1f: 00.0/host5/ rport-5: 0-0/target5: 0:0/5:0: 0:2/block/ sdd/holders/ dm-7 pci0000: 00/0000: 00:07.0/ 0000:1f: 00.0/host5/ rport-5: 0-1/target5: 0:1/5:0: 1:2/block/ sdg/holders/ dm-7
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
Which I suppose is to be expected - except I'm guessing that the dm-(2,3,4) devices shouldn't disappear - they should map to the "new" sdX paths via the physical devices /dev/mapper/ mpath(0, 1,2) ... or however that pathing failover is meant to work in LVM.