Ok, so I did some more testing. It appears that the problem isn't specific to the dev_loss_tmo and fast_io_fail_tmo setting. This is evidenced by the terminal log below. In multipath.conf (which we know for certain is being read, as the created multipath map gets the correct alias), I instruct it to use the ALUA hardware handler for all devices. However, for some reason, this is ignored, and the EMC hardware handler is used instead:
This does *NOT* happen on RHEL-based distros - on those, changing the hardware_handler in multipath.conf in this way works as expected.
So why does it use the EMC hardware_handler? Well, there's a built-in default device section that matches the array in question. So this appears to override my user-specified config from multipath.conf:
It would appear that for some reason, in order to override default device settings in Ubuntu there must be an *exact* string match between the user-supplied «vendor» and «product» settings. If I change e.g. «product» in multipath.conf to ".*.*", then it starts using the built-in defaults again, ignoring multipath.conf. I consider this behaviour very dangerous - consider that if the admin has a working config (due to exact matching vendor/product settings), and then the package gets updated and extends the built-in defaults to incorporate some new model matching the same profile/settings). At this point the admin's working config will stop being used, possibly causing disruptive problems. I therefore strongly suggest you figure out why it behaves differently in Ubuntu and RHEL, and adopt the RHEL behaviour which really is the only sensible one.
In any case, now that I know how to ensure my multipath.conf settings are being used, I re-tried adding dev_loss_tmo and fast_io_fail_tmo, but it still doesn't work:
The *_tmo settings were read and understood by the config file parser, as I can see them occur in the output from «multipathd -k'show config'». It is also clear that they are recognised as supported options, because if I add another «foo» option with the value of «bar» right below them, that one does *not* show up in «multipathd -k'show config'» - so it's clear the config parser doesn't just blindly read in any settings it encounters.
So it clearly does not work. In any case, if you need it I'd be happy to give you access to this test machine so you can see for yourself, Mathieu. Find me on the NetworkManager IRC channel if you're interested in that.
Ok, so I did some more testing. It appears that the problem isn't specific to the dev_loss_tmo and fast_io_fail_tmo setting. This is evidenced by the terminal log below. In multipath.conf (which we know for certain is being read, as the created multipath map gets the correct alias), I instruct it to use the ALUA hardware handler for all devices. However, for some reason, this is ignored, and the EMC hardware handler is used instead:
===== osl2:~# cat /etc/multipath.conf
vendor ".*"
product ".*"
hardware_ handler "1 alua"
root@ucstest-
devices {
device {
}
}
multipaths {
wwid 3600601603a7132 0022967e0a1f38e 411
alias bootvolume osl2:~# multipath -v 2 20022967e0a1f38 e411) undef DGC,VRAID
multipath {
}
}
root@ucstest-
create: bootvolume (3600601603a713
size=50G features='1 queue_if_no_path' hwhandler='1 emc' wp=undef
|-+- policy='round-robin 0' prio=1 status=undef
| |- 0:0:0:0 sda 8:0 undef ready running
| `- 1:0:1:0 sdd 8:48 undef ready running
`-+- policy='round-robin 0' prio=0 status=undef
|- 0:0:1:0 sdb 8:16 undef ready running
`- 1:0:0:0 sdc 8:32 undef ready running
=====
This does *NOT* happen on RHEL-based distros - on those, changing the hardware_handler in multipath.conf in this way works as expected.
So why does it use the EMC hardware_handler? Well, there's a built-in default device section that matches the array in question. So this appears to override my user-specified config from multipath.conf:
===== osl2:~# multipathd -k'show config' | grep -B10 -A4 '1 emc' grouping_ policy group_by_prio
root@ucstest-
device {
vendor "DGC"
product ".*"
product_blacklist "LUNZ"
path_
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_selector round-robin 0
path_checker emc_clariion
checker emc_clariion
features "1 queue_if_no_path"
hardware_handler "1 emc"
prio emc
failback immediate
no_path_retry 60
}
=====
If I copy the entire default device config into /etc/multipath.conf and only change the hardware_handler setting, then it starts working:
===== osl2:~# cat /etc/multipath.conf
vendor "DGC"
product ".*"
product_ blacklist "LUNZ"
path_ grouping_ policy group_by_prio
getuid_ callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_ selector "round-robin 0"
path_ checker emc_clariion
checker emc_clariion
features "1 queue_if_no_path"
hardware_ handler "1 alua"
prio emc
failback immediate
no_path_ retry 60
root@ucstest-
devices {
device {
}
}
multipaths {
wwid 3600601603a7132 0022967e0a1f38e 411
alias bootvolume osl2:~# multipath -v 2 20022967e0a1f38 e411) undef DGC,VRAID
multipath {
}
}
root@ucstest-
create: bootvolume (3600601603a713
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=undef
|-+- policy='round-robin 0' prio=1 status=undef
| |- 0:0:0:0 sda 8:0 undef ready running
| `- 1:0:1:0 sdd 8:48 undef ready running
`-+- policy='round-robin 0' prio=0 status=undef
|- 0:0:1:0 sdb 8:16 undef ready running
`- 1:0:0:0 sdc 8:32 undef ready running
=====
It would appear that for some reason, in order to override default device settings in Ubuntu there must be an *exact* string match between the user-supplied «vendor» and «product» settings. If I change e.g. «product» in multipath.conf to ".*.*", then it starts using the built-in defaults again, ignoring multipath.conf. I consider this behaviour very dangerous - consider that if the admin has a working config (due to exact matching vendor/product settings), and then the package gets updated and extends the built-in defaults to incorporate some new model matching the same profile/settings). At this point the admin's working config will stop being used, possibly causing disruptive problems. I therefore strongly suggest you figure out why it behaves differently in Ubuntu and RHEL, and adopt the RHEL behaviour which really is the only sensible one.
In any case, now that I know how to ensure my multipath.conf settings are being used, I re-tried adding dev_loss_tmo and fast_io_fail_tmo, but it still doesn't work:
===== osl2:~# cat /etc/multipath.conf
vendor "DGC"
product ".*"
product_ blacklist "LUNZ"
path_ grouping_ policy group_by_prio
getuid_ callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_ selector "round-robin 0"
path_ checker emc_clariion
checker emc_clariion
features "1 queue_if_no_path"
hardware_ handler "1 alua"
prio emc
failback immediate
no_path_ retry 60
fast_ io_fail_ tmo 3
dev_loss_ tmo 2147483647
root@ucstest-
devices {
device {
}
}
multipaths {
wwid 3600601603a7132 0022967e0a1f38e 411
alias bootvolume osl2:~# multipath -v 2 fc_remote_ ports/rport- 0:0-1/dev_ loss_tmo 20022967e0a1f38 e411) undef DGC,VRAID osl2:~# grep . /sys/class/ fc_remote_ ports/rport- */*tmo fc_remote_ ports/rport- 0:0-0/dev_ loss_tmo: 30 fc_remote_ ports/rport- 0:0-0/fast_ io_fail_ tmo:off fc_remote_ ports/rport- 0:0-1/dev_ loss_tmo: 30 fc_remote_ ports/rport- 0:0-1/fast_ io_fail_ tmo:off fc_remote_ ports/rport- 0:0-2/dev_ loss_tmo: 30 fc_remote_ ports/rport- 0:0-2/fast_ io_fail_ tmo:off fc_remote_ ports/rport- 1:0-0/dev_ loss_tmo: 30 fc_remote_ ports/rport- 1:0-0/fast_ io_fail_ tmo:off fc_remote_ ports/rport- 1:0-1/dev_ loss_tmo: 30 fc_remote_ ports/rport- 1:0-1/fast_ io_fail_ tmo:off fc_remote_ ports/rport- 1:0-2/dev_ loss_tmo: 30 fc_remote_ ports/rport- 1:0-2/fast_ io_fail_ tmo:off
multipath {
}
}
root@ucstest-
Aug 29 10:39:57 | bootvolume failed to set /class/
create: bootvolume (3600601603a713
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=undef
|-+- policy='round-robin 0' prio=1 status=undef
| |- 0:0:0:0 sda 8:0 undef ready running
| `- 1:0:1:0 sdd 8:48 undef ready running
`-+- policy='round-robin 0' prio=0 status=undef
|- 0:0:1:0 sdb 8:16 undef ready running
`- 1:0:0:0 sdc 8:32 undef ready running
root@ucstest-
/sys/class/
/sys/class/
/sys/class/
/sys/class/
/sys/class/
/sys/class/
/sys/class/
/sys/class/
/sys/class/
/sys/class/
/sys/class/
/sys/class/
=====
The *_tmo settings were read and understood by the config file parser, as I can see them occur in the output from «multipathd -k'show config'». It is also clear that they are recognised as supported options, because if I add another «foo» option with the value of «bar» right below them, that one does *not* show up in «multipathd -k'show config'» - so it's clear the config parser doesn't just blindly read in any settings it encounters.
So it clearly does not work. In any case, if you need it I'd be happy to give you access to this test machine so you can see for yourself, Mathieu. Find me on the NetworkManager IRC channel if you're interested in that.
Tore