DevLossTO, FastIoFailTO settings do not match multipath.conf expected values

Bug #1435706 reported by bugproxy on 2015-03-24
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
multipath-tools (Ubuntu)
Medium
Mathieu Trudel-Lapierre
Trusty
Medium
Mathieu Trudel-Lapierre
Vivid
Medium
Mathieu Trudel-Lapierre

Bug Description

[Impact]
This bug impacts multipath users who need to tweak timeout values for DevLoss and FastIoFail for performance reasons.

[Test Case]
On a multipath system, attempt to modify DevLossTO or FastIoFailTO, then verify that the values got applied with 'multipath -l'. See below.

[Regression Potential]
Users who have already modified these values but have not noticed they did not properly apply may notice a change in behavior on device failure.

---

Problem Description
=========================================
DevLossTO, FastIoFailTO settings do not match multipath.conf expected values

---uname output---
Linux ilp1fc85apA4.tuc.stglabs.ibm.com 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:09:21 UTC 2014 ppc64le ppc64le ppc64le GNU/Linuxuname -m

Machine Type = p7 8247

Steps to Reproduce
===================================
 Verify DevLossTO, FastIoFailTO setting match multipath.conf expected values

== Comment: #31 - Thadeu Lima De Souza Cascardo <email address hidden> - 2015-03-20 10:57:20 ==
OK.

From the point of view of multipathd, everything seems correct, by looking at the logs.

I even parsed syslog and the output of getHBAInfo in order to find inconsistencies, and the inconsistency is between what multipathd logged as configured for a given target, and what its rport reports at getHBAInfo.

So, either multipathd is not configuring the timeouts even though it has the right configuration, or something else is changing those timeouts.

The other problem is that multipathd does not include the dev_loss_tmo configuration for 2145 as can be seen from list config. So, it could be not parsing the configuration correctly, or there could be a problem with the configuration.

At this point, to move forward, I would like to take a look at your system, and try reconfigure and looking at some strace output of multipathd, to check for writes into sysfs.

== Comment: #34 - Thadeu Lima De Souza Cascardo <email address hidden> - 2015-03-20 15:56:46 ==
OK, so I investigated in the system and read some of the code and checked changelog.

It looks like Ubuntu is shipping a fairly old version of multipath-tools, which is understandable, since multipath-tools is not very good in doing frequent releases, so one needs to either ship a version closer to upstream git or include its own large set of patches.

One of the patches missing is the one attached next. Without that, any devices included in the built-in hardware table will have some of its attributes from the config file ignored. That is the case with 2145. So, we lose the dev_loss_tmo setting for that device.

Cascardo.

== Comment: #38 - Thadeu Lima De Souza Cascardo <email address hidden> - 2015-03-20 16:25:39 ==
The bug this patch fixes would explain why fast_io_fail_tmo is not correctly set in some cases, but not dev_loss_tmo. So, probably, there is another missing patch here. I would like to experiment with the two patches I mentioned, however. Let's try to do this on Monday?

Cascardo.

Default Comment by Bridge

tags: added: architecture-ppc64 bugnameltc-122015 severity-medium targetmilestone-inin---

Default Comment by Bridge

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1435706/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Luciano Chavez (lnx1138) on 2015-03-31
affects: ubuntu → multipath-tools (Ubuntu)

------- Comment From <email address hidden> 2015-04-13 15:04 EDT-------
In order to apply "fix setting timeouts", one needs to apply first:

6888db0777e46ff057de5a48e522a5ac573f6115
Remove sysfs_attr cache

I would not apply the following commit in order to fix application of "do not ignore some attributes from config file". I would simple remove the reference from minio_rq when resolving the conflict. In any case, here is the patch which introduces minio_rq.

2b68b839565e38d8b73f1ec79cc6c84f7f3bade4
Support different 'minio' values for rq and bio based dm-multipath

Regards.
Cascardo.

Default Comment by Bridge

bugproxy (bugproxy) on 2015-06-08
tags: added: architecture-ppc64le
removed: architecture-ppc64
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in multipath-tools (Ubuntu):
status: New → Confirmed

------- Comment on attachment From <email address hidden> 2015-06-23 13:14 EDT-------

Hi Canonical,

This patch incorporates the upstream commits in order to apply the fibrechannel dev_loss and fast_io_fail timeout attributes from multipath.conf into sysfs.

It targets the 14.04.x LTS series, but it should apply fine to 14.10 and 15.04 (except for debian/changelog context lines, obviously). No need for 15.10 which should get a multipath-tools upgrade that includes the commits.

Thanks!

------- Comment on attachment From <email address hidden> 2015-06-23 13:18 EDT-------

> This patch incorporates the upstream commits in order to apply the
> fibrechannel dev_loss and fast_io_fail timeout attributes from
> multipath.conf into sysfs.

With the patch applied, the fc timeout attributes correctly propagate through multipath verbose logs, and the sysfs attributes -- verification procedure attached.

Changed in multipath-tools (Ubuntu):
status: Confirmed → In Progress
Changed in multipath-tools (Ubuntu Vivid):
status: New → Confirmed
importance: Undecided → Medium
Changed in multipath-tools (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
Changed in multipath-tools (Ubuntu Vivid):
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
Changed in multipath-tools (Ubuntu Trusty):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)

Digging deeper, looks like the DevLossTO and FastIoFailTO settings fixes are already in 0.5.0; so the changes are in wily.

Moving on to SRU this to vivid, trusty.

Changed in multipath-tools (Ubuntu):
status: In Progress → Fix Released
Changed in multipath-tools (Ubuntu Vivid):
status: Confirmed → In Progress
description: updated

Hello bugproxy, or anyone else affected,

Accepted multipath-tools into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in multipath-tools (Ubuntu Trusty):
status: Confirmed → Fix Committed
tags: added: verification-needed
Changed in multipath-tools (Ubuntu Vivid):
status: In Progress → Fix Committed
Chris J Arges (arges) wrote :

Hello bugproxy, or anyone else affected,

Accepted multipath-tools into vivid-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu12.15.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

bugproxy (bugproxy) on 2015-07-17
tags: added: targetmilestone-inin14043
removed: targetmilestone-inin---
Adam Conrad (adconrad) wrote :

Hello bugproxy, or anyone else affected,

Accepted multipath-tools into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Hi,
I don't have the hardware to verify this quickly handy now, but I verified the interdiff between what's in trusty-proposed and the debdiff I submitted in comment #8 and verified in comment #9 for the 0020-0023 patches, and changes are only in diff headers and hunks' @@ function strings.
Looks good. If a report on the real thing is really required, just let me know.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package multipath-tools - 0.4.9-3ubuntu7.4

---------------
multipath-tools (0.4.9-3ubuntu7.4) trusty; urgency=medium

  * Remove 0024-ignore-usb.patch: Ignore USB devices. Verification fails
    for this fix; it needs more work.

multipath-tools (0.4.9-3ubuntu7.3) trusty; urgency=medium

  * Added debian/patches/0015-shared-lock-for-udev.patch (LP: #1431650)
  * Support disks with non 512-byte sectors (LP: #1441930)
  * Correctly write FC timeout attributes to sysfs. (LP: #1435706)
  * Ignore USB devices. (LP: #1468897)

 -- Mathieu Trudel-Lapierre <email address hidden> Mon, 27 Jul 2015 13:48:39 -0400

Changed in multipath-tools (Ubuntu Trusty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for multipath-tools has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Tore Anderson (toreanderson) wrote :

To me fix doesn't actually appear to work. After upgrading to multipath-tools 0.4.9-3ubuntu7.4on an amd64 trusty and rebooting, the fast_io_fail_tmo and dev_loss_tmo values do not get written to sysfs:

$ grep . /sys/class/fc_remote_ports/*/*_tmo
/sys/class/fc_remote_ports/rport-2:0-0/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-2:0-0/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-2:0-1/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-2:0-1/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-3:0-0/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-3:0-0/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-3:0-1/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-3:0-1/fast_io_fail_tmo:off

The device stanza from multipath.conf contains the following:

 device {
  vendor "DGC|EMC"
  product "RAID [0-9]*|VRAID|SYMMETRIX.*"
  path_grouping_policy group_by_prio
  getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
  path_selector "round-robin 0"
  path_checker emc_clariion
  features "0"
  hardware_handler "1 emc"
  prio emc
  failback immediate
  rr_weight uniform
  no_path_retry queue
  rr_min_io 100
  fast_io_fail_tmo 3
  dev_loss_tmo 2147483647
 }

FWIW, I can manually set the sysfs settings to the desired values:

$ echo 3 | sudo tee /sys/class/fc_remote_ports/rport-*/fast_io_fail_tmo
3
$ echo 2147483647 | sudo tee /sys/class/fc_remote_ports/rport-*/dev_loss_tmo
2147483647
$ grep . /sys/class/fc_remote_ports/*/*_tmo
/sys/class/fc_remote_ports/rport-2:0-0/dev_loss_tmo:2147483647
/sys/class/fc_remote_ports/rport-2:0-0/fast_io_fail_tmo:3
/sys/class/fc_remote_ports/rport-2:0-1/dev_loss_tmo:2147483647
/sys/class/fc_remote_ports/rport-2:0-1/fast_io_fail_tmo:3
/sys/class/fc_remote_ports/rport-3:0-0/dev_loss_tmo:2147483647
/sys/class/fc_remote_ports/rport-3:0-0/fast_io_fail_tmo:3
/sys/class/fc_remote_ports/rport-3:0-1/dev_loss_tmo:2147483647
/sys/class/fc_remote_ports/rport-3:0-1/fast_io_fail_tmo:3

Tore

Changed in multipath-tools (Ubuntu Trusty):
milestone: none → trusty-updates
Changed in multipath-tools (Ubuntu Vivid):
milestone: none → vivid-updates
Mathew Hodson (mathew-hodson) wrote :

Reopening the Trusty task based on https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1468897/comments/15 and comment #17 from this bug, which suggests this fix isn't complete.

Changed in multipath-tools (Ubuntu Trusty):
status: Fix Released → Triaged
tags: added: verification-needed
removed: verification-done
Tore Anderson (toreanderson) wrote :
Download full text (6.6 KiB)

I verified that this bug is *NOT* fixed by trying the exact identical configuration (which is as minimal as possible) both with Ubuntu Trusty and with Scientific Linux 6 (RHEL6 clone). The test machine is a Cisco B200M2 blade server, using the Cisco VIC FCoE HBA (fnic.ko driver). The storage array is an EMC VNX5300, which is reached via FCoE (inside the Cisco UCS infrastructure) and then traditional FC fabric.

The following console output is taken with Trusty installed. Note that it was fully upgraded. After creating /etc/multipath.conf with the indicated contents, update-initramfs was run and the system rebooted, just to make sure the settings had taken effect. As you can see from the output, the dev_loss_tmo and fast_io_fail_tmo settings are *NOT* applied:

=-=-=-=-=-=-=-=
tore@ucstest-osl2:~$ cat /etc/multipath.conf
devices {
        device {
                vendor ".*"
                product ".*"
                fast_io_fail_tmo 3
                dev_loss_tmo 2147483647
        }
}

multipaths {
        multipath {
                wwid 3600601603a71320022967e0a1f38e411
                alias bootvolume
        }
}
tore@ucstest-osl2:~$ sudo multipath -ll
bootvolume (3600601603a71320022967e0a1f38e411) dm-0 DGC,VRAID
size=50G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 0:0:1:0 sdb 8:16 active ready running
| `- 1:0:1:0 sdd 8:48 active ready running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:0:0 sdc 8:32 active ready running
  `- 0:0:0:0 sda 8:0 active ready running
tore@ucstest-osl2:~$ grep . /sys/class/fc_remote_ports/rport-*/*tmo
/sys/class/fc_remote_ports/rport-0:0-0/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-0:0-0/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-0:0-1/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-0:0-1/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-0:0-2/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-0:0-2/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-1:0-0/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-1:0-1/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-1:0-2/fast_io_fail_tmo:off
tore@ucstest-osl2:~$ uname -r
3.13.0-62-generic
tore@ucstest-osl2:~$ md5sum /etc/multipath.conf
27a62898e80a0bcd7e62b5f2e8d675ff /etc/multipath.conf
tore@ucstest-osl2:~$ echo 3 | sudo tee /sys/class/fc_remote_ports/rport-*/fast_io_fail_tmo
3
tore@ucstest-osl2:~$ echo 2147483647 | sudo tee /sys/class/fc_remote_ports/rport-*/dev_loss_tmo
2147483647
tore@ucstest-osl2:~$ grep . /sys/class/fc_remote_ports/rport-*/*tmo
/sys/class/fc_remote_ports/rport-0:0-0/dev_loss_tmo:2147483647
/sys/class/fc_remote_ports/rport-0:0-0/fast_io_fail_tmo:3
/sys/class/fc_remote_ports/rport-0:0-1/dev_loss_tmo:2147483647
/sys/class/fc_remote_ports/rport-0:0-1/fast_io_fail_tmo:3
/sys/class/fc_remote_ports/rport-0:0-2/dev_loss_tmo:2147483647
/sys/class/fc_remote_ports/rport-0:0-2/fast_io_fail_tmo:3
/sys/class/fc_remote_ports/rport-1:0-...

Read more...

Is it possible that the issue might be something else that is directly related to the devices themselves rather than the behavior in multipath-tools?

I haven't looked very far yet, but maybe we're missing some other commit, too.

tags: removed: verification-needed
Changed in multipath-tools (Ubuntu Vivid):
status: Fix Committed → Triaged
Tore Anderson (toreanderson) wrote :
Download full text (7.0 KiB)

Ok, so I did some more testing. It appears that the problem isn't specific to the dev_loss_tmo and fast_io_fail_tmo setting. This is evidenced by the terminal log below. In multipath.conf (which we know for certain is being read, as the created multipath map gets the correct alias), I instruct it to use the ALUA hardware handler for all devices. However, for some reason, this is ignored, and the EMC hardware handler is used instead:

=====
root@ucstest-osl2:~# cat /etc/multipath.conf
devices {
        device {
                vendor ".*"
                product ".*"
                hardware_handler "1 alua"
        }
}

multipaths {
        multipath {
                wwid 3600601603a71320022967e0a1f38e411
                alias bootvolume
        }
}
root@ucstest-osl2:~# multipath -v 2
create: bootvolume (3600601603a71320022967e0a1f38e411) undef DGC,VRAID
size=50G features='1 queue_if_no_path' hwhandler='1 emc' wp=undef
|-+- policy='round-robin 0' prio=1 status=undef
| |- 0:0:0:0 sda 8:0 undef ready running
| `- 1:0:1:0 sdd 8:48 undef ready running
`-+- policy='round-robin 0' prio=0 status=undef
  |- 0:0:1:0 sdb 8:16 undef ready running
  `- 1:0:0:0 sdc 8:32 undef ready running
=====

This does *NOT* happen on RHEL-based distros - on those, changing the hardware_handler in multipath.conf in this way works as expected.

So why does it use the EMC hardware_handler? Well, there's a built-in default device section that matches the array in question. So this appears to override my user-specified config from multipath.conf:

=====
root@ucstest-osl2:~# multipathd -k'show config' | grep -B10 -A4 '1 emc'
 device {
  vendor "DGC"
  product ".*"
  product_blacklist "LUNZ"
  path_grouping_policy group_by_prio
  getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
  path_selector round-robin 0
  path_checker emc_clariion
  checker emc_clariion
  features "1 queue_if_no_path"
  hardware_handler "1 emc"
  prio emc
  failback immediate
  no_path_retry 60
 }
=====

If I copy the entire default device config into /etc/multipath.conf and only change the hardware_handler setting, then it starts working:

=====
root@ucstest-osl2:~# cat /etc/multipath.conf
devices {
        device {
                vendor "DGC"
                product ".*"
                product_blacklist "LUNZ"
                path_grouping_policy group_by_prio
                getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
                path_selector "round-robin 0"
                path_checker emc_clariion
                checker emc_clariion
                features "1 queue_if_no_path"
                hardware_handler "1 alua"
                prio emc
                failback immediate
                no_path_retry 60
        }
}

multipaths {
        multipath {
                wwid 3600601603a71320022967e0a1f38e411
                alias bootvolume
        }
}
root@ucstest-osl2:~# multipath -v 2
create: bootvolume (3600601603a71320022967e0a1f38e411) undef DGC,VRAID
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=undef
|-+- policy='round-robin 0' prio=1 status=undef
| |- 0:0:0:0 sda 8:0 undef ready running
| `- 1:0:1:0 sdd 8:48 undef ready ru...

Read more...

Download full text (13.4 KiB)

Hi,

The re-verification of this shows it's indeed fixed with multipath-tools 0.4.9-3ubuntu7.4.
Details provided.

Software version check:

 # lsb_release -d
 Description: Ubuntu 14.04.3 LTS

 # dpkg -s multipath-tools | grep ^Version:
 Version: 0.4.9-3ubuntu7.4

Set known values to the fast_io_fail_tmo (21) and dev_loss_tmo (42) files,
and activate multipathd to re-set the values according to its configuration.

 # /etc/init.d/multipath-tools stop
  * Stopping multipath daemon multipathd
    ...done.

 # for fastiofail in /sys/class/fc_remote_ports/rport-*/fast_io_fail_tmo; do echo 21 > $fastiofail; done
 # grep -h . /sys/class/fc_remote_ports/rport-*/fast_io_fail_tmo | sort -u
 21

 # for devloss in /sys/class/fc_remote_ports/rport-*/dev_loss_tmo; do echo 42 > $devloss; done
 # grep -h . /sys/class/fc_remote_ports/rport-*/dev_loss_tmo | sort -u
 42

 # /etc/init.d/multipath-tools start
  * Starting multipath daemon multipathd
    ...done.

The storage products used:

 # multipath -l | grep ^mpath | cut -d, -f2 | sort -u
 2107900
 2145
 2810XIV
 FlashSystem-9840

The active configuration for them:

 # multipathd -k'show config' | grep '{\|}\|product\|tmo'
 defaults {
  fast_io_fail_tmo 10
 }
 ...
 devices {
 ...
  device {
          product "2107900"
  }
  device {
          product "2145"
          dev_loss_tmo 120
  }
 ...
  device {
          product "2810XIV"
  }
  device {
          product "FlashSystem-9840"
          fast_io_fail_tmo 25
          dev_loss_tmo 300
  }
 }
 ...

The active configuration is sysfs:

 The sysfs setting (see below) are all set correctly,
 according to the multipath configuration (above).

 On rports to 2107900 or 2810XIV, devloss = 42 (not specified; system) and fastiofail = 10 (defaults).
 On rports to 2145, devloss = 120 (device), fastiofail = 10 (defaults).
 On rports to 9840, devloss = 300 (product) and fastiofail = 25 (product).

 On rports not connected to any of those, the system/known values are left unchanged.

 # grep . /sys/class/fc_remote_ports/rport-*/{dev_loss_tmo,fast_io_fail_tmo,device/target*/*/model} | sort -V
 /sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo:42
 /sys/class/fc_remote_ports/rport-1:0-0/fast_io_fail_tmo:21
 /sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo:42
 /sys/class/fc_remote_ports/rport-1:0-1/fast_io_fail_tmo:21
 /sys/class/fc_remote_ports/rport-1:0-2/device/target1:0:0/1:0:0:0/model:2810XIV-LUN-0
 /sys/class/fc_remote_ports/rport-1:0-2/device/target1:0:0/1:0:0:1/model:2810XIV
 /sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo:42
 /sys/class/fc_remote_ports/rport-1:0-2/fast_io_fail_tmo:10
 /sys/class/fc_remote_ports/rport-1:0-3/device/target1:0:1/1:0:1:0/model:2145
 /sys/class/fc_remote_ports/rport-1:0-3/dev_loss_tmo:120
 /sys/class/fc_remote_ports/rport-1:0-3/fast_io_fail_tmo:10
 /sys/class/fc_remote_ports/rport-1:0-4/device/target1:0:2/1:0:2:0/model:FlashSystem-9840
 /sys/class/fc_remote_ports/rport-1:0-4/dev_loss_tmo:300
 /sys/class/fc_remote_ports/rport-1:0-4/fast_io_fail_tmo:25
 /sys/class/fc_remote_ports/rport-1:0-5/device/target1:0:3/1:0:3:0/model:2107900
 /sys/class/fc_remote_ports/rport-1:0-5/dev_loss_tmo:42
 /sys/class/fc_remote_ports/rport-1:0-5/fas...

Default Comment by Bridge

------- Comment on attachment From <email address hidden> 2015-06-23 13:14 EDT-------

Hi Canonical,

This patch incorporates the upstream commits in order to apply the fibrechannel dev_loss and fast_io_fail timeout attributes from multipath.conf into sysfs.

It targets the 14.04.x LTS series, but it should apply fine to 14.10 and 15.04 (except for debian/changelog context lines, obviously). No need for 15.10 which should get a multipath-tools upgrade that includes the commits.

Thanks!

------- Comment on attachment From <email address hidden> 2015-06-23 13:18 EDT-------

> This patch incorporates the upstream commits in order to apply the
> fibrechannel dev_loss and fast_io_fail timeout attributes from
> multipath.conf into sysfs.

With the patch applied, the fc timeout attributes correctly propagate through multipath verbose logs, and the sysfs attributes -- verification procedure attached.

Sigh; ignore bugproxy again.
There are no new patches here.

The fix for this bug has been awaiting testing feedback in the -proposed repository for vivid for more than 90 days. Please test this fix and update the bug appropriately with the results. In the event that the fix for this bug is still not verified 15 days from now, the package will be removed from the -proposed repository.

tags: added: removal-candidate

The verification for vivid is no longer relevant as it's EOL by Feb, 2016. Thanks.

Tore Anderson (toreanderson) wrote :
Download full text (3.7 KiB)

I tested it on Vivid, and it does not work. The dev_loss_tmo and fast_io_fail_tmo sysfs settings do *not* get set. More information on my test environment below:

root@ucstest:~# cat /etc/multipath.conf
defaults {
  fast_io_fail_tmo 8
  dev_loss_tmo 1024
}
devices
  device {
    vendor "HP.*"
    product "P2000G3.*"
    path_grouping_policy "multibus"
    fast_io_fail_tmo 16
    dev_loss_tmo 2048
  }
}
root@ucstest:~# multipath -ll
3600c0ff0001204a9d12b755101000000 dm-0 HP ,P2000G3 FC/iSCSI
size=30G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 1:0:0:1 sdb 8:16 active ready running
  |- 1:0:1:1 sdc 8:32 active ready running
  |- 2:0:0:1 sdd 8:48 active ready running
  `- 2:0:1:1 sde 8:64 active ready running

I know for a fact that the device{} section is being applied, because if I remove the path_grouping_policy keyword and restart multipathd, the topology changes to one path per group:

root@ucstest:~# sed -i 's/path_grouping/#path_grouping/' /etc/multipath.conf
root@ucstest:~# systemctl restart multipath-tools.service
root@ucstest:~# multipath -ll
3600c0ff0001204a9d12b755101000000 dm-0 HP ,P2000G3 FC/iSCSI
size=30G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 1:0:0:1 sdb 8:16 active ready running
|-+- policy='round-robin 0' prio=1 status=enabled
| `- 1:0:1:1 sdc 8:32 active ready running
|-+- policy='round-robin 0' prio=1 status=enabled
| `- 2:0:0:1 sdd 8:48 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 2:0:1:1 sde 8:64 active ready running

After reverting that change and restarting again, I can confirm that my config file timeout settings are being read by multipathd:

root@ucstest:~# multipathd -k'show config' | grep -B5 -A1 dev_loss_tmo
defaults {
 verbosity 2
 wwids_file /etc/multipath/wwids
 fast_io_fail_tmo 8
 dev_loss_tmo 1024
}
--
 device {
  vendor "HP.*"
  product "P2000G3.*"
  path_grouping_policy multibus
  fast_io_fail_tmo 16
  dev_loss_tmo 2048
 }

However, they are *not* being applied to sysfs:

root@ucstest:~# grep . /sys/class/fc_remote_ports/rport-*/*tmo
/sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-1:0-0/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-1:0-1/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-1:0-2/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-2:0-0/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-2:0-0/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-2:0-1/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-2:0-1/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-2:0-2/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-2:0-2/fast_io_fail_tmo:off

Versions:

root@ucstest:~# dpkg -l kpartx multipath-tools
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
++...

Read more...

Tore Anderson (toreanderson) wrote :

Okay, sorry about the irrelevant verification on Vivid then. But I'd like to point out that Trusty behaves exactly the same, i.e., the bug is *not* fixed. Using the exact same multipath.conf as I mentioned in comment #31 with multipath-tools on 0.4.9-3ubuntu7.9, I get the exact same behaviour. That is, it is apparent that multipathd does read the settings from the config file (as they're visible in output from "multipathd -k'show config'"), but they're not being applied/written to sysfs.

If I run use the command line utility in verbose mode to create the map, it does claim that it opens the sysfs files in question, but strace shows no sign of that actually happening:

root@ucstest:~# /etc/init.d/multipath-tools stop
 * Stopping multipath daemon multipathd [ OK ]
root@ucstest:~# multipath -F
root@ucstest:~# strace -ff -eopen multipath -v4 |& egrep 'create:|_tmo'
Mar 08 07:48:38 | 3600c0ff0001204a9d12b755101000000: fast_io_fail_tmo = 16 (controller default)
Mar 08 07:48:38 | 3600c0ff0001204a9d12b755101000000: dev_loss_tmo = 2048 (controller default)
Mar 08 07:48:38 | open '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:00.0/0000:04:00.0/0000:05:01.0/0000:07:00.0/host1/rport-1:0-1/fc_remote_ports/rport-1:0-1'/'dev_loss_tmo'
ort-1:0-1'/'fast_io_fail_tmo'
Mar 08 07:48:38 | open '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:00.0/0000:04:00.0/0000:05:01.0/0000:07:00.0/host1/rport-1:0-2/fc_remote_ports/rport-1:0-2'/'dev_loss_tmo'
Mar 08 07:48:38 | open '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:00.0/0000:04:00.0/0000:05:01.0/0000:07:00.0/host1/rport-1:0-2/fc_remote_ports/rport-1:0-2'/'fast_io_fail_tmo'
Mar 08 07:48:38 | open '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:00.0/0000:04:00.0/0000:05:02.0/0000:08:00.0/host2/rport-2:0-1/fc_remote_ports/rport-2:0-1'/'dev_loss_tmo'
Mar 08 07:48:38 | open '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:00.0/0000:04:00.0/0000:05:02.0/0000:08:00.0/host2/rport-2:0-1/fc_remote_ports/rport-2:0-1'/'fast_io_fail_tmo'
Mar 08 07:48:38 | open '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:00.0/0000:04:00.0/0000:05:02.0/0000:08:00.0/host2/rport-2:0-2/fc_remote_ports/rport-2:0-2'/'dev_loss_tmo'
Mar 08 07:48:38 | open '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:00.0/0000:04:00.0/0000:05:02.0/0000:08:00.0/host2/rport-2:0-2/fc_remote_ports/rport-2:0-2'/'fast_io_fail_tmo'
create: 3600c0ff0001204a9d12b755101000000 undef HP ,P2000G3 FC/iSCSI

Note that this is a lab system, so if you'd like, you can have a look yourself, Mathieu. Just send me a SSH pubkey on IRC and I'll set up a user account for you.

Tore

Tore Anderson (toreanderson) wrote :

Ok, so I found the bug. The problematic code is in sysfs_attr_set_value() in libmultipath/sysfs.c:

        devpath = udev_device_get_syspath(dev);
        condlog(4, "open '%s'/'%s'", devpath, attr_name);
        if (stat(devpath, &statbuf) != 0) {
                condlog(4, "stat '%s' failed: %s", devpath, strerror(errno));
                return 0;
        }

        /* skip directories */
        if (S_ISDIR(statbuf.st_mode))
                return 0;

The problem here is that stat() gets called on the containing directory in devpath (as opposed to devpath+attr_name). Then the code proceeds to check if that is a directory (which obviously it is going to be) and before returning without having done anything. The rest of the function also seems to assume that "devpath" contains the full path to the sysfs attribute as opposed to the containing directory.

How the verification in comment #22 could have found this code to be working is beyond me, as the only place where the attr_name variable is actively being used for anything in the function is in the condlog() call.

It appears this got fixed upstream by http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=commit;h=050b24b33d3c60e29f7820d2fb75e84a9edde528 . This patch applies fine to the multipath-tools 0.4.9-3ubuntu7.9 sources from trusty (with --fuzz=3), and I can confirm that it does fix the problem for me - the sysfs timeout attributes gets set correctly when the maps is being created (both when using multipathd and the multipath tool).

Tore

Hi @toreanderson,

The not relevant statement is not at you :) just explaining why I didn't verify on vivid.

I've seen some reports that this functionality is no longer working on Trusty too.

The version of multipath-tools in the proposed pocket of Vivid that was purported to fix this bug report has been removed because the bugs that were to be fixed by the upload were not verified in a timely (105 days) fashion.

Changed in multipath-tools (Ubuntu Vivid):
status: Triaged → Won't Fix
To post a comment you must log in.