multipath-tools from Precise should have been fixed together with Trusty fixes

Bug #1520192 reported by Rafael David Tinoco
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
multipath-tools (Ubuntu)
Precise
Won't Fix
High
Unassigned

Bug Description

[Impact]

 * Multipath from precise proves to have problems with failover
 * This can lead to bad I/O if openstack relies on VNX, for example
 * "failed/online" path status is NEVER good and I/O corruption can happen
 * Minimum can happen is a GUEST freeze if relying on path in question
 * Other applications might suffer the same

[Test Case]

 * Described fully in comment #3 (comments #1 and #2 are related)

[Regression Potential]

 * This patches are the exact change that happen in Precise -> Trusty
   development. No major changes or versions were made.
 * A PPA with a hotfix was created in order to test those changes and
   proved to fix the initial problem.

[Other Info]

Original BUG Description:

Precise multipath-tools MIGHT need fixes from trusty. This has already been proved in one iSCSI multipath installation where precise multipath, intermittently, connected to VNX storages show paths as: active/failed when it should show - even after the path check timeout - faulty/failed.

* Improve description showing output *

Using trusty multipath in Precise, the same environment does NOT suffer
from this issue.

Differences between both versions:

#### LP: #1468897 - https://bugs.launchpad.net/bugs/1468897
#### LP: #1386637 - https://bugs.launchpad.net/bugs/1386637

- 0001-multipath-add-checker_timeout-default-config-option.patch
- 0002-Make-params-variable-local.patch
- 0003-libmultipath-Fix-possible-string-overflow.patch
- 0004-Update-hwtable-factorization.patch
- 0005-Fixup-strip-trailing-whitespaces-for-getuid-return-v.patch
- 0006-Remove-sysfs_attr-cache.patch
- 0007-Move-setup_thread_attr-to-uevent.c.patch
- 0008-Use-lists-for-uevent-processing.patch
- 0009-Start-uevent-service-handler-from-main-thread.patch
- 0010-libmultipath-rework-sysfs-handling.patch
- 0011-Rework-sysfs-device-handling-in-multipathd.patch
- 0012-Only-check-offline-status-for-SCSI-devices.patch
- 0013-Check-for-offline-path-in-get_prio.patch
- 0014-libmultipath-Remove-duplicate-calls-to-path_offline.patch
- 0015-Update-dev_loss_tmo-for-no_path_retry.patch
- 0016-Reload-map-for-device-read-only-setting-changes.patch
- 0017-multipath-get-right-sysfs-value-for-checker_timeout.patch
- 0018-multipath-handle-offlined-paths.patch
- 0019-multipath-fix-scsi-timeout-code.patch
- 0020-multipath-make-tgt_node_name-work-for-iscsi-devices.patch
- 0021-multipath-cleanup-dev_loss_tmo-issues.patch
- 0022-Fix-for-setting-0-to-fast_io_fail.patch
- 0023-Fix-fast_io_fail-capping.patch
- 0024-multipath-enable-getting-uevents-through-libudev.patch
- 0025-Use-devpath-as-argument-for-sysfs-functions.patch
- 0026-multipathd-remove-references-to-sysfs_device.patch
- 0027-multipathd-use-struct-path-as-argument-for-event-pro.patch
- 0028-Add-global-udev-reference-pointer-to-config.patch
- 0029-Use-udev-enumeration-during-discovery.patch
- 0030-use-struct-udev_device-during-discovery.patch
- 0031-More-debugging-output-when-synchronizing-path-states.patch
- 0032-Use-struct-udev_device-instead-of-sysdev.patch
- 0033-discovery-Fixup-cciss-discovery.patch
- 0035-Use-udev-devices-during-discovery.patch
- 0036-Remove-all-references-to-hand-craftes-sysfs-code.patch
- 0037-multipath-libudev-cleanup-and-bugfixes.patch
- 0038-multipath-check-if-a-device-belongs-to-multipath.patch
- 0039-multipath-and-wwids_file-multipath.conf-option.patch
- 0040-multipath-Check-blacklists-as-soon-as-possible.patch
- 0041-add-wwids-file-cleanup-options.patch
- 0042-add-find_multipaths-option.patch

#### LP: #1431650 - https://bugs.launchpad.net/bugs/1431650

- Added debian/patches/0015-shared-lock-for-udev.patch

#### LP: #1441930 - https://bugs.launchpad.net/bugs/1441930

- Support disks with non 512-byte sectors

#### LP: #1435706 - https://bugs.launchpad.net/bugs/1435706 ( GOOD CANDIDATE )

- Correctly write FC timeout attributes to sysfs.

Tags: sts
Changed in multipath-tools (Ubuntu):
status: New → In Progress
assignee: nobody → Rafael David Tinoco (inaddy)
description: updated
summary: - Precise multipath-tools from precise should have been fixed together
- with Trusty fixes
+ multipath-tools from precise should have been fixed together with Trusty
+ fixes
summary: - multipath-tools from precise should have been fixed together with Trusty
+ multipath-tools from Precise should have been fixed together with Trusty
fixes
description: updated
Revision history for this message
Gábor Mészáros (gabor.meszaros) wrote :
Revision history for this message
Gábor Mészáros (gabor.meszaros) wrote :

With the backported package the problem cannot be reproduced.

Revision history for this message
Gábor Mészáros (gabor.meszaros) wrote :

Steps taken to provoke the problem:
1.) deploy any VM instance using nova commands, that is booted from EMC VNX series SAN connected to compute using iscsi, dm-multipath
2.) unplug/disable network connectivity on any of the active paths
3.) check multipath -ll output for faulty failed path/device status.
4.) re-plug/enable network connectivity on the disabled paths
5.) check multipath -ll output for recovering to active enabled path/device status

Somtimes the problem occurs when disabling the interface, other times only when re-enabling it.

Expected output for multipath -ll:
36006016047813400dd029f614896e511 dm-3 DGC ,VRAID
size=50G features='0' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=70 status=active
  |- 6:0:0:15 sdi 8:128 active ready running
  |- 7:0:0:15 sdk 8:160 active ready running
  |- 8:0:0:15 sdm 8:192 active ready running
  `- 9:0:0:15 sdo 8:224 active ready running

Actual output (even after several hours of waiting, with active traffic on storage):
36006016047813400dd029f614896e511 dm-3 DGC,VRAID
size=50G features='0' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=70 status=active
  |- 6:0:0:159 sdi 8:128 active ready running
  |- 7:0:0:159 sdk 8:160 failed ready running
  |- 8:0:0:159 sdm 8:192 active ready running
  `- 9:0:0:159 sdo 8:224 failed ready running

A previously discovered workaround for the problem can be achieved by reloading the multipath-tools service (or restarting, but multipath -r does not always fixes it).
The package with backported changes is confirmed to fix the issue, without having to reload the service.

Dave Chiluk (chiluk)
Changed in multipath-tools (Ubuntu):
importance: Undecided → High
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

This is the debdiff with changes from trusty to precise. Those changes are basically fixing behaviour PER comments showing that. No major (or even minor) version was changed so this is - IMHO - suitable for SRU. This also has been tested - PER PPA with hotfix provided - and proved to work.

Thank you in advance

Rafael Tinoco

description: updated
Louis Bouchard (louis)
Changed in multipath-tools (Ubuntu Precise):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in multipath-tools (Ubuntu):
status: In Progress → Invalid
assignee: Rafael David Tinoco (inaddy) → nobody
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Please hold this fix since I just discovered the following BUG:

https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1532789

I'll have to add the fix for the bug above together with this one.

I will attach files soon.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Adding fix for LP: #1532789 together with this SRU (so this is all done just once).

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Okay this is ready to be sponsored now. I have added fix for LP: #1532789 together.

Mathieu,

Could u do these "2" fixes at once ?

Thank you

Rafael

tags: added: sts
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Aye. Removing sponsors, since I'll take care of it.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Mathieu,

Please take a look at the following bug:

https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1535898

It looks like Trusty debian/patches are suffering a regression. This might have to be fixed together.

Mathew Hodson (mhodson)
no longer affects: multipath-tools (Ubuntu)
Changed in multipath-tools (Ubuntu Precise):
assignee: Rafael David Tinoco (inaddy) → nobody
Louis Bouchard (louis)
Changed in multipath-tools (Ubuntu Precise):
assignee: nobody → Louis Bouchard (louis-bouchard)
Louis Bouchard (louis)
Changed in multipath-tools (Ubuntu Precise):
assignee: Louis Bouchard (louis) → nobody
Revision history for this message
Steve Langasek (vorlon) wrote :

The Precise Pangolin has reached end of life, so this bug will not be fixed for that release

Changed in multipath-tools (Ubuntu Precise):
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.