Comment 0 for bug 1535898

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

We have a problem on multipath-tools. I created 2 hosts:

iscsi-server
iscsi-client

With 4 NICs in between them and with a simple multibus multipath.

With that I was able to check that there is a regression in multipath-tools.

It looks like the patches brought from upstream:

0017-multipath-get-right-sysfs-value-for-checker_timeout.patch
0018-multipath-handle-offlined-paths.patch
#
# from here
#
0019-multipath-fix-scsi-timeout-code.patch
0020-multipath-make-tgt_node_name-work-for-iscsi-devices.patch
0021-multipath-cleanup-dev_loss_tmo-issues.patch
0022-Fix-for-setting-0-to-fast_io_fail.patch
0023-Fix-fast_io_fail-capping.patch
0024-multipath-enable-getting-uevents-through-libudev.patch
0025-Use-devpath-as-argument-for-sysfs-functions.patch
0026-multipathd-remove-references-to-sysfs_device.patch
0027-multipathd-use-struct-path-as-argument-for-event-pro.patch
0028-Add-global-udev-reference-pointer-to-config.patch
0029-Use-udev-enumeration-during-discovery.patch
0030-use-struct-udev_device-during-discovery.patch
0031-More-debugging-output-when-synchronizing-path-states.patch
0032-Use-struct-udev_device-instead-of-sysdev.patch
0033-discovery-Fixup-cciss-discovery.patch
0035-Use-udev-devices-during-discovery.patch
0036-Remove-all-references-to-hand-craftes-sysfs-code.patch
#
# to here
#
# 0037-multipath-libudev-cleanup-and-bugfixes.patch
# 0038-multipath-check-if-a-device-belongs-to-multipath.patch
# 0039-multipath-and-wwids_file-multipath.conf-option.patch
# 0040-multipath-Check-blacklists-as-soon-as-possible.patch
# 0041-add-wwids-file-cleanup-options.patch
# 0042-add-find_multipaths-option.patch
# 0043-alloc-keywords.patch
# lp1503305_libmultipath_info_on_1st_path_down_dbd131e.patch

In the range 19-36 caused a regression.

Whenever I generate the package (for trusty) including those patches I'm able to
generate a core dump indicating a possible double-free or null-dereference related
to a path removal (that is why I can reproduce with the test case). Unfortunately
it usually explodes inside malloc() or somewhere in glibc.

Using valgrind I was able to verify some free() errors:

==30415== Invalid free() / delete / delete[] / realloc()
==30415== at 0x4C2BDEC: free (vg_replace_malloc.c:473)
==30415== by 0x54E243C: vector_del_slot (vector.c:95)
==30415== by 0x550A516: _remove_map (structs_vec.c:139)
==30415== by 0x550A5C3: _remove_maps (structs_vec.c:170)
==30415== by 0x550A64B: remove_maps (structs_vec.c:181)
==30415== by 0x40713F: configure (main.c:1153)
==30415== by 0x407A74: child (main.c:1419)
==30415== by 0x40837D: main (main.c:1618)

And they are exactly aligned to a core dump (multipathd) I got from another user.
(wrong free was coming from _remove_map).