multipath random crashes on use-after-free

Bug #1695789 reported by Rafael David Tinoco
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
multipath-tools (Ubuntu)
Fix Released
High
Unassigned
Trusty
Fix Released
Medium
Rafael David Tinoco

Bug Description

[Impact]

 * multipath crashes when device-mapper is modified. DM_NAME was being freed twice.
 * expect multipath daemon to crash and not run any checkers on path groups.
 * not checking path groups, in an event of failure, the mpath won't change path prios.
 * openstack relies on flushing device maps frequently when using iscsi.

[Test Case]

 * having a multipathed environment (4 paths, 2 and 2, to a lun):
   - while true; do multipath -F ; multipath -r ; multipath -ll; done
 * run multipath with valgrind and see:

==31831== Invalid read of size 1
==31831== at 0x4C2E902: strncmp (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31831== by 0x56FC26E: find_mp_by_alias (structs.c:296)
==31831== by 0x404B2F: ev_add_map (main.c:264)
==31831== by 0x404A8B: uev_add_map (main.c:244)
...
==31831== Address 0x728d8d1 is 1 bytes inside a block of size 6 free'd
==31831== at 0x4C2BDEC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31831== by 0x404A9A: uev_add_map (main.c:245)
==31831== by 0x40623C: uev_trigger (main.c:756)

[Regression Potential]

 * using strdup for this char *, if there was no double free - like i discovered, would cause a slight memory leak of the size of DM_NAME every time a device mapper disappears and is re-created. it wouldn't be an important regression.

* based on upstream commit and tested by the reported. fixes initial issue.

* What releases are affected ?

 The following releases already got the fix
 - Xenial/Yakkety/Zesty/Artful

 Note that Debian also has the fix.
 Meaning that ONLY Trusty is affected by this bug.

* This SRU contained fixes for 2 LP bugs:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1695789
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1687004

[Other Info]

It has brought to my attention that multipath in trusty has been crashing randomly. Some dumps were given to me and I was able to generate some others. I have also generated valgrind output to help me with these random crashes.

Crashes:

#0 malloc_consolidate (av=av@entry=0x7f5b58000020) at malloc.c:4149
#1 0x00007f5b62df3cf8 in _int_malloc (av=0x7f5b58000020, bytes=16384) at malloc.c:3423
#2 0x00007f5b62df66d0 in __GI___libc_malloc (bytes=16384) at malloc.c:2891
#3 0x00007f5b638134d7 in dm_task_run () from /lib/x86_64-linux-gnu/libdevmapper.so.1.02.1
#4 0x00007f5b6314be5c in dm_map_present (str=0x7f5b58000990 "lun02") at devmapper.c:304
#5 0x0000000000404ac7 in ev_add_map (dev=, alias=, vecs=) at main.c:257
#6 0x0000000000000000 in ?? ()

And:

#0 0x00007f13a5933c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f13a5937028 in __GI_abort () at abort.c:89
#2 0x00007f13a59702a4 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7f13a5a81ef0 "") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007f13a597c56e in malloc_printerr (ptr=<optimized out>, str=0x7f13a5a82020 "double free or corruption (out)", action=1) at malloc.c:4996
#4 _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
#5 0x00007f13a5cdbe86 in free_multipath (mpp=0x7f138c033d60, free_paths=0) at structs.c:174
#6 0x00007f13a5cfe117 in _remove_map (mpp=0x7f138c033d60, vecs=0x8adaa0, stop_waiter=1, purge_vec=1) at structs_vec.c:143
#7 0x00007f13a5cfe175 in remove_map_and_stop_waiter (mpp=0x7f138c033d60, vecs=0x8adaa0, purge_vec=1) at structs_vec.c:156
#8 0x0000000000406b4d in mpvec_garbage_collector (vecs=<error reading variable: can't compute CFA for this frame>) at main.c:950
...
#14 0x00000000004076b7 in checkerloop (ap=<error reading variable: can't compute CFA for this frame>) at main.c:1163

Please follow my analysis in the subsequent comments.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (7.4 KiB)

## Valgrind

==31831== Invalid read of size 1
==31831== at 0x4C2E0F4: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31831== by 0x56FC243: find_mp_by_alias (structs.c:295)
==31831== by 0x404B2F: ev_add_map (main.c:264)
==31831== by 0x404A8B: uev_add_map (main.c:244)
==31831== by 0x40623C: uev_trigger (main.c:756)
==31831== by 0x5713958: service_uevq (uevent.c:118)
==31831== by 0x5713B67: uevent_dispatch (uevent.c:167)
==31831== by 0x406485: uevqloop (main.c:815)
==31831== by 0x4E3F183: start_thread (in /lib/x86_64-linux-gnu/libpthread-2.19.so)
==31831== by 0x5A2EBEC: clone (clone.S:111)
==31831== Address 0x728d8d1 is 1 bytes inside a block of size 6 free'd
==31831== at 0x4C2BDEC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31831== by 0x404A9A: uev_add_map (main.c:245)
==31831== by 0x40623C: uev_trigger (main.c:756)
==31831== by 0x5713958: service_uevq (uevent.c:118)
==31831== by 0x5713B67: uevent_dispatch (uevent.c:167)
==31831== by 0x406485: uevqloop (main.c:815)
==31831== by 0x4E3F183: start_thread (in /lib/x86_64-linux-gnu/libpthread-2.19.so)
==31831== by 0x5A2EBEC: clone (clone.S:111)

All the errors above exist because of a use-after-free caused by the code:

static int
uev_add_map (struct uevent * uev, struct vectors * vecs)
{
 char *alias;
 int major = -1, minor = -1, rc;

 condlog(2, "%s: add map (uevent)", uev->kernel);
 alias = uevent_get_dm_name(uev);
        ...
 rc = ev_add_map(uev->kernel, alias, vecs);
 FREE(alias); <--- freed here
 return rc
}

Mainly because ev_add_map will call:

 /*
  * now we can register the map
  */
 if (map_present && (mpp = add_map_without_path(vecs, alias))) {
  sync_map_state(mpp);
  condlog(2, "%s: devmap %s registered", alias, dev);
  return 0;
 }

which will set:

extern struct multipath *
add_map_without_path (struct vectors * vecs, char * alias)
{
 struct multipath * mpp = alloc_multipath();

 if (!mpp)
  return NULL;

 mpp->alias = alias;

So when this logic returns, mpp-> alias will be already freed by uev_add_map.
Any other code traversing mpvecs using "mpp" as "struct multipath" will have
pointer "alias" pointing into random memory causing all kinds of memory
corruption.

This logic is triggered every time the "ueventloop" gets a new uevent from
kernel saying that there was a device map change. If device is "dm-" and
its action is "change" (meaning a device mapper was changed), this logic
will try to create a new "mpp" structure in global vecs->mpvec vectors
with this discovered multipath.

This is not used in daemon startup, since all paths are initially created
by the configure() -> {path_discovery, map_discovery, coalesce_paths, coalesce_
maps}. Multipath devices are more susceptible to be created from the beginning,
or by a discovered path, that would trigger another logic to for mpp structures
(thus creating multipath devices): That might explain why this bug has not
been fixed before.

Example of crashes because of this:

#0 malloc_consolidate (av=av@entry=0x7f5b58000020) at malloc.c:4149
#1 0x00007f5b62df3cf8 in _int_malloc (av=0x7f5b58000020, bytes=16384) at malloc.c:3423
#2 0x000...

Read more...

Changed in multipath-tools (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in multipath-tools (Ubuntu):
milestone: none → ubuntu-14.04.5
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I can confirm that applying upstream commit solves valgrind complains (and likely the crashes I showed).

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I have just attached the fix for this BUG together with fix for LP: #1687004. I'll provide now a PPA containing both fixes, as a temp hotfix for those who need, and will wait a bit for feedback - from whoever use it - before subscribing sponsors. When I subscribe sponsors the SRU process will begin.

Thank you
Rafael

tags: added: sts
tags: added: sts-sponsor
description: updated
Eric Desrochers (slashd)
description: updated
Changed in multipath-tools (Ubuntu Trusty):
assignee: nobody → Rafael David Tinoco (inaddy)
importance: Undecided → Medium
status: New → In Progress
Changed in multipath-tools (Ubuntu):
status: In Progress → Fix Released
assignee: Rafael David Tinoco (inaddy) → nobody
Revision history for this message
Eric Desrochers (slashd) wrote :

Uploaded in Trusty upload queue, now waiting for SRU approval in order to start building in trusty-proposed.

- Eric

tags: added: sts-sru-needed
tags: removed: sts-sponsor
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Rafael, or anyone else affected,

Accepted multipath-tools into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.16 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in multipath-tools (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Change of SRU verification policy

As part of a recent change in the Stable Release Update verification policy we would like to inform that for a bug to be considered verified for a given release a verification-done-$RELEASE tag needs to be added to the bug where $RELEASE is the name of the series the package that was tested (e.g. verification-done-xenial). Please note that the global 'verification-done' tag can no longer be used for this purpose.

Thank you!

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'll be verifying this case by the end of this week so it can be published, hopefully, next week. Thank you!

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

## VERIFICATION (old version and old behavior)

$ dpkg -l | grep multipath | sed -E 's: +: :g'

ii multipath-tools 0.4.9-3ubuntu7.15 amd64 maintain multipath block device access
ii multipath-tools-boot 0.4.9-3ubuntu7.15 all Support booting from multipath devices
ii multipath-tools-dbg 0.4.9-3ubuntu7.15 amd64 maintain multipath block device access
ii multipath-tools-dbgsym 0.4.9-3ubuntu7.15 amd64 debug symbols for package multipath-tools

Attaching: verification_old_valgrind.txt
Attaching: verification_old_multipathd.txt

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Valgrind showing wrong memory behavior due to a double free() (explained in this bug)

==10019== Thread 3:
==10019== Invalid read of size 1
==10019== at 0x4C2E0E2: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10019== by 0x56FC243: find_mp_by_alias (structs.c:295)
==10019== by 0x571F066: update_multipath (structs_vec.c:495)
==10019== by 0x5720986: waiteventloop (waiter.c:130)
==10019== by 0x5720AE3: waitevent (waiter.c:162)
==10019== by 0x4E3F183: start_thread (pthread_create.c:312)
==10019== by 0x5A2EFFC: clone (clone.S:111)
==10019== Address 0x731ada0 is 0 bytes inside a block of size 6 free'd
==10019== at 0x4C2BDEC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10019== by 0x404A9A: uev_add_map (main.c:245)
==10019== by 0x40623C: uev_trigger (main.c:756)
==10019== by 0x5713958: service_uevq (uevent.c:118)
==10019== by 0x5713B67: uevent_dispatch (uevent.c:167)
==10019== by 0x406485: uevqloop (main.c:815)
==10019== by 0x4E3F183: start_thread (pthread_create.c:312)
==10019== by 0x5A2EFFC: clone (clone.S:111)

==10019== Invalid read of size 2
==10019== at 0x4C2FDC0: __GI_memcpy (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10019== by 0x505DB81: ??? (in /lib/x86_64-linux-gnu/libdevmapper.so.1.02.1)
==10019== by 0x56F50E9: dm_get_status (devmapper.c:402)
==10019== by 0x571E6FF: update_multipath_status (structs_vec.c:262)
==10019== by 0x571E7D9: update_multipath_strings (structs_vec.c:283)
==10019== by 0x571EA57: setup_multipath (structs_vec.c:338)
==10019== by 0x571F0DD: update_multipath (structs_vec.c:505)
==10019== by 0x5720986: waiteventloop (waiter.c:130)
==10019== by 0x5720AE3: waitevent (waiter.c:162)
==10019== by 0x4E3F183: start_thread (pthread_create.c:312)
==10019== by 0x5A2EFFC: clone (clone.S:111)
==10019== Address 0x7e200d0 is 0 bytes inside a block of size 6 free'd
==10019== at 0x4C2BDEC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10019== by 0x404A9A: uev_add_map (main.c:245)
==10019== by 0x40623C: uev_trigger (main.c:756)
==10019== by 0x5713958: service_uevq (uevent.c:118)
==10019== by 0x5713B67: uevent_dispatch (uevent.c:167)
==10019== by 0x406485: uevqloop (main.c:815)
==10019== by 0x4E3F183: start_thread (pthread_create.c:312)
==10019== by 0x5A2EFFC: clone (clone.S:111)

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

## VERIFICATION (proposed version and new behavior)

$ dpkg -l | grep multipath | sed -E 's: +: :g'
ii multipath-tools 0.4.9-3ubuntu7.16 amd64 maintain multipath block device access
ii multipath-tools-boot 0.4.9-3ubuntu7.16 all Support booting from multipath devices
ii multipath-tools-dbg 0.4.9-3ubuntu7.16 amd64 maintain multipath block device access
ii multipath-tools-dbgsym 0.4.9-3ubuntu7.16 amd64 debug symbols for package multipath-tools

Attaching: verification_new_valgrind.txt
Attaching: verification_new_multipathd.txt

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Nothing too different from previous (since I'm not reproducing the original dumps, just the valgrind behavior)

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Aside from a false positive warning (about an infinite loop for service_uevq) there is nothing besides valgrind finding leaks in pthread_create() from glibc (likely false positives also).

All the old errors:

==10019== Thread 3:
==10019== Invalid read of size 1
==10019== at 0x4C2E0E2: strlen (in /usr/lib/valgrind/...)
==10019== by 0x56FC243: find_mp_by_alias (structs.c:295)
...
==10019== Address 0x731ada0 is 0 bytes inside a block of size 6 free'd
==10019== at 0x4C2BDEC: free (in /usr/lib/valgrind/...)
==10019== by 0x404A9A: uev_add_map (main.c:245)
...

Seem to be gone. This shows the patch was effective and is solving the problems of initial core dump, that suffered a seg fault when accessing mpp->alias (just like comment https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1695789/comments/1 showed).

I'll consider this as verification-done.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Forgot to attach the new valgrind log.

tags: added: verification-done-trusty
removed: verification-needed
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for multipath-tools has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package multipath-tools - 0.4.9-3ubuntu7.16

---------------
multipath-tools (0.4.9-3ubuntu7.16) trusty; urgency=medium

  * Fixes multipathd crash on usa after free (mpp->alias) (LP: #1695789)
     + d/p/strdup_multipath_alias.patch
  * Fixes multipathd crash on log thread initialization (LP: #1687004)
     + d/p/add-missing-log-functions-from-hannes-tree.patch
     + d/p/make-log-pthread-more-robust.patch

 -- Rafael David Tinoco <email address hidden> Mon, 01 May 2017 17:09:08 +0000

Changed in multipath-tools (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.