Surelock GA2: Kernel panic with GA candidate driver, warning at kernel/rcu/tree.c:2694

Bug #1526946 reported by bugproxy on 2015-12-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Canonical Kernel Team
Wily
Undecided
Tim Gardner
Xenial
High
Canonical Kernel Team

Bug Description

-- Problem Description --
System was loaded (Ubuntu 15.10 base (Linux z1391 4.2.0-16-generic #19-Ubuntu SMP Thu Oct 8 14:49:47 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux)

Kernel panic running Hardware Test Exerciser test suite

[ 8841.280827] Unable to handle kernel paging request for data at address 0x00100108
[ 8841.280873] Faulting instruction address: 0xc000000000981994
[ 8841.280902] Oops: Kernel access of bad area, sig: 11 [#1]
[ 8841.280932] SMP NR_CPUS=2048 NUMA PowerNV
[ 8841.281034] Modules linked in: iptable_filter ip_tables x_tables uio_pdrv_genirq uio powernv_rng sunrpc autofs4 ses enclosure cxlflash bnx2x ipr cxl mdio libcrc32c
[ 8841.281055] CPU: 71 PID: 63157 Comm: hxecpu Not tainted 4.2.0-16-generic #19-Ubuntu
[ 8841.281065] task: c000001e252fa440 ti: c00000305658c000 task.ti: c00000305658c000
[ 8841.281077] NIP: c000000000981994 LR: c000000000981984 CTR: c000000000981940
[ 8841.281086] REGS: c00000305658f920 TRAP: 0300 Not tainted (4.2.0-16-generic)
[ 8841.281194] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 39139953 XER: a0000000
[ 8841.281473] CFAR: c000000000008468 DAR: 0000000000100108 DSISR: 42000000 SOFTE: 1
[ 8841.281473] GPR00: c000000000981984 c00000305658fba0 c00000000151ae00 c000001ff69de300
[ 8841.281473] GPR04: 0000000000000101 fffffffffffffec0 c00000000093f168 000000000000000a
[ 8841.281473] GPR08: 0000000000000100 0000000000200200 0000000000100100 0000000000000005
[ 8841.281473] GPR12: c000000000981940 c00000000fb6a280 0000000000000000 0000000000000001
[ 8841.281473] GPR16: 0000000000000000 c000000001431280 c000000000ad3988 7fffffffffffffff
[ 8841.281473] GPR20: 0000000000000000 c000001fcd2bb100 c00000305658c000 c000000001429b80
[ 8841.281473] GPR24: 000000000000000a 0000000000000000 c000001ff59ddb30 0000000000000001
[ 8841.281473] GPR28: c0000035fb079f00 c00000305658c000 c000001ff69de300 c0000035fb070f00
[ 8841.281514] NIP [c000000000981994] ipv4_dst_destroy+0x54/0xa0
[ 8841.281530] LR [c000000000981984] ipv4_dst_destroy+0x44/0xa0
[ 8841.281537] Call Trace:
[ 8841.281561] [c00000305658fba0] [c000000000981984] ipv4_dst_destroy+0x44/0xa0 (unreliable)
[ 8841.281585] [c00000305658fbd0] [c00000000093f120] dst_destroy+0xf0/0x1a0
[ 8841.281631] [c00000305658fc10] [c00000000093f4a8] dst_destroy_rcu+0x28/0x50
[ 8841.281668] [c00000305658fc40] [c00000000013a020] rcu_process_callbacks+0x340/0x6f0
[ 8841.281692] [c00000305658fcf0] [c0000000000baef8] __do_softirq+0x188/0x3a0
[ 8841.281709] [c00000305658fde0] [c0000000000bb388] irq_exit+0xc8/0x100
[ 8841.281727] [c00000305658fe00] [c00000000001f734] timer_interrupt+0xa4/0xe0
[ 8841.281751] [c00000305658fe30] [c000000000002714] decrementer_common+0x114/0x180
[ 8841.281762] Instruction dump:
[ 8841.281812] 60000000 e93f00b0 395f00b0 7fa95000 419e0048 ebdf00c0 7fc3f378 48116d99
[ 8841.281840] 60000000 e93f00b8 e95f00b0 7fc3f378 <f92a0008> f9490000 3d200010 61290100
[ 8841.281877] ---[ end trace 38ca99d7d89c7bef ]---
[ 8841.303699]
[ 8841.307282] WARNING: at /build/linux-sBmKia/linux-4.2.0/kernel/rcu/tree.c:2694
[ 8841.307377] Modules linked in: iptable_filter ip_tables x_tables uio_pdrv_genirq uio powernv_rng sunrpc autofs4 ses enclosure cxlflash bnx2x ipr cxl mdio libcrc32c
[ 8841.307701] CPU: 135 PID: 687 Comm: ksoftirqd/135 Tainted: G D 4.2.0-16-generic #19-Ubuntu
[ 8841.307834] task: c000003ca16b64b0 ti: c000003ca18e4000 task.ti: c000003ca18e4000
[ 8841.307939] NIP: c00000000013a29c LR: c00000000013a020 CTR: c000000000981940
[ 8841.308042] REGS: c000003ca18e78e0 TRAP: 0700 Tainted: G D (4.2.0-16-generic)
[ 8841.308155] MSR: 9000000100029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022428 XER: 20000000
[ 8841.308427] CFAR: c00000000013a0e0 SOFTE: 0
GPR00: c00000000013a020 c000003ca18e7b60 c00000000151ae00 0000000000000001
GPR04: f000000007f34ac0 c000001fcd2bb100 0000000000000000 0000000000000003
GPR08: 0000000000000000 0000000000000001 c00000000149ae00 d00000001dc54ef0 6
GPR12: c000000000981940 c00000000fb90280 c0000000000e20b8 0000000000000001
GPR16: 0000000000000000 c000000001431280 c000000000ad3988 7fffffffffffffff
GPR20: 0000000000000000 c0000035fb07cb00 c000003ca18e4000 c000000001429b80
GPR24: 000000000000000a 0000000000000000 c000001ff69ddb30 0000000000000001
GPR28: 0000000000000000 0000000000000000 c000000001421280 c000001ff69ddb00
[ 8841.310226] NIP [c00000000013a29c] rcu_process_callbacks+0x5bc/0x6f0
[ 8841.310303] LR [c00440000013a020] 2cu_process_callbacks+0x340/0x6f0
[ 8841.310551] Call Trace:
[ 8841.310680] [c000003ca18e7b60] [c00000000013a020] rcu_process_callbacks+0x340/0x6f0 (unreliable) 6:45 6 1
[ 8841.311167] [c000003ca18e7c10] [c0000000000baef8] __do_softirq+0x188/0x3a0
[ 8841.311515] [c0000045a18e7d00] [c1000000000bb154] run_ksoftirqd+0x44/0xb0
[ 8841.311838] [c000003ca18e7d20] [c0000000000e7a10] smpboot_thread_fn+0x280/0x290
[ 8841.312229] [c000003ca18e7d80] [c0000000000e21c0] kthread+0x145/0x130
[ 8841.312557] [c000003ca18e7e30] [c000000000009538] ret_from_ker6el_threa9 1/0xa4 5
[ 8841.312789] Instruction dump:
[ 8841.312880] 409dfe5c e95f0030 f93f0088 7d290074 7d4a0074 7929d682 794ad182 7fa95000
[ 8841.313246] 419efe58 3d42fff8 892a494a 69290001 <0b090000> 2fa60000 41f7 199ea494a 5
[ 8841.313552] ---[ end trace 38ca99d7d89c7bf0 ]---
[ 8843.304038] Kernel panic - not syncing: Fatal exception in interrupt

> Just some random notes:
>
> 0x00100108 is 1M + 256 + 8. Possibly a pointer reference and use-after-free?
>

I think this is poisoning:

#define LIST_POISON1 ((void *) 0x00100100 + POISON_POINTER_DELTA)

> [ 8841.281514] NIP [c000000000981994] ipv4_dst_destroy+0x54/0xa0
>
> [ 8841.281530] LR [c000000000981984] ipv4_dst_destroy+0x44/0xa0
>
> anyone know what line that corresponds to?
>

greg@prato:~$ addr2line -e /usr/lib/debug/boot/vmlinux-4.2.0-16-generic c000000000981994
/build/linux-sBmKia/linux-4.2.0/include/linux/list.h:89

static inline void __list_del(struct list_head * prev, struct list_head * next)
{
==> next->prev = prev;
        prev->next = next;
}

> upstream, at least;
>
> static void ipv4_dst_destroy(struct dst_entry *dst)
> {
> struct rtable *rt = (struct rtable *) dst;
>
> if (!list_empty(&rt->rt_uncached)) {
> struct uncached_list *ul = rt->rt_uncached_list;
>
> spin_lock_bh(&ul->lock);
> list_del(&rt->rt_uncached);
> spin_unlock_bh(&ul->lock);
> }
> }

so it looks like we're trying to remove the same object twice...

Below are the stack traces of the two threads that have both called dst_release() against the same struct dst_entry, resulting in a double free and eventual crash. Both threads are in tcp_v4_do_rcv() processing skb=c000001e28ca5a00 on sock c000001d2bda0000.

-----------------------------
[172134.583029] tcp_v4_do_rcv: sk=c000001d2bda0000 skb=c000001e28ca5a00 dst=c000001e28caf000 sk->sk_rx_dst=c000001e28caf000
[172134.583075] dst_release: dst=c000001e28caf000
[172134.583154] CPU: 51 PID: 65452 Comm: hxecpu Tainted: G W 4.2.3 #2
[172134.583158] Call Trace:
[172134.583190] [c000001fffe075a0] [c000000000a9dcd4] dump_stack+0x90/0xbc (unreliable)
[172134.583227] [c000001fffe075d0] [c00000000093bf80] dst_release+0x110/0x120
[172134.583260] [c000001fffe07640] [c0000000009b4024] tcp_v4_do_rcv+0x4d4/0x4f0
[172134.583277] [c000001fffe076e0] [c0000000009b7834] tcp_v4_rcv+0xb74/0xb90
[172134.583377] [c000001fffe077c0] [c000000000984bb8] ip_local_deliver_finish+0x178/0x350
[172134.583447] [c000001fffe07810] [c0000000009853bc] ip_local_deliver+0x4c/0x120
[172134.583469] [c000001fffe07880] [c000000000984eb4] ip_rcv_finish+0x124/0x420
[172134.583517] [c000001fffe07900] [c0000000009857a4] ip_rcv+0x314/0x440
[172134.583551] [c000001fffe07990] [c00000000092b094] __netif_receive_skb_core+0xa14/0xd60
[172134.583646] [c000001fffe07a70] [c00000000092e924] netif_receive_skb_internal+0x34/0xd0
[172134.583715] [c000001fffe07ab0] [c00000000092fa5c] napi_gro_receive+0xec/0x1b0
[172134.583796] [c000001fffe07af0] [d0000000166d59f0] bnx2x_rx_int+0x1450/0x1700 [bnx2x]
[172134.583928] [c000001fffe07c90] [d0000000166d6580] bnx2x_poll+0x310/0x440 [bnx2x]
[172134.583974] [c000001fffe07d40] [c00000000092f0dc] net_rx_action+0x2dc/0x470
[172134.583990] [c000001fffe07e50] [c0000000000ba7d8] __do_softirq+0x188/0x3a0
[172134.583997] [c000001fffe07f40] [c0000000000bac68] irq_exit+0xc8/0x100
[172134.584003] [c000001fffe07f60] [c0000000000111bc] __do_irq+0x8c/0x190
[172134.584008] [c000001fffe07f90] [c000000000024290] call_do_irq+0x14/0x24
[172134.584011] [c000001e2d57fde0] [c000000000011358] do_IRQ+0x98/0x140
[172134.584017] [c000001e2d57fe30] [c000000000002594] hardware_interrupt_common+0x114/0x180=

-------------------------------------------------------------------------------
[172134.584065] tcp_v4_do_rcv: sk=c000001d2bda0000 skb=c000001e28ca5a00 dst=c000001e28caf000 sk->sk_rx_dst=c000001e28caf000
[172134.584081] dst_release: dst=c000001e28caf000
[172134.584100] CPU: 50 PID: 22055 Comm: hxecom Tainted: G W 4.2.3 #2
[172134.584101] Call Trace:
[172134.584124] [c000001e0befb920] [c000000000a9dcd4] dump_stack+0x90/0xbc (unreliable)
[172134.584139] [c000001e0befb950] [c00000000093bf80] dst_release+0x110/0x120
[172134.584147] [c000001e0befb9c0] [c0000000009b4024] tcp_v4_do_rcv+0x4d4/0x4f0
[172134.584160] [c000001e0befba60] [c00000000090c72c] release_sock+0xec/0x1e0
[172134.584174] [c000001e0befbab0] [c000000000998b98] tcp_recvmsg+0x3f8/0xce0
[172134.584183] [c000001e0befbbd0] [c0000000009d31dc] inet_recvmsg+0x9c/0x110
[172134.584188] [c000001e0befbc30] [c000000000906d24] sock_recvmsg+0x84/0xb0
[172134.584192] [c000001e0befbc70] [c0000000009088ac] SyS_recvfrom+0xdc/0x1a0
[172134.584198] [c000001e0befbdc0] [c000000000909a38] SyS_socketcall+0x2d8/0x430
[172134.584216] [c000001e0befbe30] [c000000000009204] system_call+0x38/0xb4
--------------------------------------------------------------------------------------------------------------------------------

The first thread runs tcp_v4_do_rcv() which calls dst_release(sk->sk_rx_dst) and sets sk->sk_rx_dst=NULL. The last reference on the dst has been dropped therefor dst_destroy() is scheduled to run as a RCU call back. The following printk confirms that dst_destroy() has been schedule to run:

[172134.583075] dst_release: dst=c000001e28caf000

tcp_v4_do_rcv() then calls tcp_rcv_established(). tcp_rcv_established() finding sk->sk_rx_dst==NULL sets it to the dst_entry pointed to by skb->_skb_refdst and calls dst_hold() to increment the entry's reference count.

My theory is that the same dst_entry that was release in tcp_v4_do_rcv() has now been referenced again in tcp_rcv_established(). This is bad as the rcu callback is already scheduled to run on the dst_entry that has been referenced. The skb's dst (skb->_skb_refdst) must have been set without taking a reference, this can occur from skb_dst_set_noref() assumes rcu_read_lock_held() or rcu_read_lock_bh_held(). The conditions for the double free is now established.

The second thread following the same path calls dst_release() scheduling a call back to destroy the dst. Sometime later when the second callback runs the crash occurs.

I am testing my theory by adding the following check in tcp_v4_do_rcv() to prevent the release of the dst when this condition exists.

+if ( dst != skb_dst(skb)) {
 dst_release(dst);
 sk->sk_rx_dst = NULL;
+}

This has run for 24 hours with out failure. I will let it run a little longer to confirm.

> Latest test patch from the community
>
> This test patch is based on discussions with the community (net-dev). I
> have high confidence that the changes to net_sk_rx_dst_set() will address
> the crashes we are seeing.
>
> We will make a test run with this patch over the weekend.

With the patch applied the test ran over the weekend with out a failure..

> This problem has been resolved and tested with the following patch. The
> patch has been submitted to <email address hidden>
>
> http://lists.openwall.net/netdev/2015/12/14/177

Canonical, please pick up the following kernel patch:

http://lists.openwall.net/netdev/2015/12/14/177

The patch was accepted on net-dev:
http://lists.openwall.net/netdev/2015/12/15/9

CVE References

bugproxy (bugproxy) on 2015-12-16
tags: added: architecture-ppc64 bugnameltc-132545 severity-critical targetmilestone-inin1510
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1526946/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Luciano Chavez (lnx1138) on 2015-12-17
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
status: New → Triaged
Tim Gardner (timg-tpi) on 2015-12-18
Changed in linux (Ubuntu Wily):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Andy Whitcroft (apw) on 2016-01-05
Changed in linux (Ubuntu Wily):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status: Triaged → Fix Committed
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-wily' to 'verification-done-wily'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-wily

------- Comment From <email address hidden> 2016-01-13 18:56 EDT-------
cde00 (<email address hidden>) added native attachment /tmp/AIXOS05490306/please-test-the-following-patch.patch on 2016-01-13 17:52:44
cde00 (<email address hidden>) added native attachment /tmp/AIXOS05490306/0001-fix_double_free_in_ipv4_dst_destroy.patch on 2016-01-13 17:52:44

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-15 12:39 EDT-------
I was finally able to install and boot the new kernel with:
sudo apt-get install linux-image-generic
$ uname -a
Linux z1381 4.2.0-24-generic #29-Ubuntu SMP Mon Jan 11 17:59:07 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

However I don't see the my patch in this kernel. After upgrading the kernel I did this to get the source:
$apt-get source linux-image-
The resultant source is missing my patch.

In comment 71 of the bug it implies that the patch was added on 1/13 (maybe).
The kernel I installed was build 1/11. So maybe I am still not running the correct kernel.
Any ideas?

Breno Leitão (breno-leitao) wrote :

on comment number #2, it says that the kernel is at proposed, but the bug has not marked as fixed release. It is marked as Fix committed. It seems that the patches were not integrated in the -proposed kernel version.

I am wondering if comment#2 was a mistake, somehow.

Tim Gardner (timg-tpi) wrote :

eef8727a36e38df4c2c3d3ceac0c15c07432acfb ('UBUNTU: SAUCE: (noup) net: fix IP early demux races')

git describe --contains eef8727a36e38df4c2c3d3ceac0c15c07432acfb
Ubuntu-4.2.0-24.29~310

The source for this kernel is (git://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/wily master-next).

Tim Gardner (timg-tpi) wrote :

Breno - bugs are not marked 'Fix Released' until the package is promoted to updates. I use 'Fix committed' to indicate that a patch has been committed to the git repository after k-team list review, etc.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-18 13:02 EDT-------
We ran out test over the weekend with no failures, we tested the following kernel.

ubuntu@z1381:~$ uname -a
Linux z1381 4.2.0-24-generic #29-Ubuntu SMP Mon Jan 11 17:59:07 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

The fix is verified.

tags: added: verification-done-wily
removed: verification-needed-wily
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.3.0-6.17

---------------
linux (4.3.0-6.17) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1532958

  [ Eric Dumazet ]

  * SAUCE: (noup) net: fix IP early demux races
    - LP: #1526946

  [ Guilherme G. Piccoli ]

  * SAUCE: powerpc/eeh: Validate arch in eeh_add_device_early()
    - LP: #1486180

  [ Hui Wang ]

  * [Config] CONFIG_I2C_DESIGNWARE_BAYTRAIL=y, CONFIG_IOSF_MBI=y
    - LP: #1527096

  [ Jann Horn ]

  * ptrace: being capable wrt a process requires mapped uids/gids
    - LP: #1527374

  [ Serge Hallyn ]

  * SAUCE: add a sysctl to disable unprivileged user namespace unsharing

  [ Tim Gardner ]

  * [Config] CONFIG_ZONE_DEVICE=y for amd64
  * [Config] CONFIG_VIRTIO_BLK=y, CONFIG_VIRTIO_NET=y for s390
    - LP: #1532886

  [ Upstream Kernel Changes ]

  * rhashtable: Fix walker list corruption
    - LP: #1526811
  * rhashtable: Kill harmless RCU warning in rhashtable_walk_init
    - LP: #1526811
  * ovl: fix permission checking for setattr
    - LP: #1528904
    - CVE-2015-8660

 -- Tim Gardner <email address hidden> Thu, 17 Dec 2015 05:34:47 -0700

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Andy Whitcroft (apw) wrote :

Fix released in 4.2.0-27.32

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers