Surelock GA2: Kernel panic with GA candidate driver, warning at kernel/rcu/tree.c:2694
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
High
|
Canonical Kernel Team | ||
Wily |
Fix Released
|
Undecided
|
Tim Gardner | ||
Xenial |
Fix Released
|
High
|
Canonical Kernel Team |
Bug Description
-- Problem Description --
System was loaded (Ubuntu 15.10 base (Linux z1391 4.2.0-16-generic #19-Ubuntu SMP Thu Oct 8 14:49:47 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux)
Kernel panic running Hardware Test Exerciser test suite
[ 8841.280827] Unable to handle kernel paging request for data at address 0x00100108
[ 8841.280873] Faulting instruction address: 0xc000000000981994
[ 8841.280902] Oops: Kernel access of bad area, sig: 11 [#1]
[ 8841.280932] SMP NR_CPUS=2048 NUMA PowerNV
[ 8841.281034] Modules linked in: iptable_filter ip_tables x_tables uio_pdrv_genirq uio powernv_rng sunrpc autofs4 ses enclosure cxlflash bnx2x ipr cxl mdio libcrc32c
[ 8841.281055] CPU: 71 PID: 63157 Comm: hxecpu Not tainted 4.2.0-16-generic #19-Ubuntu
[ 8841.281065] task: c000001e252fa440 ti: c00000305658c000 task.ti: c00000305658c000
[ 8841.281077] NIP: c000000000981994 LR: c000000000981984 CTR: c000000000981940
[ 8841.281086] REGS: c00000305658f920 TRAP: 0300 Not tainted (4.2.0-16-generic)
[ 8841.281194] MSR: 9000000000009033 <SF,HV,
[ 8841.281473] CFAR: c000000000008468 DAR: 0000000000100108 DSISR: 42000000 SOFTE: 1
[ 8841.281473] GPR00: c000000000981984 c00000305658fba0 c00000000151ae00 c000001ff69de300
[ 8841.281473] GPR04: 0000000000000101 fffffffffffffec0 c00000000093f168 000000000000000a
[ 8841.281473] GPR08: 0000000000000100 0000000000200200 0000000000100100 0000000000000005
[ 8841.281473] GPR12: c000000000981940 c00000000fb6a280 0000000000000000 0000000000000001
[ 8841.281473] GPR16: 0000000000000000 c000000001431280 c000000000ad3988 7fffffffffffffff
[ 8841.281473] GPR20: 0000000000000000 c000001fcd2bb100 c00000305658c000 c000000001429b80
[ 8841.281473] GPR24: 000000000000000a 0000000000000000 c000001ff59ddb30 0000000000000001
[ 8841.281473] GPR28: c0000035fb079f00 c00000305658c000 c000001ff69de300 c0000035fb070f00
[ 8841.281514] NIP [c000000000981994] ipv4_dst_
[ 8841.281530] LR [c000000000981984] ipv4_dst_
[ 8841.281537] Call Trace:
[ 8841.281561] [c00000305658fba0] [c000000000981984] ipv4_dst_
[ 8841.281585] [c00000305658fbd0] [c00000000093f120] dst_destroy+
[ 8841.281631] [c00000305658fc10] [c00000000093f4a8] dst_destroy_
[ 8841.281668] [c00000305658fc40] [c00000000013a020] rcu_process_
[ 8841.281692] [c00000305658fcf0] [c0000000000baef8] __do_softirq+
[ 8841.281709] [c00000305658fde0] [c0000000000bb388] irq_exit+0xc8/0x100
[ 8841.281727] [c00000305658fe00] [c00000000001f734] timer_interrupt
[ 8841.281751] [c00000305658fe30] [c000000000002714] decrementer_
[ 8841.281762] Instruction dump:
[ 8841.281812] 60000000 e93f00b0 395f00b0 7fa95000 419e0048 ebdf00c0 7fc3f378 48116d99
[ 8841.281840] 60000000 e93f00b8 e95f00b0 7fc3f378 <f92a0008> f9490000 3d200010 61290100
[ 8841.281877] ---[ end trace 38ca99d7d89c7bef ]---
[ 8841.303699]
[ 8841.307282] WARNING: at /build/
[ 8841.307377] Modules linked in: iptable_filter ip_tables x_tables uio_pdrv_genirq uio powernv_rng sunrpc autofs4 ses enclosure cxlflash bnx2x ipr cxl mdio libcrc32c
[ 8841.307701] CPU: 135 PID: 687 Comm: ksoftirqd/135 Tainted: G D 4.2.0-16-generic #19-Ubuntu
[ 8841.307834] task: c000003ca16b64b0 ti: c000003ca18e4000 task.ti: c000003ca18e4000
[ 8841.307939] NIP: c00000000013a29c LR: c00000000013a020 CTR: c000000000981940
[ 8841.308042] REGS: c000003ca18e78e0 TRAP: 0700 Tainted: G D (4.2.0-16-generic)
[ 8841.308155] MSR: 9000000100029033 <SF,HV,
[ 8841.308427] CFAR: c00000000013a0e0 SOFTE: 0
GPR00: c00000000013a020 c000003ca18e7b60 c00000000151ae00 0000000000000001
GPR04: f000000007f34ac0 c000001fcd2bb100 0000000000000000 0000000000000003
GPR08: 0000000000000000 0000000000000001 c00000000149ae00 d00000001dc54ef0 6
GPR12: c000000000981940 c00000000fb90280 c0000000000e20b8 0000000000000001
GPR16: 0000000000000000 c000000001431280 c000000000ad3988 7fffffffffffffff
GPR20: 0000000000000000 c0000035fb07cb00 c000003ca18e4000 c000000001429b80
GPR24: 000000000000000a 0000000000000000 c000001ff69ddb30 0000000000000001
GPR28: 0000000000000000 0000000000000000 c000000001421280 c000001ff69ddb00
[ 8841.310226] NIP [c00000000013a29c] rcu_process_
[ 8841.310303] LR [c00440000013a020] 2cu_process_
[ 8841.310551] Call Trace:
[ 8841.310680] [c000003ca18e7b60] [c00000000013a020] rcu_process_
[ 8841.311167] [c000003ca18e7c10] [c0000000000baef8] __do_softirq+
[ 8841.311515] [c0000045a18e7d00] [c1000000000bb154] run_ksoftirqd+
[ 8841.311838] [c000003ca18e7d20] [c0000000000e7a10] smpboot_
[ 8841.312229] [c000003ca18e7d80] [c0000000000e21c0] kthread+0x145/0x130
[ 8841.312557] [c000003ca18e7e30] [c000000000009538] ret_from_
[ 8841.312789] Instruction dump:
[ 8841.312880] 409dfe5c e95f0030 f93f0088 7d290074 7d4a0074 7929d682 794ad182 7fa95000
[ 8841.313246] 419efe58 3d42fff8 892a494a 69290001 <0b090000> 2fa60000 41f7 199ea494a 5
[ 8841.313552] ---[ end trace 38ca99d7d89c7bf0 ]---
[ 8843.304038] Kernel panic - not syncing: Fatal exception in interrupt
> Just some random notes:
>
> 0x00100108 is 1M + 256 + 8. Possibly a pointer reference and use-after-free?
>
I think this is poisoning:
#define LIST_POISON1 ((void *) 0x00100100 + POISON_
> [ 8841.281514] NIP [c000000000981994] ipv4_dst_
>
> [ 8841.281530] LR [c000000000981984] ipv4_dst_
>
> anyone know what line that corresponds to?
>
greg@prato:~$ addr2line -e /usr/lib/
/build/
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
==> next->prev = prev;
prev->next = next;
}
> upstream, at least;
>
> static void ipv4_dst_
> {
> struct rtable *rt = (struct rtable *) dst;
>
> if (!list_
> struct uncached_list *ul = rt->rt_
>
> spin_lock_
> list_del(
> spin_unlock_
> }
> }
so it looks like we're trying to remove the same object twice...
Below are the stack traces of the two threads that have both called dst_release() against the same struct dst_entry, resulting in a double free and eventual crash. Both threads are in tcp_v4_do_rcv() processing skb=c000001e28c
-------
[172134.583029] tcp_v4_do_rcv: sk=c000001d2bda0000 skb=c000001e28c
[172134.583075] dst_release: dst=c000001e28c
[172134.583154] CPU: 51 PID: 65452 Comm: hxecpu Tainted: G W 4.2.3 #2
[172134.583158] Call Trace:
[172134.583190] [c000001fffe075a0] [c000000000a9dcd4] dump_stack+
[172134.583227] [c000001fffe075d0] [c00000000093bf80] dst_release+
[172134.583260] [c000001fffe07640] [c0000000009b4024] tcp_v4_
[172134.583277] [c000001fffe076e0] [c0000000009b7834] tcp_v4_
[172134.583377] [c000001fffe077c0] [c000000000984bb8] ip_local_
[172134.583447] [c000001fffe07810] [c0000000009853bc] ip_local_
[172134.583469] [c000001fffe07880] [c000000000984eb4] ip_rcv_
[172134.583517] [c000001fffe07900] [c0000000009857a4] ip_rcv+0x314/0x440
[172134.583551] [c000001fffe07990] [c00000000092b094] __netif_
[172134.583646] [c000001fffe07a70] [c00000000092e924] netif_receive_
[172134.583715] [c000001fffe07ab0] [c00000000092fa5c] napi_gro_
[172134.583796] [c000001fffe07af0] [d0000000166d59f0] bnx2x_rx_
[172134.583928] [c000001fffe07c90] [d0000000166d6580] bnx2x_poll+
[172134.583974] [c000001fffe07d40] [c00000000092f0dc] net_rx_
[172134.583990] [c000001fffe07e50] [c0000000000ba7d8] __do_softirq+
[172134.583997] [c000001fffe07f40] [c0000000000bac68] irq_exit+0xc8/0x100
[172134.584003] [c000001fffe07f60] [c0000000000111bc] __do_irq+0x8c/0x190
[172134.584008] [c000001fffe07f90] [c000000000024290] call_do_
[172134.584011] [c000001e2d57fde0] [c000000000011358] do_IRQ+0x98/0x140
[172134.584017] [c000001e2d57fe30] [c000000000002594] hardware_
-------
[172134.584065] tcp_v4_do_rcv: sk=c000001d2bda0000 skb=c000001e28c
[172134.584081] dst_release: dst=c000001e28c
[172134.584100] CPU: 50 PID: 22055 Comm: hxecom Tainted: G W 4.2.3 #2
[172134.584101] Call Trace:
[172134.584124] [c000001e0befb920] [c000000000a9dcd4] dump_stack+
[172134.584139] [c000001e0befb950] [c00000000093bf80] dst_release+
[172134.584147] [c000001e0befb9c0] [c0000000009b4024] tcp_v4_
[172134.584160] [c000001e0befba60] [c00000000090c72c] release_
[172134.584174] [c000001e0befbab0] [c000000000998b98] tcp_recvmsg+
[172134.584183] [c000001e0befbbd0] [c0000000009d31dc] inet_recvmsg+
[172134.584188] [c000001e0befbc30] [c000000000906d24] sock_recvmsg+
[172134.584192] [c000001e0befbc70] [c0000000009088ac] SyS_recvfrom+
[172134.584198] [c000001e0befbdc0] [c000000000909a38] SyS_socketcall+
[172134.584216] [c000001e0befbe30] [c000000000009204] system_
-------
The first thread runs tcp_v4_do_rcv() which calls dst_release(
[172134.583075] dst_release: dst=c000001e28c
tcp_v4_do_rcv() then calls tcp_rcv_
My theory is that the same dst_entry that was release in tcp_v4_do_rcv() has now been referenced again in tcp_rcv_
The second thread following the same path calls dst_release() scheduling a call back to destroy the dst. Sometime later when the second callback runs the crash occurs.
I am testing my theory by adding the following check in tcp_v4_do_rcv() to prevent the release of the dst when this condition exists.
+if ( dst != skb_dst(skb)) {
dst_release(dst);
sk->sk_rx_dst = NULL;
+}
This has run for 24 hours with out failure. I will let it run a little longer to confirm.
> Latest test patch from the community
>
> This test patch is based on discussions with the community (net-dev). I
> have high confidence that the changes to net_sk_rx_dst_set() will address
> the crashes we are seeing.
>
> We will make a test run with this patch over the weekend.
With the patch applied the test ran over the weekend with out a failure..
> This problem has been resolved and tested with the following patch. The
> patch has been submitted to <email address hidden>
>
> http://
Canonical, please pick up the following kernel patch:
http://
The patch was accepted on net-dev:
http://
Related branches
CVE References
tags: | added: architecture-ppc64 bugnameltc-132545 severity-critical targetmilestone-inin1510 |
Changed in ubuntu: | |
assignee: | nobody → Taco Screen team (taco-screen-team) |
affects: | ubuntu → linux (Ubuntu) |
Changed in linux (Ubuntu): | |
assignee: | Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team) |
importance: | Undecided → High |
status: | New → Triaged |
Changed in linux (Ubuntu Wily): | |
assignee: | nobody → Tim Gardner (timg-tpi) |
status: | New → In Progress |
Changed in linux (Ubuntu Wily): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Xenial): | |
status: | Triaged → Fix Committed |
Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https:/ /wiki.ubuntu. com/Bugs/ FindRightPackag e. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.
To change the source package that this bug is filed about visit https:/ /bugs.launchpad .net/ubuntu/ +bug/1526946/ +editstatus and add the package name in the text box next to the word Package.
[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]