BUG: unable to handle kernel paging request at ffff820504022108

Bug #1615681 reported by Neil
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

We get this kernel panic while running trusty with 3.19.0-26-generic.
It seems there is a bug in nf_nat module, I got kdump, if needed, I will upload.

[702076.560806] BUG: unable to handle kernel paging request at ffff820504022108
[702076.564428] IP: [<ffffffffc049e1c6>] nf_nat_setup_info+0x236/0x360 [nf_nat]
[702076.567971] PGD 0
[702076.571349] Oops: 0002 [#1] SMP
[702076.574656] Modules linked in: xt_nat iptable_nat nf_nat_ipv4 sch_sfq sch_htb xt_comment xt_iprange xt_physdev act_police cls_u32 sch_ingress ebt_dnat ebt_ip ebtable_nat ebt_arp veth xt_conntrack nbd nf_conntrack_netlink xt_TCPMSS xt_CT xt_set ip_set_hash_net ip_set nfnetlink iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter netconsole configfs xt_CHECKSUM xt_tcpudp iptable_mangle ip_tables x_tables bonding intel_rapl(E) iosf_mbi x86_pkg_temp_thermal(E) nf_nat intel_powerclamp(E) nf_conntrack_pptp nf_conntrack_proto_gre coretemp(E) br_netfilter crct10dif_pclmul ipmi_ssif(E) crc32_pclmul ghash_clmulni_intel aesni_intel bridge stp aes_x86_64 llc lrw ipmi_devintf(E) nf_conntrack_ipv4 gf128mul nf_defrag_ipv4 glue_helper ablk_helper cryptd mxm_wmi(E) dcdbas(E) sb_edac(E) mei_me(E)
[702076.613908] nf_conntrack mei(E) lpc_ich(E) edac_core(E) drbd(OE) ipmi_si(E) ipmi_msghandler 8250_fintek(E) shpchp(E) vhost_net vhost acpi_power_meter(E) macvtap wmi(E) macvlan mac_hid(E) kvm_intel kvm lp xfs parport libcrc32c ahci megaraid_sas(E) libahci tg3(E) ixgbe(OE) dca(E) vxlan ip6_udp_tunnel udp_tunnel ptp pps_core
[702076.638776] CPU: 19 PID: 0 Comm: swapper/19 Tainted: G OE 3.19.0-26-generic #28~14.04.1
[702076.649826] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.5 04/11/2016
[702076.661111] task: ffff883011c49d70 ti: ffff881812224000 task.ti: ffff881812224000
[702076.672654] RIP: 0010:[<ffffffffc049e1c6>] [<ffffffffc049e1c6>] nf_nat_setup_info+0x236/0x360 [nf_nat]
[702076.684662] RSP: 0018:ffff88301eb235b8 EFLAGS: 00010286
[702076.690647] RAX: ffff882fbcd21700 RBX: ffff882f17b1bd40 RCX: ffff820504022100
[702076.702499] RDX: ffff88180fb7f038 RSI: 000000003637ced5 RDI: ffffffffc04a04a0
[702076.714455] RBP: ffff88301eb23668 R08: 0000000000000000 R09: ffff88301eb235b8
[702076.726823] R10: ffff88301eb23560 R11: ffff88300eeb8000 R12: 000000000006fe07
[702076.739167] R13: 0000000000000000 R14: ffffffff81cda040 R15: ffff88301eb23678
[702076.751481] FS: 0000000000000000(0000) GS:ffff88301eb20000(0000) knlGS:0000000000000000
[702076.763823] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[702076.770007] CR2: ffff820504022108 CR3: 0000000001c16000 CR4: 00000000001427e0
[702076.782162] Stack:
[702076.788035] 000000002755e673 0000000000000000 a402c68b0002259b 0000000000000000
[702076.799835] 0006500000000000 000000002755e673 0000000000000000 a402c68b0002259b
[702076.811642] 0000000000000000 0006500000000000 ffff88300db05800 ffff8830001b0e70
[702076.823573] Call Trace:
[702076.829377] <IRQ>
[702076.829447]
[702076.835069] [<ffffffffc049e342>] __nf_nat_alloc_null_binding+0x52/0x80 [nf_nat]
[702076.846350] [<ffffffffc049e391>] nf_nat_alloc_null_binding+0x21/0x30 [nf_nat]
[702076.857721] [<ffffffffc05d283d>] nf_nat_ipv4_fn+0x1dd/0x230 [nf_nat_ipv4]
[702076.863548] [<ffffffffc05d7020>] ? iptable_nat_ipv4_fn+0x20/0x20 [iptable_nat]
[702076.874933] [<ffffffffc0518580>] ? br_parse_ip_options+0x1b0/0x1b0 [br_netfilter]
[702076.886328] [<ffffffffc05d2968>] nf_nat_ipv4_out+0x48/0xf0 [nf_nat_ipv4]
[702076.892120] [<ffffffffc0518580>] ? br_parse_ip_options+0x1b0/0x1b0 [br_netfilter]
[702076.903384] [<ffffffffc05d7085>] iptable_nat_ipv4_out+0x15/0x20 [iptable_nat]
[702076.914648] [<ffffffff816dcb1a>] nf_iterate+0x9a/0xb0
[702076.920275] [<ffffffffc0518580>] ? br_parse_ip_options+0x1b0/0x1b0 [br_netfilter]
[702076.931324] [<ffffffff816dcba4>] nf_hook_slow+0x74/0x130
[702076.936913] [<ffffffffc0518580>] ? br_parse_ip_options+0x1b0/0x1b0 [br_netfilter]
[702076.947932] [<ffffffffc05188a3>] br_nf_post_routing+0x283/0x370 [br_netfilter]
[702076.959286] [<ffffffffc0458c30>] ? deliver_clone+0x60/0x60 [bridge]
[702076.965049] [<ffffffffc0458c30>] ? deliver_clone+0x60/0x60 [bridge]
[702076.970612] [<ffffffff816dcb1a>] nf_iterate+0x9a/0xb0
[702076.976050] [<ffffffffc0458c30>] ? deliver_clone+0x60/0x60 [bridge]
[702076.981437] [<ffffffff816dcba4>] nf_hook_slow+0x74/0x130
[702076.986715] [<ffffffffc0458c30>] ? deliver_clone+0x60/0x60 [bridge]
[702076.992071] [<ffffffffc0458fc0>] ? br_flood+0x160/0x160 [bridge]
[702076.997487] [<ffffffffc0459012>] br_forward_finish+0x52/0x60 [bridge]
[702077.002855] [<ffffffffc0518a5b>] br_nf_forward_finish+0xcb/0x1d0 [br_netfilter]
[702077.013278] [<ffffffffc0518d8c>] br_nf_forward_ip+0x22c/0x400 [br_netfilter]
[702077.023700] [<ffffffffc0458fc0>] ? br_flood+0x160/0x160 [bridge]
[702077.029033] [<ffffffffc0458fc0>] ? br_flood+0x160/0x160 [bridge]
[702077.034166] [<ffffffff816dcb1a>] nf_iterate+0x9a/0xb0
[702077.039182] [<ffffffffc0458fc0>] ? br_flood+0x160/0x160 [bridge]
[702077.044116] [<ffffffff816dcba4>] nf_hook_slow+0x74/0x130
[702077.048952] [<ffffffffc0458fc0>] ? br_flood+0x160/0x160 [bridge]
[702077.053787] [<ffffffffc04591d0>] __br_forward+0xb0/0xf0 [bridge]
[702077.058482] [<ffffffffc04594c3>] br_forward+0x93/0xb0 [bridge]
[702077.063064] [<ffffffffc045a3cb>] br_handle_frame_finish+0x13b/0x540 [bridge]
[702077.071975] [<ffffffffc0519438>] br_nf_pre_routing_finish+0x138/0x3e0 [br_netfilter]
[702077.081089] [<ffffffffc0519993>] br_nf_pre_routing+0x2b3/0x67d [br_netfilter]
[702077.090349] [<ffffffffc045a290>] ? br_handle_local_finish+0x90/0x90 [bridge]
[702077.099691] [<ffffffff816dcb1a>] nf_iterate+0x9a/0xb0
[702077.104349] [<ffffffffc045a290>] ? br_handle_local_finish+0x90/0x90 [bridge]
[702077.113680] [<ffffffff816dcba4>] nf_hook_slow+0x74/0x130
[702077.118542] [<ffffffffc045a290>] ? br_handle_local_finish+0x90/0x90 [bridge]
[702077.128284] [<ffffffffc045a958>] br_handle_frame+0x188/0x250 [bridge]
[702077.133363] [<ffffffffc00c36cd>] ? ixgbe_clean_rx_irq+0x84d/0x1030 [ixgbe]
[702077.138488] [<ffffffff816a9e42>] __netif_receive_skb_core+0x1b2/0x790
[702077.143624] [<ffffffff816aa438>] __netif_receive_skb+0x18/0x60
[702077.148671] [<ffffffff816ab1a6>] process_backlog+0xa6/0x150
[702077.153592] [<ffffffff816aa8f9>] net_rx_action+0x159/0x340
[702077.158403] [<ffffffff81078ee4>] __do_softirq+0xe4/0x270
[702077.163088] [<ffffffff810792ad>] irq_exit+0x9d/0xb0
[702077.167676] [<ffffffff817b986a>] do_IRQ+0x5a/0xf0
[702077.172319] [<ffffffff817b766d>] common_interrupt+0x6d/0x6d
[702077.176967] <EOI>
[702077.177035]
[702077.181593] [<ffffffff8164fed0>] ? cpuidle_enter_state+0x70/0x170
[702077.186279] [<ffffffff8164febd>] ? cpuidle_enter_state+0x5d/0x170
[702077.190914] [<ffffffff81650087>] cpuidle_enter+0x17/0x20
[702077.195573] [<ffffffff810b5404>] cpu_startup_entry+0x334/0x3d0
[702077.200246] [<ffffffff810e9d63>] ? clockevents_register_device+0xe3/0x140
[702077.204961] [<ffffffff81048cf7>] start_secondary+0x197/0x1c0
[702077.209553] Code: c0 0f 84 3e 01 00 00 48 01 d0 48 89 58 10 49 8b 96 98 0b 00 00 4a 8d 14 e2 48 8b 0a 48 89 50 08 48 89 08 48 85 c9 48 89 02 74 04 <48> 89 41 08 48 c7 c7 a0 04 4a c0 e8 5a 82 31 c1 48 81 8b 80 00
[702077.223475] RIP [<ffffffffc049e1c6>] nf_nat_setup_info+0x236/0x360 [nf_nat]
[702077.228017] RSP <ffff88301eb235b8>
[702077.232409] CR2: ffff820504022108

Tags: trusty
Revision history for this message
Neil (loyou) wrote :

I attached the kernel log

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1615681

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Neil (loyou) wrote :

it was running on ubuntu server and cannot run apport-collect, I attached the kernel log

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a kernel version where you were not having this particular problem? This will help determine if the problem you are seeing is the result of a regression, and when this regression was introduced. If this is a regression, we can perform a kernel bisect to identify the commit that introduced the problem.

Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.8 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.8-rc3

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Neil (loyou) wrote :

Thanks for your suggestion.
This issue is not happened on any update/upgrade, and only happened once.
Because we cannot get any operation path to reproduce it, so test with the latest upstream kernel cannot help make it clear.
Anyway, I will try to share some more information on current kdump.

Revision history for this message
Neil (loyou) wrote :

kernel panic while paging request at ffff820504022108.
```
[702076.560806] BUG: unable to handle kernel paging request at ffff820504022108
[702076.564428] IP: [<ffffffffc049e1c6>] nf_nat_setup_info+0x236/0x360 [nf_nat]
```
we can see the instructions near the corruption, and rcx=ffff820504022100.
```
crash> dis -lr ffffffffc049e1c6
...
/root/ubuntu-trusty/net/netfilter/nf_nat_core.c: 429
0xffffffffc049e1a9 <nf_nat_setup_info+537>: mov 0xb98(%r14),%rdx
0xffffffffc049e1b0 <nf_nat_setup_info+544>: lea (%rdx,%r12,8),%rdx
/root/ubuntu-trusty/include/linux/rculist.h: 398
0xffffffffc049e1b4 <nf_nat_setup_info+548>: mov (%rdx),%rcx
/root/ubuntu-trusty/include/linux/rculist.h: 401
0xffffffffc049e1b7 <nf_nat_setup_info+551>: mov %rdx,0x8(%rax)
/root/ubuntu-trusty/include/linux/rculist.h: 400
0xffffffffc049e1bb <nf_nat_setup_info+555>: mov %rcx,(%rax)
/root/ubuntu-trusty/include/linux/rculist.h: 403
0xffffffffc049e1be <nf_nat_setup_info+558>: test %rcx,%rcx
/root/ubuntu-trusty/include/linux/rculist.h: 402
0xffffffffc049e1c1 <nf_nat_setup_info+561>: mov %rax,(%rdx)
/root/ubuntu-trusty/include/linux/rculist.h: 403
0xffffffffc049e1c4 <nf_nat_setup_info+564>: je 0xffffffffc049e1ca <nf_nat_setup_info+570>
/root/ubuntu-trusty/include/linux/rculist.h: 404
0xffffffffc049e1c6 <nf_nat_setup_info+566>: mov %rax,0x8(%rcx)
```
source code near corruption in rculist.h:
```
395 static inline void hlist_add_head_rcu(struct hlist_node *n,
396 struct hlist_head *h)
397 {
398 struct hlist_node *first = h->first;
399
400 n->next = first;
401 n->pprev = &h->first;
402 rcu_assign_pointer(hlist_first_rcu(h), n);
403 if (first)
404 first->pprev = &n->next;
405 }
```
first=ecx=ffff820504022100 on line 404.
first is the first pointer of hlist(net->ct.nat_bysource[srchash]) and hlist should have corrupted at somewhere.
```
375 unsigned int
376 nf_nat_setup_info(struct nf_conn *ct,
377 const struct nf_nat_range *range,
378 enum nf_nat_manip_type maniptype)
379 {
...
420 if (maniptype == NF_NAT_MANIP_SRC) {
421 unsigned int srchash;
422
423 srchash = hash_by_src(net, nf_ct_zone(ct),
424 &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
425 spin_lock_bh(&nf_nat_lock);
426 /* nf_conntrack_alter_reply might re-allocate extension aera */
427 nat = nfct_nat(ct);
428 nat->ct = ct;
429 hlist_add_head_rcu(&nat->bysource,
430 &net->ct.nat_bysource[srchash]);
431 spin_unlock_bh(&nf_nat_lock);
432 }
```

Revision history for this message
penalvch (penalvch) wrote :

Neil, as per https://wiki.ubuntu.com/Kernel/LTSEnablementStack kernel series 3.19.x is EOL as of August 2016.

If you update to the supported enablement stack as per that article is this reproducible?

Revision history for this message
Neil (loyou) wrote :

it is not reproducible, and we have seen this issue only once.
as the corrupted pointer is point to supervisor area reserved. I cannot make sure if it is caused by code logic error or memory corruption.

Revision history for this message
penalvch (penalvch) wrote :

Neil, closing out for now. If it's reproducible with a supported release/stack, please feel free to reopen.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.