openvswitch, ppc64el: oops when calling kmem_cache_free from flow_free
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
In Progress
|
Medium
|
Unassigned |
Bug Description
[Impact]
Users of openvswitch on ppc64el 4.1+ kernels may run into the following kernel oops:
Faulting instruction address: 0xc000000000291d60
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: veth openvswitch libcrc32c ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_
CPU: 16 PID: 996 Comm: kworker/16:1 Not tainted 4.1.0-1-generic #1~rc2-Ubuntu
Workqueue: events od_dbs_timer
task: c000000fe49ea400 ti: c000000fe1380000 task.ti: c000000fe1380000
NIP: c000000000291d60 LR: c000000000292254 CTR: c000000000292140
REGS: c000000fe1383170 TRAP: 0300 Not tainted (4.1.0-1-generic)
MSR: 9000000000009033 <SF,HV,
CFAR: c000000000008468 DAR: 00000000ffffffff DSISR: 42000000 SOFTE: 1
GPR00: c000000000292254 c000000fe13833f0 c0000000014bda00 c000000ff701f800
GPR04: f0000000003fffc0 00000000ffffffff d0000000120cd694 0000000000002fb2
GPR08: 0000000000000000 0000000000000000 0000000000210d00 d0000000120d2520
GPR12: c000000000292140 c00000000fb89000 c0000000000de5f8 000000000000000a
GPR16: c000000febbe3828 0000000000000001 0000000000000000 c0000000013d0c80
GPR20: c000000000aa36f8 7fffffffffffffff 0000000000000000 0000000000000001
GPR24: c0000000013c9200 0000000000210d00 00000000ffffffff c000000ff701f800
GPR28: 0000000000000001 0000000000000000 0000000000000000 f0000000003fffc0
NIP [c000000000291d60] __slab_
LR [c000000000292254] kmem_cache_
Call Trace:
[c000000fe13833f0] [c000000fe1383500] 0xc000000fe1383500 (unreliable)
[c000000fe13834f0] [c000000000292254] kmem_cache_
[c000000fe1383570] [d0000000120cd694] flow_free+
[c000000fe13835b0] [c000000000136090] rcu_process_
[c000000fe1383660] [c0000000000b824c] __do_softirq+
[c000000fe1383760] [c0000000000b86d8] irq_exit+0xc8/0x100
[c000000fe1383780] [c00000000003ed78] doorbell_
[c000000fe13837b0] [c000000000003314] h_doorbell_
--- interrupt: e81 at osq_lock+0xb8/0x1f0
LR = mutex_optimisti
[c000000fe1383aa0] [c00000000013e280] add_timer_
[c000000fe1383ad0] [c000000000116f4c] mutex_optimisti
[c000000fe1383b30] [c000000000a60d04] __mutex_
[c000000fe1383bb0] [c000000000a60f18] mutex_lock+
[c000000fe1383be0] [c0000000008965bc] od_dbs_
[c000000fe1383c50] [c0000000000d6434] process_
[c000000fe1383ce0] [c0000000000d68e4] worker_
[c000000fe1383d80] [c0000000000de700] kthread+0x110/0x130
[c000000fe1383e30] [c0000000000094f4] ret_from_
Instruction dump:
614a0d00 7d484838 2fa80000 409e029c 3f200021 3b800001 63390d00 408200a4
e93b0022 82ff0018 ebdf0010 92e10078 <7fda492a> a1210078 3929ffff 79290420
---[ end trace bd509c1e05c7f71f ]---
This seems to not happen in VMs running on ppc64el, and doesn't occur on x86_64. I've also tested latest mainline tree and it also occurs.
With numa=off this problem doesn't occur.
reverting commit 3af229f2071f5b5
[Test Case]
# apt-get install openvswitch-switch
# ip link add type veth peer name testveth0
# ovs-vsctl add-br integbr
description: | updated |
description: | updated |
description: | updated |
Changed in linux (Ubuntu): | |
assignee: | Chris J Arges (arges) → nobody |
Patch submitted upstream: /lkml.org/ lkml/2015/ 7/21/555
https:/