[zesty] mlx5 OVS vxlan ipv6 LNST test cause Oops

Bug #1682418 reported by Talat Batheesh
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned
Zesty
Expired
Medium
Unassigned

Bug Description

After running offload enabled LNST ipv6 vxlan ovs test (recipes/ovs_offload/1_virt_ovs_vxlan_ipv6.xml) with the setup that it creates multiple times till it crashes.
The test itself and other LNST tests pass, it's the shutdown phase that causes this.
There are different stack traces that usually relate to some kind of allocation (or ext4, inode), see one below.

scenario :
1. Install lnst tests
   git clone https://github.com/jpirko/lnst.git && cd lnst && ./setup.py install
2. prepare OVS offload enable setup (2 machines) connected Back to Back
3. enable 2 VM's on the mlnx5 Physical Function on each machine
4. setup lnst on vm and HV (run lnst-slave)
5. run IPv VXLAN lnst test in loop
   for example #lnst-ctl -d --pools=talat run recipes/ovs_offload/1_virt_ovs_vxlan_ipv6.xml

Call trace
 kernel: [76406.381439] Oops: 0000 [#1] SMP
 kernel: [76406.419297] Modules linked in: act_mirred act_gact act_tunnel_key cls_flower sch_ingress vport_vxlan vxlan ip6_udp_tunnel udp_tunnel vfio_pci vfio_iommu_type1 vfio_virqfd vfio mlx5_ib ib_core nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ipmi_ssif intel_cstate ipmi_si input_leds joydev ipmi_devintf
 kernel: [76406.981750] mei_me dcdbas intel_rapl_perf shpchp mei ipmi_msghandler lpc_ich mac_hid acpi_power_meter configfs nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx4_en hid_generic tg3 mlx5_core usbhid mlx4_core ahci ptp mxm_wmi hid libahci megaraid_sas devlink pps_core fjes wmi
 kernel: [76407.335099] CPU: 25 PID: 5253 Comm: ip Not tainted 4.10.0-19-generic #21-Ubuntu
 kernel: [76407.446475] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
 kernel: [76407.558645] task: ffff9a2b76f89680 task.stack: ffffbda6c76a8000
 kernel: [76407.618666] RIP: 0010:rb_erase+0x194/0x350
 kernel: [76407.676596] RSP: 0018:ffffbda6c76ab4f0 EFLAGS: 00010046
 kernel: [76407.735460] RAX: ffff9a2c2cc30bc0 RBX: ffff9a2c53372d18 RCX: 0000000000000000
 kernel: [76407.797100] RDX: 0000000000000000 RSI: ffff9a2c53372d20 RDI: ffff9a2c2cc30a40
 kernel: [76407.858831] RBP: ffffbda6c76ab4f0 R08: 0000000000000000 R09: 000000018040002e
 kernel: [76407.921323] R10: ffff9a2c2cc30b40 R11: 00000000000f9e00 R12: ffff9a2c2cc30a40
 kernel: [76407.984793] R13: ffff9a2c53372d18 R14: 0000000000000046 R15: ffff9a2c5536b800
 kernel: [76408.048453] FS: 00007f3d96082d80(0000) GS:ffff9a2c5f300000(0000) knlGS:0000000000000000
 kernel: [76408.166912] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: [76408.227997] CR2: 0000000000000000 CR3: 00000010181b5000 CR4: 00000000003426e0
 kernel: [76408.290488] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 kernel: [76408.351513] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 kernel: [76408.410855] Call Trace:
 kernel: [76408.463227] private_free_iova+0x37/0x80
 kernel: [76408.516647] iova_magazine_free_pfns+0x41/0x80
 kernel: [76408.569805] free_iova_fast+0xd4/0x210
 kernel: [76408.620830] flush_unmaps_timeout+0xc5/0x1c0
 kernel: [76408.671569] intel_unmap+0x15a/0x210
 kernel: [76408.720173] intel_unmap_page+0xe/0x10
 kernel: [76408.767571] mlx5e_page_release+0xae/0x110 [mlx5_core]
 kernel: [76408.816057] mlx5e_destroy_rq+0xd8/0x130 [mlx5_core]
 kernel: [76408.863341] mlx5e_close_channel+0xd1/0x1d0 [mlx5_core]
 kernel: [76408.910105] mlx5e_close_channels+0xd9/0x110 [mlx5_core]
 kernel: [76408.955876] mlx5e_close_locked+0x60/0x80 [mlx5_core]
 kernel: [76409.000267] mlx5e_close+0x33/0x50 [mlx5_core]
 kernel: [76409.042831] __dev_close_many+0x99/0x100
 kernel: [76409.083537] __dev_close+0x45/0x70
 kernel: [76409.122035] __dev_change_flags+0x9d/0x160
 kernel: [76409.160600] dev_change_flags+0x29/0x60
 kernel: [76409.198250] do_setlink+0x338/0xd20
 kernel: [76409.235159] ? nla_parse+0x31/0x110
 kernel: [76409.270899] rtnl_newlink+0x5c6/0x860
 kernel: [76409.306535] ? security_capable+0x20/0x60
 kernel: [76409.341989] ? ns_capable_common+0x68/0x80
 kernel: [76409.377215] ? ns_capable+0x13/0x20
 kernel: [76409.411543] rtnetlink_rcv_msg+0xe6/0x210
 kernel: [76409.445368] ? __kmalloc_node_track_caller+0x1de/0x2a0
 kernel: [76409.480369] ? __alloc_skb+0x87/0x1e0
 kernel: [76409.512880] ? rtnl_newlink+0x860/0x860
 kernel: [76409.544854] netlink_rcv_skb+0xa4/0xc0
 kernel: [76409.576628] rtnetlink_rcv+0x28/0x30
 kernel: [76409.608717] netlink_unicast+0x18c/0x220
 kernel: [76409.640348] netlink_sendmsg+0x2f7/0x3b0
 kernel: [76409.671522] ? aa_sock_msg_perm+0x61/0x150
 kernel: [76409.702695] sock_sendmsg+0x38/0x50
 kernel: [76409.733084] ___sys_sendmsg+0x2c2/0x2d0
 kernel: [76409.763826] ? mem_cgroup_commit_charge+0x7e/0x510
 kernel: [76409.796237] ? lru_cache_add_active_or_unevictable+0x36/0xb0
 kernel: [76409.830626] ? handle_mm_fault+0xf9b/0x1360
 kernel: [76409.863080] ? __dentry_kill+0x110/0x160
 kernel: [76409.894984] __sys_sendmsg+0x54/0x90
 kernel: [76409.926068] SyS_sendmsg+0x12/0x20
 kernel: [76409.956873] entry_SYSCALL_64_fastpath+0x1e/0xad
 kernel: [76409.989499] RIP: 0033:0x7f3d95799237
 kernel: [76410.020795] RSP: 002b:00007ffd54522db8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 kernel: [76410.087104] RAX: ffffffffffffffda RBX: 00007ffd5452aec0 RCX: 00007f3d95799237
 kernel: [76410.127078] RDX: 0000000000000000 RSI: 00007ffd54522e00 RDI: 0000000000000003
 kernel: [76410.167215] RBP: 00007ffd54522e00 R08: 0000000000000001 R09: fefefeff77686d74
 kernel: [76410.207897] R10: 00007ffd5452c7c0 R11: 0000000000000246 R12: 00007ffd54522e40
 kernel: [76410.248850] R13: 00005612b473a020 R14: 00007ffd5452aec0 R15: 0000000000000000
 kernel: [76410.290185] Code: 10 f6 c2 01 0f 84 d3 00 00 00 48 83 e2 fc 0f 84 1e ff ff ff 48 89 c1 48 89 d0 48 8b 50 08 48 39 ca 0f 85 71 ff ff ff 48 8b 50 10 <f6> 02 01 75 3a 48 8b 7a 08 48 89 c1 48 83 c9 01 48 89 78 10 48
 kernel: [76410.416172] RIP: rb_erase+0x194/0x350 RSP: ffffbda6c76ab4f0
 kernel: [76410.458747] CR2: 0000000000000000
 kernel: [76410.498095] ---[ end trace 8d9a539d70087300 ]---

other traces

kernel: general protection fault: 0000 [#1] SMP
 kernel: Modules linked in: act_mirred act_gact act_tunnel_key cls_flower sch_ingress vport_vxlan vxlan ip6_udp_tunnel udp_tunnel vfio_pci vfio_iommu_type1 vfio_virqfd vfio mlx5_ib ib_core nfsv3 rpcsec_gss_kr
 kernel: lpc_ich dcdbas intel_rapl_perf ipmi_devintf shpchp ipmi_msghandler mac_hid acpi_power_meter nfsd configfs auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs raid10 raid456 async
 kernel: CPU: 19 PID: 1902 Comm: ovs-vswitchd Not tainted 4.10.8+ #13
 kernel: Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
 kernel: task: ffff8d28f135ad00 task.stack: ffffadb848858000
 kernel: RIP: 0010:kmem_cache_alloc_trace+0x7b/0x1c0
 kernel: RSP: 0018:ffffadb84885bda0 EFLAGS: 00010286
 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000028da
 kernel: RDX: 00000000000028d9 RSI: 00000000014000c0 RDI: 000000000001c5c0
 kernel: RBP: ffffadb84885bde0 R08: ffff8d311f25c5c0 R09: ffff8d291f407980
 kernel: R10: ffff006400000000 R11: ffff8d28f56c5280 R12: 00000000014000c0
 kernel: R13: ffffffff9218255b R14: 00007ffce91a4c70 R15: ffff8d291f407980
 kernel: FS: 00007fda84415940(0000) GS:ffff8d311f240000(0000) knlGS:0000000000000000
 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 0000564f3e7bc224 CR3: 0000000830bfc000 CR4: 00000000003426e0
 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 kernel: Call Trace:
 kernel: ? kmem_cache_alloc+0xd7/0x1b0
 kernel: sock_alloc_inode+0x3b/0xc0
 kernel: alloc_inode+0x1d/0x90
 kernel: new_inode_pseudo+0x11/0x60
 kernel: sock_alloc+0x1c/0x80
 kernel: SYSC_accept4+0x71/0x210
 kernel: ? ____fput+0xe/0x10
 kernel: ? task_work_run+0x83/0xa0
 kernel: SyS_accept+0x10/0x20
 kernel: entry_SYSCALL_64_fastpath+0x1e/0xad
 kernel: RIP: 0033:0x7fda8394f8ed
 kernel: RSP: 002b:00007ffce91a4c50 EFLAGS: 00000293 ORIG_RAX: 000000000000002b
 kernel: RAX: ffffffffffffffda RBX: 000055e38935c6a0 RCX: 00007fda8394f8ed
 kernel: RDX: 00007ffce91a4c6c RSI: 00007ffce91a4c70 RDI: 0000000000000069
 kernel: RBP: 000055e3893800d0 R08: 0000000000000000 R09: 0000000000000001
 kernel: R10: 000000000003516c R11: 0000000000000293 R12: 00007ffce91a4da0
 kernel: R13: 00007ffce91a4c70 R14: 00007ffce91a4d40 R15: 000055e38937dc40
 kernel: Code: 08 65 4c 03 05 f7 1d 3f 6e 49 83 78 10 00 4d 8b 10 0f 84 f0 00 00 00 4d 85 d2 0f 84 e7 00 00 00 49 63 41 20 48 8d 4a 01 49 8b 39 <49> 8b 1c 02 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb 49 63

 kernel: general protection fault: 0000 [#1] SMP
 kernel: Modules linked in: act_tunnel_key act_gact act_mirred cls_flower mlx5_ib mlx5_core mst_pciconf(OE) mst_pci(OE) ib_umad nfsv3 nfs fscache vfio_pci vfio_iommu_type1 vfio_virqfd vfio netconsole ib_core
 kernel: stp llc ipmi_si intel_cstate joydev input_leds mei_me ipmi_devintf mei lpc_ich intel_rapl_perf shpchp dcdbas ipmi_msghandler mac_hid acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc conf
 kernel: CPU: 37 PID: 12495 Comm: modprobe Tainted: G OE 4.10.6+ #8
 kernel: Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
 kernel: task: ffffa0b485b60000 task.stack: ffffb84c89260000
 kernel: RIP: 0010:__kmalloc+0xbc/0x200
 kernel: RSP: 0018:ffffb84c89263be0 EFLAGS: 00010286
 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000018061
 kernel: RDX: 0000000000018060 RSI: 0000000000000000 RDI: 000000000001c5c0
 kernel: RBP: ffffb84c89263c18 R08: ffffa0b49f49c5c0 R09: ffffa0ac9f407980
 kernel: R10: ffff006400000000 R11: 000000006f6e736f R12: 00000000014080c0
 kernel: R13: 0000000000000040 R14: ffffffff8c6d0f3e R15: ffffa0ac9f407980
 kernel: FS: 00007f4ef6650700(0000) GS:ffffa0b49f480000(0000) knlGS:0000000000000000
 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 000055cdb53d4228 CR3: 000000103ace8000 CR4: 00000000003426e0
 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 kernel: Call Trace:
 kernel: ? ext4fs_dirhash+0xc2/0x2b0
 kernel: ext4_htree_store_dirent+0x3e/0x120
 kernel: htree_dirblock_to_tree+0xf3/0x290
 kernel: ? dput+0x34/0x250
 kernel: ext4_htree_fill_tree+0xb5/0x320
 kernel: ? kmem_cache_alloc_trace+0xdb/0x1c0
 kernel: ext4_readdir+0x701/0xa20
 kernel: ? lru_cache_add_active_or_unevictable+0x36/0xb0
 kernel: iterate_dir+0x172/0x1a0
 kernel: SyS_getdents+0x99/0x120
 kernel: ? fillonedir+0x100/0x100
 kernel: entry_SYSCALL_64_fastpath+0x1e/0xad
 kernel: RIP: 0033:0x7f4ef614331b
 kernel: RSP: 002b:00007fffd33e0b60 EFLAGS: 00000206 ORIG_RAX: 000000000000004e
 kernel: RAX: ffffffffffffffda RBX: 00007f4ef643cb58 RCX: 00007f4ef614331b
 kernel: RDX: 0000000000008000 RSI: 000055cdb53cc220 RDI: 0000000000000000
 kernel: RBP: 00007f4ef643cb00 R08: 00007f4ef643cbb8 R09: 0000000000000000
 kernel: R10: 000055cdb53cc1f0 R11: 0000000000000206 R12: 00007f4ef643cb58
 kernel: R13: 0000000000008040 R14: 00007f4ef643cb58 R15: 000000000000270f
 kernel: Code: 08 65 4c 03 05 c6 08 9f 73 49 83 78 10 00 4d 8b 10 0f 84 d5 00 00 00 4d 85 d2 0f 84 cc 00 00 00 49 63 41 20 48 8d 4a 01 49 8b 39 <49> 8b 1c 02 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb 49 63
 kernel: RIP: __kmalloc+0xbc/0x200 RSP: ffffb84c89263be0

tags: added: zesty
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1682418

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc7

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
Talat Batheesh (talat-b87) wrote :

Additional trace with this test is below

[80921.628988] general protection fault: 0000 [#1] SMP
[80921.630449] Modules linked in: act_mirred act_tunnel_key cls_flower sch_ingress vport_vxlan vxlan ip6_udp_tunnel udp_tunnel vfio_pci vfio_iommu_type1 vfio_virqfd vfio mlx5_ib ib_core openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat libcrc32c nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc binfmt_misc ipmi_ssif intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate ipmi_si mei_me ipmi_devintf
[80921.652556] dcdbas intel_rapl_perf mei shpchp lpc_ich ipmi_msghandler acpi_power_meter mac_hid nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 mlx4_en mxm_wmi mlx5_core mlx4_core tg3 ahci megaraid_sas ptp libahci devlink pps_core fjes wmi
[80921.659813] CPU: 7 PID: 5743 Comm: kworker/u82:0 Not tainted 4.10.0-19-generic #21-Ubuntu
[80921.662346] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[80921.664686] Workqueue: events_freezable_power_ disk_events_workfn
[80921.666586] task: ffff96f69a285a00 task.stack: ffffacc627394000
[80921.668437] RIP: 0010:kmem_cache_alloc+0x77/0x1a0
[80921.669902] RSP: 0018:ffffacc6273978a8 EFLAGS: 00010086
[80921.671531] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000011c41
[80921.673758] RDX: 0000000000011c40 RSI: 0000000001080020 RDI: 000000000001c5c0
[80921.675985] RBP: ffffacc6273978d8 R08: ffff96f6ff0dc5c0 R09: ffff006400000000
[80921.729078] R10: ffff96f6af006630 R11: 0000000000000000 R12: 0000000001080020
[80921.782687] R13: ffffffff957acd99 R14: ffff96eedf407980 R15: ffff96eedf407980
[80921.836867] FS: 0000000000000000(0000) GS:ffff96f6ff0c0000(0000) knlGS:0000000000000000
[80921.944601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[80921.998508] CR2: 00007fd1e8a75a08 CR3: 0000001062c09000 CR4: 00000000003426e0
[80922.052471] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[80922.105264] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[80922.156346] Call Trace:
[80922.205695] ? kmem_cache_alloc+0xd3/0x1a0
[80922.254432] alloc_iova+0x49/0x240
[80922.301753] alloc_iova_fast+0x55/0x200
[80922.347916] intel_alloc_iova+0xac/0xe0
[80922.392736] intel_map_sg+0xc2/0x220
[80922.436095] ? lock_timer_base+0x81/0xa0
[80922.478325] ata_qc_issue+0x204/0x320

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Zesty) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Zesty):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.