OVS causes kernel oops SMP, unhandled paging request on one compute node. Fuel 7.0 HA neutron+gre deployment.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Invalid
|
High
|
Ivan Suzdal |
Bug Description
This deployment is our second round of testing and deployed as part of our production network using hardware we already had on site.
Specific steps to the attempted deployment using Fuel 7.0:
Create a new cluster selecting neutron with gre tunneling.
Deploy three controllers, two ceph storage nodes, four compute nodes (see hardware of nodes for specific failure).
Options changed from default include:
Select Neutron L2 population and Neutron DVR. Select https and tls.
Verify networks & deploy cluster.
Expected result:
Deployment succeeds on all nodes.
Actual result:
Deployment fails due to timeout.
Deployment hangs at "(/Stage[
Workaround:
Remove the supermicro node and redeploy.
Impact:
Deployment cannot be completed with the desired hardware.
Description of environment:
Fuel 7, Kilo on Unbuntu 14.04, HA, Neutron+GRE
Fuel version:
{"build_id": "301", "build_number": "301", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "301", "build_number": "301", "api": "1.0", "fuel-library_sha": "5d50055aeca1dd
All nodes with exception of one compute node are PowerEdge T410's w/ Intel E5620 proc, 32gb ddr3 1333mhz ecc ram, DP broadcom net extreme 3, DP Intel 82576, with varying hdd configurations. Deployment to all of these nodes is successful when the supermicro node is removed (see below).
Three of the above are compute, the fourth compute node is a SuperMicro H8DG6/H8DGi w/ AMD Opteron 6344, 64gb ddr3 1333mhz ecc ram, DP Intel 82576, DP Intel I350, 6x 146gb 15k sas in raid 10. (Hardware dumps included)
This supermicro compute node begins to have OVS processes hang starting with
(/Stage[
For each bridge the new OVS process hangs until the kernel oops at
(/Stage[
where the deployment hangs.
Here is the dmesg output:
[ 358.721109] gre: GRE over IPv4 demultiplexor driver
[ 358.721314] openvswitch: module verification failed: signature and/or required key missing - tainting kernel
[ 358.721969] openvswitch: Open vSwitch switching datapath 2.3.1, built Oct 13 2015 20:19:34
[ 362.877270] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
[ 363.028297] Bridge firewalling registered
[ 363.179052] 8021q: 802.1Q VLAN Support v1.8
[ 363.179075] 8021q: adding VLAN 0 to HW filter on device eth2
[ 364.374112] device eth2 entered promiscuous mode
[ 364.377476] br-fw-admin: port 1(eth2) entered forwarding state
[ 364.377512] br-fw-admin: port 1(eth2) entered forwarding state
[ 379.429473] br-fw-admin: port 1(eth2) entered forwarding state
[ 397.631393] IPv6: ADDRCONF(
[ 397.631402] 8021q: adding VLAN 0 to HW filter on device eth3
[ 397.634577] device eth3.101 entered promiscuous mode
[ 397.637758] device eth3 entered promiscuous mode
[ 397.637955] IPv6: ADDRCONF(
[ 399.834616] igb: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 399.834902] IPv6: ADDRCONF(
[ 399.835237] IPv6: ADDRCONF(
[ 399.835312] br-mgmt: port 1(eth3.101) entered forwarding state
[ 399.835338] br-mgmt: port 1(eth3.101) entered forwarding state
[ 410.142175] IPv6: ADDRCONF(
[ 410.142182] 8021q: adding VLAN 0 to HW filter on device eth1
[ 410.146786] device eth1.103 entered promiscuous mode
[ 410.148362] device eth1 entered promiscuous mode
[ 410.148541] IPv6: ADDRCONF(
[ 413.485542] igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 413.485723] IPv6: ADDRCONF(
[ 413.486055] IPv6: ADDRCONF(
[ 413.486128] br-storage: port 1(eth1.103) entered forwarding state
[ 413.486145] br-storage: port 1(eth1.103) entered forwarding state
[ 414.893500] br-mgmt: port 1(eth3.101) entered forwarding state
[ 422.624921] IPv6: ADDRCONF(
[ 422.624930] 8021q: adding VLAN 0 to HW filter on device eth0
[ 422.628780] device eth0.104 entered promiscuous mode
[ 422.630994] device eth0 entered promiscuous mode
[ 422.631183] IPv6: ADDRCONF(
[ 422.878019] device ovs-system entered promiscuous mode
[ 422.878166] BUG: unable to handle kernel paging request at 0000000000001e08
[ 422.880136] IP: [<ffffffff81158
[ 422.881775] PGD 7fb708067 PUD 7fa6d4067 PMD 0
[ 422.882814] Oops: 0000 [#1] SMP
[ 422.883771] Modules linked in: 8021q garp mrp bridge stp llc bonding openvswitch(OX) gre vxlan ip_tunnel iptable_filter ip_tables x_tables kvm_amd kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper joydev cryptd amd64_edac_mod shpchp edac_core edac_mce_amd k10temp fam15h_power i2c_piix4 serio_raw mac_hid nf_conntrack_
[ 422.904913] CPU: 6 PID: 25833 Comm: ovs-vswitchd Tainted: G OX 3.13.0-65-generic #105-Ubuntu
[ 422.907350] Hardware name: Supermicro H8DG6/H8DGi/
[ 422.909653] task: ffff8807fa2ce000 ti: ffff8807f85a6000 task.ti: ffff8807f85a6000
[ 422.911712] RIP: 0010:[<
[ 422.914120] RSP: 0018:ffff8807f8
[ 422.915287] RAX: 0000000000001e00 RBX: 00000000002012d0 RCX: 0000000000000000
[ 422.916980] RDX: 0000000000001e00 RSI: 0000000000000000 RDI: 00000000002012d0
[ 422.918874] RBP: ffff8807f85a77f8 R08: 0000000040000000 R09: ffffea001fdf1de0
[ 422.920900] R10: ffffffffa0351100 R11: ffff880807400f90 R12: 0000000000000080
[ 422.922763] R13: 00000000002012d0 R14: 0000000000000000 R15: 0000000000000000
[ 422.924675] FS: 00007f1de84cf98
[ 422.926799] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 422.928325] CR2: 0000000000001e08 CR3: 00000007fa27b000 CR4: 00000000000407e0
[ 422.930370] Stack:
[ 422.930899] ffff8807f8427000 ffff8807f7cb8300 ffffffff81ce4260 ffff8807fa267800
[ 423.045957] ffff8807fa267a58 ffff8807f85a7708 ffffffff816c8f5a ffff8807f85a78c0
[ 423.161270] ffffffff81638235 0000000000000000 ffff8807fa2ce000 ffff8807fa2ce000
[ 423.276231] Call Trace:
[ 423.389802] [<ffffffff816c8
[ 423.502871] [<ffffffff81638
[ 423.616877] [<ffffffff811a2
[ 423.730458] [<ffffffff81197
[ 423.844103] [<ffffffff811a0
[ 423.957402] [<ffffffff81720
[ 424.071890] [<ffffffffa0351
[ 424.187761] [<ffffffff8136d
[ 424.304359] [<ffffffff8147c
[ 424.422442] [<ffffffff811a4
[ 424.540677] [<ffffffff811a3
[ 424.659841] [<ffffffffa0351
[ 424.779445] [<ffffffffa0349
[ 424.898701] [<ffffffffa0349
[ 425.015557] [<ffffffff81654
[ 425.131605] [<ffffffff8138d
[ 425.245315] [<ffffffff81315
[ 425.358257] [<ffffffff8138d
[ 425.468903] [<ffffffff81656
[ 425.579213] [<ffffffff81656
[ 425.688153] [<ffffffff81656
[ 425.796983] [<ffffffff81654
[ 425.910393] [<ffffffff81655
[ 426.026483] [<ffffffff81654
[ 426.134650] [<ffffffff81654
[ 426.238983] [<ffffffff81651
[ 426.342257] [<ffffffff81652
[ 426.445303] [<ffffffff8160e
[ 426.546880] [<ffffffff8160e
[ 426.646218] [<ffffffff81653
[ 426.742477] [<ffffffff811a3
[ 426.836976] [<ffffffff81316
[ 426.930216] [<ffffffff8160e
[ 427.019472] [<ffffffff811db
[ 427.106684] [<ffffffff8160f
[ 427.192348] [<ffffffff8160f
[ 427.277234] [<ffffffff81734
[ 427.361734] Code: c1 e8 13 41 83 e7 02 83 e0 01 41 09 c7 23 1d ba da bb 00 48 c7 45 b8 00 00 00 00 f6 c3 10 41 89 dd 0f 85 5e 02 00 00 48 8b 45 98 <48> 83 78 08 00 0f 84 a6 01 00 00 66 66 66 66 90 0f b6 4d a0 b8
[ 427.540019] RIP [<ffffffff81158
[ 427.629146] RSP <ffff8807f85a76d0>
[ 427.715253] CR2: 0000000000001e08
[ 427.798145] ---[ end trace d91033ced793f0ec ]---
[ 427.878825] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 427.958666] IPv6: ADDRCONF(
[ 428.035881] IPv6: ADDRCONF(
[ 428.111518] br-ex: port 1(eth0.104) entered forwarding state
[ 428.185472] br-ex: port 1(eth0.104) entered forwarding state
[ 428.528555] br-storage: port 1(eth1.103) entered forwarding state
[ 443.223230] br-ex: port 1(eth0.104) entered forwarding state
I have also attached the full dmesg, dmidecode, lspci, lshw, and everything in /var/log in the following format. supermicro-
Also, I have tested the memory and all disks on this server, just to be on the safe side. All passed.
Any feedback provided will be greatly appreciated and please let me know if I can gather any more information.
There seems to now be an issue with the remote connection where I was transferring the logs from. I will attach the logs in the morning.
description: | updated |
description: | updated |
description: | updated |
Changed in fuel: | |
milestone: | none → 8.0 |
status: | Incomplete → Confirmed |
importance: | Undecided → High |
Changed in fuel: | |
assignee: | nobody → MOS Linux (mos-linux) |
Changed in fuel: | |
assignee: | MOS Linux (mos-linux) → Ivan Suzdal (isuzdal) |
tags: | added: area-linux |
@Rich, could you provide diagnostic snapshot, please.