kernel oops and reboot on steady state AIO-DX controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Bin Yang |
Bug Description
Brief Description
-----------------
After installing an AIO-DX (two node) system with the stx-openstack application, controller-0 had a kernel oops and spontaneously rebooted after being up for about 58 hours. There were no nova instances running and no other activity on the system at the time.
Severity
--------
Major: controller should not spontaneously reboot
Steps to Reproduce
------------------
Install AIO-DX system. Wait.
Expected Behavior
------------------
No kernel oops or spontaneous reboots.
Actual Behavior
----------------
The following kernel oops occurred:
[211368.883501] BUG: unable to handle kernel paging request at 0000000000001118
[211368.890583] IP: [<ffffffff8aa4c
[211368.897121] PGD b84e39067 PUD 0
[211368.900476] Oops: 0000 [#1] PREEMPT SMP
[211368.904540] Modules linked in: xt_REDIRECT nf_nat_redirect xt_connmark ip6table_raw ip6table_mangle xt_CHECKSUM ebtable_filter ebtables ip6table_filter vxlan ip6_udp_tunnel udp_tunnel ip_gre gre ip6_tables xt_recent rbd libceph dns_resolver xt_statistic nbd tun ipt_REJECT nf_reject_ipv4 openvswitch nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 vfio_pci vfio_iommu_type1 xt_multiport xt_set iptable_mangle iptable_raw ip_set_hash_net ip_set_hash_ip ip_set ipip tunnel4 ip_tunnel veth nf_conntrack_
[211368.976840] sunrpc xfs libcrc32c iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul ablk_helper cryptd dm_mod i2c_i801 joydev mei_me mei lpc_ich ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel vfio xprtrdma(O) svcrdma(O) rpcrdma(O) nvmet_rdma(O) nvme_rdma(O) mlx4_en(O) ib_srp(O) ib_isert(O) ib_iser(O) rdma_rxe(O) mlx5_ib(O) mlx5_core(O) mlxfw(O) mlx4_ib(O) mlx4_core(O) devlink rdma_ucm(O) rdma_cm(O) iw_cm(O) ib_ucm(O) ib_uverbs(O) ib_cm(O) ib_core(O) mlx_compat(O) ixgbevf(O) ixgbe(O) dca tpm_tis(O) tpm_tis_core(O) tpm(O) i40evf(O) uas usb_storage ahci libahci nfit libnvdimm i40e(O) e1000e(O) [last unloaded: nf_defrag_ipv4]
[211369.049922]
[211369.050126] CPU: 1 PID: 904505 Comm: nginx Kdump: loaded Tainted: G O ------------ T 3.10.0-
[211369.062079] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.
[211369.072568] task: ffff9466c4ef45c0 ti: ffff9470bdcf4000 task.ti: ffff9470bdcf4000
[211369.080114] RIP: 0010:[<
[211369.089075] RSP: 0018:ffff9470bd
[211369.094458] RAX: 0000000000001120 RBX: 00000000000010c8 RCX: ffffffff8b720e60
[211369.101661] RDX: ffffffff8b65db80 RSI: ffffffff8aa4afd0 RDI: ffff94724125bec8
[211369.108863] RBP: ffff9470bdcf7e90 R08: 0000000000000000 R09: ffffffff8aa4b553
[211369.116065] R10: ffff94732b2f82b0 R11: ffff947143b4ce10 R12: ffff94724125bec8
[211369.123267] R13: ffff94724125bec0 R14: ffff947143b4ceb0 R15: ffff94717b2601a0
[211369.130471] FS: 00007f1b2b3f274
[211369.138625] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[211369.144444] CR2: 0000000000001118 CR3: 00000008516c4000 CR4: 00000000007607e0
[211369.151643] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[211369.158845] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[211369.166041] PKRU: 55555554
[211369.168834] Call Trace:
[211369.171373] [<ffffffff8aa01
[211369.176416] [<ffffffff8aa01
[211369.181374] [<ffffffff8a8a2
[211369.186846] [<ffffffff8a81b
[211369.192579] [<ffffffff8b015
[211369.197789] Code: 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 e8 27 24 5c 00 49 8b 06 48 89 45 d0 48 8b 45 d0 48 8d 58 a8 49 39 c6 74 3b 0f 1f 00 <4c> 8b 6b 50 4d 8d 65 08 4c 89 e7 e8 00 24 5c 00 48 89 de 4c 89
[211369.218036] RIP [<ffffffff8aa4c
[211369.224651] RSP <ffff9470bdcf7e60>
[211369.228221] CR2: 0000000000001118
Reproducibility
---------------
Unsure
System Configuration
-------
AIO-DX (two node) system
Branch/Pull Time/Commit
-------
Designer load:
BUILD_DATE=
Last Pass
---------
N/A
Timestamp/Logs
--------------
Collect logs and core dump will be attached.
Test Activity
-------------
Developer testing
Changed in starlingx: | |
assignee: | Cindy Xie (xxie1) → Bin Yang (byangintel) |
Changed in starlingx: | |
importance: | Undecided → High |
I am unable to attach the core dump due to its size - please contact me if you need it.