Compute Node will continuously rebooting on KVM virtual deployment env(OVS and DPDK related)

Bug #1797320 reported by Austin Sun
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
High
Austin Sun

Bug Description

Title
-----
Compute Node will continuously rebooting on KVM virtual deployment env(OVS and DPDK related)

Brief Description
-----------------
When Deploy system on KVM virtual env, The Compute will continuously rebooting
The Alarm is 200.011 - r5-multi-compute-0 experienced a configuration failure. and
200.004 - r5-multi-compute-0 experienced a service-affecting failure. Auto-recovery in progress. Manual Lock and Unlock may be required if auto-recovery is unsuccessful.

Severity
--------
<Critical: System/Feature is not usable after the defect>

Steps to Reproduce
------------------
Create Compute Node in KVM use attached kvm xml. the NIC type is setting to e1000
and deploy it on NUC.
Expected Behavior
------------------
Compute can be successfully provision and work

Actual Behavior
----------------
Compute re-boot continuously

Reproducibility
---------------
<Reproducible/Intermittent>
if using attached KVM xml, this is 100% reproducible
System Configuration
can use stx-2018-10-07-15-r-2018.10.iso to reproduce.
--------------------
<One node system, Two node system, Multi-node system, Dedicated storage, https, IPv4, IPv6 etc.>
Multi-node System (1 controller and 1 compute) on KVM VM env.

Branch/Pull Time/Commit
-----------------------
latest Mainline and r/2018.10 branch

Timestamp/Logs
--------------
openvswitch/ovs-vswitchd.log
2018-10-11T07:02:25.418Z|00075|bridge|INFO|bridge br-phy0: added interface lldpdbfaf9d1-0f on port 1
2018-10-11T07:02:25.727Z|00076|dpdk|ERR|EAL: Driver cannot attach the device (0000:00:06.0)
2018-10-11T07:02:25.727Z|00077|dpdk|ERR|EAL: No port found for device (0000:00:06.0)
2018-10-11T07:02:25.727Z|00078|netdev_dpdk|WARN|Error attaching device '0000:00:06.0' to DPDK
2018-10-11T07:02:25.727Z|00079|netdev|WARN|eth0: could not set configuration (Invalid argument)
2018-10-11T07:02:30.395Z|00080|memory|INFO|105820 kB peak resident set size after 10.1 seconds
2018-10-11T07:02:30.395Z|00081|memory|INFO|handlers:2 ports:2 revalidators:2 rules:5 udpif keys:7
2018-10-11T07:02:33.429Z|00082|dpdk|ERR|EAL: Driver cannot attach the device (0000:00:06.0)

Revision history for this message
Austin Sun (sunausti) wrote :
description: updated
Revision history for this message
Austin Sun (sunausti) wrote :
Austin Sun (sunausti)
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :
Revision history for this message
Austin Sun (sunausti) wrote :

In https://bugs.launchpad.net/starlingx/+bug/1796420, it's degrade because DMAR: intel_iommu_map: iommu width (39) is not sufficient for the mapped address (7fbdc0000000) in kerl.log, but there is not such log in this log file.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Sorry I meant to say: https://bugs.launchpad.net/starlingx/+bug/1797474
1797474 is reported on the virtual environment whereas 1796420 is specific to baremetal.
Note that 1797474 is a duplicate of https://bugs.launchpad.net/starlingx/+bug/1796380

Revision history for this message
Ghada Khalil (gkhalil) wrote :

I confirmed with Matt Peters (Networking Technical Lead) that this is indeed a duplicate of https://bugs.launchpad.net/starlingx/+bug/1796380. Marking as such

tags: added: stx.networking
tags: added: stx.2018.10
Changed in starlingx:
importance: Undecided → High
assignee: nobody → Steven Webster (swebster-wr)
status: New → Triaged
Revision history for this message
Austin Sun (sunausti) wrote :

I have built and test with fix bug #1796380(https://review.openstack.org/611391), but the issue can still represent. I have attached the latest /var/log. the issue is still same as before.

Revision history for this message
Austin Sun (sunausti) wrote :

after debug, found the root cause , DPDK is not support this type device yet .

after debug open in SysInv:

2018-10-22 07:08:19.292 1826 DEBUG sysinv.agent.pci [-] driver: e1000 get_pci_driver_name /usr/lib64/python2.7/site-packages/sysinv/agent/pci.py:249
2018-10-22 07:08:20.122 1826 DEBUG sysinv.agent.pci [-] DPDK does not support NIC (vendor: 0x8086 device: 0x100e) pci_get_net_attrs /usr/lib64/python2.7/site-packages/sysinv/agent/pci.py:481

I have change bug status to invalid.

Changed in starlingx:
status: Triaged → Invalid
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Thanks Austin for handling this bug report.

Changed in starlingx:
assignee: Steven Webster (swebster-wr) → Austin Sun (sunausti)
Ken Young (kenyis)
tags: added: stx.1.0
removed: stx.2018.10
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.