CC13: DPDK vrouter not starting after server reboot

Bug #1795825 reported by Bernhard Koessler
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
New
High
haji mohamed ashraf ali
Trunk
New
High
haji mohamed ashraf ali

Bug Description

In a CC13/Contrail 5.0.1 setup with Intel X710 Fortville NICs, after rebooting one of the DPDK vrouter compute node servers, DPDK vrouter failed to start in a CC13/Contrail 5.0.1 environment

20:03:26 | [root@overcloud63m-compdpdk-60 ~]# reboot

20:24:52 | [root@overcloud63m-compdpdk-60 ~]# contrail-status
20:24:52 | Pod Service Original Name State Status
20:24:52 | vrouter agent contrail-vrouter-agent running Up 21 seconds
20:24:52 | vrouter agent-dpdk contrail-vrouter-agent-dpdk running Up 4 minutes
20:24:52 | vrouter nodemgr contrail-nodemgr running Up 7 minutes
20:24:52 |
20:24:52 | vrouter driver is not PRESENT but agent pod is present
20:24:52 | == Contrail vrouter ==
20:24:52 | nodemgr: active
20:24:52 | agent: initializing

20:25:30 | [root@overcloud63m-compdpdk-60 ~]# vif --list
20:25:30 | Error registering NetLink client: Connection refused (111)
20:25:33 | [root@overcloud63m-compdpdk-60 ~]#
20:25:33 | [root@overcloud63m-compdpdk-60 ~]#
20:25:34 | [root@overcloud63m-compdpdk-60 ~]#
20:25:37 | [root@overcloud63m-compdpdk-60 ~]# /var/lib/docker/overlay2/fec97702bec79042f8007e0c53823adfb48affff0dc18235bb18c6739893a9fb/diff/opt/contrail/bin/dpdk_nic_bind.py -s
20:25:38 |
20:25:38 | Network devices using DPDK-compatible driver
20:25:38 | ============================================
20:25:38 | 0000:06:00.0 'Ethernet Controller X710 for 10GbE SFP+' drv=vfio-pci unused=i40e
20:25:38 | 0000:06:00.1 'Ethernet Controller X710 for 10GbE SFP+' drv=vfio-pci unused=i40e
20:25:38 |
20:25:38 | Network devices using kernel driver
20:25:38 | ===================================
20:25:38 | 0000:16:00.0 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno1 drv=tg3 unused=vfio-pci *Active*
20:25:38 | 0000:16:00.1 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno2 drv=tg3 unused=vfio-pci *Active*
20:25:38 | 0000:16:00.2 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno3 drv=tg3 unused=vfio-pci
20:25:38 | 0000:16:00.3 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno4 drv=tg3 unused=vfio-pci
20:25:38 | 0000:81:00.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f0 drv=i40e unused=vfio-pci
20:25:38 | 0000:81:00.1 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f1 drv=i40e unused=vfio-pci
20:25:38 |
20:25:38 | Other network devices
20:25:38 | =====================
20:25:38 | <none>

20:26:52 | [root@overcloud63m-compdpdk-60 ~]# ls -ltr /var/crashes/
20:26:52 | total 11841712
20:26:52 | -rw-------. 1 root root 2386378752 Sep 24 13:07 core.contrail-vroute.162.overcloud63m-compdpdk-60.nuremberg-cc13.de.1537787242
20:26:52 | -rw-------. 1 root root 2394767360 Sep 24 19:29 core.contrail-vroute.161.overcloud63m-compdpdk-60.nuremberg-cc13.de.1537810172
20:26:52 | -rw-------. 1 root root 2304565248 Sep 24 19:35 core.contrail-vroute.163.overcloud63m-compdpdk-60.nuremberg-cc13.de.1537810531
20:26:52 | -rw-------. 1 root root 2500673536 Sep 25 10:59 core.contrail-vroute.162.overcloud63m-compdpdk-60.nuremberg-cc13.de.1537865966
20:26:52 | -rw-------. 1 root root 2483900416 Oct 2 19:30 core.contrail-vroute.162.overcloud63m-compdpdk-60.nuremberg-cc13.de.1538501420
20:26:52 | -rw-------. 1 root root 102932480 Oct 2 20:19 core.contrail-vroute.237.overcloud63m-compdpdk-60.nuremberg-cc13.de.1538504349

The docker logs for dpdk and agent containers are attached in a file

Later I also saw this in the contrail-vrouter-agent.log:

20:54:14 | [root@overcloud63m-compdpdk-60 ~]# cat /var/log/containers/contrail/contrail-vrouter-agent.log
20:54:15 | 2018-10-02 Tue 20:19:06:186.610 CEST overcloud63m-compdpdk-60.nuremberg-cc13.de [Thread 140168239367936, Pid 237]: KsyncTxQueue CPU pinning policy <>. KsyncTxQueuen not pinned to CPU
20:54:15 | 2018-10-02 Tue 20:51:26:082.760 CEST overcloud63m-compdpdk-60.nuremberg-cc13.de [Thread 140645128644352, Pid 162]: KsyncTxQueue CPU pinning policy <>. KsyncTxQueuen not pinned to CPU
20:54:15 | 2018-10-02 Tue 20:53:46:372.378 CEST overcloud63m-compdpdk-60.nuremberg-cc13.de [Thread 139898531747584, Pid 237]: KsyncTxQueue CPU pinning policy <>. KsyncTxQueuen not pinned to CPU

Tags: dpdk vrouter cc13
Revision history for this message
Bernhard Koessler (bkoessler) wrote :
tags: added: dpdk vrouter
Revision history for this message
Bernhard Koessler (bkoessler) wrote :
description: updated
Revision history for this message
bassim (aly12) wrote :

I have the same issue also in contrail 4.1.12. The workaround is to kill all vrouter processes, unbind the DPDK interfaces from kernel and bind them to DPDK then start the vrouter

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.