CC13: DPDK vrouter not starting after server reboot
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R5.0 |
New
|
High
|
haji mohamed ashraf ali | |||
Trunk |
New
|
High
|
haji mohamed ashraf ali |
Bug Description
In a CC13/Contrail 5.0.1 setup with Intel X710 Fortville NICs, after rebooting one of the DPDK vrouter compute node servers, DPDK vrouter failed to start in a CC13/Contrail 5.0.1 environment
20:03:26 | [root@overcloud
20:24:52 | [root@overcloud
20:24:52 | Pod Service Original Name State Status
20:24:52 | vrouter agent contrail-
20:24:52 | vrouter agent-dpdk contrail-
20:24:52 | vrouter nodemgr contrail-nodemgr running Up 7 minutes
20:24:52 |
20:24:52 | vrouter driver is not PRESENT but agent pod is present
20:24:52 | == Contrail vrouter ==
20:24:52 | nodemgr: active
20:24:52 | agent: initializing
20:25:30 | [root@overcloud
20:25:30 | Error registering NetLink client: Connection refused (111)
20:25:33 | [root@overcloud
20:25:33 | [root@overcloud
20:25:34 | [root@overcloud
20:25:37 | [root@overcloud
20:25:38 |
20:25:38 | Network devices using DPDK-compatible driver
20:25:38 | =======
20:25:38 | 0000:06:00.0 'Ethernet Controller X710 for 10GbE SFP+' drv=vfio-pci unused=i40e
20:25:38 | 0000:06:00.1 'Ethernet Controller X710 for 10GbE SFP+' drv=vfio-pci unused=i40e
20:25:38 |
20:25:38 | Network devices using kernel driver
20:25:38 | =======
20:25:38 | 0000:16:00.0 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno1 drv=tg3 unused=vfio-pci *Active*
20:25:38 | 0000:16:00.1 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno2 drv=tg3 unused=vfio-pci *Active*
20:25:38 | 0000:16:00.2 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno3 drv=tg3 unused=vfio-pci
20:25:38 | 0000:16:00.3 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno4 drv=tg3 unused=vfio-pci
20:25:38 | 0000:81:00.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f0 drv=i40e unused=vfio-pci
20:25:38 | 0000:81:00.1 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f1 drv=i40e unused=vfio-pci
20:25:38 |
20:25:38 | Other network devices
20:25:38 | =======
20:25:38 | <none>
20:26:52 | [root@overcloud
20:26:52 | total 11841712
20:26:52 | -rw-------. 1 root root 2386378752 Sep 24 13:07 core.contrail-
20:26:52 | -rw-------. 1 root root 2394767360 Sep 24 19:29 core.contrail-
20:26:52 | -rw-------. 1 root root 2304565248 Sep 24 19:35 core.contrail-
20:26:52 | -rw-------. 1 root root 2500673536 Sep 25 10:59 core.contrail-
20:26:52 | -rw-------. 1 root root 2483900416 Oct 2 19:30 core.contrail-
20:26:52 | -rw-------. 1 root root 102932480 Oct 2 20:19 core.contrail-
The docker logs for dpdk and agent containers are attached in a file
Later I also saw this in the contrail-
20:54:14 | [root@overcloud
20:54:15 | 2018-10-02 Tue 20:19:06:186.610 CEST overcloud63m-
20:54:15 | 2018-10-02 Tue 20:51:26:082.760 CEST overcloud63m-
20:54:15 | 2018-10-02 Tue 20:53:46:372.378 CEST overcloud63m-
tags: | added: dpdk vrouter |
I have the same issue also in contrail 4.1.12. The workaround is to kill all vrouter processes, unbind the DPDK interfaces from kernel and bind them to DPDK then start the vrouter