CC13: restarting DPDK vrouter hangs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Juniper Openstack |
Incomplete
|
High
|
alexey-mr | ||
R5.0 |
Won't Fix
|
High
|
alexey-mr | ||
Trunk |
Won't Fix
|
High
|
alexey-mr |
Bug Description
CC13/Contrail 5.0.1 setup with Intel X710 Fortville NICs.
Using CC13 and Contrail 5.0.1 the procedure to restart the DPDK vrouter is:
ifdown vhost0
ifup vhost0
this unbinds DPDK driver, stops/removes container, removes vhost interface and adds/starts stuff again.
However, it is seen that most of the time the scripts hangs when unbinding the DPDK driver. Container needs to be manually stopped and UIO driver unloaded in some cases to fix it.
Issuing ifdown vhost0 - unbind is stuck and shell does not return:
19:24:26 | [root@overcloud
19:24:27 | INFO: rebind device 0000:06:00.0 from vfio-pci to driver i40e
19:24:27 | INFO: unbind 0000:06:00.0 from vfio-pci
19:24:55 | ^Z
19:24:55 |
19:24:55 |
19:24:56 | ^Z
19:24:56 |
19:24:56 |
19:24:56 |
19:25:58 |
19:25:58 |
19:25:58 |
19:25:58 |
19:26:01 |
19:26:01 | [1]+ Stopped ifdown vhost0
19:26:01 | [root@overcloud
from another shell, stop the dpdk container (script from previous shell returns at this point - see last lines above):
19:26:01 | [root@overcloud
19:26:02 | contrail-
DPDK driver unbind gets stuck and the interfaces look like this:
19:26:09 | [root@overcloud
19:26:09 |
19:26:09 | Network devices using DPDK-compatible driver
19:26:09 | =======
19:26:09 | 0000:06:00.1 'Ethernet Controller X710 for 10GbE SFP+' drv=vfio-pci unused=i40e
19:26:09 |
19:26:09 | Network devices using kernel driver
19:26:09 | =======
19:26:09 | 0000:16:00.0 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno1 drv=tg3 unused=vfio-pci *Active*
19:26:09 | 0000:16:00.1 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno2 drv=tg3 unused=vfio-pci *Active*
19:26:09 | 0000:16:00.2 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno3 drv=tg3 unused=vfio-pci
19:26:09 | 0000:16:00.3 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno4 drv=tg3 unused=vfio-pci
19:26:09 | 0000:81:00.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f0 drv=i40e unused=vfio-pci
19:26:09 | 0000:81:00.1 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f1 drv=i40e unused=vfio-pci
19:26:09 |
19:26:09 | Other network devices
19:26:09 | =======
19:26:09 | 0000:06:00.0 'Ethernet Controller X710 for 10GbE SFP+' unused=
19:27:20 | [root@overcloud
Also, manualy trying dpdk_nic_bind.py –u gets stuck and does not terminate and cannot be interrupted.
Sometimes the nic unbinds after killing the dpdk container but sometimes it is still stuck as shown above and UIO drier needs to be unloaded:
19:28:15 | [root@overcloud
19:28:25 | [root@overcloud
19:28:26 |
19:28:26 | Network devices using DPDK-compatible driver
19:28:26 | =======
19:28:26 | <none>
19:28:26 |
19:28:26 | Network devices using kernel driver
19:28:26 | =======
19:28:26 | 0000:16:00.0 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno1 drv=tg3 unused= *Active*
19:28:26 | 0000:16:00.1 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno2 drv=tg3 unused= *Active*
19:28:26 | 0000:16:00.2 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno3 drv=tg3 unused=
19:28:26 | 0000:16:00.3 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno4 drv=tg3 unused=
19:28:26 | 0000:81:00.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f0 drv=i40e unused=
19:28:26 | 0000:81:00.1 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f1 drv=i40e unused=
19:28:26 |
19:28:26 | Other network devices
19:28:26 | =======
19:28:26 | 0000:06:00.0 'Ethernet Controller X710 for 10GbE SFP+' unused=i40e
19:28:26 | 0000:06:00.1 'Ethernet Controller X710 for 10GbE SFP+' unused=i40e
19:28:40 | [root@overcloud
19:28:43 | [root@overcloud
19:28:43 |
19:28:43 | Network devices using DPDK-compatible driver
19:28:43 | =======
19:28:43 | <none>
19:28:43 |
19:28:43 | Network devices using kernel driver
19:28:43 | =======
19:28:43 | 0000:16:00.0 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno1 drv=tg3 unused=vfio-pci *Active*
19:28:43 | 0000:16:00.1 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno2 drv=tg3 unused=vfio-pci *Active*
19:28:43 | 0000:16:00.2 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno3 drv=tg3 unused=vfio-pci
19:28:43 | 0000:16:00.3 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno4 drv=tg3 unused=vfio-pci
19:28:43 | 0000:81:00.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f0 drv=i40e unused=vfio-pci
19:28:43 | 0000:81:00.1 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f1 drv=i40e unused=vfio-pci
19:28:43 |
19:28:43 | Other network devices
19:28:43 | =======
19:28:43 | 0000:06:00.0 'Ethernet Controller X710 for 10GbE SFP+' unused=
19:28:43 | 0000:06:00.1 'Ethernet Controller X710 for 10GbE SFP+' unused=
19:28:44 | [root@overcloud
19:28:45 | [root@overcloud
19:28:45 | [root@overcloud
19:28:55 | [root@overcloud
19:28:55 | /usr/bin/
19:28:55 | See '/usr/bin/
19:28:55 | Cannot find device "vhost0"
19:28:55 | Cannot find device "vhost0"
19:28:55 | ERROR : [/etc/sysconfig
19:28:57 | [root@overcloud
19:28:57 | [root@overcloud
19:28:58 | [root@overcloud
19:29:25 | [root@overcloud
19:29:28 | [root@overcloud
19:29:28 | 3962908a23ae4f7
19:29:28 | INFO: wait DPDK agent to run... 1
19:29:33 | INFO: wait DPDK agent to run... 2
19:29:38 | INFO: wait DPDK agent to run... 3
19:29:43 | INFO: wait DPDK agent to run... 4
19:29:48 | INFO: wait DPDK agent to run... 5
19:29:53 | INFO: wait DPDK agent to run... 6
19:29:58 | INFO: wait DPDK agent to run... 7
19:30:03 | INFO: wait DPDK agent to run... 8
19:30:08 | INFO: wait DPDK agent to run... 9
19:30:13 | INFO: wait DPDK agent to run... 10
19:30:18 | INFO: wait DPDK agent to run... 11
19:30:18 | INFO: wait vhost0 to be initilaized... 0/60
19:30:45 | INFO: wait vhost0 to be initilaized... 1/60
19:30:45 | INFO: wait vhost0 to be initilaized... 2/60
19:30:45 | INFO: vhost0 is ready.
Hi Bernhard
restart of the vrouter is be done by restarting the vrouter container. Please check if you see any issue with restart of the container.