CC13: restarting DPDK vrouter hangs

Bug #1795828 reported by Bernhard Koessler
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Incomplete
High
alexey-mr
R5.0
Won't Fix
High
alexey-mr
Trunk
Won't Fix
High
alexey-mr

Bug Description

CC13/Contrail 5.0.1 setup with Intel X710 Fortville NICs.

Using CC13 and Contrail 5.0.1 the procedure to restart the DPDK vrouter is:
ifdown vhost0
ifup vhost0

this unbinds DPDK driver, stops/removes container, removes vhost interface and adds/starts stuff again.

However, it is seen that most of the time the scripts hangs when unbinding the DPDK driver. Container needs to be manually stopped and UIO driver unloaded in some cases to fix it.

Issuing ifdown vhost0 - unbind is stuck and shell does not return:

19:24:26 | [root@overcloud63m-compdpdk-60 ~]# ifdown vhost0
19:24:27 | INFO: rebind device 0000:06:00.0 from vfio-pci to driver i40e
19:24:27 | INFO: unbind 0000:06:00.0 from vfio-pci
19:24:55 | ^Z
19:24:55 |
19:24:55 |
19:24:56 | ^Z
19:24:56 |
19:24:56 |
19:24:56 |
19:25:58 |
19:25:58 |
19:25:58 |
19:25:58 |
19:26:01 |
19:26:01 | [1]+ Stopped ifdown vhost0
19:26:01 | [root@overcloud63m-compdpdk-60 ~]#

from another shell, stop the dpdk container (script from previous shell returns at this point - see last lines above):
19:26:01 | [root@overcloud63m-compdpdk-60 ~]# docker stop contrail-vrouter-agent-dpdk
19:26:02 | contrail-vrouter-agent-dpdk

DPDK driver unbind gets stuck and the interfaces look like this:
19:26:09 | [root@overcloud63m-compdpdk-60 ~]# /var/lib/docker/overlay2/fec97702bec79042f8007e0c53823adfb48affff0dc18235bb18c6739893a9fb/diff/opt/contrail/bin/dpdk_nic_bind.py -s
19:26:09 |
19:26:09 | Network devices using DPDK-compatible driver
19:26:09 | ============================================
19:26:09 | 0000:06:00.1 'Ethernet Controller X710 for 10GbE SFP+' drv=vfio-pci unused=i40e
19:26:09 |
19:26:09 | Network devices using kernel driver
19:26:09 | ===================================
19:26:09 | 0000:16:00.0 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno1 drv=tg3 unused=vfio-pci *Active*
19:26:09 | 0000:16:00.1 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno2 drv=tg3 unused=vfio-pci *Active*
19:26:09 | 0000:16:00.2 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno3 drv=tg3 unused=vfio-pci
19:26:09 | 0000:16:00.3 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno4 drv=tg3 unused=vfio-pci
19:26:09 | 0000:81:00.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f0 drv=i40e unused=vfio-pci
19:26:09 | 0000:81:00.1 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f1 drv=i40e unused=vfio-pci
19:26:09 |
19:26:09 | Other network devices
19:26:09 | =====================
19:26:09 | 0000:06:00.0 'Ethernet Controller X710 for 10GbE SFP+' unused=i40e,vfio-pci
19:27:20 | [root@overcloud63m-compdpdk-60 ~]# docker stop contrail-vrouter-agent-dpdk

Also, manualy trying dpdk_nic_bind.py –u gets stuck and does not terminate and cannot be interrupted.

Sometimes the nic unbinds after killing the dpdk container but sometimes it is still stuck as shown above and UIO drier needs to be unloaded:

19:28:15 | [root@overcloud63m-compdpdk-60 ~]# modprobe -r vfio_pci
19:28:25 | [root@overcloud63m-compdpdk-60 ~]# /var/lib/docker/overlay2/fec97702bec79042f8007e0c53823adfb48affff0dc18235bb18c6739893a9fb/diff/opt/contrail/bin/dpdk_nic_bind.py -s
19:28:26 |
19:28:26 | Network devices using DPDK-compatible driver
19:28:26 | ============================================
19:28:26 | <none>
19:28:26 |
19:28:26 | Network devices using kernel driver
19:28:26 | ===================================
19:28:26 | 0000:16:00.0 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno1 drv=tg3 unused= *Active*
19:28:26 | 0000:16:00.1 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno2 drv=tg3 unused= *Active*
19:28:26 | 0000:16:00.2 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno3 drv=tg3 unused=
19:28:26 | 0000:16:00.3 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno4 drv=tg3 unused=
19:28:26 | 0000:81:00.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f0 drv=i40e unused=
19:28:26 | 0000:81:00.1 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f1 drv=i40e unused=
19:28:26 |
19:28:26 | Other network devices
19:28:26 | =====================
19:28:26 | 0000:06:00.0 'Ethernet Controller X710 for 10GbE SFP+' unused=i40e
19:28:26 | 0000:06:00.1 'Ethernet Controller X710 for 10GbE SFP+' unused=i40e
19:28:40 | [root@overcloud63m-compdpdk-60 ~]# modprobe vfio_pci

19:28:43 | [root@overcloud63m-compdpdk-60 ~]# /var/lib/docker/overlay2/fec97702bec79042f8007e0c53823adfb48affff0dc18235bb18c6739893a9fb/diff/opt/contrail/bin/dpdk_nic_bind.py -s
19:28:43 |
19:28:43 | Network devices using DPDK-compatible driver
19:28:43 | ============================================
19:28:43 | <none>
19:28:43 |
19:28:43 | Network devices using kernel driver
19:28:43 | ===================================
19:28:43 | 0000:16:00.0 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno1 drv=tg3 unused=vfio-pci *Active*
19:28:43 | 0000:16:00.1 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno2 drv=tg3 unused=vfio-pci *Active*
19:28:43 | 0000:16:00.2 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno3 drv=tg3 unused=vfio-pci
19:28:43 | 0000:16:00.3 'NetXtreme BCM5719 Gigabit Ethernet PCIe' if=eno4 drv=tg3 unused=vfio-pci
19:28:43 | 0000:81:00.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f0 drv=i40e unused=vfio-pci
19:28:43 | 0000:81:00.1 'Ethernet Controller X710 for 10GbE SFP+' if=ens5f1 drv=i40e unused=vfio-pci
19:28:43 |
19:28:43 | Other network devices
19:28:43 | =====================
19:28:43 | 0000:06:00.0 'Ethernet Controller X710 for 10GbE SFP+' unused=i40e,vfio-pci
19:28:43 | 0000:06:00.1 'Ethernet Controller X710 for 10GbE SFP+' unused=i40e,vfio-pci
19:28:44 | [root@overcloud63m-compdpdk-60 ~]#
19:28:45 | [root@overcloud63m-compdpdk-60 ~]#
19:28:45 | [root@overcloud63m-compdpdk-60 ~]#
19:28:55 | [root@overcloud63m-compdpdk-60 ~]# ifup vhost0
19:28:55 | /usr/bin/docker-current: Error response from daemon: Conflict. The container name "/contrail-vrouter-agent-dpdk" is already in use by container a2535a9c1807937b3484cb89538d341c8d733647fa8bd9f49e8a43523ccf0b5b. You have to remove (or rename) that container to be able to reuse that name..
19:28:55 | See '/usr/bin/docker-current run --help'.
19:28:55 | Cannot find device "vhost0"
19:28:55 | Cannot find device "vhost0"
19:28:55 | ERROR : [/etc/sysconfig/network-scripts/ifup-vhost] Failed to bring up vhost0.
19:28:57 | [root@overcloud63m-compdpdk-60 ~]#
19:28:57 | [root@overcloud63m-compdpdk-60 ~]#
19:28:58 | [root@overcloud63m-compdpdk-60 ~]#
19:29:25 | [root@overcloud63m-compdpdk-60 ~]# docker rename contrail-vrouter-agent-dpdk contrail-vrouter-agent-dpdk-xx
19:29:28 | [root@overcloud63m-compdpdk-60 ~]# ifup vhost0
19:29:28 | 3962908a23ae4f75991db4f6ad5b542805706f671004286ad89bb5f7eb1c387e
19:29:28 | INFO: wait DPDK agent to run... 1
19:29:33 | INFO: wait DPDK agent to run... 2
19:29:38 | INFO: wait DPDK agent to run... 3
19:29:43 | INFO: wait DPDK agent to run... 4
19:29:48 | INFO: wait DPDK agent to run... 5
19:29:53 | INFO: wait DPDK agent to run... 6
19:29:58 | INFO: wait DPDK agent to run... 7
19:30:03 | INFO: wait DPDK agent to run... 8
19:30:08 | INFO: wait DPDK agent to run... 9
19:30:13 | INFO: wait DPDK agent to run... 10
19:30:18 | INFO: wait DPDK agent to run... 11
19:30:18 | INFO: wait vhost0 to be initilaized... 0/60
19:30:45 | INFO: wait vhost0 to be initilaized... 1/60
19:30:45 | INFO: wait vhost0 to be initilaized... 2/60
19:30:45 | INFO: vhost0 is ready.

Tags: dpdk cc13
description: updated
Revision history for this message
Vinod Nair (vinodnair) wrote :

Hi Bernhard

restart of the vrouter is be done by restarting the vrouter container. Please check if you see any issue with restart of the container.

Changed in juniperopenstack:
assignee: Vinod Nair (vinodnair) → Bernhard Koessler (bkoessler)
Revision history for this message
Bernhard Koessler (bkoessler) wrote :

From Michael Henkel:
ifdown stops dpdk pmd container and attaches interfaces back to kernel. ifup attaches interfaces to dpdk and starts pmd container .
dpdk pmd container should not be restarted manually.

Changed in juniperopenstack:
assignee: Bernhard Koessler (bkoessler) → Vinod Nair (vinodnair)
Revision history for this message
Vinod Nair (vinodnair) wrote :

Michael - If this is the model we want to follow for osp13, then the ifdown should stop the vrouter container first and bind the interfaces back to kernel and the reverse on ifup.

Revision history for this message
Sivakumar Ganapathy (hotlava51) wrote :

The changes to fix this bug needs to be part of provisioning or part of if-up/down scripts. Hence removing the vrouter tag.

tags: removed: vrouter
Jeba Paulaiyan (jebap)
tags: added: blocker
Revision history for this message
alexey-mr (alexey-morlang) wrote :

@Vinod: why ifup/down should touch agent container? To down vhost0 interface it is enough to just stop dkdp container and bind back to system NICs, ifup bind NICs to dpdk again and init vhost0.

@Bernard: have you ever seen the same with any other driver, say, uio_pci_generic ?

Revision history for this message
alexey-mr (alexey-morlang) wrote :

May I have access to setup?

Jeba Paulaiyan (jebap)
tags: removed: blocker
Revision history for this message
alexey-mr (alexey-morlang) wrote :

Issue was in manually patched /etc/sysconfig/network-scripts-vrouter-dpdk-env file. There were names of containers commented that is why ifdown script was not able to stop dkdk container and rebind scripts hangs because of opened descriptors.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.