5.0: Ansible: DPDK compute stopped forwarding no mbufs

Bug #1765162 reported by Vinod Nair
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Committed
Critical
Jeya ganesh babu J
Trunk
Fix Committed
Critical
Jeya ganesh babu J

Bug Description

Contrail 5.0 DPDK compute stopped forwarding

Vrouter Interface Table

Flags: P=Policy, X=Cross Connect, S=Service Chain, Mr=Receive Mirror
       Mt=Transmit Mirror, Tc=Transmit Checksum Offload, L3=Layer 3, L2=Layer 2
       D=DHCP, Vp=Vhost Physical, Pr=Promiscuous, Vnt=Native Vlan Tagged
       Mnp=No MAC Proxy, Dpdk=DPDK PMD Interface, Rfl=Receive Filtering Offload, Mon=Interface is Monitored
       Uuf=Unknown Unicast Flood, Vof=VLAN insert/strip offload, Df=Drop New Flows, L=MAC Learning Enabled
       Proxy=MAC Requests Proxied Always, Er=Etree Root, Mn=Mirror without Vlan Tag, Ig=Igmp Trap Enabled

vif0/0 PCI: 0000:00:00.0 (Speed 20000, Duplex 0)
            Type:Physical HWaddr:90:e2:ba:50:ae:d8 IPaddr:0.0.0.0
            Vrf:0 Mcast Vrf:65535 Flags:TcL3L2VpEr QOS:-1 Ref:22
            RX device packets:5 bytes:361 errors:0 no mbufs:80501815
            RX port packets:2 errors:0
            RX queue packets:0 errors:0
            RX queue errors to lcore 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
            RX packets:2 bytes:168 errors:0
            TX packets:2 bytes:123 errors:0
            Drops:148539300
            TX queue packets:0 errors:0
            TX port packets:2 errors:0
            TX device packets:9 bytes:799 errors:0

2018-04-18 11:36:06 -0700
2018-04-18 10:36:33,178 UVHOST: Client _tap03f6009e-aa: handling message 18
2018-04-18 10:36:33,178 UVHOST: Client _tap03f6009e-aa: setting vring 1 ready state 1
2018-04-18 10:59:22,741 PMD: Bond 2: slave id 0 distributing started.
2018-04-18 11:02:23,427 PMD: Bond 2: slave id 0 distributing started.
2018-04-18 11:15:52,523 PMD: Bond 2: slave id 0 distributing started.
2018-04-18 11:15:52,523 PMD: Bond 2: slave id 1 distributing started.
2018-04-18 11:17:34,381 VROUTER: Adding monitoring vif 4348 (gen. 9) device mon3 to monitor vif 3
2018-04-18 11:17:34,381 VROUTER: KNI is not available
2018-04-18 11:17:34,381 VROUTER: creating TAP device mon3
2018-04-18 11:17:34,381 VROUTER: lcore 14 TX to HW queue 0
2018-04-18 11:17:34,381 VROUTER: lcore 15 TX to HW queue 1
2018-04-18 11:17:34,381 VROUTER: lcore 16 TX to HW queue 2
2018-04-18 11:17:34,381 VROUTER: lcore 17 TX to HW queue 3
2018-04-18 11:17:34,381 VROUTER: lcore 8 TX to HW queue 4
2018-04-18 11:17:34,381 VROUTER: lcore 9 TX to HW queue 5
2018-04-18 11:17:34,381 VROUTER: lcore 10 TX to HW queue 6
2018-04-18 11:17:34,381 VROUTER: lcore 11 TX to HW queue 7
2018-04-18 11:17:34,381 VROUTER: lcore 12 TX to HW queue 8
2018-04-18 11:17:34,381 VROUTER: lcore 13 TX to HW queue 9
2018-04-18 11:17:42,514 VROUTER: Deleting monitoring vif 4348 device to monitor vif 3
2018-04-18 11:17:42,514 VROUTER: releasing lcore 8 TX queue 0
2018-04-18 11:17:42,514 VROUTER: releasing lcore 9 TX queue 0
2018-04-18 11:17:42,514 VROUTER: releasing lcore 10 TX queue 0
2018-04-18 11:17:42,514 VROUTER: releasing lcore 11 TX queue 0
2018-04-18 11:17:42,514 VROUTER: releasing lcore 12 TX queue 0
2018-04-18 11:17:42,514 VROUTER: releasing lcore 13 TX queue 0
2018-04-18 11:17:42,515 VROUTER: releasing lcore 14 TX queue 0
2018-04-18 11:17:42,515 VROUTER: releasing lcore 15 TX queue 0
2018-04-18 11:17:42,515 VROUTER: releasing lcore 16 TX queue 0
2018-04-18 11:17:42,515 VROUTER: releasing lcore 17 TX queue 0
2018-04-18 11:17:42,515 VROUTER: releasing vif 4348 TAP device
2018-04-18 11:19:23,223 PMD: Bond 2: slave id 0 distributing started.
2018-04-18 11:22:24,213 PMD: Bond 2: slave id 0 distributing started.

Tags: dpdk
Vinod Nair (vinodnair)
summary: - 5.0: Ansible: DPDK compute stpped forwarding no mbufs
+ 5.0: Ansible: DPDK compute stopped forwarding no mbufs
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/42240
Submitter: Jeya ganesh babu (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/42240
Committed: http://github.com/Juniper/contrail-vnc/commit/b2fec73e93f970a54fddb535764c9910bb1d90df
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit b2fec73e93f970a54fddb535764c9910bb1d90df
Author: Jeya ganesh babu J <email address hidden>
Date: Thu Apr 19 21:44:01 2018 -0700

DPDK forwarding issue

Partial-bug: #1765162
Reverting dpdk version to 17.02 to unblock qa tests.

Change-Id: Ieaa6093933ff5480d38bde0d303ef558368cfd2b

Revision history for this message
Jeya ganesh babu J (jjeya) wrote :

removing blocker tag and changing the release as the dpdk is reverted back to 17.02

tags: removed: blocker
Jeba Paulaiyan (jebap)
information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/43229
Submitter: Jeya ganesh babu (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/43229
Committed: http://github.com/Juniper/contrail-vnc/commit/096cc0907d0f0e75500f8c827247ab78788e0af7
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit 096cc0907d0f0e75500f8c827247ab78788e0af7
Author: Jeya ganesh babu J <email address hidden>
Date: Tue May 22 21:07:41 2018 -0700

Reverting back to DPDK 17.11

Partial-bug: #1765162
With 5.0 released, reverting back to dpdk 17.11.

Change-Id: I59f15485e197e3bc90d07a0fecd2713216779df5

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/43939
Submitter: Yi Yang (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/43943
Submitter: Jeya ganesh babu (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/43939
Committed: http://github.com/Juniper/contrail-vrouter/commit/4db65eb46bacc78d264ac6239b665720cb93b3b8
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit 4db65eb46bacc78d264ac6239b665720cb93b3b8
Author: Yi Yang <email address hidden>
Date: Tue Jun 19 08:21:08 2018 +0800

Fix nombufs issue in DPDK 17.11

DPDK API rte_ring_sp_enqueue_bulk and rte_ring_mp_enqueue_bulk have
change against its return value, 0 and -ERRORNUM mean success and
error respectively before DPDK 17.05, but return value is 0 or n
after DPDK 17.05 (including 17.05), 0 and n mean error and success
respectively. Details are as below:

/**
* Enqueue several objects on the ring (multi-producers safe).
*
* This function uses a "compare and set" instruction to move the
* producer index atomically.
*
* @param r
* A pointer to the ring structure.
* @param obj_table
* A pointer to a table of void * pointers (objects).
* @param n
* The number of objects to add in the ring from the obj_table.
* @param free_space
* if non-NULL, returns the amount of space in the ring after the
* enqueue operation has finished.
* @return
* The number of objects enqueued, either 0 or n
*/
static __rte_always_inline unsigned int
rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
unsigned int n, unsigned int *free_space)

Previous code can work if no enqueue failure, this is why we can't
reproduce nombufs issue.

Closes-Bug: #1765162
Change-Id: Ieb48e216ca8136a8127d4dbaf33b999a181e09f7
Signed-off-by: Yi Yang <email address hidden>

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/43943
Committed: http://github.com/Juniper/contrail-vrouter/commit/a2bcb3d18a21c0ad39f47548a690f926ef2ec448
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit a2bcb3d18a21c0ad39f47548a690f926ef2ec448
Author: Yi Yang <email address hidden>
Date: Tue Jun 19 08:21:08 2018 +0800

Fix nombufs issue in DPDK 17.11

DPDK API rte_ring_sp_enqueue_bulk and rte_ring_mp_enqueue_bulk have
change against its return value, 0 and -ERRORNUM mean success and
error respectively before DPDK 17.05, but return value is 0 or n
after DPDK 17.05 (including 17.05), 0 and n mean error and success
respectively. Details are as below:

/**
* Enqueue several objects on the ring (multi-producers safe).
*
* This function uses a "compare and set" instruction to move the
* producer index atomically.
*
* @param r
* A pointer to the ring structure.
* @param obj_table
* A pointer to a table of void * pointers (objects).
* @param n
* The number of objects to add in the ring from the obj_table.
* @param free_space
* if non-NULL, returns the amount of space in the ring after the
* enqueue operation has finished.
* @return
* The number of objects enqueued, either 0 or n
*/
static __rte_always_inline unsigned int
rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
unsigned int n, unsigned int *free_space)

Previous code can work if no enqueue failure, this is why we can't
reproduce nombufs issue.

Closes-Bug: #1765162
Change-Id: Ieb48e216ca8136a8127d4dbaf33b999a181e09f7
Signed-off-by: Yi Yang <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.