5.0: Ansible: DPDK compute stopped forwarding no mbufs

Bug #1765162 reported by Vinod Nair on 2018-04-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Committed
Critical
Jeya ganesh babu J
Trunk
Fix Committed
Critical
Jeya ganesh babu J

Bug Description

Contrail 5.0 DPDK compute stopped forwarding

Vrouter Interface Table

Flags: P=Policy, X=Cross Connect, S=Service Chain, Mr=Receive Mirror
       Mt=Transmit Mirror, Tc=Transmit Checksum Offload, L3=Layer 3, L2=Layer 2
       D=DHCP, Vp=Vhost Physical, Pr=Promiscuous, Vnt=Native Vlan Tagged
       Mnp=No MAC Proxy, Dpdk=DPDK PMD Interface, Rfl=Receive Filtering Offload, Mon=Interface is Monitored
       Uuf=Unknown Unicast Flood, Vof=VLAN insert/strip offload, Df=Drop New Flows, L=MAC Learning Enabled
       Proxy=MAC Requests Proxied Always, Er=Etree Root, Mn=Mirror without Vlan Tag, Ig=Igmp Trap Enabled

vif0/0 PCI: 0000:00:00.0 (Speed 20000, Duplex 0)
            Type:Physical HWaddr:90:e2:ba:50:ae:d8 IPaddr:0.0.0.0
            Vrf:0 Mcast Vrf:65535 Flags:TcL3L2VpEr QOS:-1 Ref:22
            RX device packets:5 bytes:361 errors:0 no mbufs:80501815
            RX port packets:2 errors:0
            RX queue packets:0 errors:0
            RX queue errors to lcore 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
            RX packets:2 bytes:168 errors:0
            TX packets:2 bytes:123 errors:0
            Drops:148539300
            TX queue packets:0 errors:0
            TX port packets:2 errors:0
            TX device packets:9 bytes:799 errors:0

2018-04-18 11:36:06 -0700
2018-04-18 10:36:33,178 UVHOST: Client _tap03f6009e-aa: handling message 18
2018-04-18 10:36:33,178 UVHOST: Client _tap03f6009e-aa: setting vring 1 ready state 1
2018-04-18 10:59:22,741 PMD: Bond 2: slave id 0 distributing started.
2018-04-18 11:02:23,427 PMD: Bond 2: slave id 0 distributing started.
2018-04-18 11:15:52,523 PMD: Bond 2: slave id 0 distributing started.
2018-04-18 11:15:52,523 PMD: Bond 2: slave id 1 distributing started.
2018-04-18 11:17:34,381 VROUTER: Adding monitoring vif 4348 (gen. 9) device mon3 to monitor vif 3
2018-04-18 11:17:34,381 VROUTER: KNI is not available
2018-04-18 11:17:34,381 VROUTER: creating TAP device mon3
2018-04-18 11:17:34,381 VROUTER: lcore 14 TX to HW queue 0
2018-04-18 11:17:34,381 VROUTER: lcore 15 TX to HW queue 1
2018-04-18 11:17:34,381 VROUTER: lcore 16 TX to HW queue 2
2018-04-18 11:17:34,381 VROUTER: lcore 17 TX to HW queue 3
2018-04-18 11:17:34,381 VROUTER: lcore 8 TX to HW queue 4
2018-04-18 11:17:34,381 VROUTER: lcore 9 TX to HW queue 5
2018-04-18 11:17:34,381 VROUTER: lcore 10 TX to HW queue 6
2018-04-18 11:17:34,381 VROUTER: lcore 11 TX to HW queue 7
2018-04-18 11:17:34,381 VROUTER: lcore 12 TX to HW queue 8
2018-04-18 11:17:34,381 VROUTER: lcore 13 TX to HW queue 9
2018-04-18 11:17:42,514 VROUTER: Deleting monitoring vif 4348 device to monitor vif 3
2018-04-18 11:17:42,514 VROUTER: releasing lcore 8 TX queue 0
2018-04-18 11:17:42,514 VROUTER: releasing lcore 9 TX queue 0
2018-04-18 11:17:42,514 VROUTER: releasing lcore 10 TX queue 0
2018-04-18 11:17:42,514 VROUTER: releasing lcore 11 TX queue 0
2018-04-18 11:17:42,514 VROUTER: releasing lcore 12 TX queue 0
2018-04-18 11:17:42,514 VROUTER: releasing lcore 13 TX queue 0
2018-04-18 11:17:42,515 VROUTER: releasing lcore 14 TX queue 0
2018-04-18 11:17:42,515 VROUTER: releasing lcore 15 TX queue 0
2018-04-18 11:17:42,515 VROUTER: releasing lcore 16 TX queue 0
2018-04-18 11:17:42,515 VROUTER: releasing lcore 17 TX queue 0
2018-04-18 11:17:42,515 VROUTER: releasing vif 4348 TAP device
2018-04-18 11:19:23,223 PMD: Bond 2: slave id 0 distributing started.
2018-04-18 11:22:24,213 PMD: Bond 2: slave id 0 distributing started.

Vinod Nair (vinodnair) on 2018-04-18
summary: - 5.0: Ansible: DPDK compute stpped forwarding no mbufs
+ 5.0: Ansible: DPDK compute stopped forwarding no mbufs

Review in progress for https://review.opencontrail.org/42240
Submitter: Jeya ganesh babu (<email address hidden>)

Reviewed: https://review.opencontrail.org/42240
Committed: http://github.com/Juniper/contrail-vnc/commit/b2fec73e93f970a54fddb535764c9910bb1d90df
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit b2fec73e93f970a54fddb535764c9910bb1d90df
Author: Jeya ganesh babu J <email address hidden>
Date: Thu Apr 19 21:44:01 2018 -0700

DPDK forwarding issue

Partial-bug: #1765162
Reverting dpdk version to 17.02 to unblock qa tests.

Change-Id: Ieaa6093933ff5480d38bde0d303ef558368cfd2b

Jeya ganesh babu J (jjeya) wrote :

removing blocker tag and changing the release as the dpdk is reverted back to 17.02

tags: removed: blocker
Jeba Paulaiyan (jebap) on 2018-04-27
information type: Proprietary → Public

Review in progress for https://review.opencontrail.org/43229
Submitter: Jeya ganesh babu (<email address hidden>)

Reviewed: https://review.opencontrail.org/43229
Committed: http://github.com/Juniper/contrail-vnc/commit/096cc0907d0f0e75500f8c827247ab78788e0af7
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit 096cc0907d0f0e75500f8c827247ab78788e0af7
Author: Jeya ganesh babu J <email address hidden>
Date: Tue May 22 21:07:41 2018 -0700

Reverting back to DPDK 17.11

Partial-bug: #1765162
With 5.0 released, reverting back to dpdk 17.11.

Change-Id: I59f15485e197e3bc90d07a0fecd2713216779df5

Review in progress for https://review.opencontrail.org/43939
Submitter: Yi Yang (<email address hidden>)

Review in progress for https://review.opencontrail.org/43943
Submitter: Jeya ganesh babu (<email address hidden>)

Reviewed: https://review.opencontrail.org/43939
Committed: http://github.com/Juniper/contrail-vrouter/commit/4db65eb46bacc78d264ac6239b665720cb93b3b8
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit 4db65eb46bacc78d264ac6239b665720cb93b3b8
Author: Yi Yang <email address hidden>
Date: Tue Jun 19 08:21:08 2018 +0800

Fix nombufs issue in DPDK 17.11

DPDK API rte_ring_sp_enqueue_bulk and rte_ring_mp_enqueue_bulk have
change against its return value, 0 and -ERRORNUM mean success and
error respectively before DPDK 17.05, but return value is 0 or n
after DPDK 17.05 (including 17.05), 0 and n mean error and success
respectively. Details are as below:

/**
* Enqueue several objects on the ring (multi-producers safe).
*
* This function uses a "compare and set" instruction to move the
* producer index atomically.
*
* @param r
* A pointer to the ring structure.
* @param obj_table
* A pointer to a table of void * pointers (objects).
* @param n
* The number of objects to add in the ring from the obj_table.
* @param free_space
* if non-NULL, returns the amount of space in the ring after the
* enqueue operation has finished.
* @return
* The number of objects enqueued, either 0 or n
*/
static __rte_always_inline unsigned int
rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
unsigned int n, unsigned int *free_space)

Previous code can work if no enqueue failure, this is why we can't
reproduce nombufs issue.

Closes-Bug: #1765162
Change-Id: Ieb48e216ca8136a8127d4dbaf33b999a181e09f7
Signed-off-by: Yi Yang <email address hidden>

Reviewed: https://review.opencontrail.org/43943
Committed: http://github.com/Juniper/contrail-vrouter/commit/a2bcb3d18a21c0ad39f47548a690f926ef2ec448
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit a2bcb3d18a21c0ad39f47548a690f926ef2ec448
Author: Yi Yang <email address hidden>
Date: Tue Jun 19 08:21:08 2018 +0800

Fix nombufs issue in DPDK 17.11

DPDK API rte_ring_sp_enqueue_bulk and rte_ring_mp_enqueue_bulk have
change against its return value, 0 and -ERRORNUM mean success and
error respectively before DPDK 17.05, but return value is 0 or n
after DPDK 17.05 (including 17.05), 0 and n mean error and success
respectively. Details are as below:

/**
* Enqueue several objects on the ring (multi-producers safe).
*
* This function uses a "compare and set" instruction to move the
* producer index atomically.
*
* @param r
* A pointer to the ring structure.
* @param obj_table
* A pointer to a table of void * pointers (objects).
* @param n
* The number of objects to add in the ring from the obj_table.
* @param free_space
* if non-NULL, returns the amount of space in the ring after the
* enqueue operation has finished.
* @return
* The number of objects enqueued, either 0 or n
*/
static __rte_always_inline unsigned int
rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
unsigned int n, unsigned int *free_space)

Previous code can work if no enqueue failure, this is why we can't
reproduce nombufs issue.

Closes-Bug: #1765162
Change-Id: Ieb48e216ca8136a8127d4dbaf33b999a181e09f7
Signed-off-by: Yi Yang <email address hidden>

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers