'Flow table limit' is different from the actual one

Bug #1699425 reported by mehul
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Invalid
High
Divakar Dharanalakota
R2.21.x
Fix Released
High
Divakar Dharanalakota

Bug Description

Hi Team,

Customer reported below issue in Contrail 2.21.3-55 and 2.21.3-57

Even if 'Flow Table limit: 524288' is set, it reached the upper limit of flow entry at around 200K and the value of flow table full is up.

As per their observation the above issue is not observed in Contrail Contrail 2.21.2-36.

*Build36

root@JunCom00:~# vrouter --info
vRouter module version 2.21.2 (Built by contrail-builder@contrail-ec-build09 on 2016-04-21 20:23:07.478944)
Interfaces limit 4352
VRF tables limit 4096
NextHops limit 65536
MPLS Labels limit 11520
Bridge Table limit 262144
Bridge Table Overflow limit 4096
Flow Table limit 524288
Flow Table overflow limit 8192
Mirror entries limit 255

root@JunCom00:~# cat /etc/modprobe.d/vrouter.conf
options vrouter vr_mpls_labels=11520

2017-06-20 19:26:46 +0900
Flow Statistics
---------------
    Total Entries --- Total = 486969, new = 623
    Active Entries --- Total = 482874, new = 633
    Hold Entries --- Total = 4095, new = -10
    Fwd flow Entries - Total = 482874
    drop flow Entries - Total = 0
    NAT flow Entries - Total = 0

    Rate of change of Active Entries
    --------------------------------
        current rate = 894
        Avg setup rate = 3475
        Avg teardown rate = 0
    Rate of change of Flow Entries
    ------------------------------
        current rate = 879

root@JunCom00:~# date ; dropstats | grep -v " 0"
Tue Jun 20 19:26:57 JST 2017

Flow Unusable 3336327
Flow Table Full 349639
Flow Action Drop 1

Discards 218761
Cloned Original 18

===========================================
*Build57 default

root@JunCom01:~# vrouter --info
vRouter module version 2.21.3 (Built by contrail-builder@contrail-ec-build09 on 2017-01-25 01:34:56.990837)
Interfaces limit 4352
VRF tables limit 4096
NextHops limit 65536
MPLS Labels limit 11520
Bridge Table limit 262144
Bridge Table Overflow limit 4096
Flow Table limit 524288
Flow Table overflow limit 8192
Mirror entries limit 255

root@JunCom01:~# cat /etc/modprobe.d/vrouter.conf
options vrouter vr_mpls_labels=11520

2017-06-20 19:12:55 +0900
Flow Statistics
---------------
    Total Entries --- Total = 270210, new = -4
    Active Entries --- Total = 266113, new = -4
    Hold Entries --- Total = 4097, new = 0
    Fwd flow Entries - Total = 266096
    drop flow Entries - Total = 17
    NAT flow Entries - Total = 0

    Rate of change of Active Entries
    --------------------------------
        current rate = -6
        Avg setup rate = 5784
        Avg teardown rate = 0
    Rate of change of Flow Entries
    ------------------------------
        current rate = -6

root@JunCom01:~# date ; dropstats | grep -v " 0"
Tue Jun 20 19:13:05 JST 2017

Flow Unusable 1697824
Flow Table Full 1886
Flow Action Drop 120

Discards 122792
Cloned Original 193

root@JunCom01:~# date ; dropstats | grep -v " 0"
Tue Jun 20 19:13:05 JST 2017

Flow Unusable 1713891
Flow Table Full 1894
Flow Action Drop 120

Discards 122818
Cloned Original 193

===========================================
*Build57 vrouter.conf was changed.

root@JunCom01:~# vrouter --info
vRouter module version 2.21.3 (Built by contrail-builder@contrail-ec-build09 on 2017-01-25 01:34:56.990837)
Interfaces limit 4352
VRF tables limit 4096
NextHops limit 65536
MPLS Labels limit 11520
Bridge Table limit 262144
Bridge Table Overflow limit 4096
Flow Table limit 1048576 <<<<<< 2 times
Flow Table overflow limit 8192
Mirror entries limit 255

root@JunCom01:~# cat /etc/modprobe.d/vrouter.conf
options vrouter vr_mpls_labels=11520 vr_flow_entries=1048576 <<<<<< 2 times

2017-06-20 18:57:05 +0900
Flow Statistics
---------------
    Total Entries --- Total = 437197, new = 33
    Active Entries --- Total = 433106, new = 39
    Hold Entries --- Total = 4091, new = -6
    Fwd flow Entries - Total = 433059
    drop flow Entries - Total = 47
    NAT flow Entries - Total = 0

    Rate of change of Active Entries
    --------------------------------
        current rate = 51
        Avg setup rate = 5538
        Avg teardown rate = 3047
    Rate of change of Flow Entries
    ------------------------------
        current rate = 43

root@JunCom01:~# date ; dropstats | grep -v " 0"
Tue Jun 20 18:57:08 JST 2017

Flow Unusable 17615548
Flow Table Full 4507
Flow Action Drop 85704

Discards 827012
Cloned Original 3099546

root@JunCom01:~# date ; dropstats | grep -v " 0"
Tue Jun 20 18:57:08 JST 2017

Flow Unusable 17628874
Flow Table Full 4511
Flow Action Drop 85704

Discards 827068
Cloned Original 3099546

===========================================

They changed the file(/etc/modprobe.d/vrouter.conf). options vrouter vr_flow_entries=1048576

The upper limit of flow entry has risen to around 430K.

If they change this value a many times, half of the set the value looks like an actual one.

They needs answers on below points.

1. They do not want to reboot the server to recover the problem. They need workaround of this issue.

2. What is the root cause of this issue.

3. Anything changed by any other parameter? Especially, since max_vm_flows is a parameter used by customers, it is interested to know whether it has impact or not.

-Regards,
Mehul Patel

Tags: vrouter
Revision history for this message
mehul (pmehul) wrote :

Hi Team,

This issue is occurred in production environment, so please this on very high priority.

-Regards,
Mehul Patel

information type: Proprietary → Public
Revision history for this message
mehul (pmehul) wrote :

Hi Team,

Since the value of 'Flow Table Full' began to rise, they judged that it reached the limit at 270210 and they unable to generate flow entry beyond the limit 270210 since the value of flow table full is getting increased as below

Flow Table Full 1886

Flow Table Full 1894

After changing the value vr_mpls_labels=11520 vr_flow_entries=1048576 twice in the file /etc/modprobe.d/vrouter.conf, they able to generate the flow entry 437197 which cross the earlier limit 270210. However, this also not getting increased beyond the 437197 since the value of flow table entry is getting increasing as below

Flow Table Full 4507

Flow Table Full 4511

In Build 36, flow was generated up to the upper limit of 500 k flows which is closer to limit set in vr_flow_entries . However, Build 55 and Build 57 did not generate flows up to the upper limit. When they measured in thier lab, it stopped about 300k flow.

They are suspecting that this hold entry has stopped processing the total entry.

==============================
2017-06-21 17:26:13 +0900
Flow Statistics
---------------
     Total Entries --- Total = 297297, new = 0
     Active Entries --- Total = 293200, new = 0
     Hold Entries --- Total = 4097, new = 0 <<<<<<< Here
     Fwd flow Entries - Total = 293200
     drop flow Entries - Total = 0
     NAT flow Entries - Total = 0

==============================

-Regards,
Mehul Patel

Revision history for this message
mehul (pmehul) wrote :

Hi Team,

The following log is output in the log of the corresponding compute.

* contrail-vrouter-agent.log
======================
2017-06-20 Tue 03:17:08:680.577 UTC kw1ap-vscp0257n [Thread 139801262483200, Pid 29317]: VRouter [SYS_ERR]: FlowLog: 340153 FlowAudit : Converting HOLD entry to short flow controller/src/vnsw/agent/vrouter/ksync/flowtable_ksync.cc 653
2017-06-20 Tue 03:17:58:686.875 UTC kw1ap-vscp0257n [Thread 139801396127488, Pid 29317]: VRouter [SYS_ERR]: FlowLog: 366349 FlowAudit : Converting HOLD entry to short flow controller/src/vnsw/agent/vrouter/ksync/flowtable_ksync.cc 653
2017-06-20 Tue 03:18:44:470.723 UTC kw1ap-vscp0257n [Thread 139800648611584, Pid 29317]: VrResponseMsg Error: Bad file descriptor
===================

Does this mean that the Hold entry has reached the upper limit as they reported in their lab env?

Please tell us what details you require to debug this issue, logs, core dump and so on

-Regards,

Mehul Patel

tags: added: vrouter
Changed in juniperopenstack:
importance: Undecided → Critical
importance: Critical → High
assignee: nobody → Hari Prasad Killi (haripk)
milestone: none → r2.21
Revision history for this message
Hari Prasad Killi (haripk) wrote :

The issue was due to wrong error code from vrouter to vrouter-agent. When a forward flow is created and agent tries to create a reverse flow and it cannot get a flow table entry (because the corresponding hash bucket & overflow entry are full), error response was incorrect. Due to this, these flows do not get deleted and remain in hold state. Once we have enough of such instances, the hold limit is hit and new flow requests are not entertained. Naveen and Divakar have tested a fix which showed about 450K+ active flow entries with build 72.

This issue shouldn’t be present in R3.1.

A note on the test being done – VM is sending continuous flows changing the port numbers, without checking whether the connection is succeeding or not (only dropstats are being validated). At some point of time around 50% flow table size, flow requests get dropped (in both build 36 & new) because flow entries cannot be found (even though flow table has entries, the corresponding hash bucket and overflow table are full). In practice, these drops also would be an issue and hence flow table size would have to be larger to accommodate such requests.

Changed in juniperopenstack:
assignee: Hari Prasad Killi (haripk) → Divakar Dharanalakota (ddivakar)
Revision history for this message
mehul (pmehul) wrote :

Hi Hari,

Will you provide a new build to fix the problem? If yes then could you tell me what is the expected date to be delivered?

-Regards,
Mehul Patel

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/33404
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
Hari Prasad Killi (haripk) wrote :

Issue is not present in branches other than 2.21.x.

Changed in juniperopenstack:
status: New → Invalid
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/33404
Committed: http://github.com/Juniper/contrail-vrouter/commit/3ee94b5c00aad86920113531ccf960a14e912a48
Submitter: Zuul (<email address hidden>)
Branch: R2.21.x

commit 3ee94b5c00aad86920113531ccf960a14e912a48
Author: Hari Prasad Killi <email address hidden>
Date: Tue Jul 4 14:20:40 2017 +0530

Correct the return status of ENOSPC when flow table FULL

When the Agent adds a flow entry to Vrouter kernel module
(in particular reverse flow entry as forward flow entry is
always added by data path) and if the flow table is full,
Agent expects the return status as ENOSPC. But Vrouter is
returing EEXISTS wrongly. This is because, vr_find_free_entry()
is manipulating the flow index to 0 (which is a valid flow index)
even when the free entry is not found.

As a fix, the returned flow index is not manipulated if the free
flow entry is not found

Change-Id: Ia8e68006fdea1f36fedf5b99ed9cc7884e04e3b9
closes-bug: #1699425

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/33601
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/33601
Committed: http://github.com/Juniper/contrail-vrouter/commit/026a2eaf32b0a4f76ef170f12d16906f1d5f31ab
Submitter: Zuul (<email address hidden>)
Branch: R2.21.x

commit 026a2eaf32b0a4f76ef170f12d16906f1d5f31ab
Author: Hari Prasad Killi <email address hidden>
Date: Thu Jul 13 16:58:56 2017 +0530

Dont update flow index if find flow fails

While searching the flow table, if the flow lookup fails,
the flow index is still modified though search has failed.
This is resulting in caller returning a different error status
though the lookup resulted in failure. As a fix, the flow index
is updated only if the flow lookup succeds

This fix is continuation of the fix https://review.opencontrail.org/#/c/33404/

closes-bug: #1699425

Change-Id: I5acd2b4da30304f889ed6ba790211b1534f54cfa

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.