Agent:flow got stuck in hold state after overnight traffic was run

Bug #1580855 reported by Sandip Dey on 2016-05-12
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
High
Anand H. Krishnan
R3.1
Fix Committed
High
Anand H. Krishnan
Trunk
Fix Committed
High
Anand H. Krishnan

Bug Description

Build:R3.02 35 kilo

Had the below setup

Configuration
===============
1.NAT service chain with 8 node ecmp
2.TCP server running on a node outside this cluster.The server was listening on 1000 ports
3.Launched 20 vms with tcp client , which on bootup, tries to connect this tcp server on all those 1000 ports.And continuously connecting/disconnecting to this server.

This setup ran overnight.
In the morning, deleted all those 20 vms.In nodel11 , one flow got stuck in hold state.

Logs saved at :http://10.204.216.50/Docs/bugs/<bug-id>

Setup:
------
host1 = 'root@10.204.217.139'
host2 = 'root@10.204.217.140'
host3 = 'root@10.204.217.147'
host4 = 'root@10.204.217.144'
host5 = 'root@10.204.217.147'
host6 = 'root@10.204.217.148'
host7 = 'root@10.204.217.149'
host8 = 'root@10.204.217.150'
host9 = 'root@10.204.217.210'
host10 = 'root@10.204.217.217'
host11 = 'root@10.204.217.218'
host12 = 'root@10.204.217.220'
host13 = 'root@10.204.217.247'
host14 = 'root@10.204.217.248'
host15 = 'root@10.204.217.249'
host16 = 'root@10.204.217.118'
host17 = 'root@10.204.217.119'
host18 = 'root@10.204.217.120'
host19 = 'root@10.204.217.121'
host20 = 'root@10.204.217.122'
host21 = 'root@10.204.217.123'
host22 = 'root@10.204.217.124'
host23 = 'root@10.204.217.131'

ext_routers = [('blr-mx2', '10.204.216.245')]
router_asn = 64512
public_vn_rtgt = 30001
#public_vn_subnet = "10.204.219.72/29"

host_build = 'vjoshi@10.204.216.56'

env.roledefs = {
    'all': [host1, host2, host3, host4, host5, host6,host7, host8, host9, host10, host11, host12, host13, host14, host15,host16,host17,host18,host19,host20,host21,host22,host23],
    'cfgm': [host1, host2, host3],
    'openstack': [host4, host5, host6],
    'webui': [host1, host2, host3],
    'control': [host1, host2, host3],
    'compute': [host7, host8, host9, host10, host11, host12, host13, host14, host15,host16,host17,host18,host19,host20,host21,host22,host23],
    'collector': [host1, host2, host3],
    'database': [host1, host2, host3],
    'build': [host_build],
}

env.hostnames = {
    'all': ['nodei27', 'nodei28', 'nodei35', 'nodei32', 'nodei35', 'nodei36', 'nodei37', 'nodei38', 'nodel4', 'nodel7', 'nodel8', 'nodel9', 'nodel10', 'nodel11', 'nodel12','nodei6','nodei7','nodei8','nodei9','nodei10','nodei11','nodei12','nodei19']
}

Sandip Dey (sandipd) wrote :

[5/12/16, 9:16:17 AM] Sandip Dey: Nodel11…
[5/12/16, 9:16:25 AM] Sandip Dey: this floe got stuck seems
[5/12/16, 9:16:26 AM] Sandip Dey: (Gen: 169, K(nh):158, Action:F, Flags:, S(nh):104, Stats:1/28, SPort 56550)

  4644474 192.168.16.17:42233 6 (6)
                         10.204.219.157:1738
(Gen: 80, K(nh):134, Action:H, Flags:M, TCP:S, S(nh):0, Stats:23/1380, SPort 0)

[5/12/16, 9:53:39 AM] vedujoshi: somehow , the flow got added and deleted very quickly
[5/12/16, 9:53:39 AM] vedujoshi: 2016-05-12 09:48:15.258 FlowTrace: operation = ADD info= [ gen_id = 80 flow_index = 4644474 nh_id = 134 source_ip = 192.168.16.17 source_port = 42233 destination_ip = 10.204.219.157 destination_port = 1738 protocol = 6 vrf = 4294967295 mirror_l= [ [ ] ] mirror_vrf = 65535 implicit_deny = 0 short_flow = 1 source_vn_list= [ [ (iter95) = UNKNOWN_, ] ] dest_vn_list= [ [ (iter96) = UNKNOWN_, ] ] source_vn_match = dest_vn_match = source_sg_id_l= [ [ ] ] dest_sg_id_l= [ [ ] ] vrf_assign = default-domain:sandipd:sandipd_internal_floating_ip_net:service-b0f9dfe2-43f0-4232-a3b8-2b1d2eb31d33-default-domain_sandipd_pt_instance l3_flow = 1 smac = 00:00:00:00:00:00 dmac = 00:00:00:00:00:00 drop_reason = UNKNOWN table_id = 3 ] file = controller/src/vnsw/agent/pkt/flow_mgmt.cc line = 484
2016-05-12 09:48:15.303 FlowTrace: operation = DEL info= [ gen_id = 80 flow_index = 4644474 nh_id = 134 source_ip = 192.168.16.17 source_port = 42233 destination_ip = 10.204.219.157 destination_port = 1738 protocol = 6 vrf = 4294967295 mirror_l= [ [ ] ] mirror_vrf = 65535 implicit_deny = 0 short_flow = 1 source_vn_list= [ [ (iter95) = UNKNOWN_, ] ] dest_vn_list= [ [ (iter96) = UNKNOWN_, ] ] source_vn_match = dest_vn_match = source_sg_id_l= [ [ ] ] dest_sg_id_l= [ [ ] ] vrf_assign = default-domain:sandipd:sandipd_internal_floating_ip_net:service-b0f9dfe2-43f0-4232-a3b8-2b1d2eb31d33-default-domain_sandipd_pt_instance l3_flow = 1 smac = 00:00:00:00:00:00 dmac = 00:00:00:00:00:00 drop_reason = UNKNOWN table_id = 3 ] file = controller/src/vnsw/agent/pkt/flow_mgmt.cc line = 484

Changed in juniperopenstack:
assignee: Hari Prasad Killi (haripk) → Anand H. Krishnan (anandhk)
importance: Undecided → High

Review in progress for https://review.opencontrail.org/20195
Submitter: Anand H. Krishnan (<email address hidden>)

Jeba Paulaiyan (jebap) on 2016-05-16
information type: Proprietary → Public

Reviewed: https://review.opencontrail.org/20195
Committed: http://github.org/Juniper/contrail-vrouter/commit/b091cbb773cdeb67c8d7012b7c1cc36558eaaf0f
Submitter: Zuul
Branch: R3.0

commit b091cbb773cdeb67c8d7012b7c1cc36558eaaf0f
Author: Anand H. Krishnan <email address hidden>
Date: Fri May 13 16:29:24 2016 +0530

Do not allow agent to modify a "NEW" flow

The "NEW" flag is set whenever a flow becomes active and is in
the transient state. If agent tries to modify the entry in that
state, a possibility because of reuse of an entry due to
eviction, a condition could happen where the flags used by datapath
could come from the flags set by agent, more specifically the
Modified flag, and thus be in a state where nothing can be done
in the entry. Hence, prevent agent from acting upon NEW flows.

Change-Id: I017fd7d32f0488cef90a17c491c6021bbdd181c7
Closes-BUG: #1580855

Review in progress for https://review.opencontrail.org/22138
Submitter: Anand H. Krishnan (<email address hidden>)

Review in progress for https://review.opencontrail.org/22139
Submitter: Anand H. Krishnan (<email address hidden>)

Reviewed: https://review.opencontrail.org/22139
Committed: http://github.org/Juniper/contrail-vrouter/commit/4a5554eb30a167e07eb946cc3dc8b2188599301a
Submitter: Zuul
Branch: R3.1

commit 4a5554eb30a167e07eb946cc3dc8b2188599301a
Author: Anand H. Krishnan <email address hidden>
Date: Fri May 13 16:29:24 2016 +0530

Do not allow agent to modify a "NEW" flow

The "NEW" flag is set whenever a flow becomes active and is in
the transient state. If agent tries to modify the entry in that
state, a possibility because of reuse of an entry due to
eviction, a condition could happen where the flags used by datapath
could come from the flags set by agent, more specifically the
Modified flag, and thus be in a state where nothing can be done
in the entry. Hence, prevent agent from acting upon NEW flows.

Change-Id: I017fd7d32f0488cef90a17c491c6021bbdd181c7
Closes-BUG: #1580855

Reviewed: https://review.opencontrail.org/22138
Committed: http://github.org/Juniper/contrail-vrouter/commit/97a3556e76f1b1fa16b2b37c3ba273e853aeb44c
Submitter: Zuul
Branch: master

commit 97a3556e76f1b1fa16b2b37c3ba273e853aeb44c
Author: Anand H. Krishnan <email address hidden>
Date: Fri May 13 16:29:24 2016 +0530

Do not allow agent to modify a "NEW" flow

The "NEW" flag is set whenever a flow becomes active and is in
the transient state. If agent tries to modify the entry in that
state, a possibility because of reuse of an entry due to
eviction, a condition could happen where the flags used by datapath
could come from the flags set by agent, more specifically the
Modified flag, and thus be in a state where nothing can be done
in the entry. Hence, prevent agent from acting upon NEW flows.

Change-Id: I017fd7d32f0488cef90a17c491c6021bbdd181c7
Closes-BUG: #1580855

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers