[3.0-2713~kilo]Multi-Inline SVC with Port-Tuples not working

Bug #1546073 reported by Ganesha HV
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
Critical
Hari Prasad Killi
Trunk
Fix Committed
Critical
Hari Prasad Killi

Bug Description

Setup
=====
nodeg25, g26 & g27 - cfgm
nodeg25 - webUI/Horizon
nodeg26 & g27 - ctrl
nodek8, k9 & k10 - compute

Steps
=====
1]. Created a service chain between networks l-vn(10.10.10.0/24) and r-vn(20.20.20.0/24)
2]. Launched 2 Instances with three ports each, svm-trans on nodek8 and svm-nat on node
3]. Started a ping from l-vm(10.10.10.30 launched on nodek9 to r-vm(20.20.20.3) on nodek10
4]. Observed that the ICMP echo requests are seen till the right-interface of the svm-trans,but nothing is seen coming to the left-interface of the svm-nat.
5]. The vrf seen in the service_vlan_list of the right interface of the svm-trans doesn't have route to 20.20.20.3.

root@nodek8:/var/log/nova# flow -l
Flow table(size 68157440, entries 532480)

Entries: Created 9098 Added 9098 Processed 9098 Used Overflow entries 0
(Created Flows/CPU: 1699 1542 346 208 146 115 103 77 263 116 677 349 210 165 134 100 42 37 15 13 10 18 22 1458 7 3 6 8 9 5 12 1183)(oflows 0)

Action:F=Forward, D=Drop N=NAT(S=SNAT, D=DNAT, Ps=SPAT, Pd=DPAT, L=Link Local Port)
 Other:K(nh)=Key_Nexthop, S(nh)=RPF_Nexthop
 Flags:E=Evicted, Ec=Evict Candidate, N=New Flow, M=Modified
TCP(r=reverse):S=SYN, F=FIN, R=RST, C=HalfClose, E=Established, D=Dead

    Index Source:Port/Destination:Port Proto(V)
-----------------------------------------------------------------------------------
   221568<=>243004 20.20.20.3:49152 1 (5)
                         10.10.10.3:0
    (K(nh):25, Action:D(NoDstRt), Flags:, S(nh):2, Statistics:0/0 UdpSrcPort 61327

   243004<=>221568 10.10.10.3:49152 1 (5)
                         20.20.20.3:0
    (K(nh):25, Action:D(NoDstRt), Flags:, S(nh):25, Statistics:1/102 UdpSrcPort 58757

The setup is intact.

tags: added: blocker
Revision history for this message
Ganesha HV (ganeshahv) wrote :

Observed vrouter cores with the following BT during the tests:

(gdb) bt
#0 0x00007ffa627c0cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffa627c40d8 in __GI_abort () at abort.c:89
#2 0x00007ffa627b9b86 in __assert_fail_base (fmt=0x7ffa6290a830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x11c9135 "0", file=file@entry=0x11daa90 "controller/src/vnsw/agent/oper/vrf.cc",
    line=line@entry=344, function=function@entry=0x11db000 <VrfEntry::DeleteTimeout()::__PRETTY_FUNCTION__> "bool VrfEntry::DeleteTimeout()") at assert.c:92
#3 0x00007ffa627b9c32 in __GI___assert_fail (assertion=0x11c9135 "0", file=0x11daa90 "controller/src/vnsw/agent/oper/vrf.cc", line=344,
    function=0x11db000 <VrfEntry::DeleteTimeout()::__PRETTY_FUNCTION__> "bool VrfEntry::DeleteTimeout()") at assert.c:101
#4 0x0000000000ab2dc2 in VrfEntry::DeleteTimeout (this=0x7ffa402aea20) at controller/src/vnsw/agent/oper/vrf.cc:344
#5 0x000000000117de49 in operator() (this=<optimized out>) at /usr/include/boost/function/function_template.hpp:767
#6 Timer::TimerTask::Run (this=0x2fcc090) at controller/src/base/timer.cc:42
#7 0x0000000001176eac in TaskImpl::execute (this=0x7ffa5c027c40) at controller/src/base/task.cc:253
#8 0x00007ffa6338fb3a in ?? () from /usr/lib/libtbb.so.2
#9 0x00007ffa6338b816 in ?? () from /usr/lib/libtbb.so.2
#10 0x00007ffa6338af4b in ?? () from /usr/lib/libtbb.so.2
#11 0x00007ffa633870ff in ?? () from /usr/lib/libtbb.so.2
#12 0x00007ffa633872f9 in ?? () from /usr/lib/libtbb.so.2
#13 0x00007ffa635ab182 in start_thread (arg=0x7ffa5ae43700) at pthread_create.c:312
#14 0x00007ffa6288447d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Moved the files here :
bhushana@10.204.216.50:/home/bhushana/Documents/technical/bugs/1546073

Revision history for this message
Prakash Bailkeri (prakashmb) wrote :

Debugged the issue further to see why service chain routes are not published.
Found following
1. Agent doesn't subscribe to "default-domain:admin:l-vn:service-621cc9db-d8f4-4fda-9982-e2012c0e1719-default-domain_admin_si-nat" as it has not handled the previous delete.
2. As per Manish, the delete is not complete due to pending short flow. Due this, vroute-agent crashes with DeleteTimeout backtrace after ~15 mins and setup recovers.

So root cause for the crash and missing service chain route in left-vn is same(pending delete processing of internal routing instance for NAT instancE)

Sachin Bansal (sbansal)
tags: removed: config
Revision history for this message
Manish Singh (manishs) wrote :

Agent crash was because of flow not deleted which was holding vrf.
Flow was not marked for delete even though it was a short flow.

Ganesha HV (ganeshahv)
information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Need to reproduce.

tags: removed: blocker
Revision history for this message
Ganesha HV (ganeshahv) wrote :

Not seeing the issue in latest builds.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.