[CI]Deleted VN's RI still seen in Control node

Bug #1717231 reported by Ganesha HV
This bug report is a duplicate of:  Edit Remove
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.1
New
High
Zhiqiang Cui
Trunk
New
High
Zhiqiang Cui

Bug Description

Testscript
==========
TestSvcRegr.test_svc_in_network_datapath

This is a CI setup issue.

The CI subslave is accesible at 10.84.35.233(single-node setup).

I am trying to commit a service chain testcase to CI_test.

The testcase does the following:
Creates 3 networks(left, right and mgmt).
Launches 2 end VMs, one in left and right network.
Launches a Service VM with three interfaces, one in left, right and mgmt.
Uses this Service VM as a in-network Service Instance.
Creates a Service chain between left and right network using the Service VM.
During cleanup, the order of deletion is:
policy
SI
Service VM.
Service template.
end VMs.
right, left and then the mgmt VN.

During the right VN deletion, we see that the Routing Instance(default-domain:ctest-TestSvcRegr-45272876:ctest-right_vn-63594071:ctest-right_vn-63594071), corresponding to the right VN(default-domain:ctest-TestSvcRegr-45272876:ctest-right_vn-63594071) still exists in the ControlNode.
In the introspect, we observe that the Routing Instance is said to be deleted and the deleted_at timestamp is marked as well.

There is a single table entry(inet6.0) that exists and even that table is marked as deleted.

Because of this entry, the script is failing during cleanup.

Revision history for this message
Pramodh D'Souza (psdsouza) wrote :

There appears to be a listener on the table.

listeners
id
2
BgpConditionListener

CaB you reproduce it again? if your using the same setup that will be deleted after a timeout, please try to do it at the end of your day so we have more time.

Revision history for this message
Ganesha HV (ganeshahv) wrote :
information type: Proprietary → Public
Changed in juniperopenstack:
milestone: none → r4.1.0.0-fcs
tags: added: sanityblocker
Ganesha HV (ganeshahv)
description: updated
Revision history for this message
Ganesha HV (ganeshahv) wrote :
Download full text (3.7 KiB)

The introspect logs show that there is a single inet.0 table that exists. It is also marked as deleted, but with one Listener:

2017-09-15 11:45:09,637 - DEBUG - Requesting: http://10.204.216.153:8083/Snh_ShowRoutingInstanceReq?name=default-domain:ctest-TestSvcRegr-94375503:ctest-right_vn-70324953:ctest-right_vn-70324953
2017-09-15 11:45:09,641 - DEBUG - <ShowRoutingInstanceResp type="sandesh">
  <instances type="list" identifier="1">
    <list type="struct" size="1">
      <ShowRoutingInstance>
        <name type="string" identifier="1" link="ShowRoutingInstanceReq">default-domain:ctest-TestSvcRegr-94375503:ctest-right_vn-70324953:ctest-right_vn-70324953</name>
        <virtual_network type="string" identifier="3">default-domain:ctest-TestSvcRegr-94375503:ctest-right_vn-70324953</virtual_network>
        <vn_index type="i32" identifier="6">0</vn_index>
        <vxlan_id type="i32" identifier="8">0</vxlan_id>
        <import_target type="list" identifier="4">
          <list type="string" size="0"/>
        </import_target>
        <export_target type="list" identifier="5">
          <list type="string" size="0"/>
        </export_target>
        <always_subscribe type="bool" identifier="10">false</always_subscribe>
        <allow_transit type="bool" identifier="12">false</allow_transit>
        <pbb_evpn_enable type="bool" identifier="13">false</pbb_evpn_enable>
        <deleted type="bool" identifier="7">true</deleted>
        <deleted_at type="string" identifier="9">2017-Sep-15 06:13:27.594036</deleted_at>
        <tables type="list" identifier="2">
          <list type="struct" size="1">
            <ShowRoutingInstanceTable>
              <name type="string" identifier="1" link="ShowRouteReq">default-domain:ctest-TestSvcRegr-94375503:ctest-right_vn-70324953:ctest-right_vn-70324953.inet.0</name>
              <deleted type="bool" identifier="13">true</deleted>
              <prefixes type="u64" identifier="3">0</prefixes>
              <paths type="u64" identifier="4">0</paths>
              <primary_paths type="u64" identifier="5">0</primary_paths>
              <secondary_paths type="u64" identifier="6">0</secondary_paths>
              <infeasible_paths type="u64" identifier="7">0</infeasible_paths>
              <stale_paths type="u64" identifier="18">0</stale_paths>
              <llgr_stale_paths type="u64" identifier="19">0</llgr_stale_paths>
              <walk_requests type="u64" identifier="8">3</walk_requests>
              <walk_again_requests type="u64" identifier="16">0</walk_again_requests>
              <actual_walks type="u64" identifier="17">0</actual_walks>
              <walk_completes type="u64" identifier="9">3</walk_completes>
              <walk_cancels type="u64" identifier="10">0</walk_cancels>
              <pending_updates type="u64" identifier="11">0</pending_updates>
              <markers type="u64" identifier="12">0</markers>
              <listeners type="u64" identifier="14">1</listeners>
              <walkers type="u64" identifier="15">1</walkers>
              <membership type="struct" identifier="2">
                <ShowTableMembershipInfo>
                  <requests type="u32" identifier="1">0</req...

Read more...

Revision history for this message
Rudra Rugge (rrugge) wrote :

Please share the setup with Pramodh once you reproduce the issue.

tags: added: sanity
removed: sanityblocker
Revision history for this message
Ganesha HV (ganeshahv) wrote :

The following logs seen in the control logs:

contrail-control.log:2017-09-20 Wed 04:52:35:514.981 IST nodem7 [Thread 47794119300864, Pid 38485]: IFMap [SYS_WARN]: IFMapIdentifierNotFound: Cant find identifier default-domain:ctest-TestSvcHC-21447592:ctest-left_vn-87598583 controller/src/ifmap/ifmap_server_table.cc 294

Revision history for this message
Pramodh D'Souza (psdsouza) wrote :

When the problem is seen, checking the following interospect http://10.84.5.28:8083/service_chaining.xml#Snh_ShowServiceChainReq
causes the control node to crash. Please collect the coredump and BT.

Revision history for this message
Ganesha HV (ganeshahv) wrote :
Download full text (3.3 KiB)

The core is placed at
bhushana@10.204.216.50:/home/bhushana/Documents/technical/bugs/1717231
Use bhu@123 as the password.

bhushana@mayamruga:~/Documents/technical/bugs/1717231$ pwd
/home/bhushana/Documents/technical/bugs/1717231
bhushana@mayamruga:~/Documents/technical/bugs/1717231$ ls -lrt
total 579128
-rwxrwxrwx 1 bhushana bhushana 225251328 Sep 27 12:10 core.contrail-contro.29945.nodei21.1506492734
-rwxrwxrwx 1 bhushana bhushana 367772744 Sep 27 12:11 contrail-control
bhushana@mayamruga:~/Documents/technical/bugs/1717231$

BT
==
(gdb) bt
#0 BgpInstanceConfig::service_chain_info (this=0x0, family=family@entry=Address::INET) at controller/src/bgp/bgp_config.cc:360
#1 0x00000000006a13af in FillServiceChainInfo (family=Address::INET, rtinstance=0x2b660c038d60, info=..., search_expr=..., search_string="") at controller/src/bgp/routing-instance/show_service_chaining.cc:42
#2 BgpShowHandler<ShowServiceChainReq, ShowServiceChainReqIterate, ShowServiceChainResp, ShowServicechainInfo>::CallbackCommon (bsc=bsc@entry=0x7ffd5d5d7720, data=data@entry=0x2b661c009740) at controller/src/bgp/routing-instance/show_service_chaining.cc:78
#3 0x00000000006a275e in BgpShowHandler<ShowServiceChainReq, ShowServiceChainReqIterate, ShowServiceChainResp, ShowServicechainInfo>::Callback (sr=sr@entry=0x2b661400e870, ps=..., stage=stage@entry=0, instNum=instNum@entry=0, data=data@entry=0x2b661c009740)
    at controller/src/bgp/bgp_show_handler.h:172
#4 0x000000000069b1dc in operator()<bool, bool (*)(const Sandesh*, RequestPipeline::PipeSpec, int, int, RequestPipeline::InstData*), boost::_bi::list5<const Sandesh*&, const RequestPipeline::PipeSpec&, int&, int&, RequestPipeline::InstData*&> > (this=<optimized out>,
    f=<optimized out>, a=<synthetic pointer>) at /usr/include/boost/bind/bind.hpp:515
#5 operator()<const Sandesh*, const RequestPipeline::PipeSpec, int, int, RequestPipeline::InstData*> (a5=<synthetic pointer>, a4=<synthetic pointer>, a3=<synthetic pointer>, a2=..., a1=<synthetic pointer>, this=<optimized out>)
    at /usr/include/boost/bind/bind_template.hpp:174
#6 boost::detail::function::function_obj_invoker5<boost::_bi::bind_t<bool, bool (*)(Sandesh const*, RequestPipeline::PipeSpec, int, int, RequestPipeline::InstData*), boost::_bi::list5<boost::arg<1>, boost::arg<2>, boost::arg<3>, boost::arg<4>, boost::arg<5> > >, bool, Sandesh const*, RequestPipeline::PipeSpec const&, int, int, RequestPipeline::InstData*>::invoke (function_obj_ptr=..., a0=0x2b661400e870, a1=..., a2=0, a3=0, a4=0x2b661c009740) at /usr/include/boost/function/function_template.hpp:132
#7 0x0000000000d9a4d3 in operator() (a4=<optimized out>, a3=<optimized out>, a2=<optimized out>, a1=..., a0=<optimized out>, this=<optimized out>) at /usr/include/boost/function/function_template.hpp:767
#8 RequestPipeline::PipeImpl::RunInstance (this=<optimized out>, instNum=<optimized out>) at tools/sandesh/library/cpp/request_pipeline.cc:189
#9 0x0000000000d9baf5 in RequestPipeline::StageWorker::Run (this=0x2b661c06be00) at tools/sandesh/library/cpp/request_pipeline.cc:84
#10 0x00000000006fff37 in TaskImpl::execute (this=0x2b65b1aaaf40) at controller/src/base/task.cc:277
#11 0x00002b65a999cb3a ...

Read more...

Revision history for this message
Pramodh D'Souza (psdsouza) wrote :

Ganesh,
Can you indicate which image you used?

Revision history for this message
Pramodh D'Souza (psdsouza) wrote :

The issue appears to be that the service chain routing instance is not deleted.
This could be a script issue or a config node issue..
would like to see the node in this state to debug.

-------------------------------

$29 = (RoutingInstance *) 0x2b6608082d00
(gdb) p $27->src_->name_
$30 = "default-domain:ctest-TestSvcRegr-94073141:ctest-right_vn-02869489:service-3912637f-a21b-4212-b285-7e51b84ed804-default-domain_ctest-TestSvcRegr-94073141_ctest-si-97604586"
(gdb) p $27->src_->deleter_.px
$31 = (RoutingInstance::DeleteActor *) 0x2b660801f190
(gdb) p *$27->src_->deleter_.px
$32 = (RoutingInstance::DeleteActor) {
  <LifetimeActor> = {
    _vptr.LifetimeActor = 0xf36410 <vtable for RoutingInstance::DeleteActor+16>,
    mutex_ = {
      static is_rw_mutex = false,
      static is_recursive_mutex = false,
      static is_fair_mutex = false,
      impl = {
        __data = {
          __lock = 0,
          __count = 0,
          __owner = 0,
          __nusers = 0,
          __kind = 0,
          __spins = 0,
          __elision = 0,
          __list = {
            __prev = 0x0,
            __next = 0x0
          }
        },
        __size = '\000' <repeats 39 times>,
        __align = 0
---Type <return> to continue, or q <return> to quit---
      }
    },
    manager_ = 0x1fda620,
    deleted_ = (tbb::atomic) false,

Revision history for this message
Pramodh D'Souza (psdsouza) wrote :
Download full text (16.8 KiB)

Checked rabbit trace on the control node, see an IMPLICIT_UPDATE on this routing instance.
This confirms that it is not a control-node issue. its either a script issue or a config node issue.

2017-09-26 23:06:55.811013 ConfigClientRabbitMQMsgTrace: str1 = "Received " oper = "CREATE" str3 = " for object type:" obj_type = "access_control_list" str5 = " with FQ name:" fqname = "default-domain:ctest-TestSvcRegr-94073141:ctest-right_vn-02869489:ctest-right_vn-02869489" str7 = " and uuid:" uuid = "3f2802e0-b618-42b3-896e-2b89516e600d" file = "controller/src/config/config-client-mgr/config_amqp_client.cc" line = 314 ^M
2017-09-26 23:06:56.053407 ConfigClientRabbitMQMsgTrace: str1 = "Received " oper = "CREATE" str3 = " for object type:" obj_type = "routing_instance" str5 = " with FQ name:" fqname = "default-domain:ctest-TestSvcRegr-94073141:ctest-left_vn-90771067:service-3912637f-a21b-4212-b285-7e51b84ed804-default-domain_ctest-TestSvcRegr-94073141_ctest-si-97604586" str7 = " and uuid:" uuid = "c43fb10c-461e-44b0-885e-8858c53380b4" file = "controller/src/config/config-client-mgr/config_amqp_client.cc" line = 314 ^M
2017-09-26 23:06:56.212372 ConfigClientRabbitMQMsgTrace: str1 = "Received " oper = "UPDATE" str3 = " for object type:" obj_type = "routing_instance" str5 = " with FQ name:" fqname = "default-domain:ctest-TestSvcRegr-94073141:ctest-left_vn-90771067:service-3912637f-a21b-4212-b285-7e51b84ed804-default-domain_ctest-TestSvcRegr-94073141_ctest-si-97604586" str7 = " and uuid:" uuid = "c43fb10c-461e-44b0-885e-8858c53380b4" file = "controller/src/config/config-client-mgr/config_amqp_client.cc" line = 314 ^M
2017-09-26 23:06:56.678502 ConfigClientRabbitMQMsgTrace: str1 = "Received " oper = "UPDATE" str3 = " for object type:" obj_type = "routing_instance" str5 = " with FQ name:" fqname = "default-domain:ctest-TestSvcRegr-94073141:ctest-left_vn-90771067:ctest-left_vn-90771067" str7 = " and uuid:" uuid = "bd2e32b2-8402-40a1-8958-ec2c8560aab6" file = "controller/src/config/config-client-mgr/config_amqp_client.cc" line = 314 ^M
2017-09-26 23:06:56.678997 ConfigClientRabbitMQMsgTrace: str1 = "Received " oper = "UPDATE-IMPLICIT" str3 = " for object type:" obj_type = "routing_instance" str5 = " with FQ name:" fqname = "default-domain:ctest-TestSvcRegr-94073141:ctest-left_vn-90771067:service-3912637f-a21b-4212-b285-7e51b84ed804-default-domain_ctest-TestSvcRegr-94073141_ctest-si-97604586" str7 = " and uuid:" uuid = "c43fb10c-461e-44b0-885e-8858c53380b4" file = "controller/src/config/config-client-mgr/config_amqp_client.cc" line = 314 ^M
2017-09-26 23:06:56.822124 ConfigClientRabbitMQMsgTrace: str1 = "Received " oper = "CREATE" str3 = " for object type:" obj_type = "routing_instance" str5 = " with FQ name:" fqname = "default-domain:ctest-TestSvcRegr-94073141:ctest-right_vn-02869489:service-3912637f-a21b-4212-b285-7e51b84ed804-default-domain_ctest-TestSvcRegr-94073141_ctest-si-97604586" str7 = " and uuid:" uuid = "1779da4c-0a1f-460a-bd8a-bbc31755aac6" file = "controller/src/config/config-client-mgr/config_amqp_client.cc" line = 314 ^M
2017-09-26 23:06:56.961507 ConfigClientRabbitMQMsgTrace: str1 = "Received " oper = "UPDATE" str3 = " for object type:" obj_ty...

Revision history for this message
Ganesha HV (ganeshahv) wrote :

Setup
=====
nodei30(10.204.217.142)

Project
=======
ctest-TestSvcRegr-69328648

VN, ctest-right_vn-37692781 info is still present in the control node.
The same info has been deleted in the api-server and agent.
Only the control-node has the info.

Kindly take a look at the setup.

Ganesha HV (ganeshahv)
no longer affects: juniperopenstack/r4.0
Revision history for this message
Zhiqiang Cui (zcui) wrote :

The commit has been merged, but mistakenly use Error BUG id.

Please see the review link:
For R4.1:
https://review.opencontrail.org/#/c/36567/.
For master:
https://review.opencontrail.org/#/c/36532/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.
  • Duplicate of a private bug Remove

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.