control-node assertion in IFMapExporter::StateUpdateOnDequeue on deleting logical interfaces

Bug #1430091 reported by Daisuke Nakajima
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.0
Fix Committed
High
Tapan Karwa
R2.1
Fix Committed
High
Tapan Karwa
R2.20
Fix Committed
High
Tapan Karwa
Trunk
Fix Committed
High
Tapan Karwa

Bug Description

The Control-node / Tor-agent crashed and collector got initializing while deleting logical ports.
There are about 4000 Virtual network and 8000 logical ports in a Contrail system.

root@system001:~# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control initializing
contrail-control-nodemgr active
contrail-dns active
contrail-named active

== Contrail Analytics ==
supervisor-analytics: active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector initializing (Discovery:Collector connection down)
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

== Contrail Config ==
supervisor-config: active
contrail-api:0 initializing (Discovery:IfmapServer, Discovery:Collector connection down)
contrail-config-nodemgr active
contrail-device-manager initializing (Discovery:Collector connection down)
contrail-discovery:0 active
contrail-schema active
contrail-svc-monitor active
ifmap active

== Contrail Web UI ==
supervisor-webui: active
contrail-webui active
contrail-webui-middleware active

== Contrail Database ==
supervisor-database: active
contrail-database active
contrail-database-nodemgr active

== Contrail Support Services ==
supervisor-support-service: active
rabbitmq-server active

Revision history for this message
Daisuke Nakajima (dnakajima) wrote :
Revision history for this message
Prakash Bailkeri (prakashmb) wrote :
Download full text (8.6 KiB)

1. core core.contrail-contro.20833.system001.1425942501 has following BT.

#0 0x00007ff6fe578bb9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ff6fe57bfc8 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ff6fe571a76 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007ff6fe571b22 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x000000000045a966 in IFMapExporter::StateUpdateOnDequeue (this=0x29b1fc0, update=update@entry=0x7ff6dd94a8b0, dequeue_set=..., is_delete=<optimized out>)
    at controller/src/ifmap/ifmap_exporter.cc:548
#5 0x0000000000488fd2 in IFMapUpdateSender::ProcessUpdate (this=this@entry=0x29b2010, update=update@entry=0x7ff6dd94a8b0, base_send_set=...) at controller/src/ifmap/ifmap_update_sender.cc:225
#6 0x0000000000489514 in IFMapUpdateSender::Send (this=0x29b2010, imarker=<optimized out>) at controller/src/ifmap/ifmap_update_sender.cc:184
#7 0x0000000000489c1b in IFMapUpdateSender::SendTask::Run (this=0x7ff69bdf59b0) at controller/src/ifmap/ifmap_update_sender.cc:41
#8 0x0000000000a5e490 in TaskImpl::execute (this=0x7ff6f7dbfb40) at controller/src/base/task.cc:232
#9 0x00007ff6ff350b3a in ?? () from /usr/lib/libtbb.so.2
#10 0x00007ff6ff34c816 in ?? () from /usr/lib/libtbb.so.2
#11 0x00007ff6ff34bf4b in ?? () from /usr/lib/libtbb.so.2
#12 0x00007ff6ff3480ff in ?? () from /usr/lib/libtbb.so.2
#13 0x00007ff6ff3482f9 in ?? () from /usr/lib/libtbb.so.2
#14 0x00007ff6ff56c182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007ff6fe63cfbd in clone () from /lib/x86_64-linux-gnu/libc.so.6

(gdb) fr 4
#4 0x000000000045a966 in IFMapExporter::StateUpdateOnDequeue (this=0x29b1fc0, update=update@entry=0x7ff6dd94a8b0, dequeue_set=..., is_delete=<optimized out>)
    at controller/src/ifmap/ifmap_exporter.cc:548
548 in controller/src/ifmap/ifmap_exporter.cc
(gdb) p db_entry
$10 = (IFMapNode *) 0x7ff6ea755ff0
(gdb) set pagination off
(gdb) p *db_entry
$11 = (IFMapNode) {
  <DBGraphVertex> = {
    <DBEntry> = {
      <DBEntryBase> = {
        _vptr.DBEntryBase = 0xae6bd0 <vtable for IFMapNode+16>,
        chg_list_ = {
          <boost::intrusive::detail::generic_hook<boost::intrusive::get_list_node_algo<void*>, boost::intrusive::member_tag, (boost::intrusive::link_mode_type)1, 0>> = {
            <boost::intrusive::detail::no_default_definer> = {<No data fields>},
            <boost::intrusive::list_node<void*>> = {
              next_ = 0x0,
              prev_ = 0x0
            }, <No data fields>}, <No data fields>},
        tpart_ = 0x29c2ee0,
        state_ = {
          _M_t = {
            _M_impl = {
              <std::allocator<std::_Rb_tree_node<std::pair<int const, DBState*> > >> = {
                <__gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<int const, DBState*> > >> = {<No data fields>}, <No data fields>},
              members of std::_Rb_tree<int, std::pair<int const, DBState*>, std::_Select1st<std::pair<int const, DBState*> >, std::less<int>, std::allocator<std::pair<int const, DBState*> > >::_Rb_tree_impl<std::less<int>, false>:
              _M_key_compare = {
                <std::binary_function<int, int, bool>> = {<No data fields>}, <No dat...

Read more...

Revision history for this message
Prakash Bailkeri (prakashmb) wrote :
Download full text (5.1 KiB)

BT in core.contrail-contro.31697.system001.1425942458

(gdb) bt
#0 0x00007f87939aabb9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f87939adfc8 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f87939a3a76 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f87939a3b22 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000a47f70 in LabelBlock::~LabelBlock (this=0x7f877e9e1040, __in_chrg=<optimized out>) at controller/src/base/label_block.cc:69
#5 0x0000000000450498 in intrusive_ptr_release (block=0x7f877e9e1040) at controller/src/base/label_block.h:119
#6 0x00000000005aae46 in ~intrusive_ptr (this=0x7f877e718658, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:97
#7 ~Edge (this=0x7f877e718650, __in_chrg=<optimized out>) at controller/src/bgp/bgp_attr.h:275
#8 STLDeleteValues<std::vector<EdgeDiscovery::Edge*> > (container=0x7f877e907510) at controller/src/base/util.h:77
#9 EdgeDiscovery::~EdgeDiscovery (this=0x7f877e907510, __in_chrg=<optimized out>) at controller/src/bgp/bgp_attr.cc:310
#10 0x00000000005afe48 in intrusive_ptr_release (ediscovery=0x7f877e907510) at controller/src/bgp/bgp_attr.h:299
#11 ~intrusive_ptr (this=0x7f877e8fdd50, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:97
#12 BgpAttr::~BgpAttr (this=0x7f877e8fdca0, __in_chrg=<optimized out>) at controller/src/bgp/bgp_attr.h:499
#13 0x00000000005aff39 in BgpAttr::~BgpAttr (this=0x7f877e8fdca0, __in_chrg=<optimized out>) at controller/src/bgp/bgp_attr.h:499
#14 0x00000000005d5b7b in intrusive_ptr_release (cattrp=0x7f877e8fdca0) at controller/src/bgp/bgp_attr.h:603
#15 ~intrusive_ptr (this=0x7f877e8eab90, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:97
#16 ~BgpPath (this=0x7f877e8eab60, __in_chrg=<optimized out>) at controller/src/bgp/bgp_path.h:117
#17 ~BgpSecondaryPath (this=0x7f877e8eab60, __in_chrg=<optimized out>) at controller/src/bgp/bgp_path.h:146
#18 BgpSecondaryPath::~BgpSecondaryPath (this=0x7f877e8eab60, __in_chrg=<optimized out>) at controller/src/bgp/bgp_path.h:147
#19 0x0000000000660565 in BgpRoute::RemoveSecondaryPath (this=this@entry=0x7f877e8c56d0, src_rt=src_rt@entry=0x7f877e8a4260, src=BgpPath::Local, peer=peer@entry=0x0, path_id=path_id@entry=0) at controller/src/bgp/bgp_route.cc:224
#20 0x00000000007a245f in RoutePathReplicator::DeleteSecondaryPath (this=this@entry=0x1c47fb0, table=table@entry=0x7f87802e6b30, rt=rt@entry=0x7f877e8a4260, rtinfo=...) at controller/src/bgp/routing-instance/routepath_replicator.cc:549
#21 0x00000000007a3a7c in RoutePathReplicator::DBStateSync (this=0x1c47fb0, table=0x7f87802e6b30, rt=0x7f877e8a4260, id=1, dbstate=0x7f877e63cc40, current=...) at controller/src/bgp/routing-instance/routepath_replicator.cc:326
#22 0x00000000007a5b12 in RoutePathReplicator::BgpTableListener (this=0x1c47fb0, root=<optimized out>, entry=0x7f877e8a4260) at controller/src/bgp/routing-instance/routepath_replicator.cc:413
#23 0x0000000000a03192 in operator() (a1=0x7f877e8a4260, a0=0x7f8780dc6680, this=0x7f878c621ac0) at /usr/include/boost/function/function_template.hpp:767
#24 RunNotify (entry=0x7f877e8a4260, tpar...

Read more...

Revision history for this message
Prakash Bailkeri (prakashmb) wrote :
Download full text (3.2 KiB)

For core.contrail-contro.31697.system001.1425942458

(gdb) p secondary_table
$23 = (ErmVpnTable *) 0x1c62790
(gdb) p secondary_table->name_
$24 = "bgp.ermvpn.0"
(gdb) p table->name
Cannot take address of method name.
(gdb) p table->name_
$25 = "default-domain:demo:net2062:net2062.ermvpn.0"
(gdb) p secondary_table->name_
$26 = "bgp.ermvpn.0"
(gdb) p *rt
$27 = (ErmVpnRoute) {
  <BgpRoute> = {
    <Route> = {
      <DBEntry> = {
        <DBEntryBase> = {
          _vptr.DBEntryBase = 0xa89370 <vtable for ErmVpnRoute+16>,
          chg_list_ = <boost::intrusive_hook> next = 0x0 prev = 0x0,
          tpart_ = 0x7f8780dc6680,
          state_ = std::map with 1 elements = {
            [1] = 0x7f877e63cc40
          },
          flags = 3 '\003',
          onremoveq_ = {
            <tbb::internal::atomic_impl<bool>> = {
              my_storage = {
                my_value = false
              }
            }, <No data fields>},
          last_change_at_ = 1425942458412059
        },
        members of DBEntry:
        node_ = <boost::intrusive_hook> parent = 0x7f8780dc6708 left = 0x0 right = 0x0
      },
      members of Route:
      path_ = boost::intrusive::list<Path> with 0 elements
    }, <No data fields>},
  members of ErmVpnRoute:
  prefix_ = {
    type_ = 1 '\001',
    rd_ = {
      static kSize = 8,
      static kZeroRd = {
        static kSize = 8,
        static kZeroRd = <same as static member of an already seen type>,
        data_ = "\000\000\000\000\000\000\000"
      },
      data_ = "\000\000\000\000\000\000\000"
    },
    router_id_ = {
      addr_ = {
        s_addr = 20075530
      }
    },
    group_ = {
      addr_ = {
        s_addr = 4294967295
      }
    },
    source_ = {
      addr_ = {
        s_addr = 0
      }
    }
  }
}
(gdb) p *rt_secondary
$28 = (ErmVpnRoute) {
  <BgpRoute> = {
    <Route> = {
      <DBEntry> = {
        <DBEntryBase> = {
          _vptr.DBEntryBase = 0xa89370 <vtable for ErmVpnRoute+16>,
          chg_list_ = <boost::intrusive_hook> next = 0x0 prev = 0x0,
          tpart_ = 0x1c62ad0,
          state_ = std::map with 1 elements = {
            [0] = 0x7f877e8f6ed0
          },
          flags = 0 '\000',
          onremoveq_ = {
            <tbb::internal::atomic_impl<bool>> = {
              my_storage = {
                my_value = false
              }
            }, <No data fields>},
          last_change_at_ = 1425942458412081
        },
        members of DBEntry:
        node_ = <boost::intrusive_hook> parent = 0x7f877a860030 left = 0x7f878a88a290 right = 0x7f878a8397f0
      },
      members of Route:
      path_ = boost::intrusive::list<Path> with 0 elements
    }, <No data fields>},
  members of ErmVpnRoute:
  prefix_ = {
    type_ = 1 '\001',
    rd_ = {
      static kSize = 8,
      static kZeroRd = {
        static kSize = 8,
        static kZeroRd = <same as static member of an already seen type>,
        data_ = "\000\000\000\000\000\000\000"
      },
      data_ = "\000\001\nT2\003\b\016"
    },
    router_id_ = {
      addr_ = {
        s_addr = 20075530
      }
    },
    group_ = {
      addr_ = {
        s_addr = 4294967295
      }
    },
    source_ = {
      addr_ = {
   ...

Read more...

information type: Proprietary → Public
no longer affects: juniperopenstack/r2.20
Revision history for this message
Nischal Sheth (nsheth) wrote :

@Prakash

Could you add a brief description of your findings for both cores?
Looks like they are unrelated - one is in the IFMapExporter and
the other is in the multicast code.

Revision history for this message
Nischal Sheth (nsheth) wrote :

Will use this bug to track assertion in IFMapExporter::StateUpdateOnDequeue
and open a new bug for the other assertion in ~LabelBlock.

summary: - [2.1-Build 39] control-node crashed and collector got initializing while
+ control-node assertion in IFMapExporter::StateUpdateOnDequeue on
deleting logical interfaces
Revision history for this message
Vedamurthy Joshi (vedujoshi) wrote :

Seen again on build 44 on tor-scale testbed , Core is in
http://10.204.216.50/Docs/bugs/1431297/core.contrail-contro.14084.nodei36.1427918555.gz

#0 0x00007f1277f19bb9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f1277f1cfc8 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f1277f12a76 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f1277f12b22 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x000000000045a996 in IFMapExporter::StateUpdateOnDequeue (this=0x16a8d90, update=update@entry=0x7f11e8bdd2f0, dequeue_set=...,
    is_delete=<optimized out>) at controller/src/ifmap/ifmap_exporter.cc:548
#5 0x0000000000489002 in IFMapUpdateSender::ProcessUpdate (this=this@entry=0x16aa090, update=update@entry=0x7f11e8bdd2f0, base_send_set=...)
    at controller/src/ifmap/ifmap_update_sender.cc:225
#6 0x0000000000489544 in IFMapUpdateSender::Send (this=0x16aa090, imarker=<optimized out>) at controller/src/ifmap/ifmap_update_sender.cc:184
#7 0x0000000000489c4b in IFMapUpdateSender::SendTask::Run (this=0x7f121db16fe0) at controller/src/ifmap/ifmap_update_sender.cc:41
#8 0x0000000000a5e930 in TaskImpl::execute (this=0x7f12716df940) at controller/src/base/task.cc:243

tags: added: bms contrail-control scale
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

To be fixed in 2.20 onwards

Revision history for this message
Tapan Karwa (tkarwa) wrote :

The sequence is:
1. Exporter gets an add or change. Exporter processes it and adds it to the queue as an 'update'.
2. The queue 'update' has not been processed yet and exporter gets a delete and does the following:

    } else if (state != NULL) {
        // Link deletes must preceed node deletes.
        state->ClearValid(); <<<<<< has been cleared
        if (!state->HasDependents()) { <<<<<< this is still true and so we skip the EnqueueDelete
            // enqueue delete.
            EnqueueDelete(node, state);
            if (state->update_list().empty()) {
                entry->ClearState(table, tinfo->id());
                delete state;
            }
        }
    }

We call ClearValid. But, HasDependents() returns true. So, we skip the enqueuing of the delete.
Now, the following code is executed on the queue side.

void IFMapUpdateSender::ProcessUpdate(IFMapUpdate *update,
                                      const BitSet &base_send_set) {
    LogAndCountSentUpdate(update, base_send_set);

    // Append the contents of the update-node to the message.
    message_->EncodeUpdate(update);
    // Clean up the node if everybody has seen it.
    update->AdvertiseReset(base_send_set);
    if (update->advertise().empty()) {
        queue_->Dequeue(update);
    }
    // Update may be freed.
    server_->exporter()->StateUpdateOnDequeue(update, base_send_set, << update points to an 'update' and not a 'delete'
                                              update->IsDelete());
}

Then, in StateUpdateOnDequeue()

        if (state->update_list().empty() && state->IsInvalid()) { <<<<<< both are true
            assert(state->advertised().empty());

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/9711
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/9712
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/9711
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/9712
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/9711
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/9712
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.1

Review in progress for https://review.opencontrail.org/9793
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.0

Review in progress for https://review.opencontrail.org/9794
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/9794
Committed: http://github.org/Juniper/contrail-controller/commit/472be591cfa498a1c6462434caacb0da90cdeef6
Submitter: Zuul
Branch: R2.0

commit 472be591cfa498a1c6462434caacb0da90cdeef6
Author: Tapan Karwa <email address hidden>
Date: Wed Apr 29 16:13:59 2015 -0700

Fix for assert in IFMapExporter::StateUpdateOnDequeue

The following sequences causes the problem. Fixing it:
The sequence is:
1. Exporter gets an add or change. Exporter processes it and adds it to the
queue as an 'update'.
2. The queue 'update' has not been processed yet and exporter gets a delete for
the node.
3. The queue 'update' (not 'delete') is processed leading to the assert being
true.

Change-Id: I38b6d1dbedaaafe8d631276d5c97b1485e6a2f42
Closes-Bug: 1430091

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/9712
Committed: http://github.org/Juniper/contrail-controller/commit/93e1e4ee14fe1b9058791bcd5c262db20e536e6d
Submitter: Zuul
Branch: R2.20

commit 93e1e4ee14fe1b9058791bcd5c262db20e536e6d
Author: Tapan Karwa <email address hidden>
Date: Wed Apr 29 16:13:59 2015 -0700

Fix for assert in IFMapExporter::StateUpdateOnDequeue

The following sequences causes the problem. Fixing it:
The sequence is:
1. Exporter gets an add or change. Exporter processes it and adds it to the
queue as an 'update'.
2. The queue 'update' has not been processed yet and exporter gets a delete for
the node.
3. The queue 'update' (not 'delete') is processed leading to the assert being
true.

Change-Id: I38b6d1dbedaaafe8d631276d5c97b1485e6a2f42
Closes-Bug: 1430091

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/9711
Committed: http://github.org/Juniper/contrail-controller/commit/3753c17235dd203bddcda7077e6bfbc780318e6e
Submitter: Zuul
Branch: master

commit 3753c17235dd203bddcda7077e6bfbc780318e6e
Author: Tapan Karwa <email address hidden>
Date: Wed Apr 29 16:13:59 2015 -0700

Fix for assert in IFMapExporter::StateUpdateOnDequeue

The following sequences causes the problem. Fixing it:
The sequence is:
1. Exporter gets an add or change. Exporter processes it and adds it to the
queue as an 'update'.
2. The queue 'update' has not been processed yet and exporter gets a delete for
the node.
3. The queue 'update' (not 'delete') is processed leading to the assert being
true.

Change-Id: I38b6d1dbedaaafe8d631276d5c97b1485e6a2f42
Closes-Bug: 1430091

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/9793
Committed: http://github.org/Juniper/contrail-controller/commit/444010ea3c3183af00f9726795a446481949c5c2
Submitter: Zuul
Branch: R2.1

commit 444010ea3c3183af00f9726795a446481949c5c2
Author: Tapan Karwa <email address hidden>
Date: Wed Apr 29 16:13:59 2015 -0700

Fix for assert in IFMapExporter::StateUpdateOnDequeue

The following sequences causes the problem. Fixing it:
The sequence is:
1. Exporter gets an add or change. Exporter processes it and adds it to the
queue as an 'update'.
2. The queue 'update' has not been processed yet and exporter gets a delete for
the node.
3. The queue 'update' (not 'delete') is processed leading to the assert being
true.

Change-Id: I38b6d1dbedaaafe8d631276d5c97b1485e6a2f42
Closes-Bug: 1430091

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.