control-node crash at state->HasDependency(), IFMapExporter::LinkTableExport on tor scale setup

Bug #1454380 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R1.1
Fix Committed
High
Tapan Karwa
R2.0
Fix Committed
High
Tapan Karwa
R2.1
Fix Committed
High
Tapan Karwa
R2.20
Fix Committed
High
Tapan Karwa
Trunk
Fix Committed
High
Tapan Karwa

Bug Description

R2.20 13 Ubuntu 14.04 Juno multi-node setup

env.roledefs = {
    'all': [host2, host3, host4, host5, host6],
    'cfgm': [host2, host3],
    'openstack': [host2],
    'webui': [host3],
    'control': [host3, host4],
    'compute': [host5, host6],
    'collector': [host2, host3],
    'database': [host2, host3, host4],
    'toragent': [host6],
    'tsn': [host6],
    'build': [host_build],
}

env.hostnames = {
    'all': ['nodei34', 'nodei35', 'nodei36', 'nodei37', 'nodei38']
}

Below crash was seen while re-adding lifs, vmis on few ToRs

Core will be in http://10.204.216.50/Docs/bugs/#

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-control'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f693e69acc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f693e69acc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f693e69e0d8 in __GI_abort () at abort.c:89
#2 0x00007f693e693b86 in __assert_fail_base (fmt=0x7f693e7e4830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0xadebfd "state->HasDependency()", file=file@entry=0xadec78 "controller/src/ifmap/ifmap_exporter.cc", line=line@entry=449,
    function=function@entry=0xadf140 "void IFMapExporter::LinkTableExport(DBTablePartBase*, DBEntryBase*)") at assert.c:92
#3 0x00007f693e693c32 in __GI___assert_fail (assertion=0xadebfd "state->HasDependency()", file=0xadec78 "controller/src/ifmap/ifmap_exporter.cc", line=449,
    function=0xadf140 "void IFMapExporter::LinkTableExport(DBTablePartBase*, DBEntryBase*)") at assert.c:101
#4 0x000000000045ee71 in ?? ()
#5 0x0000000000a4868a in ?? ()
#6 0x0000000000a4ab78 in ?? ()
#7 0x0000000000a471ad in ?? ()
#8 0x0000000000ab2ad0 in ?? ()
#9 0x00007f693f471b3a in ?? () from /usr/lib/libtbb.so.2
#10 0x00007f693f46d816 in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f693f46cf4b in ?? () from /usr/lib/libtbb.so.2
#12 0x00007f693f4690ff in ?? () from /usr/lib/libtbb.so.2
#13 0x00007f693f4692f9 in ?? () from /usr/lib/libtbb.so.2
#14 0x00007f693f68d182 in start_thread (arg=0x7f6934f0a700) at pthread_create.c:312
#15 0x00007f693e75e47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) quit
root@nodei35:/var/crashes#

Revision history for this message
Tapan Karwa (tkarwa) wrote :

Can you please confirm the build info etc? Or maybe give me the path to the binary that I should use?

I tried the following binaries and none of them worked on a server running 14.04.

/github-build/R2.20/13/ubuntu-14-04/juno/store/sandbox/build/production/control-node/contrail-control

/github-build/R2.20/13/ubuntu-14-04/icehouse/store/sandbox/build/production/control-node/contrail-control
(just in case it was not juno)

/github-build/R2.20/13/ubuntu-14-04/juno/store/binaries/contrail-control

root@a1s28:~# more /etc/issue
Ubuntu 14.04 LTS \n \l

Revision history for this message
Vedamurthy Joshi (vedujoshi) wrote :

Like i mentioned in the bug, Its 2.20 Build 13
I used the std /usr/bin/contrail-control to get the backtrace. You can do the same on nodei35 itself?

Revision history for this message
Tapan Karwa (tkarwa) wrote :
Download full text (3.5 KiB)

This core seems to be a simple case of a link getting revived. Consider a link add, followed by delete, followed by another add.

When we delete the link, we will unlink it from its nodes, remove it from the graph and mark it deleted.
When the exporter gets the delete, he will call
 state->RemoveDependency();
 state->ClearValid();
 EnqueueDelete(link, state);

Now, consider that before the update_list is drained, we get a second link add. So, state is non-NULL and state->update_list() is not empty. But, the dependencies are gone and valid has been cleared.

The ifmap_server_table code will do this when it gets the second add:

54 string link_name = LinkKey(metadata, left, right); << will create the name
55 IFMapLink *link = FindLink(link_name); << will find it since its still in the link table
56 if (link) {
57 assert(link->IsDeleted());
58 link->ClearDelete(); << revive the link
59 link->set_last_change_at_to_now();
60 partition->Change(link);
61 }

Note, at this point, state exists. But, the state's dependencies are gone.

Now, when the exporter gets this add,

 IFMapLinkState *state = static_cast<IFMapLinkState *>(entry_state);
 ….
 if (state == NULL) {
           state = new IFMapLinkState(link);
           entry->SetState(table, tinfo->id(), state);
           s_left = NodeStateLocate(link->left());
           s_right = NodeStateLocate(link->right());
           add_link = true;
       } else {
           assert(state->HasDependency()); <<< asserts here….
           s_left = state->left();
           s_right = state->right();
       }

The reason we are seeing this problem now is because we recently changed the code in IFMapLinkTable::AddLink() to fix another issue and that fix is invalidating the assert in the exporter.

Consider that the second add. We won't find glink below since its not in the graph.

346 IFMapLink *glink =
347 static_cast<IFMapLink *>(graph_->GetEdge(first, second));
348 if (glink == NULL) {
349 DBGraph::Edge edge = graph_->Link(first, second);
350 LinkNodeAdd(edge, first, second, data->metadata,
351 key->id_seq_num, data->origin);
352 }

Then, in the older code, we would call FindLink() with the new edge. FindLink() will not find it since the edge is a new edge. It will go do the 'else'.

57 IFMapLink *link = FindLink(edge); <<< earlier we would use the edge to do the lookup
56 if (link) { <<< we would NOT find it since the link has been removed from the graph
57 assert(link->IsDeleted());
58 link->ClearDelete();
59 link->set_last_change_at_to_now();
60 partition->Change(link);
61 } else {
62 link = new IFMapLink(link_name); <<< earlier we would do this
63 partition->Add(link);
64 }

In the new code, FindLink() will find the link since its still in the partition and we are looking up by name. We will go do the 'if' which will finally end up with the exporter receiving a change and asserting.

54 ...

Read more...

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/10564
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.1

Review in progress for https://review.opencontrail.org/10574
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R1.10

Review in progress for https://review.opencontrail.org/10575
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/10576
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.0

Review in progress for https://review.opencontrail.org/10627
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10564
Committed: http://github.org/Juniper/contrail-controller/commit/b8df6fe70d498b9603ad0964530bc77d02eed41d
Submitter: Zuul
Branch: R2.20

commit b8df6fe70d498b9603ad0964530bc77d02eed41d
Author: Tapan Karwa <email address hidden>
Date: Tue May 19 11:05:34 2015 -0700

Fix exporter code to handle delete-link followed by add-link.

We recently changed the code in IFMapLinkTable::AddLink() to fix 1426175 and
that fix is invalidating an assert in the exporter. In the case where we get a
delete-link followed by add-link, before the link is completely cleaned up, the
new code in IFMapLinkTable::AddLink() will find the older link and send a
change on it. This will result in the exporter asserting. Fix this by checking
if the link is being revived and calling appropriate asserts for the revival
case and the change case. Add test-case and fix test case that fails.

Change-Id: Ib594d73847c80b3e3d3f3650565b785d8aaf7f25
Closes-Bug: 1454380

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/10576
Committed: http://github.org/Juniper/contrail-controller/commit/86c14bf81ede41633715af703de9f97abb40ca14
Submitter: Zuul
Branch: master

commit 86c14bf81ede41633715af703de9f97abb40ca14
Author: Tapan Karwa <email address hidden>
Date: Tue May 19 11:05:34 2015 -0700

Fix exporter code to handle delete-link followed by add-link.

We recently changed the code in IFMapLinkTable::AddLink() to fix 1426175 and
that fix is invalidating an assert in the exporter. In the case where we get a
delete-link followed by add-link, before the link is completely cleaned up, the
new code in IFMapLinkTable::AddLink() will find the older link and send a
change on it. This will result in the exporter asserting. Fix this by checking
if the link is being revived and calling appropriate asserts for the revival
case and the change case. Add test-case and fix test case that fails.

Change-Id: Ib594d73847c80b3e3d3f3650565b785d8aaf7f25
Closes-Bug: 1454380

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/10627
Committed: http://github.org/Juniper/contrail-controller/commit/81102c705d656c95bd7afa3edffb5ef3aef3f4cd
Submitter: Zuul
Branch: R2.0

commit 81102c705d656c95bd7afa3edffb5ef3aef3f4cd
Author: Tapan Karwa <email address hidden>
Date: Tue May 19 11:05:34 2015 -0700

Fix exporter code to handle delete-link followed by add-link.

We recently changed the code in IFMapLinkTable::AddLink() to fix 1426175 and
that fix is invalidating an assert in the exporter. In the case where we get a
delete-link followed by add-link, before the link is completely cleaned up, the
new code in IFMapLinkTable::AddLink() will find the older link and send a
change on it. This will result in the exporter asserting. Fix this by checking
if the link is being revived and calling appropriate asserts for the revival
case and the change case. Add test-case and fix test case that fails.

Change-Id: Ib594d73847c80b3e3d3f3650565b785d8aaf7f25
Closes-Bug: 1454380

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/10574
Committed: http://github.org/Juniper/contrail-controller/commit/be520461dec4d4d0dc999d651bae189d73fc9cde
Submitter: Zuul
Branch: R2.1

commit be520461dec4d4d0dc999d651bae189d73fc9cde
Author: Tapan Karwa <email address hidden>
Date: Tue May 19 11:05:34 2015 -0700

Fix exporter code to handle delete-link followed by add-link.

We recently changed the code in IFMapLinkTable::AddLink() to fix 1426175 and
that fix is invalidating an assert in the exporter. In the case where we get a
delete-link followed by add-link, before the link is completely cleaned up, the
new code in IFMapLinkTable::AddLink() will find the older link and send a
change on it. This will result in the exporter asserting. Fix this by checking
if the link is being revived and calling appropriate asserts for the revival
case and the change case. Add test-case and fix test case that fails.

Change-Id: Ib594d73847c80b3e3d3f3650565b785d8aaf7f25
Closes-Bug: 1454380

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/10575
Committed: http://github.org/Juniper/contrail-controller/commit/d138ea3c67e98d08113f1af9682a3d2ccda0dfd0
Submitter: Zuul
Branch: R1.10

commit d138ea3c67e98d08113f1af9682a3d2ccda0dfd0
Author: Tapan Karwa <email address hidden>
Date: Tue May 19 11:05:34 2015 -0700

Fix exporter code to handle delete-link followed by add-link.

We recently changed the code in IFMapLinkTable::AddLink() to fix 1426175 and
that fix is invalidating an assert in the exporter. In the case where we get a
delete-link followed by add-link, before the link is completely cleaned up, the
new code in IFMapLinkTable::AddLink() will find the older link and send a
change on it. This will result in the exporter asserting. Fix this by checking
if the link is being revived and calling appropriate asserts for the revival
case and the change case. Add test-case and fix test case that fails.

Change-Id: Ib594d73847c80b3e3d3f3650565b785d8aaf7f25
Closes-Bug: 1454380

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.