[SYMC] VM not reachable for some time followed for the corresponding vrouter crash

Bug #1538789 reported by Varun Lodaya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Hari Prasad Killi
R2.21.x
Fix Committed
High
Hari Prasad Killi
R2.22.x
Fix Committed
High
Hari Prasad Killi
Trunk
Fix Committed
High
Hari Prasad Killi
OpenContrail
Fix Committed
High
Hari Prasad Killi

Bug Description

We are seeing an issue where occasionally vm is not reachable for a long time after creation. Further inspection show that while it is not reachable, the tap interface of that vm is not ACTIVE, shows -1. vrf field on introspect shows empty for the tap interface.
Within 15-20 mins of the vm launch, the vrouter where the vm landed crashes with the following backtrace:
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffc8adba000

Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal 6, Aborted.
#0 0x00007fe54c48c0d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007fe54c48c0d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fe54c48f83b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007fe54c484d9e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007fe54c484e42 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00000000009d7753 in VrfEntry::DeleteTimeout() ()
#5 0x000000000105c828 in Timer::TimerTask::Run() ()
#6 0x0000000001051235 in TaskImpl::execute() ()
#7 0x00007fe54d044e52 in ?? () from /usr/lib/libtbb.so.2
#8 0x00007fe54d040c2d in ?? () from /usr/lib/libtbb.so.2
#9 0x00007fe54d0400db in ?? () from /usr/lib/libtbb.so.2
#10 0x00007fe54d03dc1f in ?? () from /usr/lib/libtbb.so.2
#11 0x00007fe54d03de59 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007fe54d25be9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007fe54c54938d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x0000000000000000 in ?? ()
(gdb)

The vrouter-agent-0-stdout.log show the following log:
contrail-vrouter-agent: controller/src/vnsw/agent/oper/vrf.cc:333: bool VrfEntry::DeleteTimeout(): Assertion `0' failed.

Tags: vmware vrouter
summary: - VM routing not setup occasionally followed by the vrouter_agent crash
+ VM not reachable for some time followed for the corresponding vrouter
+ crash
description: updated
tags: added: vrouter
Changed in juniperopenstack:
importance: Undecided → High
Changed in opencontrail:
importance: Undecided → High
Changed in juniperopenstack:
assignee: nobody → Hari Prasad Killi (haripk)
Changed in opencontrail:
assignee: nobody → Hari Prasad Killi (haripk)
Revision history for this message
shajuvk (shajuvk) wrote : Re: VM not reachable for some time followed for the corresponding vrouter crash

same crash seen on vcenter as compute setup build 2706 -kilo

#0 0x00007fede509bcc9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fede509f0d8 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007fede5094b86 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007fede5094c32 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000aadc22 in VrfEntry::DeleteTimeout() ()
#5 0x0000000001175509 in Timer::TimerTask::Run() ()
#6 0x000000000116e59c in TaskImpl::execute() ()
#7 0x00007fede5c6ab3a in ?? () from /usr/lib/libtbb.so.2
#8 0x00007fede5c66816 in ?? () from /usr/lib/libtbb.so.2
#9 0x00007fede5c65f4b in ?? () from /usr/lib/libtbb.so.2
#10 0x00007fede5c620ff in ?? () from /usr/lib/libtbb.so.2
#11 0x00007fede5c622f9 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007fede5e86182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007fede515f47d in clone () from /lib/x86_64-linux-gnu/libc.so.6

tags: added: vmware
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/17087
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
Hari Prasad Killi (haripk) wrote : Re: VM not reachable for some time followed for the corresponding vrouter crash

Root cause identified by Praveen:
ARP module has not de-registered from the inet-uc route table as a result, the clients are not de-registered. The VRF reference is not removed due to this.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/17088
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/17087
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/17123
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/17124
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/17088
Committed: http://github.org/Juniper/contrail-controller/commit/7b95333210d48e7e3e0e38577136e5d8e9e39f01
Submitter: Zuul
Branch: R2.21.x

commit 7b95333210d48e7e3e0e38577136e5d8e9e39f01
Author: Hari <email address hidden>
Date: Wed Feb 10 18:17:51 2016 +0530

Clear state on VRF entry only on the VRF delete notify.

The VRF state set by arp proto is being deleted upon ARP entry
deletion, if VRF is delete marked. If the VRF delete notification
comes later, as the state is already cleared, the rest of cleanup
is skipped. The VRF state should be cleared only on VRF delete
notification.

Change-Id: I8a0ae368e4964f54d7a8a5048b9913944ba2c987
closes-bug: 1538789

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/17087
Committed: http://github.org/Juniper/contrail-controller/commit/9908363189bc02455e279e699227200f109a983d
Submitter: Zuul
Branch: master

commit 9908363189bc02455e279e699227200f109a983d
Author: Hari <email address hidden>
Date: Wed Feb 10 18:15:19 2016 +0530

Clear state on VRF entry only on the VRF delete notify.

The VRF state set by arp proto is being deleted upon ARP entry
deletion, if VRF is delete marked. If the VRF delete notification
comes later, as the state is already cleared, the rest of cleanup
is skipped. The VRF state should be cleared only on VRF delete
notification.

Change-Id: Ie8d43832e2dd936ba87fa314cdb63f1e0a5886f6
closes-bug: 1538789

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/17123
Committed: http://github.org/Juniper/contrail-controller/commit/f33b26d3205cb8ab7b9bfdd6d7027734cc26d0bf
Submitter: Zuul
Branch: R2.22.x

commit f33b26d3205cb8ab7b9bfdd6d7027734cc26d0bf
Author: Hari <email address hidden>
Date: Thu Feb 11 11:02:09 2016 +0530

Clear state on VRF entry only on the VRF delete notify.

The VRF state set by arp proto is being deleted upon ARP entry
deletion, if VRF is delete marked. If the VRF delete notification
comes later, as the state is already cleared, the rest of cleanup
is skipped. The VRF state should be cleared only on VRF delete
notification.

Change-Id: I87f4f14703e50d5e9b790bb5b9268d0f9b29533a
closes-bug: 1538789

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/17124
Committed: http://github.org/Juniper/contrail-controller/commit/23c809dfa9a97d06a42905b0d4fc3d0c12d286d4
Submitter: Zuul
Branch: R2.20

commit 23c809dfa9a97d06a42905b0d4fc3d0c12d286d4
Author: Hari <email address hidden>
Date: Thu Feb 11 11:03:33 2016 +0530

Clear state on VRF entry only on the VRF delete notify.

The VRF state set by arp proto is being deleted upon ARP entry
deletion, if VRF is delete marked. If the VRF delete notification
comes later, as the state is already cleared, the rest of cleanup
is skipped. The VRF state should be cleared only on VRF delete
notification.

Change-Id: Id10a256d27bc90a0b65d84577646f41a9484ad03
closes-bug: 1538789

Changed in opencontrail:
status: New → Fix Committed
summary: - VM not reachable for some time followed for the corresponding vrouter
- crash
+ [SYMC] VM not reachable for some time followed for the corresponding
+ vrouter crash
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.