vrouter-agent crash on tor-agent-compute node at DBEntryBase::SetState

Bug #1418192 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
High
Hari Prasad Killi
R2.1
Fix Committed
High
Hari Prasad Killi

Bug Description

R2.1 Build 16 Ubuntu 14.04 Multi-node setup

Below vrouter-agent crash was seen multiple times on this setup

Crash and logs will be in http://10.204.216.50/Docs/bugs/#

root@nodeg11:/var/crashes# ls -ltr
total 152132
-rw------- 1 root root 23863296 Jan 30 08:36 core.contrail-tor-ag.12716.nodeg11.1422587193
-rw------- 1 root root 23764992 Jan 30 08:40 core.contrail-tor-ag.13930.nodeg11.1422587446
-rw------- 1 root root 23650304 Jan 30 10:20 core.contrail-tor-ag.1273.nodeg11.1422593456
-rw------- 1 root root 24371200 Jan 30 11:21 core.contrail-tor-ag.1447.nodeg11.1422597108
-rw------- 1 root root 23552000 Jan 30 11:23 core.contrail-tor-ag.12688.nodeg11.1422597180
-rw------- 1 root root 23662592 Jan 30 11:46 core.contrail-tor-ag.17450.nodeg11.1422598590
-rw------- 1 root root 23764992 Jan 30 11:54 core.contrail-tor-ag.17678.nodeg11.1422599064
-rw------- 1 root root 23752704 Jan 30 12:02 core.contrail-tor-ag.20463.nodeg11.1422599579
-rw------- 1 root root 23871488 Jan 30 12:10 core.contrail-tor-ag.21159.nodeg11.1422600047
-rw------- 1 root root 25546752 Jan 30 15:09 core.contrail-tor-ag.22438.nodeg11.1422610759
-rw------- 1 root root 26378240 Jan 30 17:10 core.contrail-tor-ag.23971.nodeg11.1422618013
-rw------- 1 root root 25358336 Feb 4 15:06 core.contrail-vroute.16533.nodeg11.1423042577
-rw------- 1 root root 25849856 Feb 4 15:16 core.contrail-tor-ag.16535.nodeg11.1423043215
-rw------- 1 root root 23605248 Feb 4 15:17 core.contrail-tor-ag.23591.nodeg11.1423043226
-rw------- 1 root root 25673728 Feb 4 15:17 core.contrail-tor-ag.16534.nodeg11.1423043273
-rw------- 1 root root 23674880 Feb 4 15:28 core.contrail-tor-ag.23692.nodeg11.1423043880
-rw------- 1 root root 25251840 Feb 4 16:29 core.contrail-vroute.21302.nodeg11.1423047557
-rw------- 1 root root 25178112 Feb 4 16:39 core.contrail-vroute.5160.nodeg11.1423048148
-rw------- 1 root root 25161728 Feb 5 00:14 core.contrail-vroute.20381.nodeg11.1423075450
-rw------- 1 root root 24772608 Feb 5 00:31 core.contrail-vroute.632.nodeg11.1423076486
-rw------- 1 root root 23900160 Feb 5 00:44 core.contrail-tor-ag.2313.nodeg11.1423077287
root@nodeg11:/var/crashes#

#0 0x00007f1beff90bb9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007f1beff90bb9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f1beff93fc8 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f1beff89a76 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f1beff89b22 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000e3df0f in DBEntryBase::SetState(DBTableBase*, int, DBState*) ()
#5 0x00000000008f8d0e in MulticastHandler::HandleTorRoute(DBTablePartBase*, DBEntryBase*) ()
#6 0x0000000000e4a362 in DBTableBase::RunNotify(DBTablePartBase*, DBEntryBase*) ()
#7 0x0000000000e4c1b8 in DBTablePartBase::RunNotify() ()
#8 0x0000000000e4901d in DBPartition::QueueRunner::Run() ()
#9 0x0000000000f30e20 in TaskImpl::execute() ()
#10 0x00007f1bf0b60b3a in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f1bf0b5c816 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007f1bf0b5bf4b in ?? () from /usr/lib/libtbb.so.2
#13 0x00007f1bf0b580ff in ?? () from /usr/lib/libtbb.so.2
#14 0x00007f1bf0b582f9 in ?? () from /usr/lib/libtbb.so.2
#15 0x00007f1bf0d7c182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#16 0x00007f1bf0054fbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)

tags: added: blocker
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/7141
Committed: http://github.org/Juniper/contrail-controller/commit/8e7e130c0d8336a43bd4d74483b0211664027d6c
Submitter: Zuul
Branch: R2.1

commit 8e7e130c0d8336a43bd4d74483b0211664027d6c
Author: manishsingh <email address hidden>
Date: Fri Feb 6 16:38:10 2015 +0530

Problem: Agent crash on removing one TOR from setup of two TOR.
Fix: Multicast handler was deleting the DB state and then adds it back
on deleted physical_device_vn entry. Fix was to push the DB state add
code nder non delete check.
Closes-bug: #1418192

Change-Id: I9f5b32f409bd88b1058f1dfdc8f454aff58f50a6

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :
Download full text (7.1 KiB)

Reviewed: https://review.opencontrail.org/7463
Committed: http://github.org/Juniper/contrail-controller/commit/a9f90b4857b78c121ffdc2592efb23b4f78ef448
Submitter: Zuul
Branch: master

commit a9f90b4857b78c121ffdc2592efb23b4f78ef448
Author: Praveen K V <email address hidden>
Date: Thu Jan 29 00:43:29 2015 +0530

Merge following commits from R2.1 branch.

Dont allocate nh-index 0

nh-index 0 is reserved by vrouter. So, pre-allocate the first index so that nh
added by agent use index 1 and above

Change-Id: Ieed6d8666fc45399a51b280f1aae075425cee52c
(cherry picked from commit 3f2005b4c7be556cfdec9929b33b330ecc6ade3a)

Problem: TOR path not deleted on VRF delete.
When VRF is deleted in TSN mode, notification is received for physical_device_vn
entry.This entry contain vn_entry which can in turn can be used to extract VRF.
Multicast handler on adding the entry puts a state where it adds the VRF
entry name so that it can be used at the time of deletion because when VRF is
deleted vn_entry from physical_device_vn may not contain any VRF.
Issue was that search for multicast object was being done using vrf from VN
at time of delete. This should be done using the vrf_name stored in state.
Fixed the same here. Along with this fix also added a check to verify VRF is
in deleted state when IsTorDeleted being calculated.
Closes-bug: #1416808

Change-Id: Ica526b736a07723db40c6bb31a2f91f756cb1373
(cherry picked from commit 84347759dde36c758f37f4d12a7540de97786dbd)

Remove forwarding mode setting using IPAM gateway attributes.

Though it allows 0.0.0.0 gateway but it is not used for
forwarding mode.
Partial-Bug: #1415014

Change-Id: I72c2a3faf3ededfdd05c677d06ce654b8798618f
(cherry picked from commit 78bae98b52173a754669168478cd2c4151955f8c)

DNS disable knob : when DNS server is given as 0.0.0.0 in the dns_servers
field for the subnet or in the DHCP options list, vrouter doesnt provide
DNS proxy function any more. Do not send DNS server or DNS domain options
in DHCP response in such a case.

When there is any DNS server specified in the DHCP options list, do not
add the vrouter DNS server address any more.

If the DNS server in IPAM subnet config comes unspecified, set it to be
the same as the GW address.

Change-Id: If3e66bcfd7c137064de1471cc515e4742ea14b73
Fixes-Bug: 1416711
(cherry picked from commit 2a0d85dc4cc521e8c417182eedebe14dc9b4848b)

Conflicts:

 src/vnsw/agent/oper/vn.cc

Fix the VLAN-ID comparison in logical interface

Added checks for following,

1. VLAN-ID is not reused on multiple logical-ports on a single
physical-port
2. VLAN-ID must be set during creation of logical-interface
3. Do not allow change of vlan-id after logical-interface is
created
4. Rename interface_object to li_object to avoid confusion in
_check_interface_name

Closes-Bug: #1416323
Change-Id: I444f83d8cf8d34a6848026fe3fd0b8e23a7798f4
(cherry picked from commit 0a778eb1dab3e94c94cf0a37f36ad30f593c4e3b)

Adding 10Sec delay for ovsdb session to start txn

Issue:
------
On reconnect TOR Agent use to start the audit process immediately
causing all the previouslt installed entries to get delete as
stale, then re-added on config availabilty

Fix:
----
For R2.1, on connection ...

Read more...

Changed in juniperopenstack:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.