Agent crash in virtual void InstanceTaskExecvp::Terminate()

Bug #1527429 reported by amit surana
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Divakar Dharanalakota
R2.21.x
Fix Committed
High
Divakar Dharanalakota
R2.22.x
Fix Committed
High
Divakar Dharanalakota
R3.0
Fix Committed
High
Divakar Dharanalakota
Trunk
Fix Committed
High
Divakar Dharanalakota

Bug Description

contrail version 2.22 b 116

core file is at: http://mayamruga.englab.juniper.net/bugs/1527427/

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f2d2e060cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f2d2e060cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f2d2e0640d8 in __GI_abort () at abort.c:89
#2 0x00007f2d2e059b86 in __assert_fail_base (fmt=0x7f2d2e1aa830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x1056898 "pid_", file=file@entry=0x1056920 "controller/src/vnsw/agent/oper/instance_task.cc",
    line=line@entry=58, function=function@entry=0x1056980 "virtual void InstanceTaskExecvp::Terminate()") at assert.c:92
#3 0x00007f2d2e059c32 in __GI___assert_fail (assertion=0x1056898 "pid_", file=0x1056920 "controller/src/vnsw/agent/oper/instance_task.cc",
    line=58, function=0x1056980 "virtual void InstanceTaskExecvp::Terminate()") at assert.c:101
#4 0x000000000098d59e in InstanceTaskExecvp::Terminate() ()
#5 0x0000000000984331 in InstanceManager::ScheduleNextTask(InstanceTaskQueue*) ()
#6 0x000000000098494e in InstanceManager::Enqueue(InstanceTask*, boost::uuids::uuid const&) ()
#7 0x00000000009850d7 in InstanceManager::StopServiceInstance(ServiceInstance*, InstanceState*) ()
#8 0x0000000000985bc4 in InstanceManager::EventObserver(DBTablePartBase*, DBEntryBase*) ()
#9 0x0000000000f1edda in DBTableBase::RunNotify(DBTablePartBase*, DBEntryBase*) ()
#10 0x0000000000f218e8 in DBTablePartBase::RunNotify() ()
#11 0x0000000000f1d6cd in DBPartition::QueueRunner::Run() ()
#12 0x000000000101e5f0 in TaskImpl::execute() ()
#13 0x00007f2d2ec2fb3a in ?? () from /usr/lib/libtbb.so.2
#14 0x00007f2d2ec2b816 in ?? () from /usr/lib/libtbb.so.2
#15 0x00007f2d2ec2af4b in ?? () from /usr/lib/libtbb.so.2
#16 0x00007f2d2ec270ff in ?? () from /usr/lib/libtbb.so.2
#17 0x00007f2d2ec272f9 in ?? () from /usr/lib/libtbb.so.2
#18 0x00007f2d2ee4b182 in start_thread (arg=0x7f2cebfff700) at pthread_create.c:312
#19 0x00007f2d2e12447d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Tags: vrouter soln
amit surana (asurana-t)
tags: added: vrouter
amit surana (asurana-t)
information type: Proprietary → Public
amit surana (asurana-t)
tags: added: soln
amit surana (asurana-t)
description: updated
amit surana (asurana-t)
no longer affects: juniperopenstack/r3.0
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/17553
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/17557
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/17634
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/17557
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/17743
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/17743
Committed: http://github.org/Juniper/contrail-controller/commit/ac633ca5ccef1dfb69e20ead058e7b5fd0c28ef4
Submitter: Zuul
Branch: R3.0

commit ac633ca5ccef1dfb69e20ead058e7b5fd0c28ef4
Author: Divakar <email address hidden>
Date: Mon Feb 22 21:27:25 2016 +0530

Handle Instance Task's creation errors

When instance manager starts and runs an Instance Task, it sets up a
pipe between parent and child and also adds the parents pipe FD's to
boost asio. Creation of pipe and assigning the pipe fd to boost can
result in errors. These errorrs are not handled in instance manager.
This is resulting in error scenarios where complete instance task is not
setup but instance manager acts on it, which results in crashes.

As a fix, following behaviour is added.
1) If creation of pipe fails, task's "is_running" is set to false and is
not moved out of task_queue. Using a timer, it is attempted two more
times to start it. If task still fails to start after fixed number of
attempts, then task is deleted.

2) If child process successfully starts(and pid is available) but
assigning the fd to boost fails, "is_running" is set to true and
instance manager waits for netns_timeout_ time to delete this task
rather relying on pipe event. In this case, child process very likely
succeeds the execution but as pipe event is no more tracked, max time of
netns_timeout_ time is waited to delete the task.

3) The stale netns in host machines (if agent restarts) are deleted directly rather
relying on task_queue infrastructure to make sure stale tasks are not at
head of task_queue.

Change-Id: I90070a9e9ea740a467ac688b214e7bf1ce706295
partial-bug: #1527429

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/17747
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/17747
Committed: http://github.org/Juniper/contrail-controller/commit/f7403a2b04a770308a1d0de94f49d057d84d4af1
Submitter: Zuul
Branch: R3.0

commit f7403a2b04a770308a1d0de94f49d057d84d4af1
Author: Divakar <email address hidden>
Date: Mon Feb 22 23:35:11 2016 +0530

Tracebuffer Support for Instance Manager

Currently there are no trace messages for Instancemanager. This change
adds the support of traceuffer.

closes-bug: #1527429

Conflicts:
 src/vnsw/agent/oper/instance_manager.cc

Change-Id: Id07b022fd12d85ac6a9b8d644429d9b9c4a76d1c

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/17634
Committed: http://github.org/Juniper/contrail-controller/commit/6bc6a4204f51deedc343a17469f0060679fead86
Submitter: Zuul
Branch: master

commit 6bc6a4204f51deedc343a17469f0060679fead86
Author: Divakar <email address hidden>
Date: Mon Feb 22 21:27:25 2016 +0530

Handle Instance Task's creation errors

When instance manager starts and runs an Instance Task, it sets up a
pipe between parent and child and also adds the parents pipe FD's to
boost asio. Creation of pipe and assigning the pipe fd to boost can
result in errors. These errorrs are not handled in instance manager.
This is resulting in error scenarios where complete instance task is not
setup but instance manager acts on it, which results in crashes.

As a fix, following behaviour is added.
1) If creation of pipe fails, task's "is_running" is set to false and is
not moved out of task_queue. Using a timer, it is attempted two more
times to start it. If task still fails to start after fixed number of
attempts, then task is deleted.

2) If child process successfully starts(and pid is available) but
assigning the fd to boost fails, "is_running" is set to true and
instance manager waits for netns_timeout_ time to delete this task
rather relying on pipe event. In this case, child process very likely
succeeds the execution but as pipe event is no more tracked, max time of
netns_timeout_ time is waited to delete the task.

3) The stale netns in host machines (if agent restarts) are deleted directly rather
relying on task_queue infrastructure to make sure stale tasks are not at
head of task_queue.

Change-Id: I99b783f0e0c9a666f340e779e778f668e843b245
partial-bug: #1527429

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/18001
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/18001
Committed: http://github.org/Juniper/contrail-controller/commit/da6b41192dae4a076a9aa723de2688c9fd3db234
Submitter: Zuul
Branch: master

commit da6b41192dae4a076a9aa723de2688c9fd3db234
Author: Divakar <email address hidden>
Date: Mon Feb 22 23:35:11 2016 +0530

Tracebuffer Support for Instance Manager

Currently there are no trace messages for Instancemanager. This change
adds the support of traceuffer.

closes-bug: #1527429

Change-Id: Id07b022fd12d85ac6a9b8d644429d9b9c4a76d1c

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/18194
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/18195
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/18196
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/18194
Committed: http://github.org/Juniper/contrail-controller/commit/e4b3216c85365d85eb0abec12ba30a86daa847ab
Submitter: Zuul
Branch: R2.22.x

commit e4b3216c85365d85eb0abec12ba30a86daa847ab
Author: Divakar <email address hidden>
Date: Mon Feb 22 21:27:25 2016 +0530

Handle Instance Task's creation errors

When instance manager starts and runs an Instance Task, it sets up a
pipe between parent and child and also adds the parents pipe FD's to
boost asio. Creation of pipe and assigning the pipe fd to boost can
result in errors. These errorrs are not handled in instance manager.
This is resulting in error scenarios where complete instance task is not
setup but instance manager acts on it, which results in crashes.

As a fix, following behaviour is added.
1) If creation of pipe fails, task's "is_running" is set to false and is
not moved out of task_queue. Using a timer, it is attempted two more
times to start it. If task still fails to start after fixed number of
attempts, then task is deleted.

2) If child process successfully starts(and pid is available) but
assigning the fd to boost fails, "is_running" is set to true and
instance manager waits for netns_timeout_ time to delete this task
rather relying on pipe event. In this case, child process very likely
succeeds the execution but as pipe event is no more tracked, max time of
netns_timeout_ time is waited to delete the task.

3) The stale netns in host machines (if agent restarts) are deleted directly rather
relying on task_queue infrastructure to make sure stale tasks are not at
head of task_queue.

partial-bug: #1527429
(cherry picked from commit 6bc6a4204f51deedc343a17469f0060679fead86)

Conflicts:
 src/vnsw/agent/oper/instance_manager.cc
 src/vnsw/agent/oper/instance_task.h

Change-Id: Ifd4cacce42e8fc6136cf15a9c82e678ad3b1ea9e

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/18195
Committed: http://github.org/Juniper/contrail-controller/commit/baf1438ed9c4b1dfbe312339e7d6e9edefb775ef
Submitter: Zuul
Branch: R2.20

commit baf1438ed9c4b1dfbe312339e7d6e9edefb775ef
Author: Divakar <email address hidden>
Date: Mon Feb 22 21:27:25 2016 +0530

Handle Instance Task's creation errors

When instance manager starts and runs an Instance Task, it sets up a
pipe between parent and child and also adds the parents pipe FD's to
boost asio. Creation of pipe and assigning the pipe fd to boost can
result in errors. These errorrs are not handled in instance manager.
This is resulting in error scenarios where complete instance task is not
setup but instance manager acts on it, which results in crashes.

As a fix, following behaviour is added.
1) If creation of pipe fails, task's "is_running" is set to false and is
not moved out of task_queue. Using a timer, it is attempted two more
times to start it. If task still fails to start after fixed number of
attempts, then task is deleted.

2) If child process successfully starts(and pid is available) but
assigning the fd to boost fails, "is_running" is set to true and
instance manager waits for netns_timeout_ time to delete this task
rather relying on pipe event. In this case, child process very likely
succeeds the execution but as pipe event is no more tracked, max time of
netns_timeout_ time is waited to delete the task.

3) The stale netns in host machines (if agent restarts) are deleted directly rather
relying on task_queue infrastructure to make sure stale tasks are not at
head of task_queue.

partial-bug: #1527429
(cherry picked from commit 6bc6a4204f51deedc343a17469f0060679fead86)

Conflicts:
 src/vnsw/agent/oper/instance_manager.cc
 src/vnsw/agent/oper/instance_task.h

Change-Id: Ifd4cacce42e8fc6136cf15a9c82e678ad3b1ea9e
(cherry picked from commit e4b3216c85365d85eb0abec12ba30a86daa847ab)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/18196
Committed: http://github.org/Juniper/contrail-controller/commit/f9edeba2e7b6615d6312d359b9d2915f8359d5dd
Submitter: Zuul
Branch: R2.21.x

commit f9edeba2e7b6615d6312d359b9d2915f8359d5dd
Author: Divakar <email address hidden>
Date: Mon Feb 22 21:27:25 2016 +0530

Handle Instance Task's creation errors

When instance manager starts and runs an Instance Task, it sets up a
pipe between parent and child and also adds the parents pipe FD's to
boost asio. Creation of pipe and assigning the pipe fd to boost can
result in errors. These errorrs are not handled in instance manager.
This is resulting in error scenarios where complete instance task is not
setup but instance manager acts on it, which results in crashes.

As a fix, following behaviour is added.
1) If creation of pipe fails, task's "is_running" is set to false and is
not moved out of task_queue. Using a timer, it is attempted two more
times to start it. If task still fails to start after fixed number of
attempts, then task is deleted.

2) If child process successfully starts(and pid is available) but
assigning the fd to boost fails, "is_running" is set to true and
instance manager waits for netns_timeout_ time to delete this task
rather relying on pipe event. In this case, child process very likely
succeeds the execution but as pipe event is no more tracked, max time of
netns_timeout_ time is waited to delete the task.

3) The stale netns in host machines (if agent restarts) are deleted directly rather
relying on task_queue infrastructure to make sure stale tasks are not at
head of task_queue.

partial-bug: #1527429
(cherry picked from commit 6bc6a4204f51deedc343a17469f0060679fead86)

Conflicts:
 src/vnsw/agent/oper/instance_manager.cc
 src/vnsw/agent/oper/instance_task.h

Change-Id: Ifd4cacce42e8fc6136cf15a9c82e678ad3b1ea9e
(cherry picked from commit e4b3216c85365d85eb0abec12ba30a86daa847ab)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/20693
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/20694
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/20695
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/20695
Committed: http://github.org/Juniper/contrail-controller/commit/52dcd9798383b8da2835ed50d0712041d2122555
Submitter: Zuul
Branch: R2.20

commit 52dcd9798383b8da2835ed50d0712041d2122555
Author: Divakar <email address hidden>
Date: Mon Feb 22 23:35:11 2016 +0530

Tracebuffer Support for Instance Manager

Currently there are no trace messages for Instancemanager. This change
adds the support of traceuffer.

closes-bug: #1527429

Change-Id: Ia39dce9f1280324935b6c54937baacc013167a59

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/20693
Committed: http://github.org/Juniper/contrail-controller/commit/41e6e9d36b38e5e647d1c368dc988c221243eb46
Submitter: Zuul
Branch: R2.22.x

commit 41e6e9d36b38e5e647d1c368dc988c221243eb46
Author: Divakar <email address hidden>
Date: Mon Feb 22 23:35:11 2016 +0530

Tracebuffer Support for Instance Manager

Currently there are no trace messages for Instancemanager. This change
adds the support of traceuffer.

closes-bug: #1527429

Conflicts:
 src/vnsw/agent/oper/agent.sandesh
 src/vnsw/agent/oper/instance_manager.cc

Change-Id: Ia39dce9f1280324935b6c54937baacc013167a59

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/20694
Committed: http://github.org/Juniper/contrail-controller/commit/7344d71f8e9b38036e6bcea2bd3f84c5b71a3673
Submitter: Zuul
Branch: R2.21.x

commit 7344d71f8e9b38036e6bcea2bd3f84c5b71a3673
Author: Divakar <email address hidden>
Date: Mon Feb 22 23:35:11 2016 +0530

Tracebuffer Support for Instance Manager

Currently there are no trace messages for Instancemanager. This change
adds the support of traceuffer.

closes-bug: #1527429

Change-Id: Ia39dce9f1280324935b6c54937baacc013167a59

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.