Comment 12 for bug 1527429

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/17743
Committed: http://github.org/Juniper/contrail-controller/commit/ac633ca5ccef1dfb69e20ead058e7b5fd0c28ef4
Submitter: Zuul
Branch: R3.0

commit ac633ca5ccef1dfb69e20ead058e7b5fd0c28ef4
Author: Divakar <email address hidden>
Date: Mon Feb 22 21:27:25 2016 +0530

Handle Instance Task's creation errors

When instance manager starts and runs an Instance Task, it sets up a
pipe between parent and child and also adds the parents pipe FD's to
boost asio. Creation of pipe and assigning the pipe fd to boost can
result in errors. These errorrs are not handled in instance manager.
This is resulting in error scenarios where complete instance task is not
setup but instance manager acts on it, which results in crashes.

As a fix, following behaviour is added.
1) If creation of pipe fails, task's "is_running" is set to false and is
not moved out of task_queue. Using a timer, it is attempted two more
times to start it. If task still fails to start after fixed number of
attempts, then task is deleted.

2) If child process successfully starts(and pid is available) but
assigning the fd to boost fails, "is_running" is set to true and
instance manager waits for netns_timeout_ time to delete this task
rather relying on pipe event. In this case, child process very likely
succeeds the execution but as pipe event is no more tracked, max time of
netns_timeout_ time is waited to delete the task.

3) The stale netns in host machines (if agent restarts) are deleted directly rather
relying on task_queue infrastructure to make sure stale tasks are not at
head of task_queue.

Change-Id: I90070a9e9ea740a467ac688b214e7bf1ce706295
partial-bug: #1527429