Comment 16 for bug 1527429

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/17634
Committed: http://github.org/Juniper/contrail-controller/commit/6bc6a4204f51deedc343a17469f0060679fead86
Submitter: Zuul
Branch: master

commit 6bc6a4204f51deedc343a17469f0060679fead86
Author: Divakar <email address hidden>
Date: Mon Feb 22 21:27:25 2016 +0530

Handle Instance Task's creation errors

When instance manager starts and runs an Instance Task, it sets up a
pipe between parent and child and also adds the parents pipe FD's to
boost asio. Creation of pipe and assigning the pipe fd to boost can
result in errors. These errorrs are not handled in instance manager.
This is resulting in error scenarios where complete instance task is not
setup but instance manager acts on it, which results in crashes.

As a fix, following behaviour is added.
1) If creation of pipe fails, task's "is_running" is set to false and is
not moved out of task_queue. Using a timer, it is attempted two more
times to start it. If task still fails to start after fixed number of
attempts, then task is deleted.

2) If child process successfully starts(and pid is available) but
assigning the fd to boost fails, "is_running" is set to true and
instance manager waits for netns_timeout_ time to delete this task
rather relying on pipe event. In this case, child process very likely
succeeds the execution but as pipe event is no more tracked, max time of
netns_timeout_ time is waited to delete the task.

3) The stale netns in host machines (if agent restarts) are deleted directly rather
relying on task_queue infrastructure to make sure stale tasks are not at
head of task_queue.

Change-Id: I99b783f0e0c9a666f340e779e778f668e843b245
partial-bug: #1527429