contrail-collector crash immediately after provisioning
root cause:
Race condition problem:
To state_machine_,
(1) alloced by sandesh_connection.
(2) used by generator
When problem happen, generator receive Resource update message,
and enqueue resouece update to state_machine_, at same time,
update stats immedietly. This action will try to get mutex
sometime, it will lead CPU yield. We call this as thread 1.
At same time, connection close is triggered, and destructor
function will be triggered. And destructure will call termial
and all memory will be released related to this connection.
We call this as thread 2.
When thread 2 finished and thread 1 go ahead, crash will happen.
Solution:
Designer of state_machine should consider this problem. So state
Machine destructure is separated two steps:
(1) call terminal to free memory alloced by its substruct.
(2) start a timer to free state machine self.
Between step1 and step2, deleted_ is used to check state machine
can be used or not.
We add a shutdown fucntion for stats structure to pass this state.
Change-Id: I15db0a1c1a6999758ed5cd2400d5d3ff8ab85232
Closes-Bug: 1755649
(cherry picked from commit b8a8de2a2ef2db849d96f8bc2cd983e4275b6b53)
Reviewed: https:/ /review. opencontrail. org/43763 github. com/Juniper/ contrail- common/ commit/ 76997a86307ff70 469eac51910fe18 ec544fbfec
Committed: http://
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0
commit 76997a86307ff70 469eac51910fe18 ec544fbfec
Author: zcui <email address hidden>
Date: Tue Jun 12 16:18:06 2018 -0700
contrail-collector crash immediately after provisioning
root cause:
Race condition problem:
To state_machine_,
(1) alloced by sandesh_connection.
(2) used by generator
When problem happen, generator receive Resource update message,
and enqueue resouece update to state_machine_, at same time,
update stats immedietly. This action will try to get mutex
sometime, it will lead CPU yield. We call this as thread 1.
At same time, connection close is triggered, and destructor
function will be triggered. And destructure will call termial
and all memory will be released related to this connection.
We call this as thread 2.
When thread 2 finished and thread 1 go ahead, crash will happen.
Solution:
Designer of state_machine should consider this problem. So state
Machine destructure is separated two steps:
(1) call terminal to free memory alloced by its substruct.
(2) start a timer to free state machine self.
Between step1 and step2, deleted_ is used to check state machine
can be used or not.
We add a shutdown fucntion for stats structure to pass this state.
Change-Id: I15db0a1c1a6999 758ed5cd2400d5d 3ff8ab85232 49d96f8bc2cd983 e4275b6b53)
Closes-Bug: 1755649
(cherry picked from commit b8a8de2a2ef2db8