Comment 1 for bug 1459769

Revision history for this message
Sundaresan Rajangam (srajanga) wrote :

Using the contrail-logs utility, captured the SandeshModuleTrace [with reset_time] for the generators [nodei28:Compute:contrail-tor-agent:85, nodei38:Compute:contrail-tor-agent:68 and nodei38:Compute:contrail-tor-agent:107], whose UVEs were not deleted from redis after session disconnect.

[contrail-logs o/p]
2015 May 27 20:20:30.886329 nodei36 [Analytics:contrail-collector:0:None][INVALID] : SandeshModuleServerTrace:184320 [ModuleServerState: name = nodei28:Compute:contrail-tor-agent:85, [generator_info: [hostname = nodei36, [GeneratorInfoAttr: connects = 1, connect_time = 1432662701172809, resets = 1, reset_time = 1432738230768822, in_clear = false]]]]
2015 May 27 20:20:32.166892 nodei36 [Analytics:contrail-collector:0:None][INVALID] : SandeshModuleServerTrace:184321 [ModuleServerState: name = nodei28:Compute:contrail-vrouter-agent:0, [generator_info: [hostname = nodei36, [GeneratorInfoAttr: connects = 3, connect_time = 1432662701653925, resets = 3, reset_time = 1432738218352851, in_clear = false]]]]

[redis log]
[23085] 27 May 20:20:18.353 * DelRequest for nodei28:Compute:contrail-vrouter-agent:0
....
....
[23085] 27 May 20:20:23.370 # Lua slow script detected: still in execution after 5017 milliseconds. You can try killing the script using the SCRIPT KILL command. <<<<<<
....
[23085] 27 May 20:20:32.119 * Delete Request for nodei28:Compute:contrail-vrouter-agent:0 successful

From the redis log, it may be noted that deletion of [nodei28:Compute:contrail-vrouter-agent:0] took ~14seonds [default lua-time-limit is set to 5seconds in redis.conf]. If the lua script runs for more than the configured time, then redis returns error to subsequent requests till the execution of the script is completed.
From the contrail-logs [see above], it may be observed that deletion request for the generator [nodei28:Compute:contrail-tor-agent:85] was sent [reset_time - 27 May 2015, 20:20:30] while the deletion script for the generator [nodei28:Compute:contrail-vrouter-agent:0] was still running and redis had already raised the red flag @ 27 May 20:20:23.370 due to time over run.

The UVEs of the generators nodei38:Compute:contrail-tor-agent:68 and nodei38:Compute:contrail-tor-agent:107 were not deleted from redis for the same reason:

[contrail-logs o/p]
2015 May 27 20:20:46.432991 nodei36 [Analytics:contrail-collector:0:None][INVALID] : SandeshModuleServerTrace:184364 [ModuleServerState: name = nodei38:Compute:contrail-tor-agent:68, [generator_info: [hostname = nodei36, [GeneratorInfoAttr: connects = 3, connect_time = 1432660322547729, resets = 3, reset_time = 1432738237680826, in_clear = false]]]]
2015 May 27 20:20:46.507011 nodei36 [Analytics:contrail-collector:0:None][INVALID] : SandeshModuleServerTrace:184365 [ModuleServerState: name = nodei38:Compute:contrail-tor-agent:107, [generator_info: [hostname = nodei36, [GeneratorInfoAttr: connects = 3, connect_time = 1432696866885598, resets = 3, reset_time = 1432738237744824, in_clear = false]]]]
....
....
2015 May 27 20:21:07.713827 nodei36 [Analytics:contrail-collector:0:None][INVALID] : SandeshModuleServerTrace:184366 [ModuleServerState: name = nodei38:Compute:contrail-vrouter-agent:0, [generator_info: [hostname = nodei36, [GeneratorInfoAttr: connects = 1, connect_time = 1432660322772313, resets = 1, reset_time = 1432738236784990, in_clear = false]]]]

[redis log]
[23085] 27 May 20:20:41.119 * DelRequest for nodei38:Compute:contrail-vrouter-agent:0
.....
[23085] 27 May 20:20:46.284 # Lua slow script detected: still in execution after 5165 milliseconds. You can try killing the script using the SCRIPT KILL command.
.....
[23085] 27 May 20:21:07.576 * Delete Request for nodei38:Compute:contrail-vrouter-agent:0 successful

lua script for deletion of the generator took ~26 seconds => deletion request for generators nodei38:Compute:contrail-tor-agent:68 and nodei38:Compute:contrail-tor-agent:107 failed.

>> the redis error logs in contrail-collector.log were overwritten - should use sandesh systemlog instead of log4cplus