Few vrouter uves remained even after the vrouters were deleted on tor-scale setup

Bug #1459769 reported by Vedamurthy Joshi
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Sundaresan Rajangam
Trunk
Fix Committed
High
Sundaresan Rajangam

Bug Description

R2.20 Build 30 Ubuntu 14.04 Juno multi-node setup

In This scale setup host5 and host6 had 128 tor-agents each in active-backup mode
I then added another two nodes host7 and host9 for tors 65-128. Then, deleted the vrouter (tor-agent )objects 65-128 which were created on host5(nodei38) and host6 (nodei28)

After that, it was seen that uves for the 3 vrouter objects : nodei38-107, nodei28-85, nodei38-68 were still left undeleted

Sundar is looking at it. Per his initial analysis, during the bulk deletion around 27th May 8:20:37 PM, the collector got connection-reset from tor-agent , but that didnt trigger the tor-agents' UVE-deletion

env.roledefs = {
    'all': [host2, host3, host4, host5, host6, host7, host8, host9],
    'cfgm': [host2, host3, host4],
    'openstack': [host2, host3, host4],
    'webui': [host3],
    'control': [host2, host3, host4],
    'compute': [host5, host6, host7, host8, host9],
    'collector': [host2, host3, host4],
    'database': [host2, host3, host4],
    'toragent': [host5, host6, host7, host9 ],
    'tsn': [host5, host6, host7,host9 ],
    'build': [host_build],
}

env.hostnames = {
    'all': ['nodei34', 'nodei35', 'nodei36', 'nodei37', 'nodei38', 'nodei28', 'nodei27', 'nodei30']
}

Logs will be in http://10.204.216.50/Docs/bugs/#

Revision history for this message
Sundaresan Rajangam (srajanga) wrote :
Download full text (3.9 KiB)

Using the contrail-logs utility, captured the SandeshModuleTrace [with reset_time] for the generators [nodei28:Compute:contrail-tor-agent:85, nodei38:Compute:contrail-tor-agent:68 and nodei38:Compute:contrail-tor-agent:107], whose UVEs were not deleted from redis after session disconnect.

[contrail-logs o/p]
2015 May 27 20:20:30.886329 nodei36 [Analytics:contrail-collector:0:None][INVALID] : SandeshModuleServerTrace:184320 [ModuleServerState: name = nodei28:Compute:contrail-tor-agent:85, [generator_info: [hostname = nodei36, [GeneratorInfoAttr: connects = 1, connect_time = 1432662701172809, resets = 1, reset_time = 1432738230768822, in_clear = false]]]]
2015 May 27 20:20:32.166892 nodei36 [Analytics:contrail-collector:0:None][INVALID] : SandeshModuleServerTrace:184321 [ModuleServerState: name = nodei28:Compute:contrail-vrouter-agent:0, [generator_info: [hostname = nodei36, [GeneratorInfoAttr: connects = 3, connect_time = 1432662701653925, resets = 3, reset_time = 1432738218352851, in_clear = false]]]]

[redis log]
[23085] 27 May 20:20:18.353 * DelRequest for nodei28:Compute:contrail-vrouter-agent:0
....
....
[23085] 27 May 20:20:23.370 # Lua slow script detected: still in execution after 5017 milliseconds. You can try killing the script using the SCRIPT KILL command. <<<<<<
....
[23085] 27 May 20:20:32.119 * Delete Request for nodei28:Compute:contrail-vrouter-agent:0 successful

From the redis log, it may be noted that deletion of [nodei28:Compute:contrail-vrouter-agent:0] took ~14seonds [default lua-time-limit is set to 5seconds in redis.conf]. If the lua script runs for more than the configured time, then redis returns error to subsequent requests till the execution of the script is completed.
From the contrail-logs [see above], it may be observed that deletion request for the generator [nodei28:Compute:contrail-tor-agent:85] was sent [reset_time - 27 May 2015, 20:20:30] while the deletion script for the generator [nodei28:Compute:contrail-vrouter-agent:0] was still running and redis had already raised the red flag @ 27 May 20:20:23.370 due to time over run.

The UVEs of the generators nodei38:Compute:contrail-tor-agent:68 and nodei38:Compute:contrail-tor-agent:107 were not deleted from redis for the same reason:

[contrail-logs o/p]
2015 May 27 20:20:46.432991 nodei36 [Analytics:contrail-collector:0:None][INVALID] : SandeshModuleServerTrace:184364 [ModuleServerState: name = nodei38:Compute:contrail-tor-agent:68, [generator_info: [hostname = nodei36, [GeneratorInfoAttr: connects = 3, connect_time = 1432660322547729, resets = 3, reset_time = 1432738237680826, in_clear = false]]]]
2015 May 27 20:20:46.507011 nodei36 [Analytics:contrail-collector:0:None][INVALID] : SandeshModuleServerTrace:184365 [ModuleServerState: name = nodei38:Compute:contrail-tor-agent:107, [generator_info: [hostname = nodei36, [GeneratorInfoAttr: connects = 3, connect_time = 1432696866885598, resets = 3, reset_time = 1432738237744824, in_clear = false]]]]
....
....
2015 May 27 20:21:07.713827 nodei36 [Analytics:contrail-collector:0:None][INVALID] : SandeshModuleServerTrace:184366 [ModuleServerState: name = nodei38:Compute:contrail-vrouter-agent:0, [generator_info: [hos...

Read more...

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/11098
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/11104
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/11107
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11429
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/11430
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/11431
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11451
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11431
Committed: http://github.org/Juniper/contrail-puppet/commit/f554d96571a76d8a12e394d3939cf559125f2f80
Submitter: Zuul
Branch: R2.20

commit f554d96571a76d8a12e394d3939cf559125f2f80
Author: Sundaresan Rajangam <email address hidden>
Date: Tue Jun 9 13:00:39 2015 -0700

Increase the lua-time-limit to 15sec from 5sec

contrail-collector executes delrequest.lua script when a Generator
gets disconnected. On a scale setup, some Generators (vrouter-agent,
tor-agent) originates more UVEs. Therefore, in some cases,
delrequest.lua takes more than 5 seconds [default lua-time-limit] to
complete. If the lua script runs for more than the configured time
limit, then redis returns error to other requests till the script is
completed.

This patch increases the lua-time-limit to 15000 milliseconds
in the redis config file.

Change-Id: Ieb7d543c22edad13936f58b04abb41632578f51e
Partial-Bug: #1459769

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11430
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/b981f0d33df6e70561f4729f9e61325cfa5cbef6
Submitter: Zuul
Branch: R2.20

commit b981f0d33df6e70561f4729f9e61325cfa5cbef6
Author: Sundaresan Rajangam <email address hidden>
Date: Tue Jun 9 12:33:11 2015 -0700

Increase the lua-time-limit to 15sec from 5sec

contrail-collector executes delrequest.lua script when a Generator
gets disconnected. On a scale setup, some Generators (vrouter-agent,
tor-agent) originates more UVEs. Therefore, in some cases,
delrequest.lua takes more than 5 seconds [default lua-time-limit] to
complete. If the lua script runs for more than the configured time
limit, then redis returns error to other requests till the script is
completed.

This patch increases the lua-time-limit to 15000 milliseconds
in the redis config file.

Change-Id: Idbba259b115015bc86a87b8851ca9271ac4e312c
Partial-Bug: #1459769

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11455
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/11458
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11429
Committed: http://github.org/Juniper/contrail-controller/commit/6776108e8151835807a1b566b39c4df3bc51af91
Submitter: Zuul
Branch: R2.20

commit 6776108e8151835807a1b566b39c4df3bc51af91
Author: Sundaresan Rajangam <email address hidden>
Date: Tue Jun 9 11:58:00 2015 -0700

Handle delrequest.lua and uve async request failure in redis

contrail-collector executes delrequest.lua script when a Generator
gets disconnected. On a scale setup, some Generators (vrouter-agent,
tor-agent) originate more UVEs and therefore delrequest.lua takes more
than 5 seconds [default lua-time-limit] to complete. If the lua script
runs for more than the configured time limit, then redis returns error
to other requests till the script is completed.

- Removed the chatty log from delrequest.lua as it affects the
performance. sub_del() should be called only for UVEs with
aggtype="stats" fields.
- Added assert if redis returns error for delrequest.lua and async
uveupdate and uvedelete requests.

Change-Id: I797a552ebae11adea49c3b58c9775c0014bd6731
Partial-Bug: #1459769

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11620
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
Sundaresan Rajangam (srajanga) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.