Comment 7 for bug 1715061

Revision history for this message
Hari Prasad Killi (haripk) wrote :

Timeout scenario:
There are 6486 VRFs and 43,022 interfaces in the operational DB entries in the core dump. When config is restarted, all the config would be getting downloaded to agent and agent would have to process all of it afresh (including all objects and their adjacency links etc). Other objects, routes in all these VRFs etc would be getting added. Agent would be busy during this time and may not be responding to the contrail-status requests in time. Need to check how long the timeout status was seen. Cant make out that the agent was in any error state from the gcore, with the given data. Considering the scale involved, I don’t have a way to conclude that everything is in order. But things that I checked (like dumping some of the operational DB entries, objects) seemed fine. During our scale tests, one can try out such tests and figure out the behavior.

Crash reason:
The config being received by agent is all messed up, causing the crashes (for ex, link says "physical-router-physical-interface” but the left & right nodes of the link were not these). Since the test wasn’t done in a sane state (they had ISSU failure which was then patched and continued), we should ignore these crashes and get the tests redone.