Comment 13 for bug 1050512

Revision history for this message
Gary Kotton (garyk) wrote :

Hi,
I have spent quite a lot of time trying to reproduce this. I have the following findings:
1. The L2 agent polls the ovs every 2 seconds to learn if there are any changes regarding the attached devices. If there are then it will query the quantum plugin to get the information. This is where we have a number ofinteresting things:
    i. If the quantum service has not started (which could be the case here) then by default the requests will hang for the default timeout (rpc_response_timeout=60). This still does not explain why the tag was removed. I think that this value should be changed to at most 5 seconds.
    ii. If the agent gets a response from the plugin that is inconsistent from its configuration then it will set the tag as 4095 (I do not think that this was the case here)
2. When the appliance is rebooted the OVS entries are persistent - that means that the created devices and their tags remain unchanged after reboot. The L2 agent can either delete the entry or it can set the tag to be 4095.
3. I am not sure if the processes were started manually or via systemd. If this is systemd then I think that the packages need to ensure dependencises on the startup order:
    i. rpc message service (if running on host)
    ii. database (mysql if running on host)
    iii. quantum service (if running on host)
    iv. quantum agents (if running on host)
I am going to take the liberty to move this to incomplete. Hopefully someone may have a scenario that reproduces so that we can fix the problem.
I think that we should also address the default timeout for the rpc call.
Thanks
Gary