Comment 36 for bug 1253896

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

9 hours after the change merged, we have a few occurences of this bug, which I think will stay with us until the end of the days.
Perhaps the problem is that I am the root cause of the bug, I don't know...

Seriously: 10 occurences in 9 hours after the patch merged
However:
* 5 failures are in the same job, an experimental one, started before the patch merged. the job is probably even parallelized, which adds new failure modes for this bug we're addressing with other patches.
* 4 failures are in non-neutron jobs; I have looked at logs, but have no clue about the root cause
* 1 failure is "genuine". The error is the usual, VM not getting DHCP, but the fault this time is different. [1] says the port is added incorrectly, or at least too early, because at the next command it fails to find the VIF on the integration bridge. Note that the VIF was instead found a few seconds before as detected by the ovsdb monitor. It's hard to understand what happened here. It might have a been a kvm issue, and ovs issue, or for some reason nova unplugged the vif (unlikely). But for this failure we should rule out buggy behaviour from neutron.

So things are not bad so far. I will keep looking for each failure, and throw a proposal to start filing different bugs with logstash queries which look for faults and error rather than failures. This way it would be easier to understand how we are making progress and have multiple people looking at this kind of issues.

[1] http://logs.openstack.org/21/60721/2/gate/gate-tempest-dsvm-neutron/353705a/logs/screen-q-agt.txt.gz#_2013-12-24_05_51_45_093