Comment 40 for bug 1253896

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

I've run the queries [1] and [2] for a 24 hours period. Query 1 is for dec-23 (GMT time zome), Query 2 is for Jan 2 (GMT time zone).
Experimental builds are excluded.

Query 1 showed 145 failures (elastic recheck says 154 because of 9 experimental jobs - ratio 94%) of which
126 in non-isolated jobs
8 in isolated jobs
11 in 'full jobs' - which are non voting and run parallel tests. With the current code we expect this kind of failures in parallel tests

1471 non-isolated jobs were run in this interval with a failure rate of 8.56% which is rather bad
290 isolated jobs where run in this interval with a failure rate of 2.75% which is barely ok.

Query 2 showed 29 failures (elastic recheck says 38 because of 9 experimental jobs - ratio 76%) of which:
0 in non-isolated jobs
4 in non-neutron jobs
5 in 'full jobs' - which are non voting and run parallel tests. With the current code we expect this kind of failures in parallel tests
14 in grenade jobs. It is unclear at the moment whether grenade is running parallel tests for neutron, but it seems this job is non voting as well
6 in isolated jobs. These will lead to a jenkins failure. The last patch we merged (on christmas eve?) was only for non-isolated jobs.

98 isolated jobs were run in this interval with a failure rate of 6.12%, which is still unfortunately far from being acceptable, and shows there has probably been a regression.
632 non isolated jobs were run in this interval with a failure rate of 0%, which might more acceptable, and shows that the last devstack patch merged cleared timeout messages for non-isolated jobs.

ACTION ITEMS:
1 - investigate grenade failure. Are they running parallel tests? if not, the root cause should be tracked and fixed (might either be flakiness in the test or an underlying problem in neutron)
2 - investigate rise in failures with isolated jobs. Failure rate increased from 2.75% to 6.12% in the two intervals considered.

[1] http://logstash.openstack.org/#eyJmaWVsZHMiOlsiYnVpbGRfcXVldWUiXSwic2VhcmNoIjoibWVzc2FnZTpcIlNTSFRpbWVvdXQ6IENvbm5lY3Rpb24gdG8gdGhlXCIgQU5EIG1lc3NhZ2U6XCJ2aWEgU1NIIHRpbWVkIG91dC5cIiBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiBBTkQgTk9UIGJ1aWxkX3F1ZXVlOlwiZXhwZXJpbWVudGFsXCIiLCJ0aW1lZnJhbWUiOiJjdXN0b20iLCJncmFwaG1vZGUiOiJjb3VudCIsIm9mZnNldCI6MCwidGltZSI6eyJmcm9tIjoiMjAxMy0xMi0yM1QwMDowMDowMCswMDowMCIsInRvIjoiMjAxMy0xMi0yNFQwMDowMDowMCswMDowMCIsInVzZXJfaW50ZXJ2YWwiOiIwIn0sIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIiwic3RhbXAiOjEzODg3MzU5OTUyNTl9
[2] http://logstash.openstack.org/#eyJmaWVsZHMiOlsiYnVpbGRfcXVldWUiXSwic2VhcmNoIjoibWVzc2FnZTpcIlNTSFRpbWVvdXQ6IENvbm5lY3Rpb24gdG8gdGhlXCIgQU5EIG1lc3NhZ2U6XCJ2aWEgU1NIIHRpbWVkIG91dC5cIiBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiBBTkQgTk9UIGJ1aWxkX3F1ZXVlOlwiZXhwZXJpbWVudGFsXCIiLCJ0aW1lZnJhbWUiOiJjdXN0b20iLCJncmFwaG1vZGUiOiJjb3VudCIsIm9mZnNldCI6MCwidGltZSI6eyJmcm9tIjoiMjAxNC0wMS0wMlQwMDowMDowMCswMDowMCIsInRvIjoiMjAxNC0wMS0wM1QwMDowMDowMCswMDowMCIsInVzZXJfaW50ZXJ2YWwiOiIwIn0sIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIiwic3RhbXAiOjEzODg3MzYxMzg4Mzl9