I've run the queries [1] and [2] for a 24 hours period. Query 1 is for dec-23 (GMT time zome), Query 2 is for Jan 2 (GMT time zone).
Experimental builds are excluded.
Query 1 showed 145 failures (elastic recheck says 154 because of 9 experimental jobs - ratio 94%) of which
126 in non-isolated jobs
8 in isolated jobs
11 in 'full jobs' - which are non voting and run parallel tests. With the current code we expect this kind of failures in parallel tests
1471 non-isolated jobs were run in this interval with a failure rate of 8.56% which is rather bad
290 isolated jobs where run in this interval with a failure rate of 2.75% which is barely ok.
Query 2 showed 29 failures (elastic recheck says 38 because of 9 experimental jobs - ratio 76%) of which:
0 in non-isolated jobs
4 in non-neutron jobs
5 in 'full jobs' - which are non voting and run parallel tests. With the current code we expect this kind of failures in parallel tests
14 in grenade jobs. It is unclear at the moment whether grenade is running parallel tests for neutron, but it seems this job is non voting as well
6 in isolated jobs. These will lead to a jenkins failure. The last patch we merged (on christmas eve?) was only for non-isolated jobs.
98 isolated jobs were run in this interval with a failure rate of 6.12%, which is still unfortunately far from being acceptable, and shows there has probably been a regression.
632 non isolated jobs were run in this interval with a failure rate of 0%, which might more acceptable, and shows that the last devstack patch merged cleared timeout messages for non-isolated jobs.
ACTION ITEMS:
1 - investigate grenade failure. Are they running parallel tests? if not, the root cause should be tracked and fixed (might either be flakiness in the test or an underlying problem in neutron)
2 - investigate rise in failures with isolated jobs. Failure rate increased from 2.75% to 6.12% in the two intervals considered.
I've run the queries [1] and [2] for a 24 hours period. Query 1 is for dec-23 (GMT time zome), Query 2 is for Jan 2 (GMT time zone).
Experimental builds are excluded.
Query 1 showed 145 failures (elastic recheck says 154 because of 9 experimental jobs - ratio 94%) of which
126 in non-isolated jobs
8 in isolated jobs
11 in 'full jobs' - which are non voting and run parallel tests. With the current code we expect this kind of failures in parallel tests
1471 non-isolated jobs were run in this interval with a failure rate of 8.56% which is rather bad
290 isolated jobs where run in this interval with a failure rate of 2.75% which is barely ok.
Query 2 showed 29 failures (elastic recheck says 38 because of 9 experimental jobs - ratio 76%) of which:
0 in non-isolated jobs
4 in non-neutron jobs
5 in 'full jobs' - which are non voting and run parallel tests. With the current code we expect this kind of failures in parallel tests
14 in grenade jobs. It is unclear at the moment whether grenade is running parallel tests for neutron, but it seems this job is non voting as well
6 in isolated jobs. These will lead to a jenkins failure. The last patch we merged (on christmas eve?) was only for non-isolated jobs.
98 isolated jobs were run in this interval with a failure rate of 6.12%, which is still unfortunately far from being acceptable, and shows there has probably been a regression.
632 non isolated jobs were run in this interval with a failure rate of 0%, which might more acceptable, and shows that the last devstack patch merged cleared timeout messages for non-isolated jobs.
ACTION ITEMS:
1 - investigate grenade failure. Are they running parallel tests? if not, the root cause should be tracked and fixed (might either be flakiness in the test or an underlying problem in neutron)
2 - investigate rise in failures with isolated jobs. Failure rate increased from 2.75% to 6.12% in the two intervals considered.
[1] http:// logstash. openstack. org/#eyJmaWVsZH MiOlsiYnVpbGRfc XVldWUiXSwic2Vh cmNoIjoibWVzc2F nZTpcIlNTSFRpbW VvdXQ6IENvbm5lY 3Rpb24gdG8gdGhl XCIgQU5EIG1lc3N hZ2U6XCJ2aWEgU1 NIIHRpbWVkIG91d C5cIiBBTkQgZmls ZW5hbWU6XCJjb25 zb2xlLmh0bWxcIi BBTkQgTk9UIGJ1a WxkX3F1ZXVlOlwi ZXhwZXJpbWVudGF sXCIiLCJ0aW1lZn JhbWUiOiJjdXN0b 20iLCJncmFwaG1v ZGUiOiJjb3VudCI sIm9mZnNldCI6MC widGltZSI6eyJmc m9tIjoiMjAxMy0x Mi0yM1QwMDowMDo wMCswMDowMCIsIn RvIjoiMjAxMy0xM i0yNFQwMDowMDow MCswMDowMCIsInV zZXJfaW50ZXJ2YW wiOiIwIn0sIm1vZ GUiOiIiLCJhbmFs eXplX2ZpZWxkIjo iIiwic3RhbXAiOj EzODg3MzU5OTUyN Tl9 logstash. openstack. org/#eyJmaWVsZH MiOlsiYnVpbGRfc XVldWUiXSwic2Vh cmNoIjoibWVzc2F nZTpcIlNTSFRpbW VvdXQ6IENvbm5lY 3Rpb24gdG8gdGhl XCIgQU5EIG1lc3N hZ2U6XCJ2aWEgU1 NIIHRpbWVkIG91d C5cIiBBTkQgZmls ZW5hbWU6XCJjb25 zb2xlLmh0bWxcIi BBTkQgTk9UIGJ1a WxkX3F1ZXVlOlwi ZXhwZXJpbWVudGF sXCIiLCJ0aW1lZn JhbWUiOiJjdXN0b 20iLCJncmFwaG1v ZGUiOiJjb3VudCI sIm9mZnNldCI6MC widGltZSI6eyJmc m9tIjoiMjAxNC0w MS0wMlQwMDowMDo wMCswMDowMCIsIn RvIjoiMjAxNC0wM S0wM1QwMDowMDow MCswMDowMCIsInV zZXJfaW50ZXJ2YW wiOiIwIn0sIm1vZ GUiOiIiLCJhbmFs eXplX2ZpZWxkIjo iIiwic3RhbXAiOj EzODg3MzYxMzg4M zl9
[2] http://