2018-10-24 14:22:45 |
Bogdan Dobrelya |
bug |
|
|
added bug |
2018-10-24 14:22:53 |
Bogdan Dobrelya |
tripleo: importance |
Undecided |
High |
|
2018-10-24 14:22:56 |
Bogdan Dobrelya |
tripleo: milestone |
|
stein-1 |
|
2018-10-24 14:22:59 |
Bogdan Dobrelya |
tripleo: status |
New |
Triaged |
|
2018-10-24 14:23:09 |
Bogdan Dobrelya |
tags |
|
ci |
|
2018-10-24 14:28:58 |
Bogdan Dobrelya |
description |
The error message is highly likely a red herring pointing out to some other sort of issues, like system under pressure perhaps.
Examples:
Single error: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/journal.txt.gz#_Oct_23_19_48_00
Multiple errors: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/journal.txt.gz#_Oct_23_19_15_16
dstat: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/extra/dstat.html.gz
dstat shows correlation with high CPU wait numbers (>70%)
See also the elastic-recheck stats for that error pattern:
total hits: 7
build_branch
85% master
14% stable/rocky
build_change
14% 591540 610728 582301
14% 611447 608354
14% 582735
14% 610087
14% 610491
build_name
14% tripleo-ci-centos-7-containers-multinode tripleo-ci-centos-7-scenario001-multinode-oooq-container
14% tripleo-ci-centos-7-undercloud-containers tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates tripleo-ci-centos-7-scenario003-multinode-oooq-container
14% tripleo-ci-centos-7-containers-multinode
14% tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades
14% tripleo-ci-centos-7-scenario004-multinode-oooq-container
build_node
100% centos-7
build_queue
85% check
14% gate
build_status
71% FAILURE
14% FAILURE SUCCESS
14% SUCCESS FAILURE
build_zuul_url
100% N/A
filename
71% logs/undercloud/var/log/extra/logstash.txt
28% logs/undercloud/var/log/extra/errors.txt
log_url
14% http://logs.openstack.org/40/591540/69/check/tripleo-ci-centos-7-undercloud-containers/7c3676b/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/28/610728/5/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/a299956/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/47/611447/2/check/tripleo-ci-centos-7-containers-multinode/3ede27e/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/54/608354/3/check/tripleo-ci-centos-7-scenario001-multinode-oooq-container/9690552/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/15/610515/1/check/tripleo-ci-centos-7-scenario004-multinode-oooq-container/02cd3e4/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/35/582735/10/check/tripleo-ci-centos-7-containers-multinode/72c2e19/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/87/610087/3/check/tripleo-ci-centos-7-scenario007-multinode-oooq-container/6002e49/logs/undercloud/var/log/extra/logstash.txt
node_provider
57% inap-mtl01
14% inap-mtl01 rax-dfw
14% ovh-gra1 rax-iad inap-mtl01
14% rax-iad
port
14% 35486
14% 38788
14% 42428
14% 42552
14% 45124
project
28% openstack/tripleo-heat-templates
28% openstack/tripleo-quickstart-extras
14% openstack/tripleo-heat-templates openstack/congress
14% openstack/tripleo-quickstart openstack/tripleo-quickstart-extras openstack/tripleo-heat-templates
14% openstack/tripleo-common
severity
71% INFO
28% ERROR
tags
71% logstash.txt console postci multiline _grokparsefailure
28% errors.txt console errors multiline _grokparsefailure
voting
57% 1
28% 0
14% 1 0
zuul_executor
28% ze09.openstack.org
14% ze07.openstack.org ze02.openstack.org ze01.openstack.org
14% ze10.openstack.org ze05.openstack.org
14% ze03.openstack.org
14% ze07.openstack.org
So jobs not always fail with that error. I think it should be CPU pressure related instead. |
The error message is highly likely a red herring pointing out to some other sort of issues, like system under pressure perhaps.
Examples:
Single error: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/journal.txt.gz#_Oct_23_19_48_00
Multiple errors: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/journal.txt.gz#_Oct_23_19_15_16
dstat: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/extra/dstat.html.gz
dstat shows correlation with high CPU wait numbers (>70%) and an increased memory use (See 19h 48m 44s and further on)
See also the elastic-recheck stats for that error pattern:
total hits: 7
build_branch
85% master
14% stable/rocky
build_change
14% 591540 610728 582301
14% 611447 608354
14% 582735
14% 610087
14% 610491
build_name
14% tripleo-ci-centos-7-containers-multinode tripleo-ci-centos-7-scenario001-multinode-oooq-container
14% tripleo-ci-centos-7-undercloud-containers tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates tripleo-ci-centos-7-scenario003-multinode-oooq-container
14% tripleo-ci-centos-7-containers-multinode
14% tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades
14% tripleo-ci-centos-7-scenario004-multinode-oooq-container
build_node
100% centos-7
build_queue
85% check
14% gate
build_status
71% FAILURE
14% FAILURE SUCCESS
14% SUCCESS FAILURE
build_zuul_url
100% N/A
filename
71% logs/undercloud/var/log/extra/logstash.txt
28% logs/undercloud/var/log/extra/errors.txt
log_url
14% http://logs.openstack.org/40/591540/69/check/tripleo-ci-centos-7-undercloud-containers/7c3676b/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/28/610728/5/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/a299956/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/47/611447/2/check/tripleo-ci-centos-7-containers-multinode/3ede27e/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/54/608354/3/check/tripleo-ci-centos-7-scenario001-multinode-oooq-container/9690552/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/15/610515/1/check/tripleo-ci-centos-7-scenario004-multinode-oooq-container/02cd3e4/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/35/582735/10/check/tripleo-ci-centos-7-containers-multinode/72c2e19/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/87/610087/3/check/tripleo-ci-centos-7-scenario007-multinode-oooq-container/6002e49/logs/undercloud/var/log/extra/logstash.txt
node_provider
57% inap-mtl01
14% inap-mtl01 rax-dfw
14% ovh-gra1 rax-iad inap-mtl01
14% rax-iad
port
14% 35486
14% 38788
14% 42428
14% 42552
14% 45124
project
28% openstack/tripleo-heat-templates
28% openstack/tripleo-quickstart-extras
14% openstack/tripleo-heat-templates openstack/congress
14% openstack/tripleo-quickstart openstack/tripleo-quickstart-extras openstack/tripleo-heat-templates
14% openstack/tripleo-common
severity
71% INFO
28% ERROR
tags
71% logstash.txt console postci multiline _grokparsefailure
28% errors.txt console errors multiline _grokparsefailure
voting
57% 1
28% 0
14% 1 0
zuul_executor
28% ze09.openstack.org
14% ze07.openstack.org ze02.openstack.org ze01.openstack.org
14% ze10.openstack.org ze05.openstack.org
14% ze03.openstack.org
14% ze07.openstack.org
So jobs not always fail with that error. I think it should be CPU pressure related instead. |
|
2018-10-24 14:34:15 |
Bogdan Dobrelya |
description |
The error message is highly likely a red herring pointing out to some other sort of issues, like system under pressure perhaps.
Examples:
Single error: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/journal.txt.gz#_Oct_23_19_48_00
Multiple errors: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/journal.txt.gz#_Oct_23_19_15_16
dstat: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/extra/dstat.html.gz
dstat shows correlation with high CPU wait numbers (>70%) and an increased memory use (See 19h 48m 44s and further on)
See also the elastic-recheck stats for that error pattern:
total hits: 7
build_branch
85% master
14% stable/rocky
build_change
14% 591540 610728 582301
14% 611447 608354
14% 582735
14% 610087
14% 610491
build_name
14% tripleo-ci-centos-7-containers-multinode tripleo-ci-centos-7-scenario001-multinode-oooq-container
14% tripleo-ci-centos-7-undercloud-containers tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates tripleo-ci-centos-7-scenario003-multinode-oooq-container
14% tripleo-ci-centos-7-containers-multinode
14% tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades
14% tripleo-ci-centos-7-scenario004-multinode-oooq-container
build_node
100% centos-7
build_queue
85% check
14% gate
build_status
71% FAILURE
14% FAILURE SUCCESS
14% SUCCESS FAILURE
build_zuul_url
100% N/A
filename
71% logs/undercloud/var/log/extra/logstash.txt
28% logs/undercloud/var/log/extra/errors.txt
log_url
14% http://logs.openstack.org/40/591540/69/check/tripleo-ci-centos-7-undercloud-containers/7c3676b/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/28/610728/5/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/a299956/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/47/611447/2/check/tripleo-ci-centos-7-containers-multinode/3ede27e/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/54/608354/3/check/tripleo-ci-centos-7-scenario001-multinode-oooq-container/9690552/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/15/610515/1/check/tripleo-ci-centos-7-scenario004-multinode-oooq-container/02cd3e4/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/35/582735/10/check/tripleo-ci-centos-7-containers-multinode/72c2e19/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/87/610087/3/check/tripleo-ci-centos-7-scenario007-multinode-oooq-container/6002e49/logs/undercloud/var/log/extra/logstash.txt
node_provider
57% inap-mtl01
14% inap-mtl01 rax-dfw
14% ovh-gra1 rax-iad inap-mtl01
14% rax-iad
port
14% 35486
14% 38788
14% 42428
14% 42552
14% 45124
project
28% openstack/tripleo-heat-templates
28% openstack/tripleo-quickstart-extras
14% openstack/tripleo-heat-templates openstack/congress
14% openstack/tripleo-quickstart openstack/tripleo-quickstart-extras openstack/tripleo-heat-templates
14% openstack/tripleo-common
severity
71% INFO
28% ERROR
tags
71% logstash.txt console postci multiline _grokparsefailure
28% errors.txt console errors multiline _grokparsefailure
voting
57% 1
28% 0
14% 1 0
zuul_executor
28% ze09.openstack.org
14% ze07.openstack.org ze02.openstack.org ze01.openstack.org
14% ze10.openstack.org ze05.openstack.org
14% ze03.openstack.org
14% ze07.openstack.org
So jobs not always fail with that error. I think it should be CPU pressure related instead. |
The error message is highly likely a red herring pointing out to some other sort of issues, like system under pressure perhaps.
Examples:
Single error: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/journal.txt.gz#_Oct_23_19_48_00
Multiple errors: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/journal.txt.gz#_Oct_23_19_15_16
dstat: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/extra/dstat.html.gz
dstat shows correlation with high CPU wait numbers (>70%) and an increased memory use (See 19h 48m 44s and further on)
See also the elastic-recheck stats for that error pattern:
total hits: 7
build_branch
85% master
14% stable/rocky
build_change
14% 591540 610728 582301
14% 611447 608354
14% 582735
14% 610087
14% 610491
build_name
14% tripleo-ci-centos-7-containers-multinode tripleo-ci-centos-7-scenario001-multinode-oooq-container
14% tripleo-ci-centos-7-undercloud-containers tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates tripleo-ci-centos-7-scenario003-multinode-oooq-container
14% tripleo-ci-centos-7-containers-multinode
14% tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades
14% tripleo-ci-centos-7-scenario004-multinode-oooq-container
build_node
100% centos-7
build_queue
85% check
14% gate
build_status
71% FAILURE
14% FAILURE SUCCESS
14% SUCCESS FAILURE
build_zuul_url
100% N/A
filename
71% logs/undercloud/var/log/extra/logstash.txt
28% logs/undercloud/var/log/extra/errors.txt
log_url
14% http://logs.openstack.org/40/591540/69/check/tripleo-ci-centos-7-undercloud-containers/7c3676b/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/28/610728/5/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/a299956/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/47/611447/2/check/tripleo-ci-centos-7-containers-multinode/3ede27e/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/54/608354/3/check/tripleo-ci-centos-7-scenario001-multinode-oooq-container/9690552/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/15/610515/1/check/tripleo-ci-centos-7-scenario004-multinode-oooq-container/02cd3e4/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/35/582735/10/check/tripleo-ci-centos-7-containers-multinode/72c2e19/logs/undercloud/var/log/extra/logstash.txt
14% http://logs.openstack.org/87/610087/3/check/tripleo-ci-centos-7-scenario007-multinode-oooq-container/6002e49/logs/undercloud/var/log/extra/logstash.txt
node_provider
57% inap-mtl01
14% inap-mtl01 rax-dfw
14% ovh-gra1 rax-iad inap-mtl01
14% rax-iad
port
14% 35486
14% 38788
14% 42428
14% 42552
14% 45124
project
28% openstack/tripleo-heat-templates
28% openstack/tripleo-quickstart-extras
14% openstack/tripleo-heat-templates openstack/congress
14% openstack/tripleo-quickstart openstack/tripleo-quickstart-extras openstack/tripleo-heat-templates
14% openstack/tripleo-common
severity
71% INFO
28% ERROR
tags
71% logstash.txt console postci multiline _grokparsefailure
28% errors.txt console errors multiline _grokparsefailure
voting
57% 1
28% 0
14% 1 0
zuul_executor
28% ze09.openstack.org
14% ze07.openstack.org ze02.openstack.org ze01.openstack.org
14% ze10.openstack.org ze05.openstack.org
14% ze03.openstack.org
14% ze07.openstack.org
So jobs not always fail with that error. It should be CPU wait (IO) and memory pressure related instead. |
|
2018-10-30 16:17:02 |
Juan Antonio Osorio Robles |
tripleo: milestone |
stein-1 |
stein-2 |
|
2019-01-13 22:51:14 |
Emilien Macchi |
tripleo: milestone |
stein-2 |
stein-3 |
|
2019-02-12 14:02:15 |
Bogdan Dobrelya |
tripleo: status |
Triaged |
Invalid |
|