Comment 9 for bug 1866202

Revision history for this message
yatin (yatinkarel) wrote :

Some updates, it turned out to be garbage ifcfg-ens3 network-scripts on overcloud nodes, to confirm this for testing we cleaned it up before running NetworkConfig https://review.opendev.org/#/c/711755/ and the deployment moved forward and we seen a green run for CentOS8. Test results can be seen in https://review.rdoproject.org/r/#/c/25332/.

Green runs:-
https://logserver.rdoproject.org/32/25332/53/check/periodic-tripleo-ci-centos-8-ovb-1ctlr_1comp-featureset001-master/820ce3c/
https://logserver.rdoproject.org/32/25332/53/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/d02fa66/

Fix for the ens3 issue:-
ens3 can be cleaned up in the base image used to prepare overcloud-full or in some disk image element.

Though we seen two green runs for c8 ovb, there were couple of random issues noticed during multiple runs:-

1) Mar 07 17:13:27 overcloud-novacompute-0 network[10026]: Bringing up interface eth4: Error: Connection activation failed: IP configuration could not be reserved (no available address, timeout, etc.)
Mar 07 17:13:27 overcloud-novacompute-0 network[10026]: Hint: use 'journalctl -xe NM_CONNECTION=84d43311-57c8-8986-f205-9c78cd6ef5d2 + NM_DEVICE=eth4' to get more details.
Mar 07 17:13:27 overcloud-novacompute-0 network[10026]: [FAILED]
Logs:- https://logserver.rdoproject.org/32/25332/52/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/7d36dbc/logs/overcloud-novacompute-0/var/log/journal.txt.gz

2) Baremetal nodes goes to deploy_failed state randomly:-
Logs:-
https://logserver.rdoproject.org/32/25332/53/check/periodic-tripleo-ci-centos-8-ovb-1ctlr_1comp-featureset001-master-test1/2edc8af/logs/undercloud/var/log/extra/baremetal_list.txt.gz

3) Tempest failures:-
Logs:- https://logserver.rdoproject.org/32/25332/53/check/periodic-tripleo-ci-centos-8-ovb-1ctlr_1comp-featureset001-master-test2/0c36de4/logs/undercloud/var/log/tempest/stestr_results.html.gz

4) Baremetal nodes provision failed due to qemu scientific notation bug:- https://bugs.launchpad.net/oslo.utils/+bug/1864529
This is fixed already in oslo.utils but required patch is not yet available in current-tripleo. /me didn't got why it failed randomly with same overcloud-full image.

5) overcloud deployment just stucks
For this logs didn't get collected, but i noticed in other underecloud job too, and there we got collected vm console log and it turned out to be kernel panic http://paste.openstack.org/show/790477/, so same could be true here. From kernel panic logs we seen "atop Tainted", don't know if atop binary from el7 on CentOS8 can cause that, but that's being fixed at https://review.opendev.org/711894.

I think all these random issues need to be fixed/diagnosed seperately as needs different expertise.