periodic-tripleo-ci-centos-8-9-multinode-mixed-os 'no route to host' for compute deployment

Bug #1983601 reported by Marios Andreou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Invalid
High
Unassigned

Bug Description

At [1][2] the periodic-tripleo-ci-centos-8-9-multinode-mixed-os fails very early in the compute deployment as the undercloud cannot reach the compute ctlplane IP 192.168.24.4

 2022-08-03 11:02:14 | 2022-08-03 11:02:14.438707 | fa163e7a-0c81-c7f5-c5e9-000000000038 | FATAL | Wait for connection to become available | 192.168.24.4 | error={"changed": false, "elapsed": 2404, "msg": "timed out waiting for ping module test: Data could not be sent to remote host \"192.168.24.4\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.4 port 22: No route to host\r\n"}
 2022-08-03 11:02:14 | 2022-08-03 11:02:14.441010 | fa163e7a-0c81-c7f5-c5e9-000000000038 | TIMING | Wait for connection to become available | 192.168.24.4 | 0:40:07.511350 | 2404.17s

The job is currently under development [3][4] so not part of any of our integration promotion lines yet, but it *is* running in upstream opendev, gating across the tripleo repos [5] for stable/wallaby and we are not seeing this issue there.

[1] https://logserver.rdoproject.org/34/44234/5/check/periodic-tripleo-ci-centos-8-9-multinode-mixed-os/01e9b12/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
[2] https://logserver.rdoproject.org/34/44234/4/check/periodic-tripleo-ci-centos-8-9-multinode-mixed-os/07564eb/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
[3] https://review.rdoproject.org/r/q/topic:oooci_mixed_rhel
[4] https://review.opendev.org/q/topic:oooci_mixed_rhel
[5] https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-8-9-multinode-mixed-os

Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

I held some nodes today and tried to debug. Long story short, I had to start ovsdb-server as it was down completely (eg no ovs-vsctl show available until I started it). Then I defined the os-net-config/config.yaml installed os-net-config and ran it on the compute node. After this the ping was OK between compute and the other nodes.

I am adding that workaround there for now [1] - basically like:

            cat > /home/zuul/config.yaml <<EOF
            network_config:
            - type: ovs_bridge
              name: br-ex
              use_dhcp: false
              addresses:
              - ip_netmask: 192.168.24.4/24
            EOF
            sudo service ovsdb-server start
            sudo mkdir -p /etc/os-net-config
            sudo mv /home/zuul/config.yaml /etc/os-net-config/config.yaml
            sudo chown root:root /etc/os-net-config/config.yaml
            sudo dnf -y install os-net-config
            sudo os-net-config -c /etc/os-net-config/config.yaml

[1] https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/851757/7/playbooks/multinode-overcloud-mixed-os-deploy-compute.yml#41

Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

Seems that this is invalid. Not clear why yet but moving the repo-setup to just before the compute deployment i.e. [1] and its depends-on [2] seems to cause tripleo/+bug/1983601 [3].

Not clear since we only have repo-setup role there and clearing of hosts file.

Using just https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/853167 we have good results upstream [4] and periodic [5], without hitting [3] or needing any other workaround (as we have in [2]).

Marking the bug invalid

[1] https://review.opendev.org/c/openstack/tripleo-ci/+/851758
[2] https://review.opendev.org/c/openstack/tripleo-ci/+/851758
[3] https://bugs.launchpad.net/tripleo/+bug/1983601
[4] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/852990/2#message-a3a52169a216290500c6b6929bec75fdfed5b6fa
[5] https://review.rdoproject.org/r/c/testproject/+/44234/7#message-00db167308fc1087e5ff6959008a633d5c2c56c2

Changed in tripleo:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.