mixed Centos-8-9 job - os_tempest unable to ping neutron router

Bug #1981322 reported by Marios Andreou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Won't Fix
Undecided
Unassigned

Bug Description

At [1] we are trying to create a mixed OS wallaby job with centos9 controller/undercloud and centos8 compute. This setup uses multi-stack for overcloud deployment, with a controller deployed first (using c9/wallaby containers) followed by the compute deployment (c8/wallaby).

The second/compute deployment is completing and during os_tempest setup an overcloud neutron router is created, but we are unable to ping it from the undercloud. You can see this in the logs at [2]

 2022-07-11 05:18:46.224731 | primary | TASK [os_tempest : Ping router ip address] *************************************
 2022-07-11 05:18:46.224765 | primary | Monday 11 July 2022 09:18:46 +0000 (0:00:00.114) 0:02:05.574 ***********
 2022-07-11 05:18:51.231873 | primary | FAILED - RETRYING: Ping router ip address (5 retries left).
 2022-07-11 05:19:06.061119 | primary | FAILED - RETRYING: Ping router ip address (4 retries left).
 2022-07-11 05:19:20.998833 | primary | FAILED - RETRYING: Ping router ip address (3 retries left).
 2022-07-11 05:19:36.024830 | primary | FAILED - RETRYING: Ping router ip address (2 retries left).
 2022-07-11 05:19:51.161485 | primary | FAILED - RETRYING: Ping router ip address (1 retries left).
 2022-07-11 05:20:06.168826 | primary | fatal: [undercloud]: FAILED! => {"attempts": 5, "changed": true, "cmd": "set -e\nping -c2 \"192.168.24.169\"\n", "delta": "0:00:03.088602", "end": "2022-07-11 09:20:05.873811", "msg": "non-zero return code", "rc": 1, "start": "2022-07-11 09:20:02.785209", "stderr": "", "stderr_lines": [], "stdout": "PING 192.168.24.169 (192.168.24.169) 56(84) bytes of data.\nFrom 192.168.24.2 icmp_seq=1 Destination Host Unreachable\nFrom 192.168.24.2 icmp_seq=2 Destination Host Unreachable\n\n--- 192.168.24.169 ping statistics ---\n2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1031ms\npipe 2", "stdout_lines": ["PING 192.168.24.169 (192.168.24.169) 56(84) bytes of data.", "From 192.168.24.2 icmp_seq=1 Destination Host Unreachable", "From 192.168.24.2 icmp_seq=2 Destination Host Unreachable", "", "--- 192.168.24.169 ping statistics ---", "2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1031ms", "pipe 2"]}

Communication between the undercloud and overcloud nodes is OK (e.g. ping works) and clearly undercloud can talk to overcloud neutron as the router is created fine - you can see the query on the overcloud router details in the attached file from comment #1 below.

The environment is using the ci/multinode.j2 [3] template (also used used by the c9 multinode job - for comparison example logs at [4]) and we are not facing the same issue there i.e. the neutron router can be reached. So with multinode.j2 we are adding the br-ex and have [5] to pass in the explicit datacentre:br-ex bridge mappings (though that is default anyway).

The attached file (comment #1 below) is the output from "ovs-vsctl show", "ip a", "ip r" on the 3 nodes (though you can find the logs at e.g. [6][7] where subnode-1 is control and subnode-2 is compute).

What am I missing please why cant we ping the overcloud neutron router IP address from the undercloud?

[1] https://review.opendev.org/q/topic:oooci_mixed_rhel
[2] https://logserver.rdoproject.org/58/43558/12/check/tripleo-ci-centos-8-9-mixed-os/27914d7/job-output.txt
[3] https://opendev.org/openstack/tripleo-ansible/src/commit/76f875f45d8bca830c0564b7f5f18d75f5b90843/tripleo_ansible/roles/tripleo_network_config/templates/ci/multinode.j2
[4] https://0976aa7a5fe989ae2839-16c03009b76be00de1a48ecc1775c29e.ssl.cf1.rackcdn.com/848817/1/check/tripleo-ci-centos-9-containers-multinode/5640a8b/job-output.txt
[5] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/847962/5/ci/environments/multinode-containers-mixed-os-control.yaml#48
[6] https://logserver.rdoproject.org/58/43558/12/check/tripleo-ci-centos-8-9-mixed-os/27914d7/logs/subnode-1/var/log/extra/network.txt.gz
[7] https://logserver.rdoproject.org/58/43558/12/check/tripleo-ci-centos-8-9-mixed-os/27914d7/logs/subnode-2/var/log/extra/network.txt.gz

Revision history for this message
Marios Andreou (marios-b) wrote :
description: updated
Revision history for this message
Marios Andreou (marios-b) wrote :

marking this as invalid - still not clear why the router ping was not working
however after skipping this the job has gone on to run tempest to completion so network seems to be OK at least from user point of view

https://logserver.rdoproject.org/58/43558/13/check/tripleo-ci-centos-8-9-multinode-mixed-os/e621742/logs/undercloud/var/log/tempest/stestr_results.html.gz

Changed in tripleo:
status: New → Won't Fix
Revision history for this message
Marios Andreou (marios-b) wrote :

adding a note for future reference. as commented in #2 above the ping was skipped by passing a dedicated var 'tempest_ping_router' in the featureset there https://opendev.org/openstack/tripleo-quickstart/src/commit/46b5df17c37a294f4c6ad91bf9f780c7115c6395/config/general_config/featureset066.yml#L52

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.