periodic-tripleo-ci-centos-9-8-multinode-mixed-os consistent failing in NetworkBasicOps tests

Bug #2012240 reported by Arx Cruz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

This seems to be related to network:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.9/site-packages/tempest/scenario/test_network_basic_ops.py", line 494, in test_connectivity_between_vms_on_different_networks
    self._check_public_network_connectivity(should_connect=True)
  File "/usr/lib/python3.9/site-packages/tempest/scenario/test_network_basic_ops.py", line 212, in _check_public_network_connectivity
    self.check_vm_connectivity(
  File "/usr/lib/python3.9/site-packages/tempest/scenario/manager.py", line 957, in check_vm_connectivity
    self.assertTrue(self.ping_ip_address(ip_address,
  File "/usr/lib64/python3.9/unittest/case.py", line 688, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true : Public network connectivity check failed
Timed out waiting for 192.168.24.190 to become reachable

All tempest.scenario.test_network_basic_ops.TestNetworkBasicOps tests are failing with the same error, the vm spawn properly, but host can't reach out.

Example logs:

https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-8-multinode-mixed-os/645cdef/logs/undercloud/var/log/tempest/stestr_results.html

Revision history for this message
Marios Andreou (marios-b) wrote :
Arx Cruz (arxcruz)
Changed in tripleo:
importance: Undecided → Critical
Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

We are seeing the same issue on 17.1/9 line.

The issue is not related to the cloud where the job runs, we saw the same error on vexx/psi infra.

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

Same tests passed in the mixed os check job.

https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_fe3/877154/6/check/tripleo-ci-centos-8-9-multinode-mixed-os/fe304dd/logs/undercloud/var/log/tempest/tempest_run.log

~~~
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_connectivity_between_vms_on_different_networks [71.935301s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_hotplug_nic [52.315096s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_mtu_sized_frames [48.261266s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops [84.821696s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_preserve_preexisting_port [35.494422s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_router_rescheduling ... SKIPPED: Skipped because network extension: l3_agent_scheduler is not enabled
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_subnet_details [50.079280s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_update_instance_port_admin_state [73.266858s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_update_router_admin_state [45.091668s] ... ok
 ~~~

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

Rpm diff wrt neutron comparing last good job:

Downstream diff:
~~~
Last good run: openstack-neutron-18.6.1-1.20230310151048.2563be4.el9osttrunk
Affected run: openstack-neutron-18.6.1-1.20230315171018.889d86f.el9osttrunk

Last good run: ovn-bgp-agent-0.3.1-1.20230308161522.3c21775.el9osttrunk
Affected run:ovn-bgp-agent-0.3.1-1.20230315171018.2e750e2.el9osttrunk
~~~~

Upstream wallaby diff
~~~
Last good run: openstack-neutron-18.6.1-0.20230310194647.0e97381.el9
Affected run: openstack-neutron-18.6.1-0.20230315202416.e941180.el9

Last good run: ovn-bgp-agent-0.3.1-0.20230310162437.8cc374e.el9
Affected run: ovn-bgp-agent-0.3.1-0.20230316161447.d9c7de6.el9
~~~

Revision history for this message
yatin (yatinkarel) wrote :

So i checked on the live env shared by sandeep and the issue is triggered with https://github.com/openstack/neutron/commit/786d89fee0dca2914a0fce2dc39761c55c917f81

On clearing[1] or setting options:redirect-type=overlay the new option from lrp traffic get's restored and setting[2] the option back make the traffic stop.

# ovn-nbctl list logical_router_port bf380a19-d427-4e1c-ba1e-e8c7bf1d994b
_uuid : bf380a19-d427-4e1c-ba1e-e8c7bf1d994b
enabled : []
external_ids : {"neutron:network_name"=neutron-79a4dc47-f1ad-47d6-9e77-502f9d9d72de, "neutron:revision_number"="1", "neutron:router_name"="d8ec774d-3471-4a65-ac2f-1abdab483f53", "neutron:subnet_ids"="0735f90f-7fca-4d74-949c-c7d7c89763df"}
gateway_chassis : [11c3d3f6-b4c3-4412-8872-f155009cd8f0, c3ba2c1b-242e-484f-8c8e-63d3466eca31]
ha_chassis_group : []
ipv6_prefix : []
ipv6_ra_configs : {}
mac : "fa:16:3e:18:78:23"
name : lrp-bf2c50dc-7117-4c7b-8f4c-fa46832c992f
networks : ["192.168.24.154/24"]
options : {redirect-type=bridged}
peer : []

Will check with Luis as he may have idea on how to handle it.

[1] ovn-nbctl clear logical_router_port bf380a19-d427-4e1c-ba1e-e8c7bf1d994b options
[2] ovn-nbctl set logical_router_port bf380a19-d427-4e1c-ba1e-e8c7bf1d994b options:redirect-type=bridged

Revision history for this message
yatin (yatinkarel) wrote :

<< Will check with Luis as he may have idea on how to handle it.

Checked further with Luis on this and summary is:-
- The neutron patch uncovered the issue[1] in core ovn and it's already fixed in ovn22.12+ with patch [2]
- We verified the fix with backport of [2] in ovn22.06 using scratch builds[3][4], issue do not reproduce.
- We can try to get ovn22.12 in wallaby+ once it's available[5] or request a pre build
- Also synching with Core OVN folks to get the patch backported till ovn-2021-21.12 as it will be required for some downstream releases

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2007120
[2] https://github.com/ovn-org/ovn/commit/54b635204dfdf6020c297203bfc2d1cebab14769
[3] https://cbs.centos.org/koji/taskinfo?taskID=3289835
[4] https://cbs.centos.org/koji/taskinfo?taskID=3289842
[5] https://ftp.redhat.com/pub/redhat/linux/enterprise/9Base/en/Fast-Datapath/SRPMS/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/878203
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/f02b1690cdabf368f1f4d045dfdf2c6e42ce7717
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit f02b1690cdabf368f1f4d045dfdf2c6e42ce7717
Author: yatinkarel <email address hidden>
Date: Wed Mar 22 14:51:53 2023 +0530

    [Wallaby Only] Switch mixed-os job to OVN raft

    Noticed while investigating the related bug, the
    job should validate default raft method.

    Also enable Controller Nodes to act as gateway.

    Related-Bug: #2012240
    Change-Id: I16ff70102e8ec46b6dd1f7362e44f691153bbfb0

tags: added: in-stable-wallaby
Revision history for this message
yatin (yatinkarel) wrote :

<< - We can try to get ovn22.12 in wallaby+ once it's available[5] or request a pre build
Updates in master in progress[1], wallaby will follow

[1] https://review.rdoproject.org/r/q/topic:ovn22.12

Revision history for this message
Alan Pevec (apevec) wrote :
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.