cant deploy openstack zed on rocky 9.1 openvswitch3.1 fails

Bug #2013189 reported by Tom Jensen
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Critical
Unassigned

Bug Description

the openstack ansible fails because
[root@maahcoscom01 yum.repos.d]# yum install openvswitch3.1
Last metadata expiration check: 0:29:50 ago on Wed Mar 29 10:21:24 2023.
Error:
 Problem: conflicting requests
  - nothing provides libmlx5.so.1(MLX5_1.24)(64bit) needed by openvswitch3.1-3.1.0-3.el9s.x86_64
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)

the MLX5_1.24 is not installable.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Hi Tom,

Yes, we have catched that issue just today. It's basically caused by RDO team majorly updating OVS version from 2.17 to 3.1 that landed yesterday: https://review.rdoproject.org/r/c/rdoinfo/+/48000

Highly likely, installation of rdma-core should fix this issue, but we have not yet finished testing this.

Just in case all existing CentOS installations that use OVN and have fqdn != hostname also might be affected by this change.

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_neutron (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on openstack-ansible-os_neutron (master)
Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote (last edit ):

Following patch is a really quick fix for the issue: https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/878926/

As in fact required by RDO dependency is simply missing from Rocky repos. OS maintainers are already aware and working on building a newer version, but also in creating a workflow to prevent such cases in the future. So proper solution might take a while to land.

We have actually a while series of patches that needs to be landed to fix issues related to this OVS version bump: https://review.opendev.org/q/topic:osa%252Frdo_ovs_3.1

As eventually OVS 3.1 that's being installed has a regression in ovs-vsctl behaviour, that has been discovered during investigation with the RDO team: https://bugzilla.redhat.com/show_bug.cgi?id=2182767

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-os_neutron (master)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/878911
Committed: https://opendev.org/openstack/openstack-ansible-os_neutron/commit/f1a8c358531bdf86d8aeda725bc0b1c347d325c1
Submitter: "Zuul (22348)"
Branch: master

commit f1a8c358531bdf86d8aeda725bc0b1c347d325c1
Author: Dmitriy Rabotyagov <email address hidden>
Date: Wed Mar 29 16:24:34 2023 +0200

    Workaround ovs bug that resets hostname with add command

    After RDO bumped OVS version to 3.1 from 2.17 CentOS/Rocky fails
    tempest testing due to systemd unit calling adding hostname [1]
    while ovs-vsctl add in 3.1 actually behaves exactly as `set` which
    simply resets defined hostname on each service restart. To avoid that
    we're adding `--no-record-hostname` flag that will prevent such
    behaviour.

    [1] https://github.com/openvswitch/ovs/blob/branch-3.1/utilities/ovs-ctl.in#L51

    Change-Id: I8bee1850e3a120f7b76f586909e6d74361696e32
    Related-Bug: #2013189

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-openstack_hosts (stable/zed)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_neutron (stable/zed)

Related fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/879103

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-openstack_hosts (stable/yoga)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_neutron (stable/yoga)

Related fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/879173

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-openstack_hosts (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/879100
Committed: https://opendev.org/openstack/openstack-ansible-openstack_hosts/commit/80b9efcd423f72a6c16d1d2295fc14a3b8d89325
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 80b9efcd423f72a6c16d1d2295fc14a3b8d89325
Author: Dmitriy Rabotyagov <email address hidden>
Date: Wed Mar 29 18:06:00 2023 +0200

    Pin openvswitch package on RHEL to 2.17

    OVS 3.1 was released by CentOS NFV SIG which is built against newer
    rdma-core libraries leading to uninstallable openvswitch3.1 on
    Rocky Linux due to missing libmlx5.so.1(MLX5_1.24).

    While CentOS doesn't need this specific rollback, it will be easier to
    fix gates this way.

    Closes-Bug: #2013189
    Change-Id: I388c115d368c0c0638d1dd4f9f11f4448a13a6b1
    (cherry picked from commit 181036c13bd2e9b09dabba6faea827c99a8adf6c)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-openstack_hosts (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/879171
Committed: https://opendev.org/openstack/openstack-ansible-openstack_hosts/commit/0fb35ebe7d7faf6dae1b30c66ac12b240daa9a51
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 0fb35ebe7d7faf6dae1b30c66ac12b240daa9a51
Author: Dmitriy Rabotyagov <email address hidden>
Date: Wed Mar 29 18:06:00 2023 +0200

    Pin openvswitch package on RHEL to 2.17

    OVS 3.1 was released by CentOS NFV SIG which is built against newer
    rdma-core libraries leading to uninstallable openvswitch3.1 on
    Rocky Linux due to missing libmlx5.so.1(MLX5_1.24).

    While CentOS doesn't need this specific rollback, it will be easier to
    fix gates this way.

    Closes-Bug: #2013189
    Change-Id: I388c115d368c0c0638d1dd4f9f11f4448a13a6b1
    (cherry picked from commit 181036c13bd2e9b09dabba6faea827c99a8adf6c)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-os_neutron (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/879103
Committed: https://opendev.org/openstack/openstack-ansible-os_neutron/commit/9bd23d7d39b71d27da68059457249a1f6eb3a538
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 9bd23d7d39b71d27da68059457249a1f6eb3a538
Author: Dmitriy Rabotyagov <email address hidden>
Date: Wed Mar 29 16:24:34 2023 +0200

    Workaround ovs bug that resets hostname with add command

    After RDO bumped OVS version to 3.1 from 2.17 CentOS/Rocky fails
    tempest testing due to systemd unit calling adding hostname [1]
    while ovs-vsctl add in 3.1 actually behaves exactly as `set` which
    simply resets defined hostname on each service restart. To avoid that
    we're adding `--no-record-hostname` flag that will prevent such
    behaviour.

    [1] https://github.com/openvswitch/ovs/blob/branch-3.1/utilities/ovs-ctl.in#L51

    Change-Id: I8bee1850e3a120f7b76f586909e6d74361696e32
    Related-Bug: #2013189
    (cherry picked from commit f1a8c358531bdf86d8aeda725bc0b1c347d325c1)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on openstack-ansible-os_neutron (stable/yoga)

Change abandoned by "Dmitriy Rabotyagov <email address hidden>" on branch: stable/yoga
Review: https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/879173

Revision history for this message
Mathias Gonzalez (mathgonzlez) wrote :

hi! i see you recomended to ping the ovs package to version 2.17 but in newer deployments its not possible because the only available version is 3.1-2 so its not possible to do a fresh installation now in rhel9?

thankss!

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote (last edit ):

Hi Mathias,

So they're 2 things in this issue.
1. OVS 3.1 does require a mellanox deliverable that is not present for Rocky Linux
2. There was a bug in ovs-vsctl which made "add" command to misbehave.

While 2nd issue is solved now and that is not the case anymore, issue on Rocky is still a thing.

However, Ovs 2.17 is still available in RDO repository. Also RDO maintainers said they don't remove older versions of packages from the repository, so installation of 2.17 should work and we saw CI passing quite recently.

They're more bugs in CentOS though, that make deployment fail, and mess in systemd-udev packaging where they made wrong dependencies is still hitting the distro. So CentOS 9 Stream is indeed broken due to that and must be manually fixed until CentOS maintainers release a fix: https://bugzilla.redhat.com/show_bug.cgi?id=2183279

The systemd issue does not hit Rocky Linux to have that said.

Changed in openstack-ansible:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-openstack_hosts yoga-eom

This issue was fixed in the openstack/openstack-ansible-openstack_hosts yoga-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.