kolla-ansible unconditionally configures ironic-inspector with rabbitmq transport

Bug #2054705 reported by Sven Kieske
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla-ansible
In Progress
Medium
Sven Kieske
Antelope
In Progress
Medium
Unassigned
Bobcat
In Progress
Medium
Unassigned
Caracal
In Progress
Medium
Sven Kieske
Yoga
In Progress
Medium
Unassigned
Zed
In Progress
Medium
Unassigned

Bug Description

Hi,

a user reported issues in #openstack-ironic deploying kolla-ansible with ironic-inspector in HA fashion on three controllers.

in the discussion it was brought to light, that we always configure a rabbitmq transport, even if ironic-inspector is not deployed HA, which is not necessary (instead we should use "transport_url=fake://").

back in the day the rabbitmq support was conditional on the fact if TLS was enabled, but this was changed in: https://review.opendev.org/c/openstack/kolla-ansible/+/868305/5/ansible/roles/ironic/templates/ironic-inspector.conf.j2

I thus suggest that we therefore implement a precheck if ironic-inspector is deployed HA, and if it is not, we set the transport url accordingly.

relevant ironic specs (unfortunately there seem to be no more detailed docs about this):

https://specs.openstack.org/openstack/ironic-inspector-specs/specs/splitting-service-on-API-and-worker.html

https://specs.openstack.org/openstack/ironic-specs/specs/approved/merge-inspector.html (this is for future work)

I was also told that running ironic-inspector in HA mode is still considered experimental. I have no personal experience myself if there are any known bugs or if we should disallow HA deployments for this component (I guess we can't do the latter, because it was already possible in the past to do so).

Notice, this is only about ironic-inspector, not about ironic itself or other sub components of ironic.

Going forward, if we reach consensus, I would go on implementing the above mentioned solution.

Thanks.

Sven Kieske (s-kieske)
description: updated
Sven Kieske (s-kieske)
Changed in kolla-ansible:
assignee: nobody → Sven Kieske (s-kieske)
Revision history for this message
Sven Kieske (s-kieske) wrote :

So it seems this is not really stable in an HA environment because of issues with the rabbitmq transport, see these error logs from the user:

https://paste.opendev.org/show/bscQiIEnqqfwzW2Vv3yx/

https://paste.opendev.org/show/bTfJ1XRpmkqDw2JisoZr/

this was on stable/2023.2

relevant error snippet:

2024-02-22 12:57:03.472 7 ERROR ironic_inspector.main oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 4abead957e784d18b3a98e38b6d83e48

ironics (dtantsur) suggestion is, to always use the "fake://" transport and start only one copy of inspector until the upstream HA work is done and merged (see the second spec)

Revision history for this message
Sven Kieske (s-kieske) wrote :

the weird thing is though, that "rabbit://" is the default upstream transport URL, according to the docs:
https://docs.openstack.org/ironic-inspector/latest/configuration/ironic-inspector.html#DEFAULT.transport_url and has been since the "stein" release:

https://docs.openstack.org/ironic-inspector/stein/configuration/ironic-inspector.html#DEFAULT.transport_url

I asked the ironic team for clarification in this regard.

Revision history for this message
Sven Kieske (s-kieske) wrote :

So it was seemingly planned in the past to make use of rabbitmq but this never really worked out, to quote from IRC:

"we had different plans back in the days. They never come to reality, instead we have the plan to freeze inspector as it is and migrate its functionality gradually into Ironic."

So I guess we really should ignore ironic-inspectors default in this case.

Revision history for this message
Sven Kieske (s-kieske) wrote :

I'm not sure setting this to "fake://" is really best practice, see:

https://docs.openstack.org/oslo.messaging/latest/user/FAQ.html#i-don-t-need-notifications-on-the-message-bus-how-do-i-disable-them

"Notification messages can be disabled using the noop notify driver. Set driver = noop in your configuration file under the [oslo_messaging_notifications] section."

But that's only for notifications.

Then again https://docs.openstack.org/oslo.messaging/latest/admin/drivers.html states:

"
fake

Fake driver used for testing.

This driver passes messages in memory, and should only be used for unit tests."

So this really doesn't sound like you are meant to deploy the "fake://" driver into production.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/914107

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/914107
Committed: https://opendev.org/openstack/kolla-ansible/commit/88aa51ac36923b3522cb314be73c9b16859f0446
Submitter: "Zuul (22348)"
Branch: master

commit 88aa51ac36923b3522cb314be73c9b16859f0446
Author: Michal Nasiadka <email address hidden>
Date: Mon Mar 25 16:49:40 2024 +0100

    ironic: disable heartbeat_in_pthreads

    inspector is not running as a WSGI

    Related-Bug: #2054705
    Change-Id: I20dbaef29b2ef2d6ceffc21c156c6fa4b5e8d205

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/2023.2)

Related fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/915532

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/2023.1)

Related fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/915533

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/zed)

Related fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/915534

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (unmaintained/yoga)

Related fix proposed to branch: unmaintained/yoga
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/915535

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/915532
Committed: https://opendev.org/openstack/kolla-ansible/commit/3e63d4a29bea91951c502c05982a125016c27eef
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 3e63d4a29bea91951c502c05982a125016c27eef
Author: Michal Nasiadka <email address hidden>
Date: Mon Mar 25 16:49:40 2024 +0100

    ironic: disable heartbeat_in_pthreads

    inspector is not running as a WSGI

    Related-Bug: #2054705
    Change-Id: I20dbaef29b2ef2d6ceffc21c156c6fa4b5e8d205
    (cherry picked from commit 88aa51ac36923b3522cb314be73c9b16859f0446)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/915533
Committed: https://opendev.org/openstack/kolla-ansible/commit/f8d383128781235329b8a08ffd81beec7c596e0e
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit f8d383128781235329b8a08ffd81beec7c596e0e
Author: Michal Nasiadka <email address hidden>
Date: Mon Mar 25 16:49:40 2024 +0100

    ironic: disable heartbeat_in_pthreads

    inspector is not running as a WSGI

    Related-Bug: #2054705
    Change-Id: I20dbaef29b2ef2d6ceffc21c156c6fa4b5e8d205
    (cherry picked from commit 88aa51ac36923b3522cb314be73c9b16859f0446)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/915534
Committed: https://opendev.org/openstack/kolla-ansible/commit/4fcd3fb39dabfdf58153ca4985a84f4c4215acc0
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 4fcd3fb39dabfdf58153ca4985a84f4c4215acc0
Author: Michal Nasiadka <email address hidden>
Date: Mon Mar 25 16:49:40 2024 +0100

    ironic: disable heartbeat_in_pthreads

    inspector is not running as a WSGI

    Related-Bug: #2054705
    Change-Id: I20dbaef29b2ef2d6ceffc21c156c6fa4b5e8d205
    (cherry picked from commit 88aa51ac36923b3522cb314be73c9b16859f0446)

tags: added: in-unmaintained-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (unmaintained/yoga)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/915535
Committed: https://opendev.org/openstack/kolla-ansible/commit/2177b9be0bc3f1a5c3521d61dae028d258d73415
Submitter: "Zuul (22348)"
Branch: unmaintained/yoga

commit 2177b9be0bc3f1a5c3521d61dae028d258d73415
Author: Michal Nasiadka <email address hidden>
Date: Mon Mar 25 16:49:40 2024 +0100

    ironic: disable heartbeat_in_pthreads

    inspector is not running as a WSGI

    Related-Bug: #2054705
    Change-Id: I20dbaef29b2ef2d6ceffc21c156c6fa4b5e8d205
    (cherry picked from commit 88aa51ac36923b3522cb314be73c9b16859f0446)

Revision history for this message
Will Szumski (willjs) wrote :

For what it is worth, I'm seeing issues with CLI commands hanging with a single instance of ironic inspector. This also seemed to be resolved when using: "transport_url=fake://". I've not tested I20dbaef29b2ef2d6ceffc21c156c6fa4b5e8d205 yet.

Revision history for this message
Sven Kieske (s-kieske) wrote :

notice that I20dbaef29b2ef2d6ceffc21c156c6fa4b5e8d205 does _not_ implement the transport_url change, it only fixes a related issue, not the main issue this bugreport is about.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.