After oslo.messaging release Ironic gate is broken

Bug #1461182 reported by John L. Villalovos
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Designate
Invalid
Critical
Unassigned
Ironic
Invalid
Critical
Unassigned
oslo.messaging
Fix Released
Undecided
Doug Hellmann
tripleo
Invalid
Critical
Unassigned

Bug Description

Example failures:
https://review.openstack.org/#/c/186208/

We are seeing all the 'tempest' jobs fail on multiple patches
https://review.openstack.org/#/q/project:openstack/ironic+status:open,n,z

Seeing lots of "NoSuchOptError: no such option: rpc_response_timeout" and an oslo.messaging release today

From dhellmann: it looks like that option is being registered, so I think this is an initialization sequencing issue or something; still looking

Opening this bug to track the issue.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

Testing ironic master without oslo.messaging 1.12.0 - https://review.openstack.org/#/c/187699/1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.openstack.org/187722

Changed in oslo.messaging:
assignee: nobody → Doug Hellmann (doug-hellmann)
status: New → In Progress
Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

My hypothesis is that ironic-conductor is seeing this issue because it is a server, but does also make outgoing calls and so it never creates a Client instance, and therefore the option isn't being registered. Since the option is needed by both the client and the server, it should be registered by both in oslo.messaging, so that's what the patch in comment #2 does. There's also a patch to ironic to instantiate a Client just to test the theory (https://review.openstack.org/#/c/187713), but we wouldn't want to land that patch.

Ben Nemec (bnemec)
Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Graham Hayes (grahamhayes) wrote :
Changed in designate:
importance: Undecided → Critical
status: New → Confirmed
status: Confirmed → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/187722
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=887e5a042359ed2b424d6e079f13d460213eeff6
Submitter: Jenkins
Branch: master

commit 887e5a042359ed2b424d6e079f13d460213eeff6
Author: Doug Hellmann <email address hidden>
Date: Tue Jun 2 18:48:33 2015 +0000

    Ensure rpc_response_timeout is registered before using it

    The response code in the rabbit driver doesn't use a Client object, so
    the option is not being registered in servers that don't instantiate
    Client instances (ironic-conductor, for example).

    Change-Id: I7def5e6d4960938a17344db024585a0492d6969d
    Partial-bug: #1461182

Changed in ironic:
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

This problem only appeared in oslo.messaging, and we have released a hacky fix. I'm leaving the bug open because I intend to produce a non-hacky fix as well.

Changed in designate:
status: Triaged → Invalid
Changed in ironic:
status: Triaged → Invalid
Changed in tripleo:
status: Triaged → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.openstack.org/188163

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/188163
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=a0b33a46598499c47ac90eba26f27bff4eab991c
Submitter: Jenkins
Branch: master

commit a0b33a46598499c47ac90eba26f27bff4eab991c
Author: Doug Hellmann <email address hidden>
Date: Wed Jun 3 19:53:27 2015 +0000

    replace rpc_response_timeout use in rabbit driver

    The rabbit driver was using the rpc_response_timeout configuration
    option as a reconnect timeout value, even though the option was defined
    only in oslo_messaging.client and was not always being registered before
    being accessed. An earlier patch fixed this by registering the option
    here in the driver, too, but that breaks several levels of
    abstraction. This changes the driver to define a new option with the
    same default value, so that the driver is only using options it defines
    itself. It also removes the old hacky fix.

    Closes-bug: #1461182

    Change-Id: Ia96c815d157219e12a10d94b87b0156503369a6b

Changed in oslo.messaging:
status: In Progress → Fix Committed
Changed in oslo.messaging:
milestone: none → 1.8.3
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.