ironic-conductor does not retry to connect to rpc if connection failure

Bug #1564075 reported by Emilien Macchi
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
High
Galyna Zholtkevych

Bug Description

If Ironic Conductor can't connect to AMQP, it won't loop to try again later, like it's done in many other OpenStack services, and the process will fail.

See:
Ironic Conductor tries to start:
http://logs.openstack.org/65/299065/1/check/gate-puppet-ironic-puppet-beaker-rspec-devstack-centos7/ad230b6/logs/ironic/ironic-conductor.txt.gz#_2016-03-30_19_55_17_110

But RabbitMQ resources for Ironic are created after:
http://logs.openstack.org/65/299065/1/check/gate-puppet-ironic-puppet-beaker-rspec-devstack-centos7/ad230b6/console.html#_2016-03-30_19_55_22_059

Ironic Conductor fails to start:
http://logs.openstack.org/65/299065/1/check/gate-puppet-ironic-puppet-beaker-rspec-devstack-centos7/ad230b6/logs/ironic/ironic-conductor.txt.gz#_2016-03-30_19_55_20_301

And never try again. That's a bug because other OpenStack services use to loop again.

Haomeng,Wang (whaom)
Changed in ironic:
assignee: nobody → Haomeng,Wang (whaom)
Revision history for this message
Haomeng,Wang (whaom) wrote :

I found the root cause is "AccessRefused: (0, 0): (403) ACCESS_REFUSED", do you make sure other openstack service will try again for such access refused case, can you share the logs if you have.

And I checked nova rpc starting code[1], did not find any loop logic for such rpc start fail case, so can you provide more details? Just want to see what behavior for other services.

[1] https://github.com/openstack/nova/blob/master/nova/service.py#L231

Changed in ironic:
status: New → Incomplete
Revision history for this message
Emilien Macchi (emilienm) wrote :
Dmitry Tantsur (divius)
Changed in ironic:
status: Incomplete → Confirmed
importance: Undecided → High
Haomeng,Wang (whaom)
Changed in ironic:
assignee: Haomeng,Wang (whaom) → nobody
Vadim Hmyrov (vhmyrov)
Changed in ironic:
assignee: nobody → Vadim Hmyrov (vhmyrov)
Revision history for this message
Ruby Loo (rloo) wrote :

Hi Vadim, are you still working on this?

Revision history for this message
Vadim Hmyrov (vhmyrov) wrote :

Hi Ruby, thanks for your reminder. No, I am not. I have unassigned it.

Changed in ironic:
assignee: Vadim Hmyrov (vhmyrov) → nobody
Changed in ironic:
assignee: nobody → Galyna Zholtkevych (gzholtkevych)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/376462

Changed in ironic:
status: Confirmed → In Progress
Revision history for this message
Galyna Zholtkevych (gzholtkevych) wrote :

Seems the conductor does loop when connecting to AMQP.
Blocked appropriate port, restarted ironic conductor and got the output http://paste.openstack.org/show/598109/
It looks like it retries forever and retry interval is increased taking into account rabbit configurable options in oslo mesaging

For RPCClient it retries forever (bu default retry=None which means retry forever)

Please, provide new logs if you encounter specific problem again.

Changed in ironic:
status: In Progress → Incomplete
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic (master)

Change abandoned by Galyna Zholtkevych (<email address hidden>) on branch: master
Review: https://review.openstack.org/376462
Reason: Appropriate bug seems incomplete, reported in the bug description

Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

As far as I'm aware, this was resolved in oslo.messaging.

Changed in ironic:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.