Ironic trying to contact IPA agent using its link local address during IPv6 deployment

Bug #1732692 reported by Derek Higgins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ironic-python-agent
Fix Released
High
Derek Higgins

Bug Description

Description of problem:
During a deployment over IPv6, the IPA agent send ironic a "callback_url", this url is then used by ironic to communicate with the agent. In the latest version the agent appears to be picking up the IPv6 link local address instead of the routeable IPv6 address.

This results in ironic failing to contact the IPA agent
2017-11-16 02:58:08.463 22632 ERROR ironic.drivers.modules.agent_base_vendor IronicException: Error invoking agent command iscsi.start_iscsi_target for node e8b17346-e8b4-4fa2-920e-d50dda81282e. Error: HTTPConnectionPool(host='fe80::f816:3eff:fe4e:27f7', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x12599110>: Failed to establish a new connection: [Errno 22] EINVAL',))

Version-Release number of selected component (if applicable):
python-ironic-python-agent-2.2.2-0.20171027214812.bd8c6c7.el7ost.noarch
python-ironic-lib-2.10.0-1.el7ost.noarch
openstack-ironic-python-agent-2.2.2-0.20171027214812.bd8c6c7.el7ost.noarch

How reproducible:
every time

Additional info:

tcpdump shows the wrong IP being sent to ironic

POST /v1/heartbeat/e0c67420-4f66-4571-9db3-1190e0370a7b HTTP/1.1
Host: [fd00:1101::0001]:6385
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: application/json
User-Agent: python-requests/2.11.1
X-OpenStack-Ironic-API-Version: 1.22
Content-Type: application/json
Content-Length: 59

{"callback_url": "http://[fe80::f816:3eff:fe4e:27f7]:9999"}

I suspect something has changed with the timing, the IPA agent used to select the slaac IP address to use, perhapes its now running earlier in the process before the slaac address is assigned.

Running some commands on a running IPA agent shows how this can happen if the fd00:1101.... hasn't been assigned to eth0

$ ip -o addr
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
1: lo inet6 ::1/128 scope host \ valid_lft forever preferred_lft forever
2: eth0 inet6 fd00:1101::f816:3eff:fe4e:27f7/64 scope global mngtmpaddr dynamic \ valid_lft 86379sec preferred_lft 14379sec
2: eth0 inet6 fe80::f816:3eff:fe4e:27f7/64 scope link \ valid_lft forever preferred_lft forever

$ ip route get fd00:1101::1
fd00:1101::1 dev eth0 proto kernel src fd00:1101::f816:3eff:fe4e:27f7 metric 256

# Removing the fd00:1101:.. address shows the link-local address being returned (this is the command thats run by API to find the ip address to use)
# see ironic_python_agent/agent.py _get_route_source()

$ ip addr del fd00:1101::f816:3eff:fe4e:27f7/64 dev eth0
$ ip route get fd00:1101::1
fd00:1101::1 dev eth0 proto kernel src fe80::f816:3eff:fe4e:27f7 metric 256

As a test I've inserted a 5 second sleep into the IPA agent code, and deployment now works, so I think that the agent is running this command before the interface gets its slaac address, in previous versions it must have been after.

Revision history for this message
Dmitry Tantsur (divius) wrote :

This real fix seem to belong to IPA.

affects: ironic → ironic-python-agent
Changed in ironic-python-agent:
status: New → Triaged
status: Triaged → In Progress
importance: Undecided → High
assignee: nobody → Derek Higgins (derekh)
Revision history for this message
Dmitry Tantsur (divius) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic-python-agent (master)

Reviewed: https://review.openstack.org/520582
Committed: https://git.openstack.org/cgit/openstack/ironic-python-agent/commit/?id=214790d17e35edd48e1b18f3ceabba91859e3247
Submitter: Zuul
Branch: master

commit 214790d17e35edd48e1b18f3ceabba91859e3247
Author: Derek Higgins <email address hidden>
Date: Thu Nov 16 12:38:21 2017 +0000

    Ignore IPv6 link local addresses

    Prevent IPA from picking up the IPv6 link-local address
    as a callback_url in cases where it gets tried before other
    addressing methods havn't complete yet. In this scenario IPA
    sleeps for 10 seconds and then retries giving the nic a chance to
    configure its routable IP address.

    Change-Id: Ic53334c630180f0d77bb0231e548d2c44bfe55ca
    Closes-Bug: #1732692

Changed in ironic-python-agent:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-python-agent (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/521949

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic-python-agent (stable/pike)

Reviewed: https://review.openstack.org/521949
Committed: https://git.openstack.org/cgit/openstack/ironic-python-agent/commit/?id=07050db1d1b27f0d7c7e662324feceac1420f8d9
Submitter: Zuul
Branch: stable/pike

commit 07050db1d1b27f0d7c7e662324feceac1420f8d9
Author: Derek Higgins <email address hidden>
Date: Thu Nov 16 12:38:21 2017 +0000

    Ignore IPv6 link local addresses

    Prevent IPA from picking up the IPv6 link-local address
    as a callback_url in cases where it gets tried before other
    addressing methods havn't complete yet. In this scenario IPA
    sleeps for 10 seconds and then retries giving the nic a chance to
    configure its routable IP address.

    Change-Id: Ic53334c630180f0d77bb0231e548d2c44bfe55ca
    Closes-Bug: #1732692
    (cherry picked from commit 214790d17e35edd48e1b18f3ceabba91859e3247)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ironic-python-agent 2.2.3

This issue was fixed in the openstack/ironic-python-agent 2.2.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ironic-python-agent 3.1.0

This issue was fixed in the openstack/ironic-python-agent 3.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.