Incorrect notification to nova about ironic baremetall port (for nodes in 'cleaning' state)

Bug #1656010 reported by George Shuklin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Undecided
Sam Betts
neutron
Fix Released
Low
Sam Betts
ironic (Ubuntu)
Fix Released
High
Unassigned
neutron (Ubuntu)
Fix Released
Low
Unassigned

Bug Description

version: newton (2:9.0.0-0ubuntu1~cloud0)

When neutron trying to bind port for Ironic baremetall node, it sending wrong notification to nova about port been ready. neutron send it with 'device_id' == ironic-node-id, and nova rejects it as 'not found' (there is no nova instance with such id).

Log:
neutron.db.provisioning_blocks[22265]: DEBUG Provisioning for port db3766ad-f82b-437d-b8b2-4133a92b1b86 completed by entity DHCP. [req-49434e88-4952-4e9d-a1c4-41dbf6c0091a - - - - -] provisioning_complete /usr/lib/python2.7/dist-packages/neutron/db/provisioning_blocks.py:147
neutron.db.provisioning_blocks[22265]: DEBUG Provisioning complete for port db3766ad-f82b-437d-b8b2-4133a92b1b86 [req-49434e88-4952-4e9d-a1c4-41dbf6c0091a - - - - -] provisioning_complete /usr/lib/python2.7/dist-packages/neutron/db/provisioning_blocks.py:153
neutron.callbacks.manager[22265]: DEBUG Notify callbacks [('neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned--9223372036854150578', <bound method Ml2Plugin._port_provisioned of <neutron.plugins.ml2.plugin.Ml2Plugin object at 0x7fc005834550>>)] for port, provisioning_complete [req-49434e88-4952-4e9d-a1c4-41dbf6c0091a - - - - -] _notify_loop /usr/lib/python2.7/dist-packages/neutron/callbacks/manager.py:142
neutron.plugins.ml2.plugin[22265]: DEBUG Port db3766ad-f82b-437d-b8b2-4133a92b1b86 cannot update to ACTIVE because it is not bound. [req-49434e88-4952-4e9d-a1c4-41dbf6c0091a - - - - -] _port_provisioned /usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py:224
oslo_messaging._drivers.amqpdriver[22265]: DEBUG sending reply msg_id: 254703530cd3440584c980d72ed93011 reply queue: reply_8b6e70ad5191401a9512147c4e94ca71 time elapsed: 0.0452275519492s [req-49434e88-4952-4e9d-a1c4-41dbf6c0091a - - - - -] _send_reply /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:73
neutron.notifiers.nova[22263]: DEBUG Sending events: [{'name': 'network-changed', 'server_uuid': u'd02c7361-5e3a-4fdf-89b5-f29b3901f0fc'}] send_events /usr/lib/python2.7/dist-packages/neutron/notifiers/nova.py:257
novaclient.v2.client[22263]: DEBUG REQ: curl -g -i --insecure -X POST http://nova-api.p.ironic-dal-1.servers.com:28774/v2/93c697ef6c2649eb9966900a8d6a73d8/os-server-external-events -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}592539c9fcd820d7e369ea58454ee17fe7084d5e" -d '{"events": [{"name": "network-changed", "server_uuid": "d02c7361-5e3a-4fdf-89b5-f29b3901f0fc"}]}' _http_log_request /usr/lib/python2.7/dist-packages/keystoneauth1/session.py:337
novaclient.v2.client[22263]: DEBUG RESP: [404] Content-Type: application/json; charset=UTF-8 Content-Length: 78 X-Compute-Request-Id: req-a029af9e-e460-476f-9993-4551f3b210d6 Date: Thu, 12 Jan 2017 15:43:37 GMT Connection: keep-alive
RESP BODY: {"itemNotFound": {"message": "No instances found for any event", "code": 404}}
 _http_log_response /usr/lib/python2.7/dist-packages/keystoneauth1/session.py:366
novaclient.v2.client[22263]: DEBUG POST call to compute for http://nova-api.p.ironic-dal-1.servers.com:28774/v2/93c697ef6c2649eb9966900a8d6a73d8/os-server-external-events used request id req-a029af9e-e460-476f-9993-4551f3b210d6 _log_request_id /usr/lib/python2.7/dist-packages/novaclient/client.py:85
neutron.notifiers.nova[22263]: DEBUG Nova returned NotFound for event: [{'name': 'network-changed', 'server_uuid': u'd02c7361-5e3a-4fdf-89b5-f29b3901f0fc'}] send_events /usr/lib/python2.7/dist-packages/neutron/notifiers/nova.py:263
oslo_messaging._drivers.amqpdriver[22265]: DEBUG received message msg_id: 0bf04ac8fedd4234bd6cd6c04547beca reply to reply_8b6e70ad5191401a9512147c4e94ca71 __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:194
neutron.db.provisioning_blocks[22265]: DEBUG Provisioning complete for port db3766ad-f82b-437d-b8b2-4133a92b1b86 [req-47c505d7-4eb5-4c71-9656-9e0927408822 - - - - -] provisioning_complete /usr/lib/python2.7/dist-packages/neutron/db/provisioning_blocks.py:153

Port info:
+---------------------+---------------------------------------------------------------------------------------+
| Field | Value |
+---------------------+---------------------------------------------------------------------------------------+
| admin_state_up | True |
| binding:host_id | d02c7361-5e3a-4fdf-89b5-f29b3901f0fc |
| binding:profile | {"local_link_information": [{"switch_info": "c426s1", "port_id": "1/1/21", |
| | "switch_id": "60:96:9f:69:b4:b4"}]} |
| binding:vif_details | {} |
| binding:vif_type | binding_failed |
| binding:vnic_type | baremetal |
| created_at | 2017-01-12T15:23:36Z |
| description | |
| device_id | d02c7361-5e3a-4fdf-89b5-f29b3901f0fc |
| device_owner | baremetal:none |
| extra_dhcp_opts | {"opt_value": "204.74.228.4", "ip_version": 4, "opt_name": "tftp-server"} |
| | {"opt_value": "204.74.228.4", "ip_version": 4, "opt_name": "server-ip-address"} |
| | {"opt_value": "pxelinux.0", "ip_version": 4, "opt_name": "bootfile-name"} |
| fixed_ips | {"subnet_id": "5402755a-0d8b-447d-9753-f3ba1ec39c22", "ip_address": "hidden"} |
| id | bc46cbdf-a82e-409d-9332-9eeb81aa0a94 |
| mac_address | 18:66:ee:aa:dd:cc |
| name | |
| network_id | 4b352ae7-141b-4c3f-a132-f5c006dc056c |
| project_id | 7d450ecf00d64399aeb93bc122cb6dae |
| revision_number | 8 |
| status | DOWN |
| tenant_id | 7d450ecf00d64399aeb93bc122cb6dae |
| updated_at | 2017-01-12T15:23:37Z |
+---------------------+---------------------------------------------------------------------------------

ironic node:

ironic node-list
/usr/lib/python2.7/dist-packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for baremetal.ironic-dal-1.mgm.servers.com has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
+--------------------------------------+-------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+-------+---------------+-------------+--------------------+-------------+
| d02c7361-5e3a-4fdf-89b5-f29b3901f0fc | s8002 | None | power on | clean wait | True |
+--------------------------------------+-------+---------------+-------------+--------------------+-------------+

Revision history for this message
Boden R (boden) wrote :

Still looking into this...

Based on the code I see, the port.device_id is used as the server_uuid in the notification which corresponds to the logs provided above.

It's not yet clear to me why/how this should be different for this ironic case.

Revision history for this message
George Shuklin (george-shuklin) wrote :

Problem is that 'ironic node uuid' is not the same as 'nova instance uuid'. When instance (created by ironic provision) boots, this is proper (?) behavior. But when ironic cleans node, there is no instance in nova.

(Actually, in my stand, at that moment there is no nova instances at all, only ironic nodes - and still I can see those notifications).

Revision history for this message
Boden R (boden) wrote :

Thanks for the comment.

This area of the code is new to me, so I'm waiting for some neutron folks to come on-line that can help me triage this.

Based on what I do know, I'm a little confused. How is neutron supposed to change the port.device_id in the clean node case so that it no longer has the ironic node id as the port.device_id? Is there a event/call from nova or other that's supposed to update it?

Revision history for this message
James Anziano (janzian) wrote :

I asked a colleague who works on Ironic about this, and I'm left wondering the same as Boden. My coworker did mention however that "the instance is deleted immediately when Ironic is set to clean." If I'm understanding him right, it's possible that Nova takes a bit to actually delete the instance, and so when this call goes out that device_id still exists, but that is not the case for Ironic. That at least would explain why this happens in Ironic and not Nova.

Revision history for this message
Sam Betts (sambetts) wrote :

When a Nova instance is deleted, it is fully removed from nova when the node enters the cleaning state.From that point onwards the node has no ties to a nova instance.

As part of cleaning Ironic needs to create a neutron port to attach the node to the cleaning network, for this Ironic talks directly to Neutron and issues a port-create.

In neutron there is something called the nova notifier, this sends a notification to nova on any neutron port create to inform nova that a new port has been created and to take any actions if it needs to, for example binding it to an instance.

When the port is created by Ironic, the owner is baremetal:none, and the device is set to the ironic node ID, has this is correct for who owns the node, nova AKA "compute:" isn't the owner for these ports.

It appears that a "fix" went into neutron https://github.com/openstack/neutron/commit/8b69189fdd87592a2a98be1a8bdfa20e76744cb1 which now makes the nova notifier send notifications for ports owned by Ironic unrelated to nova.

Revision history for this message
Sam Betts (sambetts) wrote :

This "fix" was done to support this bug:

https://bugs.launchpad.net/neutron/+bug/1606229

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/424247

Changed in neutron:
assignee: nobody → Sam Betts (sambetts)
status: New → In Progress
Changed in ironic:
assignee: nobody → Sam Betts (sambetts)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/424248

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/424248
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=cbdf5076d37df61d2e9c46a0a73c7ad65652b866
Submitter: Jenkins
Branch: master

commit cbdf5076d37df61d2e9c46a0a73c7ad65652b866
Author: Sam Betts <email address hidden>
Date: Mon Jan 23 17:08:35 2017 +0000

    Don't override device_owner for tenant network ports

    When a vif is passed to us from nova as a tenant port we shouldn't
    change the device_owner or device_id because that is what links the port
    to the nova instance. This enables the neutron nova notifier to trigger
    the correct events in nova for when the neutron port changes, e.g. being
    deleted, triggers the detach interface endpoint.

    Change-Id: I43c3af9f424a65211ef5a39f13e4810072997339
    Closes-Bug: #1656010

Changed in ironic:
status: In Progress → Fix Released
Changed in ironic (Ubuntu):
status: New → Triaged
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ironic 7.0.0

This issue was fixed in the openstack/ironic 7.0.0 release.

tags: added: mitaka-backport-potential
Changed in neutron:
status: In Progress → Incomplete
assignee: Sam Betts (sambetts) → nobody
status: Incomplete → In Progress
importance: Undecided → Low
assignee: nobody → Sam Betts (sambetts)
milestone: none → pike-1
Chuck Short (zulcss)
Changed in ironic (Ubuntu):
status: Triaged → Fix Released
Changed in neutron (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/424247
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cb6eae20fae1f5e46cb2af4f0b0dc5368074580b
Submitter: Jenkins
Branch: master

commit cb6eae20fae1f5e46cb2af4f0b0dc5368074580b
Author: Sam Betts <email address hidden>
Date: Mon Jan 23 17:02:01 2017 +0000

    Remove baremetal notification from nova notifier

    This is a revert of change I3d53bff8278dabafd929ecbea0b4b3b441c9e1cf

    The nova notifier was updated to notify nova on ports with the
    baremetal: device_owner, these ports are owned by Ironic not Nova, so
    nova is getting notifications that it doesn't understand.

    Change-Id: I8318a682163f6a5b739be68ce56973c43d0e32f2
    Closes-Bug: #1656010
    Depends-On: I43c3af9f424a65211ef5a39f13e4810072997339

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.0.0b1

This issue was fixed in the openstack/neutron 11.0.0.0b1 development milestone.

Revision history for this message
James Page (james-page) wrote :

Marking Ubuntu neutron task Fix Released (as pike shipped with Artful and in the Pike Ubuntu Cloud Archive)

Changed in neutron (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.