Potential race between neutron port update and node pxe boot

Bug #1334447 reported by Adam Gandelman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Triaged
Medium
Unassigned
neutron
Incomplete
Undecided
Unassigned

Bug Description

Splitting off a new bug from Bug #1300589 to track progress on one specific issue.

There is a potential race condition in the pxe driver where it will update neutron ports with DHCP data for a soon-to-be-booted ironic node. This data sets, among other things, the tftp next server address. The update is asynchronous on the Neutron side. After the update request is sent, the node is immediately powered on. If using the ssh power driver with fast booting virtual machines, there is a potential race where the node attempts pxe boot before the neutron agents have processed the updates and reconfigured DHCP servers appropriately. Copying from the original bug, Robert describes the specific issues in the driver:

This is the problem:
       _create_token_file(task, node)
        _update_neutron(task, node)
        manager_utils.node_set_boot_device(task, node, 'pxe', persistent=True)
        manager_utils.node_power_action(task, node, states.REBOOT)

There's no synchronisation with neutron (neither poll nor call-back) to know that (all) the dnsmasq processes serving that port have been updated - and a VM that boots quickly may DHCP before the dnsmasq is hupped.

Tags: pxe
Dmitry Tantsur (divius)
Changed in ironic:
status: New → Triaged
importance: Undecided → Medium
tags: added: pxe
Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Temporary workaround merged @ https://review.openstack.org/#/c/91719/

Changed in neutron:
status: New → Incomplete
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This bug is > 172 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ironic (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/327589

Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

in fact, this affects not only (fast) booting virtual BMs on devstack. It could also affect booting real BMs when Neutron is backed by some slow SDN.

as there is already a config option for setting the delay
http://git.openstack.org/cgit/openstack/ironic/tree/ironic/dhcp/neutron.py#n205

operators might set this value to account for the real port update delay in their deployment.

I'd say the bug is fixed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic (master)

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/327589
Reason: Thanks anton for pointing it out

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.