Timeout reached while waiting for callback for node

Bug #1421835 reported by Ben Nemec
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Ihar Hrachyshka
tripleo
Fix Released
Critical
Ben Nemec

Bug Description

All of our overcloud jobs are failing this way now. Appears to be the same symptoms as https://bugs.launchpad.net/bugs/1417026 but that bug claims to be fixed. Looks like a bunch more stuff just merged to ironic, so chances are one of those is the culprit. Will try some tempreverts to figure out which.

Revision history for this message
Ben Nemec (bnemec) wrote :

I tried the reverts locally, but it doesn't appear to be any of the changes I was guessing.

What's happening for me locally is that I get a file not found when the instance tries to pxe boot the deploy ramdisk (I think it's the deploy ramdisk anyway). I haven't been able to figure out why though. The files appear to be in the proper locations on the tftp server and I can download them from the devtest host just fine.

Revision history for this message
Ben Nemec (bnemec) wrote :

Still not sure what's going on, but the boot error I get seems to be this: http://ipxe.org/err/2d03e1

Which would suggest that the DHCP server is no longer providing the necessary PXE booting information. I can confirm that the tcpdump I did on br-ctlplane in the seed had all the DHCP traffic without boot server entries.

Will continue on this tomorrow if it isn't solved by then.

Revision history for this message
Derek Higgins (derekh) wrote :

Looks like 2 problems have snuck in, reverting this neutron commit allows the overcloud servers to boot but we then hit a problem completing the heat stack
    https://review.openstack.org/#/c/141044/

Revision history for this message
Derek Higgins (derekh) wrote :

I've done a little debuging into both problems, with the patch https://review.openstack.org/#/c/141044/8 applied the dhcp response back to the node is missing these two options
    Option: (67) Bootfile name
        Length: 11
        Bootfile name: pxelinux.0
    Option: (66) TFTP Server Name
        Length: 10
        TFTP Server Name: 192.0.2.1

I've opened another bug for heat
https://bugs.launchpad.net/tripleo/+bug/1423126

Changed in neutron:
assignee: nobody → Derek Higgins (derekh)
status: New → In Progress
Revision history for this message
Derek Higgins (derekh) wrote :
Derek Higgins (derekh)
Changed in neutron:
assignee: Derek Higgins (derekh) → nobody
Ben Nemec (bnemec)
Changed in tripleo:
assignee: nobody → Ben Nemec (bnemec)
Changed in neutron:
assignee: nobody → Dariusz Smigiel (smigiel-dariusz)
Changed in neutron:
assignee: Dariusz Smigiel (smigiel-dariusz) → nobody
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

The regression breaks Ironic, raising severity to High.

Changed in neutron:
milestone: none → kilo-3
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/162260

Changed in neutron:
assignee: nobody → Ihar Hrachyshka (ihar-hrachyshka)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/156853
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0dec0aca4e4d12c38ee034c16a84433814bcbc74
Submitter: Jenkins
Branch: master

commit 0dec0aca4e4d12c38ee034c16a84433814bcbc74
Author: Derek Higgins <email address hidden>
Date: Wed Feb 18 11:46:55 2015 +0000

    Revert "Add the rebinding chance in _bind_port_if_needed"

    This reverts commit 67e45d324af39a66fbd4ad175e410407b2720e68.

    This commit caused a regression in tripleo-ci where some dhcp
    options ended up missing(tftp options).

    Closes-Bug: #1421835
    Change-Id: Ibe68eceb2f5a36cf40cc1c378c1a59a35bfcbf7f

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
James Slagle (james-slagle) wrote :

need to remove the cherrypick fix in CI: https://review.openstack.org/#/c/162212/

Changed in tripleo:
status: Triaged → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Jay Dobies (jdob)
Changed in tripleo:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.