Introspection timeouts for bulks of nodes on VMs

Bug #1473024 reported by Dmitry Tantsur
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic Inspector
Fix Released
High
Dmitry Tantsur

Bug Description

KVM PXE firmware seems buggy. In short, when multiple nodes are PXE-booting, some nodes are ignoring NACK's from dnsmasq and try to get occupied IP address until timeout.

We have to work around it probably. Currently we're inserting sleep 5 which is definitely a broken approach. Maybe we can reboot nodes if we don't get answer after some time.

Actually we can do a couple of things:
1. Optionally prevent nodes from going on introspection too often by using last introspection time
2. Retry introspection if we didn't get response after some time

Dmitry Tantsur (divius)
description: updated
Changed in ironic-inspector:
assignee: nobody → Dmitry Tantsur (divius)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-inspector (master)

Fix proposed to branch: master
Review: https://review.openstack.org/203040

Changed in ironic-inspector:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic-inspector (master)

Reviewed: https://review.openstack.org/203040
Committed: https://git.openstack.org/cgit/openstack/ironic-inspector/commit/?id=f15aee4a7f8531eb88a78789e3765858998c409a
Submitter: Jenkins
Branch: master

commit f15aee4a7f8531eb88a78789e3765858998c409a
Author: Dmitry Tantsur <email address hidden>
Date: Fri Jul 17 15:55:34 2015 +0200

    Insert artificial delay between sending virtual nodes on introspection

    KVM PXE code seems broken in an interesting way, when you try to PXE
    boot too many nodes. This change makes inspector sleep configurable
    amount of time between powering on nodes with *_ssh driver.

    Work around in devstack/exercise.sh is no longer needed and is dropped.

    Note that this change is not HA, so we might revisit it in the future.

    Change-Id: I9b16592f9b5130e90c02fce1b421887f451e397b
    Closes-Bug: #1473024

Changed in ironic-inspector:
status: In Progress → Fix Committed
Dmitry Tantsur (divius)
Changed in ironic-inspector:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.