_build_instance does not have a recoverable network_api.allocate_for_instance

Bug #1173417 reported by Joshua Harlow
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Wishlist
jichenjc

Bug Description

It appears that if the allocate of a instances network fails due to a race condition at the network_api layer that the instance will be set in the error state instead of trying to recover from said failure in a graceful manner (retrying could be one solution for example). Looking at the NetworkManager code it if a race happens at the code level, then an exception will be passed back which will stop the instance from continuing to build.

Michael Still (mikal)
Changed in nova:
status: New → Triaged
importance: Undecided → Wishlist
jichenjc (jichenjc)
Changed in nova:
assignee: nobody → jichenjc (jichenjc)
Revision history for this message
jichenjc (jichenjc) wrote :

commit f0cf1c0fc14ba44ae6af5aad93ccd2fe010094a5
may already fix this bug

Revision history for this message
jichenjc (jichenjc) wrote :

Allow retrying network allocations separately

    Introduce a new config option, 'network_allocate_retries', that allows
    one to retry network allocations. The default is 0 for no retries to
    match the current behavior.

    The network allocations currently get retried by a full retry of a build
    via the scheduler, if those are enabled. This patch reduces the need to
    re-schedule for simple network allocation issues.

    The retrying happens in the network alloc async greenthread, so for virt
    drivers that support the new NetworkModel, the retrying potentially
    happens in the background while the image is being downloaded, etc.

Changed in nova:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.