Instance boot fails if image cache is full

Bug #1439012 reported by Chris Buccella
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Unassigned

Bug Description

If a compute node's image cache is full, booting an instance of an instance not in the cache will fail, with a "no valid host" error, since the image of the new instance can't be copied into the cache. This occurs even if there are no instances running on the compute node. The only recourse for the user is to either wait for the periodic cleanup of the cache or have an admin manually clean up the cache.

The fix would be to remove unused images from the cache older than remove_unused_original_minimum_age_seconds immediately if a new image needs to be copied to the cache.

Eli Qiao (taget-9)
Changed in nova:
assignee: nobody → Eli Qiao (taget-9)
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Chris,

Agree with fix, but still curious about what exactly is the traceback you see in api/compute/cond logs just before this "no valid host" error.

Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Michael Still (mikal) wrote :

So, I don't think this cleanup can be prompted by the boot request -- the boot request never made it back to the hypervisor, as the hypervisor has reported that its out of disk and been filtered out of the list of valid hosts.

Instead, I think we're saying that nova-compute could tweak the frequency of the cleanup periodic task as the local disk starts to fill up. That would have strange side effects on hypervisor nodes running with shared storage though (as the shared storage fills up, all of the hypervisors using that storage start running the cleanup loop more frequently, even if they have nothing to clean up). This is also a change of "user interface" in that the periodic task frequency is set in a integer flag value at the moment.

Eli Qiao (taget-9)
Changed in nova:
assignee: Eli Qiao (taget-9) → nobody
Changed in nova:
assignee: nobody → Zhenzan Zhou (zhenzan-zhou)
tags: added: image-cache
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

This bug report has an assignee for over 1 month but there is no patch
for that. It looks like that the chance of getting a patch is low.
I'm going to remove the assignee to signal to others that they can take
over if they like.

If you want to work on this, please:
* add yourself as assignee AND
* set the status to "In Progress" AND
* provide a (WIP) patch within the next 2 weeks after that.

If you need assistance, reach out on the IRC channel #openstack-nova or
use the mailing list.

Changed in nova:
assignee: Zhenzan Zhou (zhenzan-zhou) → nobody
Changed in nova:
assignee: nobody → karthik (karthik-kalakodimi)
status: Confirmed → In Progress
Revision history for this message
karthik (karthik-kalakodimi) wrote :

Chris,

Can you please provide me the exact reproduction step and also I need info about how many VM's you created,what are the images used and flavours selected for each vm.

Revision history for this message
Chris Buccella (chris-buccella) wrote :

Karthik,

Preconditions:

- Image cache size = x
- ImageA size = y
- ImageB size = z
- x < y+z

1) Disable all but 1 nova-compute host
2) Boot a VM using ImageB
3) Delete VM
4) Attempt to boot a VM using ImageB

The flavor used is irrelevant.

Revision history for this message
karthik (karthik-kalakodimi) wrote :

chris,

it looks like "no valid host" is generic error.

Can you please provide me the logs that you have seen during error condition.

how do you determine the cache size ?

Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → Confirmed
assignee: karthik (karthik-kalakodimi) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers