Locate and clean up orphaned VDIs in XenServer
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Josh Kearney |
Bug Description
On instance provision, if an exception is generated which stops the build, and then the failed build is deleted, the files pulled from Glance are not cleaned up at all.
Over time, this can stack up and become a very large problem since there is a lot of junk disks. From what I observed, the disk image is brought down to the host machine, and has been scanned in to the SR as there is a VDI record. It's just not removed. We should also add the instance id into the VDI name-description so that we can track which VDIs are associated with what instances. At this point, there's not a good way to track and clean this cruft from the failed builds.
For example:
uuid ( RO) : 205b5447-
name-
For reference I'm running rev 1265. The exception I ran into was (nova): TRACE: RemoteError: FixedIpNotFound
Changed in nova: | |
assignee: | nobody → Brian Waldon (bcwaldon) |
Changed in nova: | |
assignee: | Brian Waldon (bcwaldon) → nobody |
Changed in nova: | |
status: | Confirmed → In Progress |
assignee: | nobody → Josh Kearney (jk0) |
Changed in nova: | |
importance: | Medium → Critical |
importance: | Critical → High |
summary: |
- Disk Clean up on Build Failure in XenServer + Locate and clean up orphaned VDIs in XenServer |
Changed in nova: | |
milestone: | none → essex-1 |
Changed in nova: | |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | essex-1 → 2012.1 |
The way I see this we have a couple of options:
1) Clean up all orphaned disks periodically.
2) Provide Admin API calls to list orphaned disks and delete said disks.
I'm a fan of providing this via the Admin API because then any operational team can decide what to do with that information. We don't have an admin API client (something we need?)
I don't love the periodic task strategy (feels like bailing out water when you could be finding the leak) but it might be prudent to make a task and then a blueprint for conversion to admin API and having an admin API client which could be run in a cron job?