resource leak when launching pci instance on host that don't have enough pci resources

Bug #1482019 reported by Rui Chen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Rui Chen

Bug Description

I specify a host to boot instance with pci request, but the host didn't have pci devices. like below:

 nova boot --image cirros-0.3.2-x86_64-disk --nic net-id=d9eee163-f148-4244-92c5-ffda7d9db06a --flavor chenrui_f --availability ::devstack chenrui_pci

A exception would be raised from self.pci_stats.apply_requests in HostState.consume_from_instance.

https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L234

But at this time, the part of compute resource had been consumed, like: ram, disk, vcpus and so on. And there is no revert resource logic to release the part of resource when the exception was raised. I think it's a resource lacking.

I boot 12 instances, the following is nova-scheduler.log, you can found the resources constantly on the decrease. At final, I must restart the nova-scheduler, or else I can't boot any instances.

stack@devstack:/opt/stack/logs$ $ tailf screen-n-sch.log | fgrep 'Selected host: WeighedHost'
2015-05-11 15:54:45.735 DEBUG nova.scheduler.filter_scheduler [req-11dcc5ee-586a-472f-afa0-260c282676e3 admin admin] Selected host: WeighedHost [host: (devstack, devstack) ram:14509 disk:76800 io_ops:0 instances:2, weight: 0.965914386526] _schedule /opt/stack/nova/nova/scheduler/filter_scheduler.py:158
2015-05-11 15:54:53.620 DEBUG nova.scheduler.filter_scheduler [req-a88af594-2633-4527-8d8b-4db8feef7489 admin admin] Selected host: WeighedHost [host: (devstack, devstack) ram:13997 disk:75776 io_ops:0 instances:3, weight: 0.931828773051] _schedule /opt/stack/nova/nova/scheduler/filter_scheduler.py:158
2015-05-11 15:54:58.849 DEBUG nova.scheduler.filter_scheduler [req-8a79ad56-eb1b-4bc8-8573-d387bfc38184 admin admin] Selected host: WeighedHost [host: (devstack, devstack) ram:13485 disk:74752 io_ops:0 instances:4, weight: 0.897743159577] _schedule /opt/stack/nova/nova/scheduler/filter_scheduler.py:158
2015-05-11 15:55:05.956 DEBUG nova.scheduler.filter_scheduler [req-e2a3577a-e739-406b-957a-3bc8fc16a7d8 admin admin] Selected host: WeighedHost [host: (devstack, devstack) ram:12973 disk:73728 io_ops:0 instances:5, weight: 0.863657546102] _schedule /opt/stack/nova/nova/scheduler/filter_scheduler.py:158
2015-05-11 15:55:10.868 DEBUG nova.scheduler.filter_scheduler [req-6f943265-dfc7-473a-a9df-3e078c7abb08 admin admin] Selected host: WeighedHost [host: (devstack, devstack) ram:12461 disk:72704 io_ops:0 instances:6, weight: 0.829571932628] _schedule /opt/stack/nova/nova/scheduler/filter_scheduler.py:158
2015-05-11 15:55:43.500 DEBUG nova.scheduler.filter_scheduler [req-e171dcfd-373e-4ff9-b7de-e8d8d977b727 admin admin] Selected host: WeighedHost [host: (devstack, devstack) ram:11949 disk:71680 io_ops:0 instances:7, weight: 0.795486319153] _schedule /opt/stack/nova/nova/scheduler/filter_scheduler.py:158
2015-05-11 15:55:55.551 DEBUG nova.scheduler.filter_scheduler [req-522f9d71-35ed-44bb-b308-d3f78374c24e admin admin] Selected host: WeighedHost [host: (devstack, devstack) ram:11437 disk:70656 io_ops:0 instances:8, weight: 0.761400705679] _schedule /opt/stack/nova/nova/scheduler/filter_scheduler.py:158
2015-05-11 15:56:13.723 DEBUG nova.scheduler.filter_scheduler [req-106cccfb-4778-4eb7-90d8-a97d4a62de8c admin admin] Selected host: WeighedHost [host: (devstack, devstack) ram:10925 disk:69632 io_ops:0 instances:9, weight: 0.727315092204] _schedule /opt/stack/nova/nova/scheduler/filter_scheduler.py:158
2015-05-11 15:57:43.972 DEBUG nova.scheduler.filter_scheduler [req-c054d26e-ca44-4375-991c-531418791806 admin admin] Selected host: WeighedHost [host: (devstack, devstack) ram:10413 disk:68608 io_ops:0 instances:10, weight: 0.69322947873] _schedule /opt/stack/nova/nova/scheduler/filter_scheduler.py:158
2015-05-11 15:57:54.557 DEBUG nova.scheduler.filter_scheduler [req-92684590-df86-4c0e-a359-6f661ee0cd23 admin admin] Selected host: WeighedHost [host: (devstack, devstack) ram:9901 disk:67584 io_ops:0 instances:11, weight: 0.659143865255] _schedule /opt/stack/nova/nova/scheduler/filter_scheduler.py:158
2015-05-11 15:58:24.918 DEBUG nova.scheduler.filter_scheduler [req-eb7443d4-8617-4986-8d33-c8f44646d769 admin admin] Selected host: WeighedHost [host: (devstack, devstack) ram:9389 disk:66560 io_ops:0 instances:12, weight: 0.625058251781] _schedule /opt/stack/nova/nova/scheduler/filter_scheduler.py:158
2015-05-11 15:59:53.188 DEBUG nova.scheduler.filter_scheduler [req-416e2d3b-a601-463b-948e-c6fe27341398 admin admin] Selected host: WeighedHost [host: (devstack, devstack) ram:8877 disk:65536 io_ops:0 instances:13, weight: 0.590972638306] _schedule /opt/stack/nova/nova/scheduler/filter_scheduler.py:158

Code base:

$ git log -1
commit c6a19f36e2ed0addd154c4d8361cd82fa8f790b9
Merge: 7e76b89 c10f8a1
Author: Jenkins <email address hidden>
Date: Wed Aug 5 05:09:26 2015 +0000

    Merge "libvirt: move LibvirtAOEVolumeDriver into it's own module"

Changed in nova:
assignee: nobody → Rui Chen (kiwik-chenrui)
status: New → In Progress
Revision history for this message
Rui Chen (kiwik-chenrui) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/182165

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/182165
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=26a4f430d1ec45f7b9a61b69f009a705d2d3cd52
Submitter: Jenkins
Branch: master

commit 26a4f430d1ec45f7b9a61b69f009a705d2d3cd52
Author: Rui Chen <email address hidden>
Date: Mon May 11 17:45:04 2015 +0800

    Fix resource leaking when consume_from_instance raise exception

    When we boot instance with force host, and the instance
    request some pci devices, all the scheduler filters
    would been ignored, include: PciPassthroughFilter.
    So if the specified host can't apply the pci request,
    PciDeviceRequestFailed exception would been raised
    from HostState.consume_from_instance, at this time
    the part of compute resource had been updated,
    e.g. vcpus, ram, disk and so on, we should make
    the resource of HostState having chance to been reverted
    in order to avoid resource leaking.

    When the exception is raised from self.pci_stats.apply_requests(),
    catch it and don't set the update field of HostState,
    HostState will be synced with db in next scheduling,
    the resource is reverted at that time.

    Change-Id: I0a87088d56337bcd180606013a5473fa2ec6c608
    Related-Bug: #1391816
    Closes-Bug: #1482019

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-3 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.