Existing Ironic instances report negative available RAM for the node after upgrade

Bug #1502177 reported by Jim Rollenhagen
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Jim Rollenhagen

Bug Description

Ironic nodes that have an existing instance will report negative available RAM after upgrading beyond this commit: https://github.com/openstack/nova/commit/b99fb0a51c658301188bbc729d1437a9c8b75d00

The node attached to the instance will not have instance_info[memory_mb], etc, set on the node object in Ironic. This code causes the driver to report memory_mb_used=memory_mb=0 if this info is unset. The resource tracker notices that an instance is on that node and sets memory_mb_used to X (the size of the instance). After which the node reports (-X) available memory.

This can wreak havoc on tools that look at total available memory. These could range from capacity reporting tools to scheduler/cells filters. If more than half of the capacity has instances, the total memory available will be negative, and could cause things to not schedule properly or generate alerts.

tags: added: kilo-rc-potential
tags: added: liberty-rc-potential
removed: kilo-rc-potential
Changed in nova:
assignee: nobody → Jim Rollenhagen (jim-rollenhagen)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/230487
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=047da6498dbb3af71bcb9e6d0e2c38aa23b06615
Submitter: Jenkins
Branch: master

commit 047da6498dbb3af71bcb9e6d0e2c38aa23b06615
Author: Jim Rollenhagen <email address hidden>
Date: Thu Oct 1 21:12:53 2015 -0700

    Ironic: Fix bad capacity reporting if instance_info is unset

    If node.instance_info is unset for a node that has an instance on it
    (for example, when upgrading Nova from before the patch that set these
    values on spawn), the Ironic driver reports 0 for both $resource and
    $resource_used. The resource tracker will correct $resource_used, and
    resources available will be reported as a negative number.

    Correct this by reporting the original value if instance_info is unset.

    Closes-Bug: #1502177
    Change-Id: I13c5e5430fd305cd0ee2f24cd95304660ccf11eb

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/231179

Changed in nova:
importance: Undecided → High
Changed in nova:
milestone: none → liberty-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/liberty)

Reviewed: https://review.openstack.org/231179
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8af97531e3fbd2f24c091f14c3a20cacf538cff2
Submitter: Jenkins
Branch: stable/liberty

commit 8af97531e3fbd2f24c091f14c3a20cacf538cff2
Author: Jim Rollenhagen <email address hidden>
Date: Thu Oct 1 21:12:53 2015 -0700

    Ironic: Fix bad capacity reporting if instance_info is unset

    If node.instance_info is unset for a node that has an instance on it
    (for example, when upgrading Nova from before the patch that set these
    values on spawn), the Ironic driver reports 0 for both $resource and
    $resource_used. The resource tracker will correct $resource_used, and
    resources available will be reported as a negative number.

    Correct this by reporting the original value if instance_info is unset.

    Closes-Bug: #1502177
    Change-Id: I13c5e5430fd305cd0ee2f24cd95304660ccf11eb
    (cherry picked from commit 047da6498dbb3af71bcb9e6d0e2c38aa23b06615)

tags: added: in-stable-liberty
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-rc2 → 12.0.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/235181

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)
Download full text (10.6 KiB)

Reviewed: https://review.openstack.org/235181
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d8a3b9d6408627ce9a293d1e62d003757af655a1
Submitter: Jenkins
Branch: master

commit 6df6ad3ff32f2b1fe2978df1032002548ad8eb66
Author: Davanum Srinivas <email address hidden>
Date: Wed Oct 7 08:11:35 2015 -0700

    Omnibus stable/liberty fix

    There are currently 3 different blocking issues in stable/liberty due
    to library releases: webob 1.5, oslo.db 3.0.0, and
    oslo.versionedobjects 0.11.0. This is a squashed fix for all of them
    as none can land without the others.

    Issue #1 - oslo.db

    Add testresources used by oslo.db fixture

    If we use oslo.db fixtures, we'll need the package or
    the next version of oslo.db release will break us.

    (Cherry-picked from 4bcc26487837b7ece7797f88622dea1b6d09bd94)

    Closes-Bug: #1503501

    Issue #2 - oslo.versionedobjects

    Drop unused obj_to_primitive() override

    This was a band-aid override until o.vo gained the obj_relationships fix
    that this method overrides. That has been in place since o.vo 0.8.0, which
    means this is long since no longer necessary (and is actually blocking our
    ability to absorb bug fixes to this code in o.vo). Further, we no longer
    use this directly because we're doing backports based on version manifests,
    which means we no longer consult child_versions _or_ obj_relationships.

    (cherry picked from commit 142f1d9cc4ace90956c665c40b1f78795f9f7e29)

    Issue #3 - webob

    Default ConvertedException code to 500

    webob 1.5.0 released on 10/11 has change f6c749011 which
    strictly enforces status codes in exceptions, and 0 is not
    a valid status code so tests fail.

    Change the default to 500 to match the default in the parent
    class in webob.

    Closes-Bug: #1505153
    (cherry picked from commit 10438c0fc34bd088e018e1a5e8ec57b396528792)

    Change-Id: I1e06e77308a7dd23209124f0807d61fb52470188

commit 606204354b5ed96852240020769c81acda9f9fc8
Author: Matt Riedemann <email address hidden>
Date: Mon Oct 5 20:32:58 2015 +0000

    Revert "[libvirt] Move cleanup of imported files to imagebackend"

    This reverts commit 9ba70756de326ffaa8be43acfde12cad04ed0af2

    The change introduced an UnboundLocalError if we fail to
    create the config_drive_image variable. Also, the original
    change didn't have any unit tests and came late in the
    liberty release so I don't really want to mess with fixing
    this given we need the fix in liberty-rc2.

    Change-Id: Ia7b70aa139b67cf58b5c0f9fbcd2a4deb465914e
    Closes-Bug: #1502961

commit ef655379445693443146f8a3ed31cabb011d9937
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Oct 8 06:41:06 2015 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: Idcac653033ab9808e06451a0dd690db4736834b2

commit eda3029aa74932f421d2992ac24f5ac3c92f347c
Author: Dan Smith <email address hidden>
Date: Tue Oct 6 10:58:18 2...

Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/nova 13.0.0.0b1

This issue was fixed in the openstack/nova 13.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.