VMWARE: Intermittent problem with stats reporting

Bug #1252827 reported by Sreeram Yerrapragada
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Sabari Murugesan
Havana
Fix Released
High
Gary Kotton
VMwareAPI-Team
Fix Released
Critical
Sabari Murugesan

Bug Description

I see that sometimes vmware driver reports 0 stats. Please take a look at the following log file for more information: http://162.209.83.206/logs/51404/6/screen-n-cpu.txt.gz

excerpts from log file:
2013-11-18 15:41:03.994 20162 WARNING nova.virt.vmwareapi.vim_util [-] Unable to retrieve value for datastore Reason: None
2013-11-18 15:41:04.029 20162 WARNING nova.virt.vmwareapi.vim_util [-] Unable to retrieve value for host Reason: None
2013-11-18 15:41:04.029 20162 WARNING nova.virt.vmwareapi.vim_util [-] Unable to retrieve value for resourcePool Reason: None
2013-11-18 15:41:04.029 20162 DEBUG nova.compute.resource_tracker [-] Hypervisor: free ram (MB): 0 _report_hypervisor_resource_view /opt/stack/nova/nova/compute/resource_tracker.py:389
2013-11-18 15:41:04.029 20162 DEBUG nova.compute.resource_tracker [-] Hypervisor: free disk (GB): 0 _report_hypervisor_resource_view /opt/stack/nova/nova/compute/resource_tracker.py:390
2013-11-18 15:41:04.030 20162 DEBUG nova.compute.resource_tracker [-] Hypervisor: VCPU information unavailable _report_hypervisor_resource_view /opt/stack/nova/nova/compute/resource_tracker.py:397

During this time we cannot spawn any server. Look at the http://162.209.83.206/logs/51404/6/screen-n-sch.txt.gz

excerpts from log file:
2013-11-18 15:41:52.475 DEBUG nova.filters [req-dc82a954-3cc5-4627-ae01-b3d1ec2155af InstanceActionsTestXML-tempest-716947327-user InstanceActionsTestXML-tempest-716947327-tenant] Filter AvailabilityZoneFilter returned 1 host(s) get_filtered_objects /opt/stack/nova/nova/filters.py:88
2013-11-18 15:41:52.476 DEBUG nova.scheduler.filters.ram_filter [req-dc82a954-3cc5-4627-ae01-b3d1ec2155af InstanceActionsTestXML-tempest-716947327-user InstanceActionsTestXML-tempest-716947327-tenant] (Ubuntu1204Server, domain-c26(c1)) ram:-576 disk:0 io_ops:0 instances:1 does not have 64 MB usable ram, it only has -576.0 MB usable ram. host_passes /opt/stack/nova/nova/scheduler/filters/ram_filter.py:60
2013-11-18 15:41:52.476 INFO nova.filters [req-dc82a954-3cc5-4627-ae01-b3d1ec2155af InstanceActionsTestXML-tempest-716947327-user InstanceActionsTestXML-tempest-716947327-tenant] Filter RamFilter returned 0 hosts
2013-11-18 15:41:52.477 WARNING nova.scheduler.driver [req-dc82a954-3cc5-4627-ae01-b3d1ec2155af InstanceActionsTestXML-tempest-716947327-user InstanceActionsTestXML-tempest-716947327-tenant] [instance: 1a648022-1783-4874-8b41-c3f4c89d8500] Setting instance to ERROR state.

Tags: vmware
Ryan Hsu (rhsu)
affects: barbican → nova
Changed in nova:
status: New → Confirmed
importance: Undecided → Critical
Changed in openstack-vmwareapi-team:
importance: Undecided → Critical
status: New → Confirmed
Changed in nova:
importance: Critical → High
Revision history for this message
Tracy Jones (tjones-i) wrote :

going to run the debug code on one of the CI slaves to hopefully repo

Revision history for this message
Gary Kotton (garyk) wrote :

Moved to critical - when the problem occurs the VM's cannot be booted.

Changed in nova:
importance: High → Critical
Gary Kotton (garyk)
Changed in nova:
milestone: none → icehouse-1
assignee: nobody → Gary Kotton (garyk)
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Shawn Hartsock (hartsock) wrote :

Please tell me I don't need to ask Russell Bryant to come here and explain priorities to you.

Changed in nova:
importance: Critical → High
Changed in openstack-vmwareapi-team:
status: Confirmed → In Progress
Revision history for this message
Gary Kotton (garyk) wrote :

Yeah, an explanation of the priorities would be nice. If I was unable to deploy a VM I would consider that a critical problem, but if high is what we need to settle for then great.

Problem is address by: https://review.openstack.org/#/c/58890/ the patch https://review.openstack.org/#/c/58705/ was a quick fix until we found the real issue

Changed in nova:
assignee: Gary Kotton (garyk) → Sabari Kumar Murugesan (smurugesan)
Revision history for this message
dan wendlandt (danwent) wrote :

no need for a priority fight here. In terms of the priority within the nova project, i believe this should be 'high', as its impact is limited to a single driver, and I think critical is reserved for general items (at least this is what I have been told... not sure if it is strictly enforced). For the vmwareapi project, I would consider this critical.

Revision history for this message
Sabari Murugesan (smurugesan) wrote :
tags: added: havana-backport-potential
Changed in nova:
milestone: icehouse-1 → icehouse-2
Changed in openstack-vmwareapi-team:
assignee: nobody → Sabari Kumar Murugesan (smurugesan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/58890
Committed: http://github.com/openstack/nova/commit/6471776b6b25bb4062238f7c1b732b2d6999ec65
Submitter: Jenkins
Branch: master

commit 6471776b6b25bb4062238f7c1b732b2d6999ec65
Author: Sabari Kumar Murugesan <email address hidden>
Date: Wed Nov 27 16:10:59 2013 -0800

    VMware: Fix unhandled session failure issues

    VMware driver has a re-try mechanism to handle session expiration
    failures. Due to a minor bug in the exception handling module, this
    failure was unhandled.

    The patch fixes this issue and has added tests.

    Closes-Bug: #1252827
    Change-Id: Ie91adb4b4b57b7cefeed855cdbe4710da86294f0

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/60651

Alan Pevec (apevec)
tags: removed: havana-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/60651
Committed: http://github.com/openstack/nova/commit/c2278faae1248ecbd149d0750ab1e27d53ded62d
Submitter: Jenkins
Branch: stable/havana

commit c2278faae1248ecbd149d0750ab1e27d53ded62d
Author: Sabari Kumar Murugesan <email address hidden>
Date: Wed Nov 27 16:10:59 2013 -0800

    VMware: Fix unhandled session failure issues

    VMware driver has a re-try mechanism to handle session expiration
    failures. Due to a minor bug in the exception handling module, this
    failure was unhandled.

    The patch fixes this issue and has added tests.

    Closes-Bug: #1252827
    (cherry picked from commit 6471776b6b25bb4062238f7c1b732b2d6999ec65)

    Conflicts:

     nova/tests/virt/vmwareapi/test_vmwareapi_vim_util.py

    Change-Id: I6b2e0ce664c0f6b479475a4bbc80947e5a1f9101

Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Changed in openstack-vmwareapi-team:
status: In Progress → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-2 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.