cpu_util meter should always be non-negative

Bug #1404192 reported by Kurt
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Released
Medium
Unassigned

Bug Description

In VMWare environment, when a VM is shutdown, vsphere will return -1 when ceilometer query for performance, and as a result the cpu_util is set to -0.01%, like below:

[root@cc vmware]# ceilometer sample-list -m cpu_util -l 1
+--------------------------------------+----------+-------+-----------------+------+---------------------+
| Resource ID | Name | Type | Volume | Unit | Timestamp |
+--------------------------------------+----------+-------+-----------------+------+---------------------+
| f8596226-94b7-4a8b-9569-52672bbf5865 | cpu_util | gauge | -0.01 | % | 2014-12-19T10:19:37 |
+--------------------------------------+----------+-------+-----------------+------+---------------------+

Expected results:

There are two ways to handle this:
1, the inspector should know "-1" is an error code rather than an actual number, and shouldn't generate a new sample.
2, cpu_util volume should be set to "0" if it is negative.

And what's more, the same policy should be applied to memory usage, disk IO, network IO, etc.

Kurt (dingyuan-rao)
Changed in ceilometer:
assignee: nobody → Kurt (dingyuan-rao)
Revision history for this message
ZhiQiang Fan (aji-zqfan) wrote :

or we can raise instance shut off exception when instance is shut off....

Revision history for this message
Kurt (dingyuan-rao) wrote :

Agree with Zhiqiang

gordon chung (chungg)
Changed in ceilometer:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Kurt (dingyuan-rao) wrote :

The fix should handle the following three cases:

1, when instance is shutdown (or any other *stopped-like* status), all performance query will return -1. In this case, an exception should be raised indicating the VM status is invalid for monitoring. No samples will be generated at all in this case.

2, when instance is active, but one disk is broken or one NIC is error, some samples in the performance query will return -1 and others are OK. In that case, as the disk io is calculated as a summary of all disk IOs, the -1s should be replaced by 0s, and others remain unchanged. Samples are still generated in this case, and the -1s are replaced by 0s.

3, when instance is active, but cpu_util or memory usage somehow returns -1. In this case, an exception should be raised indicating the volume is invalid. The specific sample (cpu_util or memory usage) will not be generated, other correct samples will be generated.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/144192

Eoghan Glynn (eglynn)
Changed in ceilometer:
milestone: none → kilo-2
Changed in ceilometer:
status: Triaged → In Progress
Revision history for this message
Haifeng.Yan (yanheven) wrote :

I think we should reuse InstanceShutOffException & NoDataException.so no more work to do with codes in dir pollster/

Eoghan Glynn (eglynn)
Changed in ceilometer:
milestone: kilo-2 → kilo-3
Thierry Carrez (ttx)
Changed in ceilometer:
milestone: kilo-3 → kilo-rc1
Eoghan Glynn (eglynn)
Changed in ceilometer:
milestone: kilo-rc1 → next
Revision history for this message
Luo Gangyi (luogangyi) wrote :

Hi guys, I have another question.

Should we guarantee the cpu_util value smaller than 100%?

In current implementation, cpu_util value sometime are large than 100%, does such value reasonable?

Revision history for this message
gordon chung (chungg) wrote :

@Luo Gangyi, that issue can be tracked here: https://bugs.launchpad.net/ceilometer/+bug/1421584

Revision history for this message
Luo Gangyi (luogangyi) wrote :

 Thanks gordon,

Still a bit difference from my issue. But the issue you pointed is very useful.

gordon chung (chungg)
Changed in ceilometer:
assignee: Kurt (dingyuan-rao) → nobody
status: In Progress → Triaged
Changed in ceilometer:
assignee: nobody → Dean Daskalantonakis (ddaskal)
Revision history for this message
gordon chung (chungg) wrote :

this can be done by adding support for growth_only like delta transformer

Changed in ceilometer:
assignee: Dean Daskalantonakis (ddaskal) → nobody
importance: High → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ceilometer (master)

Change abandoned by gordon chung (<email address hidden>) on branch: master
Review: https://review.openstack.org/144192
Reason: clean up

Changed in ceilometer:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.