ceilometer compute cannot collection memory.usage meter for all vm in one compute node

Bug #1770295 reported by xiexianbin
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ceilometer
In Progress
Undecided
xiexianbin
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Queens
Incomplete
Low
Unassigned
libvirt (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Incomplete
Low
Unassigned

Bug Description

when shutdown vm which long time, the libvirt while
raise libvirtError: Timed out during operation: cannot
acquire state change lock (held by
remoteDispatchDomainShutdown), if occur, all the vm in
the Physics host can not get memory.usage meter anymore!

2018-05-03 07:57:02.211 82162 ERROR ceilometer.compute.virt.libvirt.inspector [-] Get used size of sda in instance-00008a39 failed. The reason is error: Guest agent is not responding: QEMU guest agent is not connected

2018-05-03 07:57:32.474 82162 ERROR ceilometer.compute.pollsters.disk [-] Ignoring instance instance-00008c37 (a7a5433a-9958-4b3c-bb54-b1b693e1c3d7) : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainShutdown)
2018-05-03 07:57:32.474 82162 ERROR ceilometer.compute.pollsters.disk Traceback (most recent call last):
2018-05-03 07:57:32.474 82162 ERROR ceilometer.compute.pollsters.disk File "/usr/lib/python2.7/site-packages/ceilometer/compute/pollsters/disk.py", line 624, in get_samples
2018-05-03 07:57:32.474 82162 ERROR ceilometer.compute.pollsters.disk instance,
2018-05-03 07:57:32.474 82162 ERROR ceilometer.compute.pollsters.disk File "/usr/lib/python2.7/site-packages/ceilometer/compute/pollsters/disk.py", line 560, in _populate_cache
2018-05-03 07:57:32.474 82162 ERROR ceilometer.compute.pollsters.disk for disk, info in disk_info:
2018-05-03 07:57:32.474 82162 ERROR ceilometer.compute.pollsters.disk File "/usr/lib/python2.7/site-packages/ceilometer/compute/virt/libvirt/inspector.py", line 242, in inspect_disk_info
2018-05-03 07:57:32.474 82162 ERROR ceilometer.compute.pollsters.disk block_info = domain.blockInfo(device)
2018-05-03 07:57:32.474 82162 ERROR ceilometer.compute.pollsters.disk File "/usr/lib64/python2.7/site-packages/libvirt.py", line 690, in blockInfo
2018-05-03 07:57:32.474 82162 ERROR ceilometer.compute.pollsters.disk if ret is None: raise libvirtError ('virDomainGetBlockInfo() failed', dom=self)
2018-05-03 07:57:32.474 82162 ERROR ceilometer.compute.pollsters.disk libvirtError: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainShutdown)
2018-05-03 07:57:32.474 82162 ERROR ceilometer.compute.pollsters.disk
2018-05-03 07:57:35.142 82162 ERROR ceilometer.compute.virt.libvirt.inspector [-] Get used size of vda in instance-00008a5d failed. The reason is error: Guest agent is not responding: QEMU guest agent is not connected

2018-05-03 07:58:34.602 82162 WARNING ceilometer.compute.pollsters.memory [-] Cannot inspect data of MemoryUsagePollster for a7a5433a-9958-4b3c-bb54-b1b693e1c3d7: Failed to inspect memory usage of a7a5433a-9958-4b3c-bb54-b1b693e1c3d7, can not get info from libvirt: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainShutdown)
2018-05-03 07:58:34.606 82162 ERROR ceilometer.agent.manager [-] Prevent pollster memory.usage from polling [<NovaLikeServer: VM-21A-01J14>, <NovaLikeServer: VM-21A-01J2H>, <NovaLikeServer: VM-21A003VB>, <NovaLikeServer: VM-21A-01HXN>, <NovaLikeServer: VM-21A-01HXJ>, <NovaLikeServer: VM-21A-01HXK>, <NovaLikeServer: VM-21A-01J2Z>, <NovaLikeServer: VM-21A-01J3C>, <NovaLikeServer: VM-21A-01HXQ>, <NovaLikeServer: VM-21A-01HXR>, <NovaLikeServer: VM-21A007BS>, <NovaLikeServer: VM-21A-01J3B>, <NovaLikeServer: VM-21A-01J2N>, <NovaLikeServer: VM-21A-01HXU>, <NovaLikeServer: VM-21A-01J3F>, <NovaLikeServer: VM-21A-01J3H>, <NovaLikeServer: VM-21A-01HY0>, <NovaLikeServer: VM-21A-01J3D>] on source memory_utilization anymore!

xiexianbin (xianbin)
Changed in ceilometer:
status: New → In Progress
Changed in ceilometer:
assignee: nobody → xiexianbin (xianbin)
Revision history for this message
buguldey (buguldey) wrote :

Seems to be a related bug in libvirt:

https://bugzilla.redhat.com/show_bug.cgi?id=1530346

My own data:

lsb_release -d
Description: Ubuntu 18.04.3 LTS

libvirtd -v --version
libvirtd (libvirt) 4.0.0

virsh list
 Id Name State
----------------------------------------------------
 8 … running

virsh shutdown 8
error: Failed to shutdown domain 8
error: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainSuspend)

Excerpts from the bug at https://bugzilla.redhat.com/show_bug.cgi?id=1530346 :

"I am able to reproduce with [libvirt] 3.7.0 but unable to reproduce with current master (v4.2.0-457-gda613819e9). … I'm running bisect now to see which commit fixed this."

"My bisect points at this commit 72adaf2f10509c3682f2c65ffad4176e00e5a2fb (v4.1.0-rc1~337) which is supposed to fix this bug 1536461 which follows my investigation and suspicion. So basically, 4.1.0 is fixed, the problematic commit that introduced the bug is v3.4.0-rc1~119."

Libvirt git for your convenience: https://libvirt.org/git/?p=libvirt.git

Revision history for this message
buguldey (buguldey) wrote :

This bug is stable: I receive such hangups on host: Intel i7 (8 cpu cores) with guest vm compiling Telegram Desktop: cd out/Debug/ && make -j7

Revision history for this message
buguldey (buguldey) wrote :

The guest vm occupies 7 cpu cores. Its .vdi was created with virtualbox and after that, imported into virt-manager.

Revision history for this message
buguldey (buguldey) wrote :

Tested with latest libvirt git, commit 7a7d36055ce7c161e9309c7bad7f8e61be31c5b8. No hangup while building Telegram. Vm pauses and unpauses okay. Success.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This is only in 4.0 in Bionic and not in earlier nor in later releases.
Thanks buguldey for reporting back on this.

The change only adds/removes a bug from being reported.
So the reported error will change, but it should not change the actual overall behavior.

I haven't seen a rather clear "if you do this, that happens" on this bug to consider the fix important yet. Furthermore I haven't seen other reports on the same in all the time.
Therefore i'm considering stability (not changing it until we know more) over a fix even we know the commit/revert now.

Since this was reported coming up in ceilometer, are there openstack people out there that see this more often and can back this as being a common problem?

Changed in libvirt (Ubuntu Bionic):
status: New → Incomplete
importance: Undecided → Low
Changed in libvirt (Ubuntu):
status: New → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It certainly doesn't match the initially reported issue about state lock and memory stats.
Please update the bug if this is still an issues with newer versions and the recent updates of buguldey turn out to be nice, but actually unrelated to the initial issue).

Revision history for this message
buguldey (buguldey) wrote :

However when I switched to pure qemu/kvm instance with spice visual interface (without libvirt), hangups continue to occur when building Telegram Desktop using a Ubuntu 14.04 guest (which is the Ubuntu version of choice to build Telegram Desktop). I set BIOS memory timings according to vendor spec, did a memory test - all is okay.

So I gave up on this.

Revision history for this message
buguldey (buguldey) wrote :

On my desktop, VirtualBox reports 4 physical CPU cores and 8 logical CPU cores. /proc/cpuinfo reports 8 processors. VirtualBox GUI draws 4 CPU cores as green, and the remaining 4 cores as red (with "invalid configuration" warning when one chooses more than 4 cores).

When I choose 4 cpu cores in VirtualBox, the guest still hangs on TDesktop build process.

```
cat /proc/cpuinfo | grep "model name" | head -1
model name : Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
```

Changed in cloud-archive:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.