qemu: qemu_thread_create: Resource temporarily unavailable

Bug #1921979 reported by wangzhh
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Medium
wangzhh

Bug Description

Description
===========
When I start my nova-compute. It always reports errors, breaks resources update. Here is the log:

2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager [req-1eed160c-86ea-4d86-abe1-bf80daa1315d - - - - -] Error updating resources for node hw-arm-controller-1.: InvalidDiskInfo: 磁盘信息文件无效:对 /var/lib/libvirt/images/ubuntu18.04-2.qcow2 执行 qemu-img 失败:Unexpected error while running command.
Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/libvirt/images/ubuntu18.04-2.qcow2 --force-share
Exit code: -6
Stdout: u''
Stderr: u'qemu: qemu_thread_create: Resource temporarily unavailable\n'
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager Traceback (most recent call last):
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7408, in update_available_resource_for_node
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager rt.update_available_resource(context, nodename)
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 706, in update_available_resource
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7090, in get_available_resource
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager disk_over_committed = self._get_disk_over_committed_size_total()
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 8640, in _get_disk_over_committed_size_total
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager config, block_device_info)
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 8542, in _get_instance_disk_info_from_config
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager qemu_img_info = disk_api.get_disk_info(path)
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 136, in get_disk_info
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager return images.qemu_img_info(path)
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 88, in qemu_img_info
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager raise exception.InvalidDiskInfo(reason=msg)
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager InvalidDiskInfo: 磁盘信息文件无效:对 /var/lib/libvirt/images/ubuntu18.04-2.qcow2 执行 qemu-img 失败:Unexpected error while running command.
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /var/lib/libvirt/images/ubuntu18.04-2.qcow2 --force-share
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager Exit code: -6
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager Stdout: u''
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager Stderr: u'qemu: qemu_thread_create: Resource temporarily unavailable\n'
2021-03-30 11:13:45.604 2100913 ERROR nova.compute.manager
2021-03-30 11:13:46.587 2100913 INFO oslo_service.service [-] Caught SIGINT signal, instantaneous exiting

It seems a bug which fixed in ironic.
https://storyboard.openstack.org/#!/story/2007644#comment-158124

It worked well when I merge this patch.

Steps to reproduce
==================
Note: There are several instances launched by virsh, without nova. When I cleaned them, it worked well.

systemctl restart nova-compute

Expected result
===============
nova-compute start and without error.

Actual result
=============
Compute service start but with error log

Environment
===========
1. Found in Q release, but still exists in master.

2. qemu v2.12.0

3. Arm server

4. NeoKylin Linux Advanced Server release V7Update6

Tags: libvirt
wangzhh (wangzhh)
Changed in nova:
assignee: nobody → wangzhh (wangzhh)
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I have couple of questions:

* You stated that using the fix from https://storyboard.openstack.org/#!/story/2007644 fixed your issue. Did you port the fix in that patch from ironic-lib to nova.privsep.qemu.unprivileged_qemu_img_info() ?

* Is the path the qemu command fail on (e.g. '/var/lib/libvirt/images/ubuntu18.04-2.qcow2' in your trace above) an image of a nova managed VM or an image of non-nova managed VM? It seems to me that nova.virt.libvirt.driver.LibvirtDriver._get_disk_over_committed_size_total() iterates all the domains on the hypervisor so it checks your non-nova managed domains too.

Marking it Incomplete until questions are answered. Please set it back to New once you did so.

tags: added: libvirt
Changed in nova:
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
wangzhh (wangzhh) wrote :

@Balazs Gibizer,
* Yes, I ported it to nova.privsep.qemu.unprivileged_qemu_img_info(). https://github.com/openstack/nova/blob/0e7cd9d1a95a30455e3c91916ece590454235e0e/nova/privsep/qemu.py#L31
Temporarily hard code address_space to 2G.

* Qemu command fails on an image of non-nova managed VM. Maybe this domain is the first one in the list. I issued command `/usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info ${other image} --force-share` and got the same error.

Changed in nova:
status: Incomplete → New
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Thanks for the response. I marking this as Confirmed as I believe that the ironic issue and the nova issue has the same root cause, the busy machine.

I suggest you to propose the ported fix against nova repo in gerrit. If you need help with that then let me know here, or in #openstack-nova IRC channel (my nick is gibi) on freenode.

Changed in nova:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.