[CI] Libvirt error "cannot set up guest memory 'pc.ram': Cannot allocate memory" cause failure while spawning instance

Bug #1902516 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

I saw at least couple of times recently that one of tests in scenario CI jobs was failing due to failure in spawning instance. In nova-compute's logs there is error like:

Nov 02 09:13:24.954052 ubuntu-bionic-rax-ord-0021331595 nova-compute[23193]: ERROR nova.virt.libvirt.driver [None req-9c065faf-96c5-42fd-833c-e0c015457188 tempest-TrunkTest-649105798 tempest-TrunkTest-649105798] [instance: 69366a0b-6d15-4e8b-9f90-e9bec06ca126] Failed to start libvirt guest: libvirt.libvirtError: internal error: process exited while connecting to monitor: 2020-11-02T09:13:24.490765Z qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]

For example in: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_bb7/759852/4/check/neutron-tempest-plugin-scenario-linuxbridge-train/bb75daf/controller/logs/screen-n-cpu.txt

Logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22warning%3A%20TCG%20doesn't%20support%20requested%20feature%3A%20CPUID.01H%3AECX.vmx%5C%22

From logstash it seems that it happens only in jobs which runs on Ubuntu Bionic.

Tags: gate-failure
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

The "warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]" is a red herring. It is logged for successfully launched VMs in green jobs as well.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

The interesting log is a bit later in the screen-n-cpu.txt:

Nov 02 09:13:25.177820 ubuntu-bionic-rax-ord-0021331595 nova-compute[23193]: 2020-11-02T09:13:24.491970Z qemu-system-x86_64: cannot set up guest memory 'pc.ram': Cannot allocate memory

So the hypervisor run out of memory.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

The failed tempest run with concurrency=4 so it might be multiple VMs launched in parallel.

Looking at the dstat log around the time when the VM failed there was around 110MB free memory on the hypervisor, the VM was booted with a flavor requesting 128MB of ram.

When the scheduler accepted the host it saw > 4G free ram.

Nov 02 09:13:20.149808 ubuntu-bionic-rax-ord-0021331595 nova-scheduler[20312]: DEBUG nova.scheduler.filter_scheduler [None req-9c065faf-96c5-42fd-833c-e0c015457188 tempest-TrunkTest-649105798 tempest-TrunkTest-649105798] [instance: 69366a0b-6d15-4e8b-9f90-e9bec06ca126] Selected host: (ubuntu-bionic-rax-ord-0021331595, ubuntu-bionic-rax-ord-0021331595) ram: 4258MB disk: 29696MB io_ops: 0 instances: 7

The last compute resource report before the VM boot failure reported the following:

_report_final_resource_view /opt/stack/nova/nova/compute/resource_tracker.py:1039}}
Nov 02 09:13:19.335170 ubuntu-bionic-rax-ord-0021331595 nova-compute[23193]: DEBUG nova.compute.resource_tracker [None req-b4825f7c-63f9-412c-8579-126a178955a8 None None] Final resource view: name=ubuntu-bionic-rax-ord-0021331595 phys_ram=7970MB used_ram=3712MB phys_disk=76GB used_disk=31GB total_vcpus=8 used_vcpus=7 pci_stats=[] {{(pid=23193) _report_final_resource_view /opt/stack/nova/nova/compute/resource_tracker.py:1061}}

This is consistent what the scheduler used for the decision making that the request fits on this host.

So I think the value of the reserverd_host_memory_mb in nova-cpu.conf (which is 512) are not reflecting the reality on the hypervisor.

summary: - [CI] Libvirt error "TCG doesn't support requested feature:
- CPUID.01H:ECX.vmx" cause failure while spawning instance
+ [CI] Libvirt error "cannot set up guest memory 'pc.ram': Cannot allocate
+ memory" cause failure while spawning instance
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I'm marking this as invalid from nova perspective as this is more like a deployment configuration error.

Changed in nova:
status: New → Invalid
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Thx Gibi for checking that. I proposed patch https://review.opendev.org/#/c/761022/ to disable Swift services in neutron-tempest-plugin jobs. This should helps a bit with memory consumption.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.