libvirtError: internal error: process exited while connecting to monitor: Cannot set up guest memory 'pc.ram': Cannot allocate memory

Bug #1366931 reported by Joe Gordon
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
High
Matthew Treinish

Bug Description

http://logs.openstack.org/41/116741/5/gate/gate-grenade-dsvm/7064ee5/logs/old/screen-n-cpu.txt.gz?level=TRACE#_2014-09-07_02_27_10_408

Libvirt stacktrace in n-cpu:

 Traceback (most recent call last):
   File "/opt/stack/old/nova/nova/compute/manager.py", line 1329, in _build_instance
     set_access_ip=set_access_ip)
   File "/opt/stack/old/nova/nova/compute/manager.py", line 393, in decorated_function
     return function(self, context, *args, **kwargs)
   File "/opt/stack/old/nova/nova/compute/manager.py", line 1741, in _spawn
     LOG.exception(_('Instance failed to spawn'), instance=instance)
   File "/opt/stack/old/nova/nova/openstack/common/excutils.py", line 68, in __exit__
     six.reraise(self.type_, self.value, self.tb)
   File "/opt/stack/old/nova/nova/compute/manager.py", line 1738, in _spawn
     block_device_info)
   File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 2286, in spawn
     block_device_info)
   File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 3686, in _create_domain_and_network
     power_on=power_on)
   File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 3588, in _create_domain
     domain.XMLDesc(0))
   File "/opt/stack/old/nova/nova/openstack/common/excutils.py", line 68, in __exit__
     six.reraise(self.type_, self.value, self.tb)
   File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 3583, in _create_domain
     domain.createWithFlags(launch_flags)
   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 179, in doit
     result = proxy_call(self._autowrap, f, *args, **kwargs)
   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 139, in proxy_call
     rv = execute(f,*args,**kwargs)
   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 77, in tworker
     rv = meth(*args,**kwargs)
   File "/usr/lib/python2.7/dist-packages/libvirt.py", line 896, in createWithFlags
     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
 libvirtError: internal error: process exited while connecting to monitor: Cannot set up guest memory 'pc.ram': Cannot allocate memory

query: message:"Cannot set up guest memory 'pc.ram': Cannot allocate memory" AND tags:"screen-n-cpu.txt" AND tags:"multiline"

Revision history for this message
Matt Riedemann (mriedem) wrote :

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwibGlidmlydEVycm9yOiBpbnRlcm5hbCBlcnJvcjogcHJvY2VzcyBleGl0ZWQgd2hpbGUgY29ubmVjdGluZyB0byBtb25pdG9yOiBDYW5ub3Qgc2V0IHVwIGd1ZXN0IG1lbW9yeSAncGMucmFtJzogQ2Fubm90IGFsbG9jYXRlIG1lbW9yeVwiIEFORCB0YWdzOlwic2NyZWVuLW4tY3B1LnR4dFwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDEwMjAyMjkxNzQyfQ==

18 hits in 7 days, check and gate, grenade jobs also (icehouse side), all failures.

The name of the instance when this fails is TelemetryNotificationAPITestJSON so it appears related to ceilometer hitting the nova API too hard and causing out of memory (but there are other related bugs for out of memory, I think related to neutron).

Can we rate limit ceilometer hitting the nova API in test? For Tempest runs we have disabled rate limiting for the nova v2 API since Havana.

Changed in nova:
status: New → Confirmed
Revision history for this message
Joe Gordon (jogo) wrote :

I don't think this is just ceilometer related, some other hits for this bug hit the oom-killer such as in check-tempest-dsvm-neutron-full

summary: - ibvirtError: internal error: process exited while connecting to monitor:
- Cannot set up guest memory 'pc.ram': Cannot allocate memory
+ libvirtError: internal error: process exited while connecting to
+ monitor: Cannot set up guest memory 'pc.ram': Cannot allocate memory
description: updated
description: updated
Revision history for this message
Matt Riedemann (mriedem) wrote :

Yeah saw that myself, removing ceilometer.

no longer affects: ceilometer
Revision history for this message
Matt Riedemann (mriedem) wrote :

Sounds like when we hit this there are too many nova-api processes, like 24 on an 8 CPU VM and that's without running the nova-metadata-api service.

We changed nova-api/metadata-api/conductor workers to be equal to the number of CPUs in Icehouse by default. The same has been done for trove api/conductor, glance api/registry and cinder volume API in Juno.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/119894

Changed in nova:
assignee: nobody → Matthew Treinish (treinish)
status: Confirmed → In Progress
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matthew Treinish (<email address hidden>) on branch: master
Review: https://review.openstack.org/119894

Changed in nova:
status: In Progress → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :

We haven't been seeing this in the gate anymore since we turned down the number of workers in devstack, should we mark this closed?

Revision history for this message
Joe Gordon (jogo) wrote :

Looks like this was an out of memory issue and not something wrong with nova per se.

Changed in nova:
status: Confirmed → Invalid
Revision history for this message
Zhengwei Gao (multi-task) wrote :

If compute node memory is insufficient , it will occur the error. Oneday, we can use CGroup to provider fix resource for Libvirt.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.