libvirtError: internal error unable to add domain xxx to cgroup: No space left on device

Bug #1295876 reported by Joe Gordon
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Medium
Unassigned
libvirt (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

logstash query: message:"cgroup\: No space left on device" AND filename:logs*screen-n-cpu.txt

http://logs.openstack.org/12/80412/8/check/check-tempest-dsvm-postgres-full/f9f6158/logs/screen-n-cpu.txt.gz?level=TRACE#_2014-03-21_17_45_12_490

ERROR nova.compute.manager [req-630b71d6-0fbe-4e9e-99fe-019da7d29a3a FixedIPsNegativeTestJson-475659359 FixedIPsNegativeTestJson-265680949] [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] Error: internal error unable to add domain instance-00000002 task 3057 to cgroup: No space left on device
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] Traceback (most recent call last):
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/opt/stack/new/nova/nova/compute/manager.py", line 1304, in _build_instance
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] set_access_ip=set_access_ip)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/opt/stack/new/nova/nova/compute/manager.py", line 394, in decorated_function
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] return function(self, context, *args, **kwargs)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/opt/stack/new/nova/nova/compute/manager.py", line 1716, in _spawn
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] LOG.exception(_('Instance failed to spawn'), instance=instance)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/opt/stack/new/nova/nova/openstack/common/excutils.py", line 68, in __exit__
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] six.reraise(self.type_, self.value, self.tb)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/opt/stack/new/nova/nova/compute/manager.py", line 1713, in _spawn
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] block_device_info)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2241, in spawn
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] block_device_info)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 3621, in _create_domain_and_network
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] power_on=power_on)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 3531, in _create_domain
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] domain.XMLDesc(0))
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/opt/stack/new/nova/nova/openstack/common/excutils.py", line 68, in __exit__
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] six.reraise(self.type_, self.value, self.tb)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 3526, in _create_domain
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] domain.createWithFlags(launch_flags)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 179, in doit
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] result = proxy_call(self._autowrap, f, *args, **kwargs)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 139, in proxy_call
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] rv = execute(f,*args,**kwargs)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 77, in tworker
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] rv = meth(*args,**kwargs)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 581, in createWithFlags
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] libvirtError: internal error unable to add domain instance-00000002 task 3057 to cgroup: No space left on device
26028 TRACE nova.compute.manager [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640]
26028 TRACE nova.compute.utils [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] Traceback (most recent call last):
26028 TRACE nova.compute.utils [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/opt/stack/new/nova/nova/compute/manager.py", line 1184, in _run_instance
26028 TRACE nova.compute.utils [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] instance, image_meta, legacy_bdm_in_spec)
26028 TRACE nova.compute.utils [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] File "/opt/stack/new/nova/nova/compute/manager.py", line 1354, in _build_instance
26028 TRACE nova.compute.utils [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] reason=unicode(exc_info[1]))
26028 TRACE nova.compute.utils [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640] RescheduledException: Build of instance 3f281136-ed69-4bfb-bf36-a7d4aa1c0640 was re-scheduled: internal error unable to add domain instance-00000002 task 3057 to cgroup: No space left on device
26028 TRACE nova.compute.utils [instance: 3f281136-ed69-4bfb-bf36-a7d4aa1c0640]

This started right around the time we started using: https://review.openstack.org/#/c/79816/ (serge's libvirt fix for https://bugs.launchpad.net/nova/+bug/1254872)

Revision history for this message
Joe Gordon (jogo) wrote :

We have seen 10 hits in last 12 hours across all jobs, so this bug appears to be less frequent then https://bugs.launchpad.net/nova/+bug/1254872 was

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

ENOSPC is what you get when you try to move a task into a cpuset which has an uninitialized cpuset.cpus or cpuset.mems. This will happen if cgroup.clone_children is unset and libvirt does not set those values itself.

Please try ensuring that something does echo 1 > /sys/fs/cgroup/cpuset/cgroup.clone_children at boot and see if that helps.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hm, looking at the code in libvirt which sets up the inheritence of those values, there seems to be no way there could be a race.

Is there any way to get log output with log_level=1 in /etc/libvirt/libvirtd.conf? (I realize that's probably not convenient, and it also may end up masking whatever is going on)

Matt Riedemann (mriedem)
tags: added: libvirt testing
Revision history for this message
Matt Riedemann (mriedem) wrote :

This makes me nervous, this merged on 3/20 and logstash shows that's when this started failing:

https://review.openstack.org/77593

Revision history for this message
Matt Riedemann (mriedem) wrote :

This also merged on 3/20 and deals with vcpus, which looks bad given what Serge said, but the change looks pretty tame:

https://review.openstack.org/#/c/73548/

The logic seems OK to me.

Revision history for this message
Joe Gordon (jogo) wrote :

Serge, you mentioned we should set the cgroups at boot, so does that mean this won't work: https://review.openstack.org/#/c/82630/

Joe Gordon (jogo)
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1295876] Re: libvirtError: internal error unable to add domain xxx to cgroup: No space left on device

Quoting Joe Gordon (<email address hidden>):
> Serge, you mentioned we should set the cgroups at boot, so does that
> mean this won't work: https://review.openstack.org/#/c/82630/

Hi Joe - yes, this *definately* should work. However it's a workaround
and *should* not be needed, and given the mysterious nature of the
current failure, it's quite possible that the root cause will go on to
cause another symptom.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Matt,

yeah, regarding https://review.openstack.org/#/c/73548/14/nova/virt/libvirt/driver.py,unified - cursory glance suggests that get_vcpu_used() is just for reporting on vms, so shouldn't have anything to do with this bug.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libvirt (Ubuntu):
status: New → Confirmed
Revision history for this message
Joe Gordon (jogo) wrote :

It looks like this bug has returned in Gate yesterday April 22nd. http://status.openstack.org/elastic-recheck/

Revision history for this message
Sean Dague (sdague) wrote :

@serge unfortunately we can't really set libvirt to that log level for a race bug, as that generations > 100MB of log per libvirt run. If there is a more targeted log filter we can look at it.

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

@sean: For cgroups, below are the specific filter variables to log debug level messages:

    LIBVIRT_LOG_FILTERS="1:cgroup"
    LIBVIRT_LOG_OUTPUTS=1:file:/var//tmp/libvirt.log

Revision history for this message
Joe Gordon (jogo) wrote :

no hits in a while looks like changing the version of libvirt fixed this

Changed in nova:
status: Confirmed → Incomplete
status: Incomplete → Invalid
Changed in libvirt (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.