I ran the instance knowing that the compute node doesn't have sufficient RAM memory available. It shows errors in the nova-compute.log and the instance remains in pending state forever.
nova-compute.log
-------------------------------------------------------------
[DEBUG:root:instance 1: starting...
DEBUG:root:Running cmd (subprocess): ifconfig vlan101
DEBUG:root:Result was 1
DEBUG:root:Starting VLAN inteface vlan101
DEBUG:root:Running cmd (subprocess): sudo vconfig set_name_type VLAN_PLUS_VID_NO_PAD
DEBUG:root:Running cmd (subprocess): sudo vconfig add eth0 101
DEBUG:root:Running cmd (subprocess): sudo ifconfig vlan101 up
DEBUG:root:Running cmd (subprocess): ifconfig br101
DEBUG:root:Result was 1
DEBUG:root:Starting Bridge interface for vlan101
DEBUG:root:Running cmd (subprocess): sudo brctl addbr br101
DEBUG:root:Running cmd (subprocess): sudo brctl setfd br101 0
DEBUG:root:Running cmd (subprocess): sudo brctl stp br101 off
DEBUG:root:Running cmd (subprocess): sudo brctl addif br101 vlan101
DEBUG:root:Running cmd (subprocess): sudo ifconfig br101 up
DEBUG:root:Running cmd (subprocess): sudo iptables --delete FORWARD --in-interface br101 -j ACCEPT
DEBUG:root:Result was 1
DEBUG:root:Running cmd (subprocess): sudo iptables -I FORWARD --in-interface br101 -j ACCEPT
DEBUG:root:Running cmd (subprocess): sudo iptables --delete FORWARD --out-interface br101 -j ACCEPT
DEBUG:root:Result was 1
DEBUG:root:Running cmd (subprocess): sudo iptables -I FORWARD --out-interface br101 -j ACCEPT
DEBUG:root:instance instance-1165315330: starting toXML method
DEBUG:root:instance instance-1165315330: finished toXML method
DEBUG:root:Running cmd (subprocess): mkdir -p /home/tpatil/nova/nova/..//instances/instance-1165315330/
DEBUG:root:Running cmd (subprocess): chmod 0777 /home/tpatil/nova/nova/..//instances/instance-1165315330/
INFO:root:instance instance-1165315330: Creating image
DEBUG:root:Running cmd (subprocess): /usr/bin/curl --fail --silent http://10.2.3.150:3333/_images/ami-tiny/image -H "Date: Thu, 06 Jan 2011 21:49:39 GMT" -H "Authorization: AWS admin:admin:tg82sggZV1c1dPYNPK3cL49wDX0=" -o /home/tpatil/nova/nova/..//instances/instance-1165315330/disk-raw
DEBUG:root:Running cmd (subprocess): /usr/bin/curl --fail --silent http://10.2.3.150:3333/_images/aki-lucid/image -H "Date: Thu, 06 Jan 2011 21:49:41 GMT" -H "Authorization: AWS admin:admin:sOXhKpkknKQk4aJ+LKf1BMyY+Qo=" -o /home/tpatil/nova/nova/..//instances/instance-1165315330/kernel
DEBUG:root:Running cmd (subprocess): /usr/bin/curl --fail --silent http://10.2.3.150:3333/_images/ari-lucid/image -H "Date: Thu, 06 Jan 2011 21:49:41 GMT" -H "Authorization: AWS admin:admin:jnZiGj8T1zmMh9JyZxbzde+UUeo=" -o /home/tpatil/nova/nova/..//instances/instance-1165315330/ramdisk
INFO:root:instance instance-1165315330: injecting key into image ami-tiny
DEBUG:root:Running cmd (subprocess): sudo losetup --find --show /home/tpatil/nova/nova/..//instances/instance-1165315330/disk-raw
DEBUG:root:Running cmd (subprocess): sudo tune2fs -c 0 -i 0 /dev/loop2
DEBUG:root:Running cmd (subprocess): sudo mount /dev/loop2 /tmp/tmpIejhuu
DEBUG:root:Running cmd (subprocess): sudo mkdir -p /tmp/tmpIejhuu/root/.ssh
DEBUG:root:Running cmd (subprocess): sudo chown root /tmp/tmpIejhuu/root/.ssh
DEBUG:root:Running cmd (subprocess): sudo chmod 700 /tmp/tmpIejhuu/root/.ssh
DEBUG:root:Running cmd (subprocess): sudo tee -a /tmp/tmpIejhuu/root/.ssh/authorized_keys
DEBUG:root:Running cmd (subprocess): sudo umount /dev/loop2
DEBUG:root:Running cmd (subprocess): rmdir /tmp/tmpIejhuu
DEBUG:root:Running cmd (subprocess): sudo losetup --detach /dev/loop2
DEBUG:root:Running cmd (subprocess): dd if=/dev/zero of=/home/tpatil/nova/nova/..//instances/instance-1165315330/disk-raw count=1 seek=20971519 bs=512
DEBUG:root:Running cmd (subprocess): e2fsck -fp /home/tpatil/nova/nova/..//instances/instance-1165315330/disk-raw
DEBUG:root:Result was 1
DEBUG:root:Running cmd (subprocess): resize2fs /home/tpatil/nova/nova/..//instances/instance-1165315330/disk-raw
DEBUG:root:Running cmd (subprocess): dd if=/dev/zero of=/home/tpatil/nova/nova/..//instances/instance-1165315330/disk count=1 seek=62 bs=512
DEBUG:root:Running cmd (subprocess): parted --script /home/tpatil/nova/nova/..//instances/instance-1165315330/disk mklabel msdos
DEBUG:root:Running cmd (subprocess): dd if=/home/tpatil/nova/nova/..//instances/instance-1165315330/disk-raw of=/home/tpatil/nova/nova/..//instances/instance-1165315330/disk bs=268435456 conv=notrunc,fsync oflag=append
DEBUG:root:Running cmd (subprocess): parted --script /home/tpatil/nova/nova/..//instances/instance-1165315330/disk mkpart primary 63s 20971582s
DEBUG:root:Running cmd (subprocess): dd if=/dev/zero of=/home/tpatil/nova/nova/..//instances/instance-1165315330/disk count=1 seek=188743742 bs=512
DEBUG:root:Running cmd (subprocess): parted --script /home/tpatil/nova/nova/..//instances/instance-1165315330/disk mkpartfs primary ext2 20971583s 188743742s
libvir: QEMU error : internal error process exited while connecting to monitor: qemu: at most 2047 MB RAM can be simulated
ERROR:root:Uncaught exception
Traceback (most recent call last):
File "/home/tpatil/nova/nova/exception.py", line 83, in _wrap
return f(*args, **kw)
File "/home/tpatil/nova/nova/virt/libvirt_conn.py", line 355, in spawn
self._conn.createXML(xml, 0)
File "/usr/lib/python2.6/dist-packages/libvirt.py", line 1289, in createXML
if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error process exited while connecting to monitor: qemu: at most 2047 MB RAM can be simulated
ERROR:root:instance instance-1165315330: Failed to spawn
Traceback (most recent call last):
File "/home/tpatil/nova/nova/compute/manager.py", line 146, in run_instance
self.driver.spawn(instance_ref)
File "/home/tpatil/nova/nova/exception.py", line 89, in _wrap
raise Error(str(e))
Error: internal error process exited while connecting to monitor: qemu: at most 2047 MB RAM can be simulated
libvir: QEMU error : Domain not found: no domain with matching name 'instance-1165315330'
With current code, the instance apparently goes to "failed to spawn" state. Are we supposed to recover from that ? How do we handle the case when the scheduler sends the request to a host that is already full (vcpu, memory...) ?