I was able to reproduce this issue again for the 3rd time on a clean install. I believe that anybody can reproduce this with the following steps:
STEP 1: Do a clean Ubuntu Server 10.04 installation. When I installed, I selected "OpenSSH server" as the only additional installation package.
STEP 2: Install libvirt-bin package.
I installed it as such: apt-get install eucalyptus-nc
This downloaded libvirt-bin as a dependency.
After I installed Eucalyptus NC and configured the node properly I verified that this node was able to execute my images properly. Everything worked fine. As part of configuring Eucalyptus-NC, I also created a bridge br0 to my eth0 dev.
STEP 3: Edit /etc/libvirt/qemu.conf
Config change:
security_driver = "none"
I restarted libvirt-bin and verified again that everything was working fine.
This step may have a benign effect or it may act in combination with STEP 4 to cause the issue.
STEP 4: Purge apparmor package & reboot
root@srv-uec-qa-node01:~# apt-get purge apparmor
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
apparmor* apparmor-utils*
0 upgraded, 0 newly installed, 2 to remove and 9 not upgraded.
After this operation, 4,067kB disk space will be freed.
Do you want to continue [Y/n]? y
(Reading database ... 44060 files and directories currently installed.)
Removing apparmor-utils ...
Purging configuration files for apparmor-utils ...
Removing apparmor ...
* Unloading AppArmor profiles [ OK ]
Purging configuration files for apparmor ...
dpkg: warning: while removing apparmor, directory '/etc/apparmor.d/cache' not empty so not removed.
Processing triggers for man-db ...
Processing triggers for ureadahead ...
ureadahead will be reprofiled on next reboot
After STEP4 my problem is reproduced. I get the following error when I try to run vm instances using libvirtd:
root@srv-uec-qa-node01:/var/lib/eucalyptus/instances/admin/i-5AF109A2# virsh start i-5AF109A2
error: Failed to start domain i-5AF109A2
error: monitor socket did not show up.: Connection refused
Thus, the problem is either STEP4 by itself or combination of STEP 3 + STEP 4.
------
Even if I don't purge apparmor but disable it with a kernel param (apparmor=0), the problem repeats itself, suggesting libvirtd relies on the presence of apparmor in order to function properly.
------
I haven't had 100% success fixing libvirt after this.
In one case, reinstalling apparmor seemed to make libvirtd work again.
However, in another case I wasn't successful. On this system I had disabled apparmor using a kernel boot param. Additionally, this system was in a "broken" state for several days and may have seen other changes beside the boot param. I undid my modifications to the kernel boot parameters, rebooted, reinstalled apparmor, but the problem persisted.
I was able to reproduce this issue again for the 3rd time on a clean install. I believe that anybody can reproduce this with the following steps:
STEP 1: Do a clean Ubuntu Server 10.04 installation. When I installed, I selected "OpenSSH server" as the only additional installation package.
STEP 2: Install libvirt-bin package.
I installed it as such: apt-get install eucalyptus-nc
This downloaded libvirt-bin as a dependency.
After I installed Eucalyptus NC and configured the node properly I verified that this node was able to execute my images properly. Everything worked fine. As part of configuring Eucalyptus-NC, I also created a bridge br0 to my eth0 dev.
STEP 3: Edit /etc/libvirt/ qemu.conf
Config change:
security_driver = "none"
I restarted libvirt-bin and verified again that everything was working fine.
This step may have a benign effect or it may act in combination with STEP 4 to cause the issue.
STEP 4: Purge apparmor package & reboot
root@srv- uec-qa- node01: ~# apt-get purge apparmor d/cache' not empty so not removed.
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
apparmor* apparmor-utils*
0 upgraded, 0 newly installed, 2 to remove and 9 not upgraded.
After this operation, 4,067kB disk space will be freed.
Do you want to continue [Y/n]? y
(Reading database ... 44060 files and directories currently installed.)
Removing apparmor-utils ...
Purging configuration files for apparmor-utils ...
Removing apparmor ...
* Unloading AppArmor profiles [ OK ]
Purging configuration files for apparmor ...
dpkg: warning: while removing apparmor, directory '/etc/apparmor.
Processing triggers for man-db ...
Processing triggers for ureadahead ...
ureadahead will be reprofiled on next reboot
----
Then I reboot.
======= ======= ======= ======= ======= ======= ======= ======= ======= ======= =======
After STEP4 my problem is reproduced. I get the following error when I try to run vm instances using libvirtd:
root@srv- uec-qa- node01: /var/lib/ eucalyptus/ instances/ admin/i- 5AF109A2# virsh start i-5AF109A2
error: Failed to start domain i-5AF109A2
error: monitor socket did not show up.: Connection refused
Thus, the problem is either STEP4 by itself or combination of STEP 3 + STEP 4.
------
Even if I don't purge apparmor but disable it with a kernel param (apparmor=0), the problem repeats itself, suggesting libvirtd relies on the presence of apparmor in order to function properly.
------
I haven't had 100% success fixing libvirt after this.
In one case, reinstalling apparmor seemed to make libvirtd work again.
However, in another case I wasn't successful. On this system I had disabled apparmor using a kernel boot param. Additionally, this system was in a "broken" state for several days and may have seen other changes beside the boot param. I undid my modifications to the kernel boot parameters, rebooted, reinstalled apparmor, but the problem persisted.