Comment 97 for bug 997978

Revision history for this message
Matt Hilt (mjhilt-x) wrote :

Soren,

We have a 12.04 based OpenStack cluster with 4 host nodes running about 30 VMs currently.
We performed the steps to add the kvm-network-hang repo and updated to the latest version on the host machines, then rebooted the instances. My understanding is that this should catch the update, since a new KVM command is run on reboot.

I caught the first failure ~12 hours after the upgrade. It had the usual symptoms: networking loss, but the VM is still up and an active VNC session was possible. I thought I just might have missed a reboot on one of the VMs, so I didn't report anything. The second failure happened yesterday, but someone else caught it and rebooted the VM. As best we can tell after the fact, it looks like the usual failure (no full harddrive, or kernel panic, or anything that got logged).

As I mentioned before, we used to see at least one failure per day, usually much more. This patch has at least reduced the occurence to a minimal amount. These non-deterministic bugs are hard to track down.