Comment 51 for bug 997978

Revision history for this message
Gary Cuozzo (ua5r) wrote :

I have seen this issue on 2 different servers which use bridging but not bonding.

One server was a customer system and we were forced to back-date the OS to an earlier release. They were experiencing the issue up to once/day and quickly got impatient to have it resolved.

The other server is an internal system which runs multiple vm's. We have only seen the issue on one of the vm's and only once every 2-3 weeks. The vm which experiences the issue is our LTSP server.

I have been testing a small cluster of 3 host machines which use both bonding and bridging. I have not seen this issue affect them, but the usage is quite light and the vm's come & go since it's a testing environment right now. Due to this bug, we have halted any plans to upgrade vm hosts to Precise until we can verify it's fixed.

We've seen the following when the issue has occurred:
* Absolutely nothing in any logs, dmesg, etc.
* Host machine cannot ping the guest
* arp shows guest as incomplete
* guest machine can ping its own IP, but nothing else (host, gw, etc)
* restarting networking subsystem is successful (no errors) but has no effect on the problem
* rebooting the guest fixes the problem until it happens again. The reboot does not actually kill the kvm session and get a new process ID, but somehow having the guest go through the init again fixes it (until it happens again some period later).
* This issue has occurred on one 12.04 guest and one 11.10 guest
* Both of the servers which this occured on are Dell 2950 series machines. I have not seen this issue on any of our HP Proliant (mostly DL360's) machines.

If there is some sort of test I can run to help debug, I'm happy to do that.

Thank you for trying to address this. This is a huge bug for us.

Thanks,
gary