OpenStack Compute (nova)

Bug #1350766
Comment #7

Comment 7 for bug 1350766

Revision history for this message

Michael Steffens (michael-steffens-b) wrote on 2014-08-12:

A vulnerability exploited by normal user behavior (such as putting load on a system), that can be used to cause corruption across different user instances I'd even be more concerned about, than something that needs special actions. On the other hand, modulating the load in manner that a specific corruption (such as selectively dropping chunks), would require very sophisticated actions, I agree.

Nevertheless, yesterday, after a regular Ubuntu nova-compute update reverted my local fix to defective behavior, I observed a new variant of corruption: A new snapshot booted fine, but then exposed filesystem errors. After redoing the whole exercise using the same image after reapplying the fsync patch, everything was fine.

I wouldn't be surprised if such issues do already surface in production now and then (less frequent than in my environment, though), but are then blamed on guest OS issues instead. Let me illustrate.:

This is how it looks to the end user: Take a snapshot, launch, fails. Launch the same snapshot again, fails the same way. Looks like the snaphost itself is defective, doesn't it? Most suspected: the filesystem has been in inconsistent state when doing the snapshot. So let's do a new snapshot. And indeed that either works, or fails consistently in a different way than the first.

Who wouldn't conclude that it's the guest OS or the way the snapshot is done (nothing OpenStack could do anything about) that is at fault, rather than the image being corrupted after download from glance, and then cached?

Is there anything I can provide to get this ticket out of the incomplete and unassigned state?