Comment 86 for bug 317781

Revision history for this message
Volodymyr M. Lisivka (vlisivka-gmail) wrote :

> Hopefully neither is true, but in that case, the chances of a file getting replaced by a zero-length file are very small indeed.

I expect that Ext4 is much better in performance than Ext3 and will save me about 1 minute per day in average, (6 hours per year - about 1 additional working day), which is very good.

On other hand, I can lose few hours per data corruption, which is comparable. My notebook has some problems with restoring from Hiberante (no proprietary code at my notebook at all), so unexpected hangups in few minutes after restore are common to me (2-3 per month), thus I can loss few days per year with Ext4 - about a working week.

Formula to calculate actual benefit: benefit_value*benefit_probability - loss_value*loss_probability [ - loss_value*loss_probability ]..., where is benefit_probability

For Ext3: benefit_value is zero comparing to Ext4 (Ext3 is slower than Ext4), but loss_value is small too - about 1 minute per failure.

Ext3_benefit = 0*(1-k) - 1m*k; where k is probability of failure per working day;
Ext4_benefit = 1m*(1-k) - 2h*k;

If you see failures less than twice a year, then Ext4 is better for you. If you see failures more than twice a year, then Ext3 is better.

> And again, I will note that XFS has been doing this all along, and other newer file systems will also be doing delayed allocation, and will be subject to the same pitfalls. Maybe they will also encode the same hacks to work around broken expectations, and people with crappy proprietary binary drivers. But folks really shouldn't be counting on this....

I, personally, have very bad experience with XFS. We used it on linux.org.ua site and I spent few days of my short life to fix corrupted files manually after few power failures in data centre (all files are created or modified recently, so backup is not helpful in this case). I recommend to stay away from XFS or similar filesystems in favour of Ext3, which has optimal balance between speed and robustness.

I used crash test in 2003 to check maturity of Ext3 filesystem. I set up computer to soft reset itself every 5 minutes, while executing filesystem intensive operations, and then left it for few days (Thursday-Monday). Ext3 is passed that test just fine.

Can we create few test cases with common filesystem usage patterns and run them continuously in Qemu on raw device and then use "pkill -9 qemu; qemu &" to simulate crash and restart? Such crash test can help much better than talks about this problem. Run it for few days to gather statistic about number of data corruption problems per failure.