Data loss on ext3, maybe related to data=journal
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Won't Fix
|
Medium
|
Unassigned |
Bug Description
I'm currently testing a backup scheme on a new karmic installation. The procedure worked flawlessly on jaunty and older Ubuntu/Debian distributions (albeit using hardware RAID on those, the new machine uses a software RAID). With karmic however I'm experiencing data loss (at least on the designated backup partition).
The partition in question gets mounted once per hour. The respective entry in /etc/fstab is
UUID="7420cd8f-
The partition is an LVM2 logical volume which runs on a single PV on a RAID 1 composed of 2 disks (driver is AHCI).
I noticed the data loss because I use sitecopy to push the backups to another machine after each backup run. On about 1 out of 3 backup runs sitecopy complains about a corrupted state file. I didn't check the backups for the integrity yet as I can reproduce the problem with sitecopy alone easily.
To reproduce it I do:
# cd /srv/backup/
# cp data.1001.
# sitecopy -r /srv/backup/
# cd /
# umount /srv/backup
# mount /srv/backup
# less /srv/backup/
In about one out of three runs, the last step step shows a corrupted file: Old contents + rest filled with zeros or a truncated file.
dmesg and syslog show nothing. In particular no journal-replay related message. Adding a "fsck.ext3 -f /dev/vg0/
So far I've discovered two ways to work around the problem:
* Don't use "data=journal". Both data=writeback and data=ordered seem to work fine
* Do "less /srv/backup/
Especially the latter seems to suggest a strange flush problem with the data=journal code in karmic's current x86-64 kernel (2.6.31.15.28).
# sudo lvdisplay /dev/vg0/srv_backup
--- Logical volume ---
LV Name /dev/vg0/srv_backup
VG Name vg0
LV UUID KXZqxv-
LV Write Access read/write
LV Status available
# open 0
LV Size 128.00 GB
Current LE 32768
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:12
# sudo pvdisplay
--- Physical volume ---
PV Name /dev/md2
VG Name vg0
PV Size 693.63 GB / not usable 4.12 MB
Allocatable yes
PE Size (KByte) 4096
Total PE 177567
Free PE 64927
Allocated PE 112640
PV UUID FHAWPv-
# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda3[0] sdb3[1]
727318656 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
1052160 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
4192896 blocks [2/2] [UU]
unused devices: <none>
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
status: | New → Triaged |
tags: | added: kernel-series-unknown |
tags: |
added: karmic removed: kernel-series-unknown |
Changed in linux (Ubuntu): | |
importance: | High → Medium |
Jürgen Kreileder, is it possible to attach the complete dmesg? Thanks!