Ubuntu
linux package

Data loss on ext3, maybe related to data=journal

Bug #485562 reported by Jürgen Kreileder on 2009-11-19

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Won't Fix	Medium	Unassigned

Bug Description

I'm currently testing a backup scheme on a new karmic installation. The procedure worked flawlessly on jaunty and older Ubuntu/Debian distributions (albeit using hardware RAID on those, the new machine uses a software RAID). With karmic however I'm experiencing data loss (at least on the designated backup partition).

The partition in question gets mounted once per hour. The respective entry in /etc/fstab is

UUID="7420cd8f-dd47-4fdb-b64e-4fd02f945e43" /srv/backup ext3 noatime,nodiratime,user_xattr,acl,noauto,nodev,nosuid,data=journal 0 2

The partition is an LVM2 logical volume which runs on a single PV on a RAID 1 composed of 2 disks (driver is AHCI).

I noticed the data loss because I use sitecopy to push the backups to another machine after each backup run. On about 1 out of 3 backup runs sitecopy complains about a corrupted state file. I didn't check the backups for the integrity yet as I can reproduce the problem with sitecopy alone easily.

To reproduce it I do:

# cd /srv/backup/backup2l/scripts/
# cp data.1001.all.tar.gpg xxxx # change something so sitecopy has something to push
# sitecopy -r /srv/backup/backup2l/scripts/.sitecopyrc -p /srv/backup/backup2l/scripts/.sitecopy -q -u backup
# cd /
# umount /srv/backup
# mount /srv/backup
# less /srv/backup/backup2l/scripts/.sitecopy/backup

In about one out of three runs, the last step step shows a corrupted file: Old contents + rest filled with zeros or a truncated file.

dmesg and syslog show nothing. In particular no journal-replay related message. Adding a "fsck.ext3 -f /dev/vg0/srv_backup" before mounting shows no problem either, still the file gets corrupted every now and then.

So far I've discovered two ways to work around the problem:
* Don't use "data=journal". Both data=writeback and data=ordered seem to work fine
* Do "less /srv/backup/backup2l/scripts/.sitecopy/backup" before the unmount

Especially the latter seems to suggest a strange flush problem with the data=journal code in karmic's current x86-64 kernel (2.6.31.15.28).

# sudo lvdisplay /dev/vg0/srv_backup
  --- Logical volume ---
  LV Name /dev/vg0/srv_backup
  VG Name vg0
  LV UUID KXZqxv-v8MQ-UD4x-41Vf-2c2t-0wsr-etUNjQ
  LV Write Access read/write
  LV Status available
  # open 0
  LV Size 128.00 GB
  Current LE 32768
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 253:12

# sudo pvdisplay
  --- Physical volume ---
  PV Name /dev/md2
  VG Name vg0
  PV Size 693.63 GB / not usable 4.12 MB
  Allocatable yes
  PE Size (KByte) 4096
  Total PE 177567
  Free PE 64927
  Allocated PE 112640
  PV UUID FHAWPv-otHj-jpDD-x35T-nE0Q-13uB-30GuSt

# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda3[0] sdb3[1]
727318656 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
1052160 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
4192896 blocks [2/2] [UU]

unused devices: <none>

Tags:

Revision history for this message

Jürgen Kreileder (jk) wrote on 2009-11-19:

dmesg.txt Edit (1.4 KiB, text/plain)

Revision history for this message

Jürgen Kreileder (jk) wrote on 2009-11-19:

lspci.txt Edit (29.9 KiB, text/plain)

Revision history for this message

Jürgen Kreileder (jk) wrote on 2009-11-19:

cpuinfo.txt Edit (6.3 KiB, text/plain)

Leann Ogasawara (leannogasawara) on 2009-11-22

Changed in linux (Ubuntu):
importance:	Undecided → High
status:	New → Triaged

Andy Whitcroft (apw) on 2009-11-30

tags:

added: kernel-series-unknown

^_Pepe_^ (jose-angel-fernandez-freire) on 2009-12-14

tags:

added: karmic
removed: kernel-series-unknown

Revision history for this message

Surbhi Palande (csurbhi) wrote on 2010-03-22:

Jürgen Kreileder, is it possible to attach the complete dmesg? Thanks!

Revision history for this message

Jürgen Kreileder (jk) wrote on 2010-03-22:

dmesg2.txt Edit (83.5 KiB, text/plain)

dmesg2.txt is obviously with a different kernel (the bug report was filed 4 months ago).
I can't tell whether the problem still occurs with this kernel, the machine is in production and I won't experiment on it.

Revision history for this message

Surbhi Palande (csurbhi) wrote on 2010-03-29:

Jurgen Kreilder, there is a patch in the Ubuntu kernel which we believe fixes this error:
commit 56fcad29d4b3cbcbb2ed47a9d3ceca3f57175417
Author: Jan Kara <email address hidden>
Date: Tue Sep 8 14:59:42 2009 +0200
ext3: Flush disk caches on fsync when needed

Please do let me know if this bug persists for you whenever you can experiment again. I will need to investigate, if things are not working for you. Thanks!

Surbhi Palande (csurbhi) on 2010-04-01

Changed in linux (Ubuntu):
importance:	High → Medium

Revision history for this message

Brad Figg (brad-figg) wrote on 2011-07-14: Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	Triaged → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

Data loss on ext3, maybe related to data=journal

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package