LVM snapshot causes kernel memory corruption

Reported by Greg Hudson on 2008-02-28
Affects Status Importance Assigned to Milestone
linux-source-2.6.22 (Ubuntu)

Bug Description

Binary package hint: linux-image-2.6.22-14-generic

I have seen strong evidence of kernel memory corruption some of the time when creating LVM snapshots. To reproduce this, I do:

  rm -rf /tmp/test
  mkdir /tmp/test
  <put about 60MB of files into /tmp/test>
  find /tmp/test -type f | xargs md5sum > /tmp/sum.pre
  lvcreate --size 2G --snapshot /dev/dink/gutsy-i386-sbuild --name testsnapshot
  find /tmp/test -type f | xargs md5sum > /tmp/sum.post
  lvremove -f /dev/dink/testsnapshot
  diff -u /tmp/sum.pre /tmp/sum.post

where /dev/dink/gutsy-i386-sbuild is the name of an unmounted volume unrelated to the /tmp filesystem. Not all of the time, but some of the time when I do this, /tmp/sum.post shows a different md5sum for one of the files than /tmp/sum.pre did. If I reboot the machine, the md5sum of the apparently changed file reverts back to the correct value. Thus, the file itself is not corrupted, but the page cache is.

The corruption is always a changed value of a single byte, always at offset 156 within a 1K block (different block each time). The incorrect value of the byte is always one less than the correct value. For example:

@@ -471431,7 +471431,7 @@
 0731860: 4d46 6ae3 0252 6864 e634 15eb 7ac1 f0ee MFj..Rhd.4..z...
 0731870: 9f2b 8d82 33e3 138b 31a2 8da5 4594 5648 .+..3...1...E.VH
 0731880: 74fd 00e0 bc48 fe09 d557 f501 70a8 7dfd t....H...W..p.}.
-0731890: ea8f 5010 b963 e2ec 7b84 8ef7 e851 fdfa ..P..c..{....Q..
+0731890: ea8f 5010 b963 e2ec 7b84 8ef7 e751 fdfa ..P..c..{....Q..
 07318a0: 6031 670b cd54 fe01 20d6 f3fb c662 dfc3 `1g..T.. ....b..
 07318b0: 7605 acd2 1be6 3fee 54ff e15b bc60 77fa v.....?.T..[.`w.
 07318c0: 368e 99f9 60a0 a1a2 fbdf ef0d 4bca a201 6...`.......K...

/tmp is located in a different volume group than the volume I am snapshotting.

Some version information:

root@linux-build-10:~# uname -a
Linux linux-build-10 2.6.22-14-server #1 SMP Thu Jan 31 23:57:25 UTC 2008 x86_6\
4 GNU/Linux
root@linux-build-10:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 7.10
Release: 7.10
Codename: gutsy
root@linux-build-10:~# dpkg -s lvm2 | grep Version
Version: 2.02.26-1ubuntu4
root@linux-build-10:~# pvscan
  PV /dev/sdb VG dink lvm2 [136.73 GB / 110.73 GB free]
  PV /dev/sda5 VG LINUX-BUILD-10.mit.edu lvm2 [68.12 GB / 0 free]
  Total: 2 [204.85 GB] / in use: 2 [204.85 GB] / in no VG: 0 [0 ]
root@linux-build-10:~# vgscan
  Reading all physical volumes. This may take a while...
  Found volume group "dink" using metadata type lvm2
  Found volume group "LINUX-BUILD-10.mit.edu" using metadata type lvm2

Anders Kaseorg (anders-kaseorg) wrote :

I ran into exactly this problem several months ago on Gutsy amd64, but had blamed it on hardware until ghudson reproduced it by doing the same operations. We are both working on the same project, but using completely different hardware. We’re now convinced that there is a kernel bug here.

I have since upgraded to Hardy amd64, and was not immediately able to reproduce again. On a different Gutsy i386 machine, I eventually got lvcreate to segfault and lvremove to hang forever, but that’s less interesting. I’ll keep working on this.

The problem was also reported to linux-lvm:

Kjell Braden (afflux) wrote :

Thank you for taking your time and helping to make Ubuntu better.
The Hardy Heron Alpha series is currently under development and contains an updated version of the kernel. You can download and try the new Hardy Heron Alpha release from http://cdimage.ubuntu.com/releases/hardy/ . You should be able to test the new kernel using the LiveCD. If you can, please verify if this bug still exists or not and report back your results. General information regarding the release can also be found here: http://www.ubuntu.com/testing/ . Thanks.

Changed in linux-source-2.6.22:
status: New → Incomplete
Kjell Braden (afflux) wrote :

We are closing this bug report because it lacks the information we need to investigate the problem, as described in the previous comments. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New". Thanks again!

Changed in linux-source-2.6.22:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers