md trancates devices with kernel >=2.6.32-37-server

Bug #974275 reported by annunaki2k2
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned

Bug Description

After updating the kernel to one >=2.6.32-37 our MD device decided to show up as only a fraction of it's previous size.

With the previous kernel, md0 shows up (correctly) as 85681.66 GiB, but with the newer kernel it shows up as only 8192.00 GiB - a serious order of magnitude different! Needless to say, with the incorrect size being reported/used, the md device is unusable and prevents access to the partitions on it.

Whilst running 2.6.32-33-server:
root@saturn:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90
  Creation Time : Mon Aug 9 09:15:41 2010
     Raid Level : raid0
     Array Size : 89843731712 (85681.66 GiB 91999.98 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Aug 9 09:15:41 2010
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 128K

           UUID : 1c27f277:0fbb69c8:852484a6:e390a598 (local to host saturn)
         Events : 0.1

    Number Major Minor RaidDevice State
       0 8 32 0 active sync /dev/sdc
       1 8 48 1 active sync /dev/sdd
root@saturn:~# cat /proc/partitions
major minor #blocks name

   8 0 488386584 sda
   8 1 468581376 sda1
   8 2 1 sda2
   8 5 19803136 sda5
   8 16 488386584 sdb
   8 17 468581376 sdb1
   8 18 1 sdb2
   8 21 19803136 sdb5
   8 32 44921865984 sdc
   9 2 468581312 md2
   8 48 44921865984 sdd
   9 0 89843731712 md0
 259 0 89843731200 md0p1
root@saturn:~# fdisk -l /dev/md0

WARNING: GPT (GUID Partition Table) detected on '/dev/md0'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/md0: 92000.0 GB, 91999981273088 bytes
255 heads, 63 sectors/track, 11185027 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 131072 bytes / 262144 bytes
Disk identifier: 0x00000000

    Device Boot Start End Blocks Id System
/dev/md0p1 1 267350 2147483647+ ee GPT
Partition 1 does not start on physical sector boundary.

But whilst running 2.6.32-40-server:
root@saturn:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90
  Creation Time : Mon Aug 9 09:15:41 2010
     Raid Level : raid0
     Array Size : 8589934336 (8192.00 GiB 8796.09 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Aug 9 09:15:41 2010
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 128K

           UUID : 1c27f277:0fbb69c8:852484a6:e390a598 (local to host saturn)
         Events : 0.1

    Number Major Minor RaidDevice State
       0 8 32 0 active sync /dev/sdc
       1 8 48 1 active sync /dev/sdd
root@saturn:~# cat /proc/partitions
major minor #blocks name

   8 0 488386584 sda
   8 1 468581376 sda1
   8 2 1 sda2
   8 5 19803136 sda5
   8 16 488386584 sdb
   8 17 468581376 sdb1
   8 18 1 sdb2
   8 21 19803136 sdb5
   8 48 44921865984 sdd
   8 32 44921865984 sdc
   9 2 468581312 md2
   9 0 8589934336 md0
root@saturn:~# fdisk -l /dev/md0

WARNING: GPT (GUID Partition Table) detected on '/dev/md0'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/md0: 8796.1 GB, 8796092760064 bytes
255 heads, 63 sectors/track, 1069397 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 131072 bytes / 262144 bytes
Disk identifier: 0x00000000

    Device Boot Start End Blocks Id System
/dev/md0p1 1 267350 2147483647+ ee GPT
Partition 1 does not start on physical sector boundary.

This is a major bug introduced on (what should be) a mature LTS release, and will need addressing ASAP.

Revision history for this message
annunaki2k2 (russell-knighton) wrote :

I meant to add that this is so far the only other reference I have found to this issue. I guess there aren't many others running with 92TB stores out there...

http://lists.debian.org/debian-kernel/2012/02/msg00774.html

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 974275

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: lucid
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
annunaki2k2 (russell-knighton) wrote : Re: md trancates devices with kernel 2.6.32-40-server

Unfortunately, for security reasons, the machine in question does not have any form of Internet access (proxy'd or other), and so I can not run the requested "aapport-collect 974275" command on this machine. If there is an alternative "off-line" process I can run, please let me know.

Revision history for this message
annunaki2k2 (russell-knighton) wrote :

In response to the automated message.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Could you run the following, then attche the file /tmp/report.974275 file:

apport-bug --save /tmp/report.974275 linux

Another option would be to run the following and attach the output files:

1) uname -a > uname-a.log
2) dmesg > dmesg.log
3) sudo lspci -vvnn > lspci-vvnn.log
4) cat /proc/version_signature > version.log

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
annunaki2k2 (russell-knighton) wrote :
Revision history for this message
annunaki2k2 (russell-knighton) wrote :
Revision history for this message
annunaki2k2 (russell-knighton) wrote :
Revision history for this message
annunaki2k2 (russell-knighton) wrote :
Revision history for this message
annunaki2k2 (russell-knighton) wrote :

Added requested log file output.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Does the device show up with the correct size if you reboot back into the 2.6.32-33-server kernel?

Revision history for this message
annunaki2k2 (russell-knighton) wrote :

Yes, luckily it does - so no lasting damage is/was caused. Needless to say though, until the fault was pinned down to the kernel change, there was a serious moment of panic about the possible loss of 70TiB + of data!

According to the person from the link in my first post, the problem didn't appear to exist in 2.6.32-39 but was there in 2.6.32-41; but I encountered it in 2.6.32-40. They were, however as far as I can tell, using a vanilla Debian; but assuming these kernel numbers tally-up with Ubuntu's, it appears the problem crept in somewhere in the 2.6.32-40 release.

Looking in the Debian kernel change-log, I wouldn't mind betting this would have had something to do with it:
   [ Ben Hutchings ]
   * Add longterm releases 2.6.32.47 and 2.6.32.48, including:
.....
     - md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.
.....

Unfortunately this is a very busy production machine and I have very limited time to pin point the issue further.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest Precise kernel? It would be good to know if this bug has been fixed already or not.

Changed in linux (Ubuntu):
importance: Medium → High
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The latest precise kernel can be downloaded from(Under the "Builds" section):
https://launchpad.net/ubuntu/+source/linux/3.2.0-22.35

Revision history for this message
annunaki2k2 (russell-knighton) wrote :

I may not yet have had an opportunity to try a 3.2.x kernel, but I have managed to 100% pin point the introduction of the bug as being in Ubuntu kernel release: linux-image-2.6.32-37-server

>=2.6.32-37 contain the MD truncation bug.
<=2.6.32-36 do not contain the bug and are safe for use.

Not much helping solving the bug, but at least it might be useful information for anyone else who stumbles upon this bug report.

Revision history for this message
penalvch (penalvch) wrote :

annunaki2k2, the next step is to fully commit bisect from Ubuntu kernel 2.6.32-36 to 2.6.32-37, in order to identify the offending commit. Could you please do this following https://wiki.ubuntu.com/Kernel/KernelBisection ?

description: updated
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
summary: - md trancates devices with kernel 2.6.32-40-server
+ md trancates devices with kernel >=2.6.32-37-server
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.