Software RAID fails to rebuild after testing degraded cold boot

Bug #925280 reported by Jeff Lane on 2012-02-02
66
This bug affects 8 people
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Medium
Dimitri John Ledkov

Bug Description

Attempting the RAID install test with Precise server AMD64.

Hardware config is a 1U server with 2 SATA drives wth the following partitions:

sda: 500GB SATA
sda1: 50GB RAID
sda2: 20GB RAID
sda3: 180GB RAID

sdb: 250GB SATA
sdb1: 50GB RAID
sdb2: 20GB RAID
sdb3: 180GB RAID

Using the instructions found here: http://testcases.qa.ubuntu.com/Install/ServerRAID1

I created the three partitions for each physical disk. I then created three RAID deviecs, md0 - md2 as follows:

md0: 50GB RAID1 using sda1 and sdb1 for /
md1: 20GB RAID1 using sda2 and sdb2 for swap
md2: 180GB RAID1 using sda3 and sdb3 for /home

I then completed the install and reboot. On the initial boot, I verified that all three RAID devices were present and active:

bladernr@ubuntu:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda1[0] sdb1[1]
           48826296 blocks super 1.2 [2/2] [UU]

md2: active raid1 sda3[0] sdb3[1]
          175838136 blocks super 1.2 [2/2] [UU]

md1: active raid1 sda2[0] sdb2[1]
          19529656 blocks super 1.2 [2/2] [UU]

I then powered the machine down per the test case instructions, removed disk 2 (sdb) and powered back up. On reboot, I verified that the array was active and degraded and powered the system back down, again per the test instructions.

I re-inserted drive2 (sdb) and powered the system up again. After logging in, I rechecked /dev/mdstat, expecting to see both drives for each md device and a resync in progress. Instead, I found that the second drive was missing from md0 and md2 while md1 (the swap LUN) was fine.

bladernr@ubuntu:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda1[0]
           48826296 blocks super 1.2 [2/1] [U_]

md2: active raid1 sda3[0]
          175838136 blocks super 1.2 [2/1][U_]

md1: active raid1 sda2[0] sdb2[1]
          19529656 blocks super 1.2 [2/2] [UU]

The instructions indicated that I may have to re-add the drives that are missing manually, so I attemted this:

bladernr@ubuntu:~$ sudo mdadm --add /dev/md0 /dev/sdb1
mdadm: /dev/sdb1 reports being an active member for /dev/md0, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sdb1 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdb1" first.

I also tried using --re-add:

bladernr@ubuntu~$ sudo mdadm --re-add /dev/md0 /dev/sdb1
mdadm: --re-add for /dev/sdb1 to /dev/md0 is not possible

So here's some info from mdadm:

/dev/md0:
                   Version : 1.2
       Creation Time : Wed Feb 1 20:53:34
              Raid Level : raid1
              Array Size : 48826296 (46.56 GiB 50.00GB)
       Used Dev Size : 48826296 (46.56 GiB 50.00GB)
         Raid Devices : 2
        Total Devices : 1
           Persistence : Superblock is persistent

         Update Time : Wed Feb 1 23:54:04 2012
                       State : clean, degraded
      Active Devices : 1
  Working Devices : 1
       Failed Devices : 0
       Spare Devices : 0

                      Name : ubuntu:0 (local to host ubuntu)
                       UUID : 118d60db:4ddc5cf2:040c4cb2:bd896eaf
                    Events : 118

    Number Major Minor RaidDevices State
            0 8 1 0 active sync /dev/sda1
            1 0 0 1 removed

So according to the test instructions, this test is a failure because I can't rebuild the array (nor is it automatically rebuilt).

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-12-generic 3.2.0-12.21
ProcVersionSignature: Ubuntu 3.2.0-12.21-generic 3.2.2
Uname: Linux 3.2.0-12-generic x86_64
AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Feb 1 23:35 seq
 crw-rw---T 1 root audio 116, 33 Feb 1 23:35 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 1.91-0ubuntu1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
Date: Wed Feb 1 23:38:28 2012
HibernationDevice: RESUME=UUID=e573077c-98b5-42e5-9f37-b8efaa2ba74a
InstallationMedia: Ubuntu-Server 12.04 LTS "Precise Pangolin" - Alpha amd64 (20120201.1)
IwConfig:
 lo no wireless extensions.

 eth1 no wireless extensions.

 eth0 no wireless extensions.
MachineType: Supermicro X7DVL
PciMultimedia:

ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-12-generic root=UUID=a84486b9-e72d-4134-82a8-263f91d7d894 ro
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-12-generic N/A
 linux-backports-modules-3.2.0-12-generic N/A
 linux-firmware 1.68
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 06/23/2008
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 2.1
dmi.board.name: X7DVL
dmi.board.vendor: Supermicro
dmi.board.version: PCB Version
dmi.chassis.type: 1
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr2.1:bd06/23/2008:svnSupermicro:pnX7DVL:pvr0123456789:rvnSupermicro:rnX7DVL:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:
dmi.product.name: X7DVL
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro

Jeff Lane (bladernr) wrote :
Jeff Lane (bladernr) wrote :

I finally found a solution. I had to zero the superblocks on the missing partitions:

sudo mdadm --zero-superblock /dev/sdb1
sudo mdamd --zero-superblock /dev/sdb3
sudo mdadm --manage /dev/md0 --add /dev/sdb1
sudo mdadm --manage /dev/md2 --add /dev/sdb3

This now shows md0 currently in recovery and md2 as rsync=DELAYED.

So either the instructions are incorrect, or there is an issue in how these devices are handled. This test really should not have degraded the array (or disks) to the extent that I had to zero the second disk to re-add.

Also it's worrysome that this only occurred on the ext4 LUN and did not affect the swap LUN.

Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/925280

tags: added: iso-testing
Brad Figg (brad-figg) on 2012-02-02
Changed in linux (Ubuntu):
status: New → Confirmed
Jeff Lane (bladernr) wrote :

So after letting the server sit for a few hours, all missing members of the arrays have been re-synced and are functional again. Perhaps this behaviour IS correct. Most of my RAID experience is with hardware RAID, admittedly, so I'm used to just swapping a disk out and the controller would handle adding/re-syncing automatically. I've not done a lot of degradation testing with software RAID and having to go to these extra lengths may well be standard behaviour.

IN any case, if this is truly expected behaviour, the test case then

http://testcases.qa.ubuntu.com/Install/ServerRAID1

should be updated to reflect this. (assuming, of course, it is decided that this is not a bug after all).

Joseph Salisbury (jsalisbury) wrote :

@Jeff

It would be great if you could have someone from the server team review the test case. I see hggdh2 edited that page last. Maybe he could provide some feedback.

Changed in linux (Ubuntu):
importance: Undecided → Medium

On 02/02/2012 10:39 AM, Joseph Salisbury wrote:
> @Jeff
>
> It would be great if you could have someone from the server team review
> the test case. I see hggdh2 edited that page last. Maybe he could
> provide some feedback.

Quote hggdh: "Yep, looks like a genuine bug"

--
Jeff Lane - Hardware Certification Engineer and Test Tools Developer
Ubuntu Ham: W4KDH
Freenode IRC: bladernr or bladernr_
gpg: 1024D/3A14B2DD 8C88 B076 0DD7 B404 1417 C466 4ABD 3635 3A14 B2DD

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-13.22
Phillip Susi (psusi) wrote :

I wonder if you did not let the drives get fully synced the first time before removing one? Could you try again now that the drives are in sync?

Jeff Lane (bladernr) wrote :

Hrmmm... that is a good point. I "think" that they were fully synced, but I could be mistaken... had a lot going on that day. I had to flatten that machine for other development work, but I'll try to find time next week to rebuild the array and verify that the drives are synced then see if I can recreate this.

Changed in mdadm (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
Jeff Lane (bladernr) wrote :
Download full text (3.3 KiB)

OK... so I retried... Here is mdstat after installing and booting, and waiting to ensure all syncing had completed:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sda2[0] sdb2[1]
      19529656 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      48826296 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sda3[0] sdb3[1]
      175779768 blocks super 1.2 [2/2] [UU]

unused devices: <none>

Next, I shut down the system and remove disk 1. On reboot, I run mdstat and note the degraded array with missing members:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active (auto-read-only) raid1 sda2[0]
      19529656 blocks super 1.2 [2/1] [U_]

md0 : active raid1 sda1[0]
      48826296 blocks super 1.2 [2/1] [U_]

md2 : active raid1 sda3[0]
      175779768 blocks super 1.2 [2/1] [U_]

unused devices: <none>

Then I shut down and re-insert drive 2:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sda2[0] sdb2[1]
      19529656 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0]
      48826296 blocks super 1.2 [2/1] [U_]

md2 : active raid1 sda3[0]
      175779768 blocks super 1.2 [2/1] [U_]

unused devices: <none>

Then I try manually adding the disks per the test case:
bladernr@ubuntu"~$ sudo mdadm --add /dev/md0 /dev/sdb1
mdadm: /dev/sdb1 reports being an active member for /dev/md0, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sdb1 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdb1" first.
bladernr@ubuntu:~$ sudo mdadm --re-add /dev/md0 /dev/sdb1
mdadm: --re-add for /dev/sdb1 to /dev/md0 is not possible.

I got the same for /dev/md2 when trying to re-add /dev/sdb3, so I zero the superblocks, which is essentially blanking the disk and adding it as though it were a brand new disk into the array.

bladernr@ubuntu"~$ sudo mdadm --zero-superblock /dev/sdb3
bladernr@ubuntu"~$ sudo mdadm --zero-superblock /dev/sdb1
bladernr@ubuntu"~$ sudo mdadm --add /dev/md0 /dev/sdb1
mdadm: added /dev/sdb1
bladernr@ubuntu"~$ sudo mdadm --add /dev/md2 /dev/sdb3
mdadm: added /dev/sdb3

bladernr@ubuntu:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sda2[0] sdb2[1]
      19529656 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdb1[2] sda1[0]
      48826296 blocks super 1.2 [2/1] [U_]
      [==>..................] recovery = 11.9% (5819456/48826296) finish=11.9min speed=59970K/sec

md2 : active raid1 sdb3[2] sda3[0]
      175779768 blocks super 1.2 [2/1] [U_]
       resync=DELAYED

unused devices: <none>

according to the test case, the most I should have to do is just plug the disk back in and reboot the server which should cause mdadm to automatically re-add the disk and start re-syncing. The most I should have to do is just use the --add command to add the disk back in (or re-add) manually.

What I am actually having to do is essentially destroy the partitions for the ext4 LUNs ...

Read more...

Changed in mdadm (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Incomplete → New
Changed in mdadm (Ubuntu):
status: Confirmed → New
Jeff Lane (bladernr) wrote :

Not sure if I should set this to Confirmed or back to New now that I've replied, so setting back to New.

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: New → Confirmed
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-15.24
Phillip Susi (psusi) wrote :

It looks like there have been some changes in mdadm behavior. They don't appear to be documented in the announcements, so they could be bugs.

First, it appears that --re-add now requires an intent bitmap. Second, incremental mode refuses to (re)add the disk to the array once it has started degraded. Finally, --add now seems to require --run to reinsert the disk.

If these changes were intentional, then I suppose that the test procedure needs updated. I'll ask on the mailing list. If they were intentional, then the test procedure should say that you need to enable the write intent bitmap and manually --re-add the disk, or --add --run without a bitmap.

Changed in mdadm (Ubuntu):
status: New → Confirmed
Christian Heimes (heimes) wrote :

The bug is still in 12.04 beta 2 with all current updates. My software RAID arrays did neither recover automatically on boot nor did mdadm --add work. I had to zero out the superblocks and manually assemble the RAID arrays.

Ubuntu : Ubuntu precise (development branch)
Kernel: 3.2.0-23-generic x86_64
mdadm: 3.2.3-2ubuntu1

iMac (imac-netstatz) wrote :

IMHO, this is the *new* expected behavior. If both the raid members left the array in a good state (i.e. you unplugged one while the system was off) then you need to zero the superblock to get it back into the array.

I suspect your test case would work with a disk that only had the structures, and not the clean data inside; Perhaps doing a live pull on the cable (simulate a controller failure) for your test, in an environment where you don't care about the data.

In that case, upon restart, I would expect the "dirty" and "old" md disk to be automatically rebuilt.

In one of my use cases, where I use mdadm slightly differently across two computers, it solves a problem where the older disk is sometimes mounted when both md members are clean; In this case the new data is overwritten by the old, which can be a real issue caused by the old behavior.

Factors that influence use cases where old data could overwrite new data previously are related to individual disk spin up times, and availability of disks at boot (especially with remote block devices), which is probably the reason for this 'feature'. My observations are in the dupe below.

The use case for you should probably include a real 'spare' rather then using an old member in a good state (which should probably be not overwritten by default)

https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/945786

Christian Heimes (heimes) wrote :

I *did* unplug the cable while the system was running and the RAIDs didn't recover automatically.

But I'll better start from the beginning:
I'm in the process of preparing replacement disks and a new OS installation for a production server. The current disks are several years old and are reaching end of lifetime. The new disks are three SSDs with SATA interface. I need only some small disks for the OS, temporary files and caches because the majority of data is stored on several fibre channel enclosures.

I've created four equal partitions (not counting the container for logical partitions) on all three SSDs:

* /boot with RAID 1
* / with RAID 5
* swap with RAID 5
* data partition for caches with RAID 5

During my tests I pulled the SATA cable from one of the SSDs to test the behaviour of the RAID sets and the SMTP server. The system noticed the missing disk within seconds and sent out notification emails as expected. Then I tested how the system handles a reboot with a missing disk. GRUB loaded successfully but the system did not (which is an entirely different issue I need to investigate later). I plugged the disk back, the OS came up fine but it didn't add the formerly disconnected disk to the RAID sets.

I tried mdadm --add and mdadm --re-add without success. mdadm --details clearly showed that the system was aware that the disconnected partitions used to belong to the RAID sets because they had the same UUID. On Ubuntu 10.04 LTS I never had to zero out the superblock to re-join a disconnected disk.

IMHO a user shall expect that mdadm rejoins a formerly disconnected RAID member as soon as possible without any user interaction.

Christian Heimes (heimes) wrote :
Download full text (3.4 KiB)

I did another test. First of all I shutdown and powered down the server properly. Then I unplugged one device and started the computer again. mdadm detected the missing device. After a couple of minutes I re-attached the SATA cable. mdadm didn't make an attempt to integrate the partitions.

Here is an output of mdadm -D for the MD device and -E for all three partitions. /dev/sdc1 is the partition of the disk that was missing during boot. The 'Array State : AAA' of /dev/sdc1 looks suspicious to me.

# mdadm -D /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Tue Apr 17 15:09:30 2012
     Raid Level : raid1
     Array Size : 487412 (476.07 MiB 499.11 MB)
  Used Dev Size : 487412 (476.07 MiB 499.11 MB)
   Raid Devices : 3
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Mon Apr 23 14:11:48 2012
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : vlserver:0 (local to host vlserver)
           UUID : b3b367ee:f2f4f4c0:c22cdc7b:4ac1ea2c
         Events : 89

    Number Major Minor RaidDevice State
       3 8 1 0 active sync /dev/sda1
       1 8 17 1 active sync /dev/sdb1
       2 0 0 2 removed

# mdadm -E /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : b3b367ee:f2f4f4c0:c22cdc7b:4ac1ea2c
           Name : vlserver:0 (local to host vlserver)
  Creation Time : Tue Apr 17 15:09:30 2012
     Raid Level : raid1
   Raid Devices : 3

 Avail Dev Size : 974824 (476.07 MiB 499.11 MB)
     Array Size : 974824 (476.07 MiB 499.11 MB)
    Data Offset : 24 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : caee6e67:ea75cde1:bd3ec883:f6c4f709

    Update Time : Mon Apr 23 14:11:48 2012
       Checksum : 4ea9d4d9 - correct
         Events : 89

   Device Role : Active device 0
   Array State : AA. ('A' == active, '.' == missing)

# mdadm -E /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : b3b367ee:f2f4f4c0:c22cdc7b:4ac1ea2c
           Name : vlserver:0 (local to host vlserver)
  Creation Time : Tue Apr 17 15:09:30 2012
     Raid Level : raid1
   Raid Devices : 3

 Avail Dev Size : 974824 (476.07 MiB 499.11 MB)
     Array Size : 974824 (476.07 MiB 499.11 MB)
    Data Offset : 24 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 8ee52fbe:6f8dae0c:ffc93d08:bfcd4870

    Update Time : Mon Apr 23 14:11:48 2012
       Checksum : bb12772a - correct
         Events : 89

   Device Role : Active device 1
   Array State : AA. ('A' == active, '.' == missing)
# mdadm -E /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : b3b367ee:f2f4f4c0:c22cdc7b:4ac1ea2c
           Name : vlserver:0 (local to host vlserver)
  Creation Time : Tue Apr 17 15:09:30 2012
     Raid Level : raid1
   Raid Devices : 3

 Avail Dev Size : 974824 (476.07 MiB 499.11 MB)
     Array Size : 974824 (476.07 MiB 499.11 MB)
    Data Offset : 24 sectors
   Super Offset ...

Read more...

Christian Heimes (heimes) wrote :

Before reboot:

# mdadm /dev/md0 --add /dev/sdc1
mdadm: Cannot open /dev/sdc1: Device or resource busy

After reboot (partition order has changed, /dev/sdc is now /dev/sda)

# mdadm /dev/md0 --add /dev/sda1
mdadm: /dev/sda1 reports being an active member for /dev/md0, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sda1 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sda1" first.

no longer affects: linux (Ubuntu)
Changed in mdadm (Ubuntu):
assignee: nobody → Dmitrijs Ledkovs (dmitrij.ledkov)
Dimitri John Ledkov (xnox) wrote :

Instead of unplugging the device can you do this:
1) boot, fully synced
2) # mdadm --fail /dev/md0 /dev/sda1
3) # reboot
4) # mdadm --detail /dev/md0
5) # mdadm --add /dev/md0 /dev/sda1
6) # mdadm --detail /dev/md0

With unplugging the device:
1) boot, fully synced
2) shutdown
3) unplug the harddrive
4) boot degraded
5) # mdadm --fail /dev/md0 detached
6) shutdown
7) attach the harddrive back
8) # mdadm --detail /dev/md0
9) # mdadm --add /dev/md0 /dev/sda1

Christian Heimes (heimes) wrote :

I'm sorry but I can no longer test the RAID array on the server. The machine was delivered and went into production over two months ago.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers