mdadm device hung, problem with kernel/EBS

Bug #909563 reported by David Taylor
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
New
Undecided
Unassigned

Bug Description

I'm using Ubuntu 10.10, ami-af7e2eea in us-west-1 on c1.xlarge.

I have 8 x 128GB EBS volumes in a RAID10 array using mdadm.

After a while /dev/md0 freezes and load shoots up from <5 to >300-400.

Any attempts to interrogate the mounted filesystem hang and are uninterruptible.

I ran "mdadm --examine" on each of the devices. All except one returned "state: clean". On one device that command never returned and I had to Ctrl-C to interrupt it.

In /var/log/syslog:

Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230054] INFO: task md0_raid10:625 blocked for more than 120 seconds.
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230082] md0_raid10 D ffff880003f579c0 0 625 2 0x00000000
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230089] ffff8801b7ea1ca0 0000000000000246 0000000000000000 00000000000159c0
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230097] ffff8801b7ea1fd8 00000000000159c0 ffff8801b7ea1fd8 ffff8801b61616e0
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230105] 00000000000159c0 00000000000159c0 ffff8801b7ea1fd8 00000000000159c0
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230113] Call Trace:
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230126] [<ffffffff814643e1>] md_super_wait+0xd1/0xf0
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230133] [<ffffffff8107fa10>] ? autoremove_wake_function+0x0/0x40
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230138] [<ffffffff814649b8>] md_update_sb+0x268/0x3e0
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230144] [<ffffffff815a6dce>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230155] [<ffffffff8146a1a2>] md_check_recovery+0x212/0x540
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230163] [<ffffffffa0061fbf>] raid10d+0x3f/0x400 [raid10]
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230170] [<ffffffff810072df>] ? xen_restore_fl_direct_end+0x0/0x1
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230173] [<ffffffff815a6dce>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230177] [<ffffffff81464109>] md_thread+0x119/0x150
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230181] [<ffffffff8107fa10>] ? autoremove_wake_function+0x0/0x40
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230185] [<ffffffff81463ff0>] ? md_thread+0x0/0x150
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230189] [<ffffffff8107f4b6>] kthread+0x96/0xa0
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230194] [<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230199] [<ffffffff8100a313>] ? int_ret_from_sys_call+0x7/0x1b
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230203] [<ffffffff815a735d>] ? retint_restore_args+0x5/0x6
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230207] [<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10

Is this a problem with the AMI? The AKI? Any suggestions?

Thanks.

Cheers,
David.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: linux-image-2.6.35-31-virtual 2.6.35-31.63
Regression: Yes
Reproducible: Yes
ProcVersionSignature: User Name 2.6.35-31.63-virtual 2.6.35.13
Uname: Linux 2.6.35-31-virtual x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
Date: Thu Dec 29 03:20:28 2011
Ec2AMI: ami-af7e2eea
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-west-1a
Ec2InstanceType: c1.xlarge
Ec2Kernel: aki-9ba0f1de
Ec2Ramdisk: unavailable
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
ProcCmdLine: root=LABEL=uec-rootfs ro console=hvc0
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux

Revision history for this message
David Taylor (david-taylor) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.