mdadm raid gets frozen

Bug #1607239 reported by akos.kuczi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Servers periodically became unresponsive, and report the following problem:

Jul 28 01:26:36 osd1-05 kernel: [1644406.270794] kworker/12:2 D ffff88085fcd3180 0 25000 2 0x00000000
Jul 28 01:26:36 osd1-05 kernel: [1644406.270813] Workqueue: dio/bdev dio_aio_complete_work
Jul 28 01:26:36 osd1-05 kernel: [1644406.270821] ffff88003694dc10 0000000000000046 ffff8805621b4800 0000000000013180
Jul 28 01:26:36 osd1-05 kernel: [1644406.270830] ffff88003694dfd8 0000000000013180 ffff8805621b4800 ffff88003694dd38
Jul 28 01:26:36 osd1-05 kernel: [1644406.270836] ffff88003694dd40 7fffffffffffffff ffff8805621b4800 00000000a6fd7000
Jul 28 01:26:36 osd1-05 kernel: [1644406.270842] Call Trace:
Jul 28 01:26:36 osd1-05 kernel: [1644406.270857] [<ffffffff8172dce9>] schedule+0x29/0x70
Jul 28 01:26:36 osd1-05 kernel: [1644406.270870] [<ffffffff8172cf39>] schedule_timeout+0x279/0x310
Jul 28 01:26:36 osd1-05 kernel: [1644406.270880] [<ffffffff815ae3f5>] ? md_make_request+0xd5/0x220
Jul 28 01:26:36 osd1-05 kernel: [1644406.270889] [<ffffffff8172e7f6>] wait_for_completion+0xa6/0x150
Jul 28 01:26:36 osd1-05 kernel: [1644406.270900] [<ffffffff8109d2a0>] ? wake_up_state+0x20/0x20
Jul 28 01:26:36 osd1-05 kernel: [1644406.270906] [<ffffffff811f7f7e>] submit_bio_wait+0x5e/0x70
Jul 28 01:26:36 osd1-05 kernel: [1644406.270914] [<ffffffff81343c3a>] blkdev_issue_flush+0x5a/0x90
Jul 28 01:26:36 osd1-05 kernel: [1644406.270919] [<ffffffff811fa7b5>] blkdev_fsync+0x35/0x50
Jul 28 01:26:36 osd1-05 kernel: [1644406.270928] [<ffffffff811f16d8>] generic_write_sync+0x48/0x60
Jul 28 01:26:36 osd1-05 kernel: [1644406.270934] [<ffffffff811fcc73>] dio_complete+0x103/0x120
Jul 28 01:26:36 osd1-05 kernel: [1644406.270939] [<ffffffff811fcde1>] dio_aio_complete_work+0x21/0x30
Jul 28 01:26:36 osd1-05 kernel: [1644406.270948] [<ffffffff81086028>] process_one_work+0x178/0x470
Jul 28 01:26:36 osd1-05 kernel: [1644406.270953] [<ffffffff81086807>] ? manage_workers.isra.25+0x1f7/0x2e0
Jul 28 01:26:36 osd1-05 kernel: [1644406.270959] [<ffffffff81086e41>] worker_thread+0x121/0x410
Jul 28 01:26:36 osd1-05 kernel: [1644406.270964] [<ffffffff81086d20>] ? rescuer_thread+0x430/0x430
Jul 28 01:26:36 osd1-05 kernel: [1644406.270971] [<ffffffff8108dc29>] kthread+0xc9/0xe0
Jul 28 01:26:36 osd1-05 kernel: [1644406.270976] [<ffffffff8108db60>] ? kthread_create_on_node+0x1c0/0x1c0
Jul 28 01:26:36 osd1-05 kernel: [1644406.270984] [<ffffffff8173a328>] ret_from_fork+0x58/0x90
Jul 28 01:26:36 osd1-05 kernel: [1644406.270988] [<ffffffff8108db60>] ? kthread_create_on_node+0x1c0/0x1c0
Jul 28 01:28:36 osd1-05 kernel: [1644526.317504] INFO: task kworker/12:2:25000 blocked for more than 120 seconds.
Jul 28 01:28:36 osd1-05 kernel: [1644526.326399] Not tainted 3.13.0-87-generic #133-Ubuntu

The actual raid configurations is:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdb1[1] sda1[0]
      937123648 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
      39473024 blocks super 1.2 [2/2] [UU]

md60 : active raid6 sdw[0] sdz[3] sdaa[4] sdx[1] sdy[2]
      585689088 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-87-generic 3.13.0-87.133
ProcVersionSignature: Ubuntu 3.13.0-87.133-generic 3.13.11-ckt39
Uname: Linux 3.13.0-87-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jul 28 06:17 seq
 crw-rw---- 1 root audio 116, 33 Jul 28 06:17 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.21
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
CurrentDmesg: [ 29.441344] init: plymouth-upstart-bridge main process ended, respawning
Date: Thu Jul 28 07:33:28 2016
HibernationDevice: RESUME=UUID=a049dacc-3f5b-4282-b507-24d3a7382389
Lsusb:
 Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 0557:2221 ATEN International Co., Ltd Winbond Hermon
 Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Supermicro SSG-6047R-E1R36L
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-87-generic root=UUID=7052bcbe-dccf-47e4-a7f7-a95a09176541 ro quiet nomdmonddf nomdmonisw crashkernel=384M-:512M
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-87-generic N/A
 linux-backports-modules-3.13.0-87-generic N/A
 linux-firmware 1.127.22
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog:

dmi.bios.date: 12/05/2013
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 3.0a
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: X9DRD-7LN4F
dmi.board.vendor: Supermicro
dmi.board.version: REV:1.02
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 1
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3.0a:bd12/05/2013:svnSupermicro:pnSSG-6047R-E1R36L:pvr0123456789:rvnSupermicro:rnX9DRD-7LN4F:rvrREV1.02:cvnSupermicro:ct1:cvr0123456789:
dmi.product.name: SSG-6047R-E1R36L
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro

Revision history for this message
akos.kuczi (kuczi-akos) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.7 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.7

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
akos.kuczi (kuczi-akos) wrote : Re: [Bug 1607239] Re: mdadm raid gets frozen
Download full text (7.5 KiB)

Hi Joseph,
this problem exists from the beginning. The first installed kernel was
3.13.0-57-generic on the affected hosts.
I try out the kernel v4.7.

Thanks in advance.

2016-07-28 20:35 GMT+02:00 Joseph Salisbury <email address hidden>
:

> Did this issue start happening after an update/upgrade? Was there a
> prior kernel version where you were not having this particular problem?
>
> Would it be possible for you to test the latest upstream kernel? Refer
> to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
> v4.7 kernel[0].
>
> If this bug is fixed in the mainline kernel, please add the following
> tag 'kernel-fixed-upstream'.
>
> If the mainline kernel does not fix this bug, please add the tag:
> 'kernel-bug-exists-upstream'.
>
> Once testing of the upstream kernel is complete, please mark this bug as
> "Confirmed".
>
>
> Thanks in advance.
>
> [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.7
>
> ** Changed in: linux (Ubuntu)
> Importance: Undecided => Medium
>
> ** Changed in: linux (Ubuntu)
> Status: Confirmed => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1607239
>
> Title:
> mdadm raid gets frozen
>
> Status in linux package in Ubuntu:
> Incomplete
>
> Bug description:
> Servers periodically became unresponsive, and report the following
> problem:
>
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270794] kworker/12:2 D
> ffff88085fcd3180 0 25000 2 0x00000000
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270813] Workqueue: dio/bdev
> dio_aio_complete_work
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270821] ffff88003694dc10
> 0000000000000046 ffff8805621b4800 0000000000013180
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270830] ffff88003694dfd8
> 0000000000013180 ffff8805621b4800 ffff88003694dd38
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270836] ffff88003694dd40
> 7fffffffffffffff ffff8805621b4800 00000000a6fd7000
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270842] Call Trace:
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270857] [<ffffffff8172dce9>]
> schedule+0x29/0x70
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270870] [<ffffffff8172cf39>]
> schedule_timeout+0x279/0x310
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270880] [<ffffffff815ae3f5>] ?
> md_make_request+0xd5/0x220
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270889] [<ffffffff8172e7f6>]
> wait_for_completion+0xa6/0x150
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270900] [<ffffffff8109d2a0>] ?
> wake_up_state+0x20/0x20
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270906] [<ffffffff811f7f7e>]
> submit_bio_wait+0x5e/0x70
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270914] [<ffffffff81343c3a>]
> blkdev_issue_flush+0x5a/0x90
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270919] [<ffffffff811fa7b5>]
> blkdev_fsync+0x35/0x50
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270928] [<ffffffff811f16d8>]
> generic_write_sync+0x48/0x60
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270934] [<ffffffff811fcc73>]
> dio_complete+0x103/0x120
> Jul 28 01:26:36 osd1-05 kernel: [1644406.270939] [<ffffffff811fcde1>]
> dio_aio...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.