mdadm SEGFAULT

Bug #108553 reported by tyler
36
This bug affects 3 people
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Fix Released
Undecided
Unassigned
Nominated for Jaunty by Geoffrey Pursell

Bug Description

Binary package hint: mdadm

I recognized this line in dmesg:
[ 28.138245] mdadm[2586]: segfault at 0000000000000004 rip 000000000041567 c rsp 00007fff8184b9e0 error 4

Segfaults never sound ok, and more so, because I'm using a soft raid5.

Revision history for this message
tyler (durdon-tyler) wrote :
Revision history for this message
pl4nkton (pl4nkton) wrote :

hi
i have nearly the same hardware and software setup and also gets this segfault

[ 27.274021] mdadm[2560]: segfault at 0000000000000004 rip 000000000041724c rsp 00007fffc35d0730 error 4

Revision history for this message
Erik de Castro Lopo (erikd) wrote :

I'm using raid1 (three separate volumes) and I get this:

[ 6.833270] mdadm[2473]: segfault at 0000000000000004 rip 000000000041724c rsp 00007fffc4787940 error 4

Top part of my dmesg output:

[ 0.000000] Linux version 2.6.19-4-generic-amd64 (root@crested) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #2 SMP Thu Apr 5 05:57:13 UTC 2007
[ 0.000000] Command line: root=/dev/sda1 ro console=tty0
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] Xen: 0000000000000000 - 0000000075fc2000 (usable)
[ 0.000000] Entering add_active_range(0, 0, 483266) 0 entries of 256 used
[ 0.000000] end_pfn_map = 483266
[ 0.000000] Entering add_active_range(0, 0, 483266) 0 entries of 256 used

Revision history for this message
Simon Wong (wongy) wrote :

mdadm segfaults for me using software RAID1 with this error:

 mdadm[2888]: segfault at 0000000000000004 rip 000000000041724c rsp 00007fff2e59e6a0 error 4

All seems fine in operation though...

Revision history for this message
tyler (durdon-tyler) wrote :

Hi there!

The bug is still here in the current hardy development version.
(but this kernel is really buggy for me https://bugs.launchpad.net/ubuntu/+source/linux/+bug/204064)

Attached my dmesg.

Revision history for this message
Martin Pool (mbp) wrote :

I'm seeing this in current hardy too:

Jun 30 08:56:54 grace kernel: [ 135.750773] ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 23 (level, low) -> IRQ 19
Jun 30 08:56:54 grace kernel: [ 135.750787] PCI: Setting latency timer of device 0000:00:1d.7 to 64
Jun 30 08:56:54 grace kernel: [ 135.750791] ehci_hcd 0000:00:1d.7: EHCI Host Controller
Jun 30 08:56:54 grace kernel: [ 135.750815] ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 7
Jun 30 08:56:54 grace kernel: [ 135.754710] PCI: cache line size of 32 is not supported by device 0000:00:1d.7
Jun 30 08:56:54 grace kernel: [ 135.754715] ehci_hcd 0000:00:1d.7: irq 19, io mem 0xf9105000
Jun 30 08:56:54 grace kernel: [ 135.799734] md: md0 stopped.
Jun 30 08:56:54 grace kernel: [ 135.843628] mdadm[2700]: segfault at 00000004 eip 08061751 esp bfe0b9c0 error 4
Jun 30 08:56:54 grace kernel: [ 135.853747] md: md0 stopped.
Jun 30 08:56:54 grace kernel: [ 135.867480] usb 5-1: new full speed USB device using uhci_hcd and address 2
Jun 30 08:56:54 grace kernel: [ 135.878515] ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
Jun 30 08:56:54 grace kernel: [ 135.878688] usb usb7: configuration #1 chosen from 1 choice

Revision history for this message
albert (alarme94) wrote :

Hi folks,

I'm the same problem in 3 boxes, 2 with amd64, and 1 with 32 bits kernels.
Every time I start, I get:

[ 26.898212] mdadm[2911]: segfault at 4 rip 41751c rsp 7fffee98ba50 error 4

Maybe is not related, but about 1 of 30 times, the system goes to busybox on startup saying "md1 device busy" and "can't mount /root"
( my / is on md1 raid0 device)
Rebooting, all is fine, but I can't confide in it for wake on lan and remote access.

Revision history for this message
Daniel T Chen (crimsun) wrote :

Is this symptom still reproducible in 8.10 beta?

Changed in mdadm:
status: New → Incomplete
Revision history for this message
Erik de Castro Lopo (erikd) wrote : Re: [Bug 108553] Re: mdadm SEGFAULT

Daniel T Chen wrote:

> Is this symptom still reproducible in 8.10 beta?

On the machine where I did see this I am now running 8.04, still
running mdadm, but haven't seen this issue for a while.

Erik
--
-----------------------------------------------------------------
Erik de Castro Lopo
-----------------------------------------------------------------
"Don't hate the media. Become the media."
- Jello Biafra

Revision history for this message
Simon IJskes (sim-nyx) wrote :

I have this problem on 8.10 amd64.

mdadm[2345]: segfault at 4 ip 0000000000418d7d sp 00007fff643653e0 error 4 in mdadm[400000+2a000]

Most of the time i didn't notice until my system did not want to mount anything during boot.
I could not resolve problem by stopping and starting the raid arrays in the initramfs shell.

Revision history for this message
Michael Olson (rosciol+launchpad) wrote :
Download full text (3.4 KiB)

I have this problem on 8.10 i386.

If you want context, read below, otherwise, this is the line in question:
Dec 10 22:04:23 daedalus kernel: [ 12.175868] mdadm[2342]: segfault at 4 ip 08062dbc sp bf8efc50 error 4 in mdadm[8048000+2c000]

Dec 10 22:04:23 daedalus kernel: [ 11.929983] Driver 'sd' needs updating - please use bus_type methods
Dec 10 22:04:23 daedalus kernel: [ 11.930064] sd 2:0:0:0: [sda] 488281250 512-byte hardware sectors (250000 MB)
Dec 10 22:04:23 daedalus kernel: [ 11.932240] sd 2:0:0:0: [sda] Write Protect is off
Dec 10 22:04:23 daedalus kernel: [ 11.932318] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 10 22:04:23 daedalus kernel: [ 11.932383] sd 2:0:0:0: [sda] 488281250 512-byte hardware sectors (250000 MB)
Dec 10 22:04:23 daedalus kernel: [ 11.932399] sd 2:0:0:0: [sda] Write Protect is off
Dec 10 22:04:23 daedalus kernel: [ 11.932428] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 10 22:04:23 daedalus kernel: [ 11.932432] sda: sda1 sda2
Dec 10 22:04:23 daedalus kernel: [ 11.945560] sd 2:0:0:0: [sda] Attached SCSI disk
Dec 10 22:04:23 daedalus kernel: [ 11.945622] sd 3:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Dec 10 22:04:23 daedalus kernel: [ 11.945640] sd 3:0:0:0: [sdb] Write Protect is off
Dec 10 22:04:23 daedalus kernel: [ 11.945674] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 10 22:04:23 daedalus kernel: [ 11.945739] sd 3:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Dec 10 22:04:23 daedalus kernel: [ 11.945757] sd 3:0:0:0: [sdb] Write Protect is off
Dec 10 22:04:23 daedalus kernel: [ 11.945790] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 10 22:04:23 daedalus kernel: [ 11.945794] sdb: sdb1 sdb2
Dec 10 22:04:23 daedalus kernel: [ 11.970706] sd 3:0:0:0: [sdb] Attached SCSI disk
Dec 10 22:04:23 daedalus kernel: [ 11.970789] sd 4:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
Dec 10 22:04:23 daedalus kernel: [ 11.970820] sd 4:0:0:0: [sdc] Write Protect is off
Dec 10 22:04:23 daedalus kernel: [ 11.970853] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 10 22:04:23 daedalus kernel: [ 11.970907] sd 4:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
Dec 10 22:04:23 daedalus kernel: [ 11.970925] sd 4:0:0:0: [sdc] Write Protect is off
Dec 10 22:04:23 daedalus kernel: [ 11.970958] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 10 22:04:23 daedalus kernel: [ 11.970961] sdc: sdc1 sdc2
Dec 10 22:04:23 daedalus kernel: [ 12.030044] sd 4:0:0:0: [sdc] Attached SCSI disk
Dec 10 22:04:23 daedalus kernel: [ 12.145440] md: md0 stopped.
Dec 10 22:04:23 daedalus kernel: [ 12.175868] mdadm[2342]: segfault at 4 ip 08062dbc sp bf8efc50 error 4 in mdadm[8048000+2c000]
Dec 10 22:04:23 daedalus kernel: [ 12.189552] md: md0 stopped.
Dec 10 22:04:23 daedalus kernel: [ 12.193400] md: bind<sdc2>
Dec 10 22:04:23 daedalus kernel: [ 12.193562] md: bind<sdb...

Read more...

Revision history for this message
LCID Fire (lcid-fire) wrote :

Same here on 8.10.
I have 2 raids running - the root filesystem as raid 0 and a data mount as raid 1.
The raid 0 is coming up fine - just the raid 1 screws up.
Strangely starting it manually works like a charm!?

Changed in mdadm:
status: Incomplete → Confirmed
Revision history for this message
LCID Fire (lcid-fire) wrote :

I just noticed - the segfault is not related to the raid 1 coming up or not. Got it configured correctly and the segfault is still showing up (although it does not seem to have any consequence!?)

Revision history for this message
Mark Carey (careym) wrote :
Download full text (10.1 KiB)

For what it is worth i'll add a me too.

Using mdadm RAID 1 (mirroring) a pair of drives both samsungs drives are on seperate IDE channels (one on primary one on secondary). mdadm configured under 8.04.

While its not nice to paste directly it heaps easier then having to open attachements, so here goes a snip of the relevant dmesg .....

[ 21.715338] hub 1-0:1.0: 2 ports detected
[ 21.752048] SCSI subsystem initialized
[ 21.802773] 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
[ 21.811071] libata version 3.00 loaded.
[ 21.821253] ACPI: PCI Interrupt 0000:00:10.1[B] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
[ 21.821275] uhci_hcd 0000:00:10.1: UHCI Host Controller
[ 21.821307] uhci_hcd 0000:00:10.1: new USB bus registered, assigned bus number 2
[ 21.821334] uhci_hcd 0000:00:10.1: irq 18, io base 0x0000dc00
[ 21.821486] usb usb2: configuration #1 chosen from 1 choice
[ 21.821515] hub 2-0:1.0: USB hub found
[ 21.821524] hub 2-0:1.0: 2 ports detected
[ 21.931187] ACPI: PCI Interrupt 0000:00:10.2[C] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
[ 21.931207] uhci_hcd 0000:00:10.2: UHCI Host Controller
[ 21.931238] uhci_hcd 0000:00:10.2: new USB bus registered, assigned bus number 3
[ 21.931265] uhci_hcd 0000:00:10.2: irq 18, io base 0x0000e000
[ 21.931415] usb usb3: configuration #1 chosen from 1 choice
[ 21.931445] hub 3-0:1.0: USB hub found
[ 21.931453] hub 3-0:1.0: 2 ports detected
[ 22.041357] ACPI: PCI Interrupt 0000:00:10.3[D] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
[ 22.041380] ehci_hcd 0000:00:10.3: EHCI Host Controller
[ 22.041422] ehci_hcd 0000:00:10.3: new USB bus registered, assigned bus number 4
[ 22.041477] ehci_hcd 0000:00:10.3: irq 18, io mem 0xdd002000
[ 22.060888] ehci_hcd 0000:00:10.3: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
[ 22.061047] usb usb4: configuration #1 chosen from 1 choice
[ 22.061082] hub 4-0:1.0: USB hub found
[ 22.061092] hub 4-0:1.0: 6 ports detected
[ 22.171198] 8139cp 0000:00:13.0: This (id 10ec:8139 rev 10) is not an 8139C+ compatible chip
[ 22.171260] 8139cp 0000:00:13.0: Try the "8139too" driver instead.
[ 22.175717] 8139too Fast Ethernet driver 0.9.28
[ 22.175810] ACPI: PCI Interrupt 0000:00:13.0[A] -> GSI 18 (level, low) -> IRQ 17
[ 22.176420] eth2: RealTek RTL8139 at 0xec00, 00:20:ed:50:95:04, IRQ 17
[ 22.176423] eth2: Identified 8139 chip type 'RTL-8100B/8139D'
[ 22.182270] ACPI: PCI Interrupt Link [ALKA] BIOS reported IRQ 0, using IRQ 20
[ 22.182277] ACPI: PCI Interrupt Link [ALKA] enabled at IRQ 20
[ 22.182288] ACPI: PCI Interrupt 0000:00:11.1[A] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 19
[ 22.182377] ACPI: PCI interrupt for device 0000:00:11.1 disabled
[ 22.187173] pata_via 0000:00:11.1: version 0.3.3
[ 22.187213] ACPI: PCI Interrupt 0000:00:11.1[A] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 19
[ 22.188705] scsi0 : pata_via
[ 22.190041] scsi1 : pata_via
[ 22.192508] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14
[ 22.192512] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15
[ 22.391290] ata1.00: ATA-4: QUANTUM FIREBALL CX13.0A, A3F.0B00, max UDMA...

Revision history for this message
Michael Kofler (michael-kofler) wrote :

I also get mdadm segfaults during boot, but my RAID1 array (2 partitions) works just fine for about a week now; /proc/mdstat shows no errors

setup: Hardy with all updates as of Dec. 19th
  / is a normal partition
  /data is /dev/md0 (RAID1)

> dmesg | grep md
[ 19.663740] md: linear personality registered for level -1
[ 19.666716] md: multipath personality registered for level -4
[ 19.669555] md: raid0 personality registered for level 0
[ 19.673083] md: raid1 personality registered for level 1
[ 20.374436] md: raid6 personality registered for level 6
[ 20.374438] md: raid5 personality registered for level 5
[ 20.374439] md: raid4 personality registered for level 4
[ 20.393293] md: raid10 personality registered for level 10
[ 20.938949] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xff00 irq 14
[ 20.938951] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xff08 irq 15
[ 22.239194] md: md0 stopped.
[ 22.336026] mdadm[2493]: segfault at 00000004 eip 08061751 esp bfee0b00 error 4 <------ !!!
[ 22.346976] md: md0 stopped.
[ 22.368559] md: bind<sdb7>
[ 22.368705] md: bind<sda7>
[ 22.380431] raid1: raid set md0 active with 2 out of 2 mirrors
[ 35.163917] EXT3 FS on md0, internal journal

strange: on another server, also with Hardy and a much more complicated setup (three RAID1 arrays for /boot, swap and LVM), I experience no mdadm segfaults at all

Revision history for this message
Half (jcornes) wrote :

Seems to recover from the core:

$ lsb_release -rd
Description: Ubuntu 8.10
Release: 8.10

$ sudo apt-cache policy mdadm
mdadm:
  Installed: 2.6.7-3ubuntu8
  Candidate: 2.6.7-3ubuntu8
  Version table:
 *** 2.6.7-3ubuntu8 0
        500 http://archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status

$ dmesg | grep md
[ 1.376917] md: linear personality registered for level -1
[ 1.378823] md: multipath personality registered for level -4
[ 1.380482] md: raid0 personality registered for level 0
[ 1.382991] md: raid1 personality registered for level 1
[ 1.880116] md: raid6 personality registered for level 6
[ 1.880174] md: raid5 personality registered for level 5
[ 1.880233] md: raid4 personality registered for level 4
[ 1.912048] md: raid10 personality registered for level 10
[ 2.996675] ata1: SATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000 irq 14
[ 2.996723] ata2: SATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xf008 irq 15
[ 3.995320] ata3: SATA max UDMA/133 cmd 0xe700 ctl 0xe800 bmdma 0xeb00 irq 19
[ 3.995368] ata4: SATA max UDMA/133 cmd 0xe900 ctl 0xea00 bmdma 0xeb08 irq 19
[ 4.878987] ata7: PATA max UDMA/100 cmd 0xb000 ctl 0xb100 bmdma 0xb400 irq 16
[ 4.879048] ata8: PATA max UDMA/100 cmd 0xb200 ctl 0xb300 bmdma 0xb408 irq 16
[ 5.349843] md: md0 stopped.
[ 5.385346] mdadm[2387]: segfault at 4 ip 0000000000418d7d sp 00007fffe964d6c0 error 4 in mdadm[400000+2a000]
[ 5.392609] md: md0 stopped.
[ 5.412165] md: bind<sdb3>
[ 5.412367] md: bind<sda3>
[ 5.416371] md0: setting max_sectors to 128, segment boundary to 32767
[ 5.417097] raid0 : md_size is 1456950784 blocks.

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid0 sda3[0] sdb3[1]
      1456950784 blocks 64k chunks

unused devices: <none>

Revision history for this message
Martijn Dijksterhuis (martijndijksterhuis) wrote :

Freshly installed & then updated to latest packages Intrepid 64 bit (8.10) gives the same segfault error.

2x 1TB Seagate SATA-2 drives in hotswap bays that form a RAID-1 partition.

[ 0.000000] Command line: root=/dev/md0 ro quiet splash
[ 0.000000] Kernel command line: root=/dev/md0 ro quiet splash
[ 1.693996] md: linear personality registered for level -1
[ 1.696373] md: multipath personality registered for level -4
[ 1.698510] md: raid0 personality registered for level 0
[ 1.701662] md: raid1 personality registered for level 1
[ 2.200507] md: raid6 personality registered for level 6
[ 2.200509] md: raid5 personality registered for level 5
[ 2.200510] md: raid4 personality registered for level 4
[ 2.214459] md: raid10 personality registered for level 10
[ 2.710058] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
[ 2.710060] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
[ 3.094863] ata3: SATA max UDMA/133 cmd 0xdc00 ctl 0xd880 bmdma 0xd400 irq 19
[ 3.094865] ata4: SATA max UDMA/133 cmd 0xd800 ctl 0xd480 bmdma 0xd408 irq 19
[ 3.769189] md: md0 stopped.
[ 3.791368] mdadm[2334]: segfault at 4 ip 0000000000418d7d sp 00007fff802c8370 error 4 in mdadm[400000+2a000]
[ 3.801317] md: md0 stopped.
[ 3.809901] md: bind<sdb3>
[ 3.810169] md: bind<sda3>
[ 3.810183] md: md0: raid array is not clean -- starting background reconstruction
[ 3.815132] raid1: raid set md0 active with 2 out of 2 mirrors
[ 3.815213] md: resync of RAID array md0
[ 3.815217] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 3.815219] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 3.815224] md: using 128k window, over a total of 974133312 blocks.
[ 3.815226] md: resuming resync of md0 from checkpoint.
[ 12.036726] EXT3 FS on md0, internal journal

Revision history for this message
Geoffrey Pursell (geoffp) wrote :

I am seeing this as well under 9.04 as of this posting. The array seems to be okay once the machine is booted, at which point the array no longer reports itself as degraded.

$ dmesg | grep -B 2 -A 2 md
[ 3.687708] scsi6 : pata_jmicron
[ 3.687769] scsi7 : pata_jmicron
[ 3.688223] ata7: PATA max UDMA/100 cmd 0xc000 ctl 0xc100 bmdma 0xc400 irq 16
[ 3.688224] ata8: PATA max UDMA/100 cmd 0xc200 ctl 0xc300 bmdma 0xc408 irq 16
[ 3.861682] ata7.00: ATA-6: WDC WD1200JB-00EVA0, 15.05R15, max UDMA/100
[ 3.861684] ata7.00: 234441648 sectors, multi 16: LBA48
--
[ 4.608084] hub 8-0:1.0: USB hub found
[ 4.608088] hub 8-0:1.0: 6 ports detected
[ 4.700959] md: bind<sdb1>
[ 4.701344] mdadm[1303]: segfault at 0 ip 000000000040839b sp 00007fff03660120 error 4 in mdadm[400000+2a000]
[ 5.012513] usb 8-3: new high speed USB device using ehci_hcd and address 2
[ 5.015443] md: bind<sda10>
[ 5.015941] mdadm[1406]: segfault at 0 ip 000000000040839b sp 00007fffb34d5fe0 error 4 in mdadm[400000+2a000]
[ 5.146013] usb 8-3: configuration #1 chosen from 1 choice
[ 5.149496] Initializing USB Mass Storage driver...
--
[ 10.155278] sd 8:0:0:3: [sdf] Attached SCSI removable disk
[ 10.155313] sd 8:0:0:3: Attached scsi generic sg6 type 0
[ 34.742871] md: md1 stopped.
[ 34.742879] md: unbind<sda10>
[ 34.756517] md: export_rdev(sda10)
[ 34.756546] md: unbind<sdb1>
[ 34.768507] md: export_rdev(sdb1)
[ 34.779165] md: bind<sda10>
[ 34.779335] md: bind<sdb1>
[ 34.781552] md: raid10 personality registered for level 10
[ 34.782143] raid10: raid set md1 active with 2 out of 2 devices
[ 34.783148] md1: unknown partition table
[ 40.300865] EXT4-fs: barriers enabled
[ 40.311722] kjournald2 starting. Commit interval 5 seconds

Revision history for this message
Geoffrey Pursell (geoffp) wrote :

This no longer happens for me, as of Jaunty release.

Revision history for this message
ceg (ceg) wrote :

setting fix realased

Changed in mdadm (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.