io/fs errors when launching gdm on imx51 with sata

Bug #431963 reported by Paul Larson
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-fsl-imx51 (Ubuntu)
Fix Released
High
Bryan Wu
Karmic
Fix Released
High
Bryan Wu
Lucid
Fix Released
High
Bryan Wu

Bug Description

With the imx51 image on babbage 2.5, if I boot with USB as a disk, everything works fine. With a sata drive attached, it's a different story.

My current configuration is that I have a 2.5" sata disk (tested and shown to work on other systems) connected to the babbage for data, and the power is coming from an external source.

I can boot the live image, and go all the way through the installation with no problems. On reboot, when gdm starts to come up, I get as far as the spinning mouse cursor and it appears to hang.

If I eliminate sreadahead from starting, I'm able to get it to take longer before hanging, but it still eventually does hang.

If I eliminate both sreadahead and gdm, I'm able to boot to a text console and use the system just fine.

Starting sreadahead by itself, I didn't see any problems, no hang, no io errors in dmesg.

Starting gdm by itself, it hangs after several seconds and I start to see errors like this on the serial console within a few minutes:
end_request: I/O error, dev sda, sector 71319871
EXT4-fs error (device sda1): __ext4_get_inode_loc: unable to read inode block - inode=2261007, block=8914976
Aborting journal on device sda1:8.
EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
EXT4-fs (sda1): Remounting filesystem read-only
end_request: I/O error, dev sda, sector 71319879
end_request: I/O error, dev sda, sector 71319887
end_request: I/O error, dev sda, sector 239343983
end_request: I/O error, dev sda, sector 239344223

When this happens, I cannot even switch between virtual consoles.

Kernel version signature is 2.6.31-100.7-imx51

Paul Larson (pwlars)
tags: added: armel
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

To me this looks like your disk is dying (or perhaps your SATA controller is buggy)

Loïc Minier (lool)
affects: linux (Ubuntu) → linux-fsl-imx51 (Ubuntu)
Changed in linux-fsl-imx51 (Ubuntu):
assignee: nobody → Amit Kucheria (amitk)
Loïc Minier (lool)
Changed in linux-fsl-imx51 (Ubuntu Karmic):
milestone: none → ubuntu-9.10
importance: Undecided → Medium
Revision history for this message
Paul Kline (paul-kline) wrote :

Freescale has seen error-free performance with a Fujitsu laptop SATA drive model MHY2080BH on Babbage 2.0, 2.5 and 3.0
using the FSL Linux BSP (http://www.amazon.com/Fujitsu-MHY2080BH-2-5-Inch-5400RPM-Serial/dp/B001DZDEXQ). This
Fujitsu drive draws .6amps. What maker & model of hard drive is getting these errors? What are its power requirements? Have you
tried a second Babbage board? Have you tried running bonnie++ to exercise the SATA drive?

Revision history for this message
Paul Larson (pwlars) wrote :

The drive I'm using is a Seagate momentus 250 GB (7200rpm). I was unable to have any success powering it off of the babbage, so I'm powering it from an external source, so I don't think power should be an issue in this case. I have not run bonnie++, but did run badblocks and several other tests on it, using two other machines without any problems. Also I did installs to it with those two other machines and it worked fine on every other system I've tested it with.

Revision history for this message
Paul Kline (paul-kline) wrote :

Could you please provide the exact model number? I see several model numbers for Seagate momentus 250 GB (7200rpm). For example, Model ST9250421AS, Model: ST90250N1A1, Model ST9250410AS. How did you hook this drive up in order to power it externally?

Revision history for this message
Tobin Davis (gruemaster) wrote :

I am seeing the same errors on a Hitachi 2.5" 40G drive (originally from my PS3 prior to upgrading it). The drive model is HTS541640J9SA00, 5V 700mA 5400RPM. I am using a SATA extension cable to hook it up (see attached picture). I purchased two of these, and modified one so that I can plug it into an external power supply. Both cables have been tested on a laptop and work flawlessly. The drive also works fine when I plug it into a usb<>sata cable and run it that way on the babbage 2.5.

Paul Larson (pwlars)
Changed in linux-fsl-imx51 (Ubuntu Karmic):
importance: Medium → High
status: New → Confirmed
Revision history for this message
Paul Larson (pwlars) wrote :

in case it's still important: seagate momentus st9250410as - printed on the front of the drive is: +5V 0.451A

Revision history for this message
Dinh Nguyen (dinh-nguyen) wrote :

We procurred a Hitachi 2.5" 80GB drive model # HTS541680J9SA00 5V 700mA. Installed the who 9.04 filesystem on it and can boot successfully on Babbage 2.5. I believe our process is using an EXT3 format, from the error messages, you are using EXT4?

Do you have just the file-system on the drive or other things? i.e. Redboot, Kernel...?

Revision history for this message
Tobin Davis (gruemaster) wrote :

Ok, here's some steps to reproduce/debug the issue we are seeing.

1. get latest image from http://cdimage.ubuntu.com/ports/daily-live/current/ and flash it to a SD card following the instructions at https://help.ubuntu.com/community/Installation/FromImgFiles.

2. After booting, Run "Install Ubuntu 9.10" by double clicking on the icon on the desktop. Follow the prompts for time zone and locale. Use the entire drive for install. To switch to ext3, select "Specify partitions manually" and create two partitions (root and swap).

3. Once the system is done installing, reboot. System will startup and hang once it gets into X. To get dmesg output, change the fconfid script by removing "quiet splash" and adding "console=ttymxc0,115200 console=tty0". To get into a console, add "single" to the kernel cmdline to boot into rescue mode.

Booting into rescue mode will give you several options including networking. This way, you can download and install additional tools like bonnie++ to run some benchmarking. YOu can run in this mode reliably for days without failure. It is only when starting X that you will start seeing errors like "end_request: I/O error, dev sda, sector 12859015".

Revision history for this message
Loïc Minier (lool) wrote :

Dinh: 9.04 uses 2.6.28; we see the issue on 9.10 with our 2.6.31 kernel.

Revision history for this message
Tobin Davis (gruemaster) wrote :

Also, with the above image, you can disconnect the drive from the onboard sata port and plug it into a USB<>SATA adapter and the system will boot into X without a problem and with no changes to the image or boot script.

Revision history for this message
Dinh Nguyen (dinh-nguyen) wrote :

1 more data point...When I use our 2.6.31 port with a 9.04 filesystem on the HDD, it continues to work fine without errors.

I have had a few email exchanges with Brad Figg earlier to help debug the RTC issue. It appears that your port is having some issues with the USB driver. Can you please confirm that the USB driver on your 2.6.31 port is work correctly?

I will try your methods from Tobin.

Revision history for this message
Dinh Nguyen (dinh-nguyen) wrote :

According to the imx51_defconfig file you have enabled USB HOST1, HOST2 and OTG port and also the Gadget for the OTG port. This is not the recommended configuration for it. Can you please change your KCONFIG to be something like this:

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ARC=y
CONFIG_USB_EHCI_ARC_H1=y
# CONFIG_USB_EHCI_ARC_H2 is not set
# CONFIG_USB_EHCI_ARC_OTG is not set
# CONFIG_USB_STATIC_IRAM is not set
CONFIG_USB_EHCI_ROOT_HUB_TT=y
# CONFIG_USB_EHCI_TT_NEWSCHED is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_HWA_HCD is not set
# CONFIG_USB_GADGET_MUSB_HDRC is not set

Revision history for this message
Dinh Nguyen (dinh-nguyen) wrote :

I see that the bootup hangs at the Ubuntu start-up screen, but I don't see any "i/o errors" in the output log. Please see attached file.

Revision history for this message
Amit Kucheria (amitk) wrote :

Paul/Tobin,

Could you test with this kernel http://people.canonical.com/~amitk/mx51/linux-image-2.6.31-105-imx51_2.6.31-105.14_armel.deb?

It disables OTG as recommended by Dinh. It also switches to the RTC_MC13892 driver for RTC. So lookout for your date/time settings on pulling power.

Revision history for this message
Amit Kucheria (amitk) wrote :

Dinh http://paste.ubuntu.com/293984/ is the result of trying to boot a kernel with OTG turned off.

I wonder why you (FSL) have H1, H2 and OTG turned ON in your imx51*_defconfg in arch/arm/configs/. That was used as the basis for the Ubuntu configs.

Pete Graner (pgraner)
Changed in linux-fsl-imx51 (Ubuntu Karmic):
milestone: ubuntu-9.10 → none
Revision history for this message
Rick Spencer (rick-rickspencer3) wrote :

We will SRU this once we get a fix

Changed in linux-fsl-imx51 (Ubuntu Karmic):
milestone: none → karmic-updates
Revision history for this message
Oliver Grawert (ogra) wrote :

i finally got a SATA cable today that enables me to debiug the issue ...
i see the same behavior as everybody else on the bug, i dont have to fiddle with sreadahed though, just adding "text" to teh cmdline gets me a properly running and stable system (i did plenty of testbuilds and kept the disk busy)

no I/O errors can be spotted at all in that mode ...

trying to launch X though gets me a locked up system ...

i then created an ~/.xsession file that only fires up metacity and xterm ... this works fine as well, no I/O errors, system is rock-stable ...

now firing up gnome-session from the xterm locks the system, no commands can be run apart from dmesg ...
running dmesg shows the following immediately after gnome-session tries to start:

usb 1-1.6: reset high speed USB device using fsl-ehci and address 4
usb 1-1.6: reset high speed USB device using fsl-ehci and address 4
usb 1-1.6: reset high speed USB device using fsl-ehci and address 4
usb 1-1.6: reset high speed USB device using fsl-ehci and address 4
usb 1-1.6: reset high speed USB device using fsl-ehci and address 4
usb 1-1.6: reset high speed USB device using fsl-ehci and address 4
usb 1-1.6: reset high speed USB device using fsl-ehci and address 4
sd 0:0:0:0: [sda] Unhandled error code
sd 0:0:0:0: [sda] Result: hostbyte=0x05 driverbyte=0x00
end_request: I/O error, dev sda, sector 2022791
Buffer I/O error on device sda1, logical block 252841
lost page write due to I/O error on sda1
JBD2: Detected IO errors while flushing file data on sda1:

Revision history for this message
Dinh Nguyen (dinh-nguyen) wrote :

Does anybody know if gnome-session start up any other HW? Does it do anything with power management?

Matt Zimmerman (mdz)
Changed in linux-fsl-imx51 (Ubuntu Karmic):
status: Confirmed → Triaged
Revision history for this message
Loïc Minier (lool) wrote :

@Dinh: yes, gnome-session starts up fine on another armel board and starts up fine if not using the SATA port.

gnome-session is in charge of starting a bunch of things in the GNOME desktop, which could possibly tweak power settings or display settings etc.. You can see what it does by running gnome-session --debug.

Revision history for this message
Loïc Minier (lool) wrote :

So the next steps are to reduce the test case a bit; I think the debug output should help locate what's launched on startup.

What we can disable are .desktop files wiht autostart=yes and disable gnome-settings-daemon plugins (.so files).

Revision history for this message
Dinh Nguyen (dinh-nguyen) wrote :

Does this error report sounds like what's happening?

http://linuxsysadminblog.com/2008/10/usb-mass-storage-buffer-io-error/

Revision history for this message
Loïc Minier (lool) wrote :

No, the disk is definitely not idle / standby

Revision history for this message
Paul Larson (pwlars) wrote :

No, I'm not seeing those other errors mentioned there

Revision history for this message
Dinh Nguyen (dinh-nguyen) wrote :

I was able to get an external USB/SATA bridge and used it on the same HDD that I have been testing with. With that setup I was able to boot Karmic all the way.

Then after some more fiddling, I made a cable to only use the data path of the Babbage USB/SATA bridge, but used an external power source. This configuration failed.

Then I used the power from Babbage SATA connector to power up the SATA HDD, and used an external USB/SATA bridge, this method was OK in booting up Karmic.

These experiments hints that perhaps the Babbage USB/SATA bridge is not stable under this new Ubuntu version. Perhaps there are power glitches with the power to the SATA bridge? Perhaps timing violations with the bridge chip that is exposed in this version of Ubuntu that weren't there before.

We have engaged the manufacturer of the USB/SATA bridge to help with debugging this issue.

Revision history for this message
Dinh Nguyen (dinh-nguyen) wrote :

A note from Neil Perng at Genesys Logic(maker of USB/SATA bridge chip):

We have a quick analysis as shown in the figure “Linux 9.10 fail.jpg.” After a series of read commands (0x28), USB Host issued an ATA PASS-THROUGH (16) command (0x85) to GL830, the command description referred to the figure “ATA Pass-Through command.jpg.” However, there is a parameter problem in this CBW packet: Host issued an IDENTIFY command (0xEC) and expected to obtain 512 bytes identify data. But, the SECTOR_COUNT field is set to 0. As a result, GL830 response the status successfully, as shown in “Linux 9.10 fail.jpg,” with no identify data. Finally, it looks like the Host issue the RESET command.

Two questions need to be clarified:

1. Why the Host issue ATA PASS_THROUGH command?

2. Why the parameters of the ATA PASS_THROUGH command are inconsistent?

Revision history for this message
Dinh Nguyen (dinh-nguyen) wrote :

Another note from Neil Perng at Genesys Logic:

We found that the key issue deciding Ubuntu 9.10 boot-up or not is “how we handle this ATA PASS-THROUGH command.” Original GL830 just pass the ATA PASS-THROUGH command and the SECTOR_COUNT field of its IDENTIFY command is 0, so GL830 seems it is an invalid command. As a result, GL830 just bypass this ATA PASS-THROUGH command to HDD and return USB Host a CSW OK. The hang-out situation is because HDD is processing the IDENTIFY command, but GL830 did not response HDD.

We have tried to ask GL830 response USB Host a CSW FAIL of CSW STALL to skip this ATA PASS-THROUGH command, and the Ubuntu 9.10 can successfully boot-up. The USB/SATA cable you’re using must be performance this similar solution, so it can boot system up.

The root cause of this weakness (the glitch you mentioned) is because GL830 doesn’t check the integrity of ATA PASS-THROUGH command. Since there are too many exceptions, we suppose the command sent by USB Host should be a valid one.

There should be two quick solutions:

1. Since Ubuntu 9.04 doesn’t not pass this ATA PASS-THROUGH command, can you ask Canonical to remove it from USB MSC driver?

2. Ask Canonical to correct this invalid command.

tags: added: iso-testing
Revision history for this message
Dinh Nguyen (dinh-nguyen) wrote :

I think I have found a fix for this issue. I have attached the patch. It is probably not the correct fix at the moment, and more analysis is needed, but for now it appears that with 9.10, there is an invalid command that is coming to the USB/SATA bridge chip. Instead of gracefully handling this command, the USB/SATA bridge chip on the Babbage board is failing on this command.

Brad Figg (brad-figg)
Changed in linux-fsl-imx51 (Ubuntu Karmic):
assignee: Amit Kucheria (amitk) → Bryan Wu (cooloney)
Changed in linux-fsl-imx51 (Ubuntu):
assignee: Amit Kucheria (amitk) → Bryan Wu (cooloney)
Revision history for this message
Bryan Wu (cooloney) wrote :

@ Dinh,

Thanks a lot for this patch.

After some discussion on kernel team mail list, we think we need to make this change only effect GL830 USB device. Otherwise, it won't send out 0x85 command to any USB mass storage device connected to i.MX51 EHCI host port.

Please take a look at the patch attached here and could you test this on your side?

-Bryan

Alexander Sack (asac)
Changed in linux-fsl-imx51 (Ubuntu Lucid):
milestone: karmic-updates → lucid-alpha-2
Revision history for this message
Paul Larson (pwlars) wrote :

Bryan, could you please provide us with a build that contains the patch?

Revision history for this message
Bryan Wu (cooloney) wrote :

Guys, sorry for the delay.

I rewrote the patch and tested it on both Karmic and Lucid. It works fine. With the patch, kernel can operate the SATA drive now.

Please find the patch firstly and I will post the kernel deb package here.

Revision history for this message
Bryan Wu (cooloney) wrote :

lucid kernel package with the patch.

Revision history for this message
Bryan Wu (cooloney) wrote :

sorry missing the attachment

Revision history for this message
Bryan Wu (cooloney) wrote :

karmic kernel package with the patch

Bryan Wu (cooloney)
Changed in linux-fsl-imx51 (Ubuntu Karmic):
status: Triaged → In Progress
Changed in linux-fsl-imx51 (Ubuntu Lucid):
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-fsl-imx51 - 2.6.31-601.3

---------------
linux-fsl-imx51 (2.6.31-601.3) lucid; urgency=low

  [ Andy Whitcroft ]

  * rebase to Ubuntu-2.6.31-17.54

  [ Dinh Nguyen ]

  * SAUCE: Workaround for SATA drive failure on Ubuntu installation
    - LP: #431963

  [ Ubuntu: 2.6.31-17.54 ]

  * security merge of Ubuntu-2.6.31-16.53

  [ Ubuntu: 2.6.31-16.53 ]

  * ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT
    - LP: #492659
    - CVE-2009-4131
 -- Andy Whitcroft <email address hidden> Wed, 16 Dec 2009 11:59:38 +0000

Changed in linux-fsl-imx51 (Ubuntu Lucid):
status: In Progress → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted linux-fsl-imx51 into karmic-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in linux-fsl-imx51 (Ubuntu Karmic):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Tobin Davis (gruemaster) wrote :

I'm not seeing the updated kernel in karmic-proposed for armel.

Revision history for this message
Oliver Grawert (ogra) wrote :

This is because it is stuck in the NEW queue and needs an archive admin to approve the binaries
https://edge.launchpad.net/ubuntu/karmic/+source/linux-fsl-imx51/2.6.31-107.18

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 431963] Re: io/fs errors when launching gdm on imx51 with sata

On Tue, Jan 05, 2010 at 08:01:30AM -0000, Tobin Davis wrote:
> I'm not seeing the updated kernel in karmic-proposed for armel.

Sorry, was in the new queue. Have processed it now.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Tobin Davis (gruemaster) wrote :

Kernel still not showing up as an update, but able to manually install it with apt-get install linux-image-2.6.31-107-imx51. Rebooted with drive attached to on-board SATA controller works now. Just need to fix the update issue.

Revision history for this message
Tobin Davis (gruemaster) wrote :

Installed daily lucid image for 2100105 with kernel 2.6.31-601-imx51 on sata and confirmed bug fixed here.

Martin Pitt (pitti)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (15.5 KiB)

This bug was fixed in the package linux-fsl-imx51 - 2.6.31-107.18

---------------
linux-fsl-imx51 (2.6.31-107.18) karmic-proposed; urgency=low

  [ Dinh Nguyen ]

  * SAUCE: Workaround for SATA drive failure on Ubuntu installation
    - LP: #431963

linux-fsl-imx51 (2.6.31-107.17) karmic-proposed; urgency=low

  [ Stefan Bader ]

  * Rebased to 2.6.31-17.54

  [ Ubuntu: 2.6.31-17.54 ]

  * Same as unreleased 2.6.31-17.53 with security release merged.

  [ Ubuntu: 2.6.31-16.53 ]

  * ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT
    - LP: #492659
    - CVE-2009-4131

linux-fsl-imx51 (2.6.31-107.16) karmic-proposed; urgency=low

  [ Stefan Bader ]

  * Rebased to 2.6.31-17.53

  [ Ubuntu: 2.6.31-17.53 ]

  * SAUCE: AppArmor: Fix oops after profile removal
    - LP: #475619
  * SAUCE: AppArmor: Fix Oops when in apparmor_bprm_set_creds
    - LP: #437258
  * SAUCE: AppArmor: Fix cap audit_caching preemption disabling
    - LP: #479102
  * SAUCE: AppArmor: Fix refcounting bug causing leak of creds
    - LP: #479115
  * SAUCE: AppArmor: Fix oops there is no tracer and doing unsafe
    transition.
    - LP: #480112
  * Revert "[Upstream] (drop after 2.6.31) usb-storage: Workaround devices
    with bogus sense size"
    - LP: #461556
  * Revert "[Upstream] (drop after 2.6.31) Input: synaptics - add another
    Protege M300 to rate blacklist"
    - LP: #480144
  * [Config] udeb: Add squashfs to fs-core-modules
    - LP: #352615
  * Revert "e1000e: swap max hw supported frame size between 82574 and
    82583"
    - LP: #461556
  * Revert "drm/i915: Fix FDI M/N setting according with correct color
    depth"
    - LP: #480144
  * Revert "agp/intel: Add B43 chipset support"
    - LP: #480144
  * Revert "drm/i915: add B43 chipset support"
    - LP: #480144
  * Revert "ACPI: Attach the ACPI device to the ACPI handle as early as
    possible"
    - LP: #327499, #480144
  * SCSI: Retry ADD_TO_MLQUEUE return value for EH commands
    - LP: #461556
  * SCSI: Fix protection scsi_data_buffer leak
    - LP: #461556
  * SCSI: sg: Free data buffers after calling blk_rq_unmap_user
    - LP: #461556
  * ARM: pxa: workaround errata #37 by not using half turbo switching
    - LP: #461556
  * tracing/filters: Fix memory leak when setting a filter
    - LP: #461556
  * x86/paravirt: Use normal calling sequences for irq enable/disable
    - LP: #461556
  * USB: ftdi_sio: remove tty->low_latency
    - LP: #461556
  * USB: ftdi_sio: remove unused rx_byte counter
    - LP: #461556
  * USB: ftdi_sio: clean up read completion handler
    - LP: #461556
  * USB: ftdi_sio: re-implement read processing
    - LP: #461556
  * USB: pl2303: fix error characters not being reported to ldisc
    - LP: #461556
  * USB: digi_acceleport: Fix broken unthrottle.
    - LP: #461556
  * USB: serial: don't call release without attach
    - LP: #461556
  * USB: option: Toshiba G450 device id
    - LP: #461556
  * USB: ipaq: fix oops when device is plugged in
    - LP: #461556
  * USB: cp210x: Add support for the DW700 UART
    - LP: #461556
  * USB: Fix throttling in generic usbserial driver
    - LP: #461556
  * USB: storage: When a device returns no sense data, call it a Hardware
    Error
...

Changed in linux-fsl-imx51 (Ubuntu Karmic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.