OS fails to boot certain SATA drives in AHCI mode.

Bug #741799 reported by Kent Baxley
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OEM Priority Project
Fix Released
High
Unassigned
grub2 (Ubuntu)
Invalid
High
Unassigned
Natty
Invalid
High
Unassigned

Bug Description

Binary package hint: grub

Ubuntu Release: Maverick and Natty

Summary: With certain 512 sector and 4k Sector Drives in 512e mode, the systems fail to boot after installing the Operating System when the SATA controller is set to AHCI mode in the BIOS.

The issue has been observed on two Seagate 4Ke drives and one 4Ke Hitachi drive with the following model numbers and sizes:

ST9500423AS 500GB
ST250LT007 250GB
HTS547564A9E384 640GB

One 512B Hard Drive from Hitachi was also found to be problematic:

HTS723232A7A64 320GB

With the Problem drives, the following is observed with these BIOS
settings on certain Dell Latitude machines.

AHCI Mode - The image appears to install fine, but afterward the systems won't boot the OS. We are dropped to a grub prompt or a black screen with blinking cursor.

Raid On - Works without issue.

ATA Mode - Works without issue.

Natty also appears to be affected by this as well. We get similar
symptoms where the OS doesn't boot afer a seemingly fine installation.

Steps to reproduce:
1) Install Maverick or Natty onto one of the above disks.
2) Reboot

Actual result: At this stage, in AHCI mode, the system is either dropped to a black screen (as observed in the factory installations), If Grub is installed to the MBR of the affected drives in AHCI mode, we are dropped to a grub or grub rescue prompt.
Switching to either ATA mode or Raid On in the BIOS allows the OS to boot just fine.

Expected result: Operating System boots regardless of the SATA Mode in the BIOS with these hard disks.

Revision history for this message
Kent Baxley (kentb) wrote :

It almost seems like a problem with the way that GRUB is reading the drive.

I say this because you can try to cat /boot/grub/grub.cfg and it's not reading out the right values. If you boot a USB stick while the drive is in AHCI mode, you can cat out those values using the kernel AHCI driver without troubles.

For example:

In AHCI mode, when the problem reproduces, run the following from the grub prompt:

ls (hd0,msdos1)/boot

....grub can't find anything in the /boot directory at all in the majority of cases. If, by chance, you do happen to see a grub/grub.cfg file, running:

cat (hd0,msdos1)/boot/grub/grub.cfg

...will read out garbage on the console.

In ATA mode, where the system boots fine, at the grub menu, get to a grub prompt and run the same command. You'll see that grub *does* see a grub/grub.cfg and you can cat (hd0,msdos1)/boot/grub/grub.cfg successfully.

Revision history for this message
Kent Baxley (kentb) wrote :

We can get the operating system to boot 'by hand' in AHCI mode by doing the following:

The system was reinstalled using the AHCI SATA option in BIOS. After installation, we sre dropped to a 'grub rescue' prompt when the OS tries to boot.

Typing the 'set' command at the grub prompt reveals that the prefix is set correctly.

prefix=(hd0,msdos1)/boot/grub

From there, the kernel and initrd can be loaded manually, the root device set, and the OS will boot.

grub> linux /vmlinuz root=/dev/sda1 ro
grub> initrd /initrd.img
grub> boot

Those steps will get the operating system booted in AHCI mode. None of this is required if the SATA mode in the BIOS is changed to ATA or RaidOn.

Chris Van Hoof (vanhoof)
Changed in oem-priority:
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
Kent Baxley (kentb) wrote :

The 4Ke drives, by the way, are currently set up in 512 emulation mode. We do have a few Western Digital, Toshiba, and one Seagate drive of this type (4k in 512 emulation mode) that do not have any issues when attempting to boot in AHCI mode.

affects: grub (Ubuntu) → grub2 (Ubuntu)
Revision history for this message
Robbie Williamson (robbiew) wrote :

For the record: I understand why this is Critical to Canonical OEM, but given there is a workaround, I'm marking it as High from an Ubuntu perspective.

Changed in grub2 (Ubuntu Natty):
status: New → Incomplete
status: Incomplete → Confirmed
importance: Undecided → High
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Revision history for this message
Colin Watson (cjwatson) wrote :

It may not be anything to do with this since you mentioned one 512-byte-sector drive was affected too, but I wonder if this patch to fetch the sector size from the BIOS might help. I should warn that I've never actually been able to test this directly, beyond knowing that it still boots in KVM.

tags: added: patch
Chris Van Hoof (vanhoof)
tags: added: hwe-blocker
Revision history for this message
Kent Baxley (kentb) wrote :

I just tested the patch on a Latitude on a 4K-sector drive that was known to have problems in AHCI mode. I was able to boot the operating system all the way with this drive in AHCI, RaidOn, and ATA modes.

Tomorrow, I'll need to double and triple check my results with a few other disks just to make sure, as well as try one of the known bad 512-sector drives.

Thanks, Colin!

Revision history for this message
Colin Watson (cjwatson) wrote : Re: [Bug 741799] Re: OS fails to boot certain SATA drives in AHCI mode.

Wow. I genuinely didn't really expect that to work. :-)

If it checks out more widely, I'll send it upstream for review.

Revision history for this message
Kent Baxley (kentb) wrote :

It turns out that the patch does not work after all.

The drive that I used that was known to be bad in some machines was actually working in AHCI mode all along, patch or no patch.

So, I took another disk that had issues and tried it out with the grub patch. This time, I did not get any improvement in behaivor. When booting in AHCI mode, the system drops me to the 'grub rescue' prompt as before.

Revision history for this message
Kent Baxley (kentb) wrote :

@Colin,

Current suspicion from BIOS side is LBA48 (48 bit logical block addressing) may be causing some of the issues we're seeing on various drives. Are there any areas in that part of the grub code that could be causing issues?

Revision history for this message
Kent Baxley (kentb) wrote :

We tested some potential BIOS fixes for this issue, and so far everything is looking good with the drives that would not boot previously. In a nutshell, my understanding is that there was a problem with the way the BIOS was reading some of the LBA information off the drives in question.

@Colin, is there anything in the grub2 code that would possibly cause issues down the road with what was mentioned in comment #9? If not, then I think we can close this one out in the next few days pending an OK from the customer.

Thanks for your help.

Changed in oem-priority:
importance: Critical → High
Revision history for this message
Colin Watson (cjwatson) wrote :

I was under the impression that we were generally LBA48-clean, but it does indeed rely on the BIOS also being LBA48-clean, so that's a plausible source for the problem.

Revision history for this message
Tony Espy (awe) wrote :

@Colin

Is LBA48 deprecated at this point?

If so, then it makes sense that neither GRUB, nor the BIOS use it. If the BIOS isn't LBA48-clean, is there anyway to detect this?

Revision history for this message
Colin Watson (cjwatson) wrote :

I don't know why LBA48 would be deprecated. How else should one access
data beyond 128 GiB using BIOS facilities (a common boot loader
requirement)?

The easiest way to detect this is likely to attempt to read data from
either side of the 128 GiB boundary. If both succeed, try bisecting
through the disk until you find something that doesn't correspond to
what the OS sees, and then work out what that boundary might correspond
to.

Revision history for this message
Chris Van Hoof (vanhoof) wrote :

We'll be able to close this bug once we receive a testing update from Kent

Revision history for this message
Kent Baxley (kentb) wrote :

Test BIOSes on Vida and Krug showed to fix the issue on the problematic drives, so I think we are now good to go. An official BIOS will go out in the next block update, which is in late Spring / Early Summer, I think.

9 Drives were found to be problematic in the factory. Of the 9, 3 are going to be permanently restricted.

Until the official BIOS update lands, the remaining 6 problematic drives are simply going to be restricted for now, and the restriction will be lifted once the BIOS goes out.

Revision history for this message
Chris Van Hoof (vanhoof) wrote :

Marking the grub2 task here as invalid as this was a BIOS issue

Changed in oem-priority:
status: Confirmed → Fix Released
Changed in grub2 (Ubuntu Natty):
status: Confirmed → Invalid
assignee: Canonical Foundations Team (canonical-foundations) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.