"failure writing to sector" error at boot with Advanced Format Disks

Bug #1253443 reported by Kent Baxley on 2013-11-20
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Dell-poweredge project
High
Kent Baxley
grub2 (Ubuntu)
High
Colin Watson
Precise
High
Colin Watson

Bug Description

SRU justification:

[Impact] Unable to save GRUB environment on 4K-sector disks, which means the "recordfail" feature won't work there (i.e. the boot menu won't be forcibly displayed if the previous boot failed). In the case at hand, this resulted in the boot being interrupted with an error message without much harm being done; however, it just occurred to me that this could result in data corruption if /boot/grub/grubenv is situated in the first 1/8th of the disk, so we definitely need to fix this.
[Test Case] Install on an Advanced Format disk and make sure it boots without any error messages from GRUB.
[Regression Potential] Confined to the disk writing code, which is rarely used in GRUB; in 12.04, it's only used by the gptsync, parttool, and save_env commands, only save_env of which is widely used.

Original report follows:

Testing the 14.04 daily server images, I noticed a minor error popping up at boot time. See attached photo.

I call this 'minor' because the system will still boot by itself within 10 seconds or so of the error popping up.

This only seems to happen so far with Advanced Format disks when installed in EFI mode. I tried two different hard drives and got the same result.

Systems that are not equipped with Advanced Format disks but installed in EFI mode do not show this error.

Steps to reproduce:

1) Install 14.04 amd64 daily server build. Select the 'use entire disk' option to auto-partition the drive.
2) System needs to have an Advanced Format disk installed in it and needs to be booted in EFI mode.

Actual Results:
Installation finshes fine but upon reboot the "failure writing to sector" error pops up after the grub menu. System will boot on its own after about 10 seconds.

Expected Results:
Boot without the error message.

The disk layout on this drive according to Parted:

Model: SEAGATE ST91000640SS (scsi)
Disk /dev/sda: 1000GB
Partition Table: gpt

Number Start End Size File system Name Flags
 1 1049kb 512MB 511MB boot
 2 512MB 983GB 983GB
 3 983GB 1000GB 17.2GB

Kent Baxley (kentb) wrote :
Kent Baxley (kentb) wrote :
description: updated
Steve Langasek (vorlon) wrote :

The error from the screenshot is:

 error: failure writing sector 0x1a938000 to `hd1'.

 Press any key to continue...

Phillip Susi (psusi) wrote :

Can you run sudo hdparm -I on this drive and show the output?

Kent Baxley (kentb) wrote :

hdparm output below. The logical/physical sector size seems incorrect since this is a 4k/4k disk:

test@dhcp-166-246:~$ sudo hdparm -I /dev/sda

/dev/sda:
SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 01 cf 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ATA device, with non-removable media
Standards:
 Likely used: 1
Configuration:
 Logical max current
 cylinders 0 0
 heads 0 0
 sectors/track 0 0
 --
 Logical/Physical Sector size: 512 bytes
 device size with M = 1024*1024: 0 MBytes
 device size with M = 1000*1000: 0 MBytes
 cache/buffer size = unknown
Capabilities:
 IORDY not likely
 Cannot perform double-word IO
 R/W multiple sector transfer: not supported
 DMA: not supported
 PIO: pio0

Kent Baxley (kentb) wrote :

Also attempted partitioning by hand at install time using a 1GB EFI partition instead of a 512MB one. Not surprisingly it didn't seem to change anything with regard to the error going away.

Phillip Susi (psusi) wrote :

Are you using a usb enclosure or something? What does cat /sys/block/sda/queue/hw_sector_size say?

I'm guessing you are using a usb enclosure that doesn't support the IDENTIFY command so the system can't detect it's a 4k sector drive, and I don't think there's anything we can do about that. I'll be it will work fine if you connect the drive directly.

Kent Baxley (kentb) wrote :

/sys/block/sda/queue/hw_sector_size says '4096'. The drive is directly connected to an LSI SAS controller in the server.

Phillip Susi (psusi) wrote :

Nevermind, I'm a dingbat... hdparm doesn't work because it's not ata.

Kent Baxley (kentb) on 2013-11-21
Changed in dell-poweredge:
assignee: nobody → Kent Baxley (kentb)
importance: Undecided → High
status: New → In Progress
Colin Watson (cjwatson) wrote :

Phillip, this is *far* more likely to be an internal bug in GRUB. If it weren't identifying the disk properly due to a hardware problem, then reads would be failing too.

Changed in grub2 (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Colin Watson (cjwatson) wrote :

I'm looking into this. The EFI status code is EFI_INVALID_PARAMETER, so possibly an alignment problem?

Changed in grub2 (Ubuntu):
assignee: nobody → Colin Watson (cjwatson)
status: Triaged → In Progress
Colin Watson (cjwatson) wrote :

The firmware reports io_align == 0 for that block IO medium, so it probably isn't alignment. Looking into other possibilities.

Colin Watson (cjwatson) wrote :

No, it's actually an out-of-range write, almost certainly just multiplied up by the sector size over 512 as you might expect. http://git.savannah.gnu.org/gitweb/?p=grub.git;a=commitdiff;h=4dedb13f51bd467f00a2da9c3611f7c00519f889 looks very promising so I'll try that next.

Colin Watson (cjwatson) wrote :

That was it. Backporting now.

Changed in grub2 (Ubuntu Precise):
milestone: none → ubuntu-12.04.4
importance: Undecided → Medium
assignee: nobody → Colin Watson (cjwatson)
status: New → Triaged
Changed in grub2 (Ubuntu):
status: In Progress → Fix Committed
Colin Watson (cjwatson) on 2013-12-12
Changed in grub2 (Ubuntu Precise):
status: Triaged → In Progress
importance: Medium → High
Changed in grub2 (Ubuntu):
importance: Medium → High
Colin Watson (cjwatson) on 2013-12-12
description: updated
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.00-22

---------------
grub2 (2.00-22) unstable; urgency=low

  * Backport from upstream:
    - On Linux, read partition start offsets from sysfs if possible
      (LP: #1237519).
    - Fix sector number when writing to non-512B disks (LP: #1253443).
  * Regularise indentation of "recordfail" in /etc/grub.d/10_linux.

 -- Colin Watson <email address hidden> Thu, 12 Dec 2013 01:24:11 +0000

Changed in grub2 (Ubuntu):
status: Fix Committed → Fix Released
Kent Baxley (kentb) wrote :

Confirmed 14.04 boots cleanly now. Thanks!

Will also test 12.04 when I see the notice go out on this bug report.

Changed in dell-poweredge:
status: In Progress → Fix Committed

Hello Kent, or anyone else affected,

Accepted grub2 into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/grub2/1.99-21ubuntu3.15 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in grub2 (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Kent Baxley (kentb) wrote :

Confirmed error is gone with the grub2 from proposed on Advanced Format Drives.

tags: added: verification-done
removed: verification-needed
Kent Baxley (kentb) wrote :

Specific version is 1.99-21ubuntu3.15 for Precise.

The verification of the Stable Release Update for grub2 has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 1.99-21ubuntu3.15

---------------
grub2 (1.99-21ubuntu3.15) precise; urgency=low

  [ Colin Watson ]
  * Backport from upstream:
    - Fix sector number when writing to non-512B disks (LP: #1253443).

  [ Yang Bai ]
  * Backport from upstream:
    - Support high resolution for grub terminal (LP: #1297128).
 -- Colin Watson <email address hidden> Tue, 20 May 2014 10:52:20 +0100

Changed in grub2 (Ubuntu Precise):
status: Fix Committed → Fix Released
Kent Baxley (kentb) on 2014-07-01
Changed in dell-poweredge:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers