Install to 3TB disk fails with "attempt to read or write outside of disk" error on reboot

Bug #1284196 reported by Rod Smith
42
This bug affects 9 people
Affects Status Importance Assigned to Milestone
ubiquity (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I performed a test installation of Trusty server to a VirtualBox installation with a 3TiB virtual disk. I chose default options for the most part, although I opted for a non-LVM partition layout. Ubiquity seemed to successfully install Ubuntu, but on reboot, I got the following GRUB error:

Error: attempt to read or write outside of disk `hd0'.
Entering rescue mode...
grub rescue>

It appears that Ubiquity set up a single giant root (/) partition; as shown by gdisk:

Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 6442450944 sectors, 3.0 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 042174C8-0513-49FC-A04B-579D7A01D723
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 6442450910
Partitions will be aligned on 2048-sector boundaries
Total free space is 4029 sectors (2.0 MiB)

Number Start (sector) End (sector) Size Code Name
   1 2048 4095 1024.0 KiB EF02
   2 4096 6440355839 3.0 TiB 8300
   3 6440355840 6442448895 1022.0 MiB 8200

My suspicion is that the kernel ended up above the 2TiB mark and so became unloadable to GRUB, either because of a limitation of GRUB or of the BIOS used by VirtualBox. Re-running the installation with manual partitioning and a separate /boot partition at the start of the disk resulted in a working installation.

This issue could result in failures of automated installations and certification testing, should the target system have an over-2TiB disk. My recommendation is that ubiquity default to creating a separate /boot partition at the start of the disk when doing a BIOS-mode installation on an over-2TiB disk.

Revision history for this message
Rod Smith (rodsmith) wrote :

Upon further investigation, it seems to be the placement of the grub.cfg file, not of the kernel, that's causing GRUB to flake out. Using debugfs, I found that grub.cfg resides at blocks 748716118 and 748716119. Given a 4KiB block size, that works out to about the 2.79TiB mark on the disk, which is presumably above a GRUB or BIOS 2^32 sector limit. Attempting to do an "ls (hd0,gpt2)/boot/grub" from the "grub rescue>" prompt results in the same "attempt to read or write outside disk" error message noted earlier.

This therefore looks like a GRUB and/or BIOS limitation; however, because larger disks are becoming increasingly common, I believe it's prudent to work around this bug in Ubiquity, at least unless and until a workaround or fix can be added to GRUB.

Revision history for this message
Phillip Susi (psusi) wrote :

Hrm... the problem seems to be that you are bios booting on an GPT disk and have no bios_grub partition, combined with a buggy bios. The installer needs to create auto create bios_grub partition on GPT disks.

Revision history for this message
Phillip Susi (psusi) wrote :

Nevermind, I'm just not used to looking at it through the eyes of gdisk... your bios_grub partition is at the start of the disk.

I'm quite surprised that virtualbox's bios would have this bug. Given the problems that come with creating a separate /boot partition by default, and at least so far, lack of evidence that such a bug is common on real hardware, this probably won't be changed just as it wasn't for some older machines that couldn't access beyond 8 GB.

Revision history for this message
Jason Harvey (jason-alioth) wrote :

I'm getting this on actual hardware with a 3TB disk. Same issue with grub.cfg existing in blocks beyond the 2^32 sector limit. An ls (hd0,gpt2)/boot/grub results in the 'attempted to read or write outside of disk (hd0)' error. Seems like most other files on the disk can be accessed by grub AOK.

Mobo is a Gigabyte GA-Z68X-UD7-B3 - couple years old.

Not a big deal for me as I just setup a /boot partition to workaround the issue. Might catch others offguard, though.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubiquity (Ubuntu):
status: New → Confirmed
Revision history for this message
Jason Harvey (jason-alioth) wrote :

Also, I'm running the latest available BIOS.

Revision history for this message
Rod Smith (rodsmith) wrote :

I've seen another report of this problem on a real Cisco server, on IRC in #maas:

<ivoks> have you guys tested maas (in 14.04) with disks > 2TB?
<ivoks> with GPT partition table
<ivoks> something is not right on Cisco UCS servers
<rodsmith> ivoks: Would you happen to be seeing https://bugs.launchpad.net/ubuntu/+source/ubiquity/+bug/1284196 ?
<ivoks> rodsmith: yes
<ivoks> it does set one giant /
<rodsmith> ivoks: If you can switch the firmware to EFI-mode booting, that should clear it.
<ivoks> but i don't think that's a problem
<ivoks> it is in efi boot mode
<rodsmith> ivoks: The Ciscos I've seen don't PXE-boot in EFI mode. :-(

Revision history for this message
David Peall (dkpeall) wrote :

Tried with 14.04.2 fresh install also broken on 4TB drive.

Revision history for this message
Alexander List (alexlist) wrote :

I experienced the same problem on a System running 15.04. I know that's EOL, but just for reference this is how I ended up in this situation:

My system is a server (Supermicro SYS-6028R-WTRT, latest firmware) with a 4TB RAID1 for the OS, and another RAID volume for data. The system supports legacy and UEFI boot, but due to the disk size involved I disabled legacy or "dual mode" booting and forced the system to UEFI only so that the installer would dtrt on an EFI system.

I used MAAS to install it. MAAS (or, the preseeds that ship with it) created a 4TB root fs (cloudimg-rootfs) and an EFI partition.

That is the fundamental issue here: grub cannot deal with files outside the 2TB limit it seems. I tried to manually load the initrd, and got a similar error message.

So I started using the rescue boot provided by the Ubuntu Server install ISO to repartition.

In the end, I created separate LVM logical volumes on the second RAID, moved /usr /var /home to separate filesystems in LVM, and eventually reduced the space consumed by the rootfs to a manageable size.

I created another LV for the root filesystem and one for swap (the MAAS install used /swap.img on the root fs...)

Then I copied all the directories from the root partition (/bin, /sbin, /lib etc) to that new filesystem.

After editing fstab in the new root, and rebooting with the rescue image using the new LV as root FS, I was able to delete my /dev/sda1, recreate it with ~1.5TB of space, and moved everything that was in /boot on the root filesystem over there.

Summary: I made sure that grub, kernel etc live in an area of the disk that grub can access.

After repartitioning, I did update-grub, reinstalled the kernel just to be sure, and rebooted - the system now boots fine again.

My suggestion is to still create a ~1GB /boot partition even on EFI systems. And in case the installer still defaults to ~256M in case of legacy systems, that should get fixed as well if the disk is large enough - autoremoving old kernels was only fixed recently :)

Revision history for this message
Rod Smith (rodsmith) wrote :

I've tried to reproduce Alexander's report of an EFI-mode GRUB failure without success:

I first created a VirtualBox BIOS-mode Ubuntu 15.10 installation in a 4TiB virtual disk. To "stack the deck" in favor of a failure, I created a separate /boot partition at the END of the disk, ensuring that kernels, initrd files, and GRUB's configuration files and modules all resided over the 2TiB mark. As expected, this failed to boot in BIOS mode, as in my initial bug report, with one exception: The error message was "unknown filesystem," not "attempt to read or write outside of disk `hd0'."

I then switched to EFI mode and used rEFInd to boot the computer, which worked fine; clearly the VirtualBox EFI has no problems reading beyond the 2TiB mark. (rEFInd relies on EFI system calls to read all its files.)

With the system booted, I installed the EFI version of GRUB, including "sudo grub-install" and "sudo update-grub." I then rebooted, and GRUB was able to boot the computer.

Thus, I think we can rule out the possibility of GRUB itself, at least in EFI mode, having problems reading beyond the 2TiB mark. I see two possible causes of the problem that Alexander reports:

* Alexander may have mistakenly installed in BIOS mode rather than in EFI mode. This is notoriously easy to do, even for experienced users.
* The EFI in Alexander's Supermicro computer may have a bug akin to the one in VirtualBox's "BIOS." This is a more disturbing possibility, because it means that, if such bugs are common, we may be seeing more of this problem in the future, as disk sizes increase, even on EFI-based computers.

Revision history for this message
Alexander List (alexlist) wrote :

Hi rodsmith,

thanks a lot for your follow-up analysis.

I can confirm that I saw this issue with the server forcibly switched to EFI only, and using grub-efi.

I cannot easily reproduce this because the system is in production use, but I will nevertheless send a report to Supermicro with my findings, so they can try to reproduce it on identical hardware and with 15.10 in EFI mode. Supermicro are usually very responsive and provide fixed BIOS within a very reasonable time frame.

Revision history for this message
Rod Smith (rodsmith) wrote :

At the request of Samantha Jian-Pielak, I've run some additional tests. I installed Ubuntu 16.04.2 desktop on a computer using an ASUS P8-H77I motherboard with an American Megatrends 2.31 UEFI and a Toshiba 3TB hard disk, putting a ~700 MB /boot partition at the END of the disk, above the 2 TiB mark. (I did not bother with a BIOS-mode install.) The computer installed and booted fine. I then copied /boot/efi/EFI/ubuntu/ to /boot/efi/EFI/BOOT and renamed shimx64.efi in the target directory to bootx64.efi, so as to make it bootable using the fallback filename for the next step. This step was to move the hard disk to a second UEFI-based computer, built around an MSI A88X-G43 motherboard, which also has an American Megatrends 2.31 UEFI. The installed system booted fine here, too.

Thus, it looks like these two systems do not have a bug that prevents booting from over-2TiB disks. I cannot rule out the possibility of such bugs on other computers, particularly those that use other EFI implementations.

Revision history for this message
DigiAngel (jlay) wrote :

I have this exact same issue.

Revision history for this message
DigiAngel (jlay) wrote :
Revision history for this message
DigiAngel (jlay) wrote :

I was able to boot this device with a 4.4.0 kernel, but 4.8.0 kernels give me "attempt to read or write outside of disk 'hd0'.".

Revision history for this message
Phillip Susi (psusi) wrote :

The issue isn't related to the kernel used. It happens in GRUB before it has loaded a kernel and the underlying bug is in the bios.

Changed in ubiquity (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
Aaditya Bagga (chk1827) wrote :

I was hit by this bug too when installing Zentyal server 5.0 (based on Ubuntu Server 16.04 with Ubiquity) [1] on a Dell Poweredge R530 with a 4Tb Raid 5 hard disk.

If I chose guided partitioning, Ubiquity created 3 partitions, 1 Mb bios_boot partition, ~ 4 Tb root partition, and 16 Gb Swap partition.

With above partitioning and booting in BIOS mode, I got "Error: attempt to read or write outside of disk `hd0'.".

While reinstalling I added a 2Gb /boot partition before the root (/) partition, it worked and was able to boot into the installed system.

TL;DR: Grub failed on 4Tb / partition *in BIOS mode*, worked when a /boot partition was created.

[1]: https://wiki.zentyal.org/wiki/Installation_Guide#Zentyal_Installer

Revision history for this message
Filofel (filofel) wrote (last edit ):

I ran into the same bug too, booting 20.04.4 from a 4TiB GPT partition partition on a 4TiB disk.
I have grub bootblocks installed in a bios-grub partition, sectors 34-2047.
The problem seems to be that by default, Grub uses BIOS drivers to load files from the target partition (32-bit LBAs, access limited to the first 2TiB of the disk). This is (tersely) documented in the GNU GRUB manual, command "nativedisk".
When using native grub drivers (ahci in my case), everything works (64-bit LBAs).
I solved the problem by re-running a grub-install with parameter
--disk-module=ahci
The problem with that approach is that any further grub-install without those parms (like an Ubuntu software update might decide to do) will zap the native driver from the Grub partition, and break the boot again.

grub-install should never generate a broken boot when it can avoid it:
Ideally, when grub detects that at least one of its target partitions crosses the 2TiB boundary, it should give a warning and do a grub-install with the --disk-module=MODULE parameter.

4TB SSD disk prices dropping fast (below 350€ these days). This problem might increasingly show up...

Revision history for this message
Rod Smith (rodsmith) wrote :

I've reset this from "Won't Fix" to "Confirmed" because I believe Filofel's analysis merits another look at the issue.

Changed in ubiquity (Ubuntu):
status: Won't Fix → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.