Ubuntu-KVM:Install Ubuntu16.04.01 OS consistently failed on big size DASD Libra HE10 8TB drive

Bug #1619470 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
debian-installer (Ubuntu)
Invalid
Undecided
Dimitri John Ledkov

Bug Description

==== State: Open by: nguyenp on 16 August 2016 14:48:29 ====

Problem Description:
====================
Intall Ubuntu16.04.01 OS on big DASD drive 8 Terabyptes Libra HE10 consistently failed.

- I have a Libra HE10 8TB drive in slot UESLL.001.G66S02C-P1-D12 of in Slider LFF draw.

- From Ubuntu-KVM host, I tried to install Ubuntu16.04.01 OS on Libra HE10 8TB drive for a KVM guest. The OS installation went successfully.

- However, when reboot right after finish OS installing, it hit "SCSI-DISK: Access beyond end of device" and dropped into grub rescue>

- I have tried install with different Libra HE10 8TB drives and the problem is the same. I then tried OS install on smaller disk with 4TB AriesKB drive and it works fine.

Please see more details below:
==============================

- Libra HE10 8TB drive is in UESLL.001.G66S02C-P1-D12 of Slider LFF draw
   sdl 0001:03:00.0/0:0:12:0 Physical 4K Disk Active

Manufacturer . . . . . . . . . . . . . . : IBM
Product ID . . . . . . . . . . . . . . . : HUH721008AL4200
Firmware Version . . . . . . . . . . . . : 41323236 (A226)
Serial Number. . . . . . . . . . . . . . : 7SG13LBR
Capacity . . . . . . . . . . . . . . . . : 8001.58 GB
Resource Name. . . . . . . . . . . . . . : /dev/sdl

Physical location
PCI Address. . . . . . . . . . . . . . . : 0001:03:00.0
Resource Path. . . . . . . . . . . . . . : 00-02-0B
SCSI Host Number . . . . . . . . . . . . : 0
SCSI Channel . . . . . . . . . . . . . . : 0
SCSI Id. . . . . . . . . . . . . . . . . : 12
SCSI Lun . . . . . . . . . . . . . . . . : 0
Platform Location. . . . . . . . . . . . : UESLL.001.G66S02C-P1-D12

- I installed Ubuntu16.04.01 OS on Libra HE10 8TB drive for a KVM guest, the installation went successfully.

Grub errors seen after install:
...
Trying to load: from: /vdevice/v-scsi@3000/disk@8000000000000000 ... Successfully loaded
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
error: failure reading sector 0x1a4445e08 from `ieee1275/disk'.
Entering rescue mode...
grub rescue>

- I displayed ls cmd at grub rescue>

grub rescue> ls
(ieee1275/disk) SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
(ieee1275/disk,gpt3) (ieee1275/disk,gpt2) (ieee1275/disk,gpt1) (
ieee1275/disk) SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
SCSI-DISK: Access beyond end of device !
(ieee1275/disk,gpt3) (ieee1275/disk,gpt2) (ieee1275/disk,gpt1)
grub rescue>

== Comment: #20 - Kevin W. Rudd - 2016-09-01 17:15:17 ==
The install completes without error, but grub does not seem to be happy with the resulting partitioning.

One odd observation is that the resulting installed OS is treating the drive as if it had 512 byte sectors, but the underlying disk is using 4K sectors:

Partition layout as seen by the host:
-------------------------
# fdisk -l /dev/sdh
Disk /dev/sdh: 7.3 TiB, 8001563222016 bytes, 1953506646 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device Boot Start End Sectors Size Id Type
/dev/sdh1 1 4294967295 4294967295 16T ee GPT

# parted /dev/sdh print
Error: /dev/sdh: unrecognised disk label
Model: IBM HUH721008AL4200 (scsi)
Disk /dev/sdh: 8002GB
Sector size (logical/physical): 4096B/4096B
Partition Table: unknown
Disk Flags:

Partition layout as reported by guest OS:
-------------------------
# fdisk -l /dev/sda
Disk /dev/sda: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: B5B215ED-9F61-4761-B6F8-B83C787BC306

Device Start End Sectors Size Type
/dev/sda1 2048 16383 14336 7M PowerPC PReP boot
/dev/sda2 16384 15579092991 15579076608 7.3T Linux filesystem
/dev/sda3 15579092992 15628052479 48959488 23.4G Linux swap

# parted /dev/sda print
Model: QEMU QEMU HARDDISK (scsi)
Disk /dev/sda: 8002GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
 1 1049kB 8389kB 7340kB prep
 2 8389kB 7976GB 7976GB ext4
 3 7976GB 8002GB 25.1GB linux-swap(v1)

Revision history for this message
bugproxy (bugproxy) wrote : trimmed sosreport from KVM host

Default Comment by Bridge

tags: added: architecture-ppc64 bugnameltc-145157 severity-high targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : /var/log from installed guest

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : dmesg from guest rescue mode boot

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → debian-installer (Ubuntu)
Kevin W. Rudd (kevinr)
summary: - STC860:Tuleta-L:gp2fp1:Ubuntu-KVM:Install Ubuntu16.04.01 OS consistently
- failed on big size DASD Libra HE10 8TB drive
+ Ubuntu-KVM:Install Ubuntu16.04.01 OS consistently failed on big size
+ DASD Libra HE10 8TB drive
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-09-07 15:22 EDT-------
Hello Canonical.

Can we get a status update on this bug?

Thanks.

Changed in debian-installer (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Dimitri John Ledkov (xnox)
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

The guest qemu usually sees passthrough virtualised disks as 512/512, despite actually being a 4k disk. The partitioning tables should still work correctly.

The actual errors observed remind disk-size maximum limitations of grub itself - 8TB is large.

I will investigate grub source code further to find the root cause. In the mean time, one workaround that you could try is to perform manual partitioning and create a separate /boot mount point:

(PReP partion should be automatically created/enforced)
/boot - 1GB
/ - 7.2TB
swap - 25.1GB

Or something similar. Hopefully making grub succeed by not reading the biggest partition to retrieve and boot the kernel.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

"Tags: architecture-ppc64 bugnameltc-145157 severity-high targetmilestone-inin--- "

Currently ubuntu doesn't have a PPC64 big-endian port, we only have little-endian one. Previously bugs for Ubuntu were synced using the architecture-ppc64le tag, now I see a mix of both. Could we please consolidate the tags and use architecture-ppc64le?

At least my subscriptions are sensitive to these tags.

Changed in grub2 (Ubuntu):
assignee: nobody → Dimitri John Ledkov (xnox)
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-07 18:16 EDT-------
(In reply to comment #27)
> "Tags: architecture-ppc64 bugnameltc-145157 severity-high
> targetmilestone-inin--- "
>
> Currently ubuntu doesn't have a PPC64 big-endian port, we only have
> little-endian one. Previously bugs for Ubuntu were synced using the
> architecture-ppc64le tag, now I see a mix of both. Could we please
> consolidate the tags and use architecture-ppc64le?
>
> At least my subscriptions are sensitive to these tags.

Sorry. Our fault for not catching the wrong arch before mirroring. It has been corrected to be ppc64le on our side.

tags: added: architecture-ppc64le
removed: architecture-ppc64
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Trying to reproduce this issue using sparse files:

$ qemu-img create -f qcow2 disk.qcow2 8T

Next launching qemu, with physical & logical sectors set to 4k, booting off yakkety-server-ppc64el.iso (launchpad will wrap, but below is all one command)

$ qemu-system-ppc64le -M pseries -cpu POWER8 -m 2048 -vga none -nographic -cdrom yakkety-server-ppc64el.iso -enable-kvm -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=hd,physical_block_size=4096,logical_block_size=4096 -drive if=none,id=hd,file=disk.qcow2,format=qcow2

completing installation, rebooting, things seem to work correctly, fdisk correctly reports 4k physical and logical sector sizes.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-09 15:08 EDT-------
cde00 (<email address hidden>) added native attachment /tmp/AIXOS06128263/dmesg on 2016-09-09 14:04:08
cde00 (<email address hidden>) added native attachment /tmp/AIXOS06128263/log.tbz on 2016-09-09 14:04:08

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Failing to reproduce this bug report when qemu options are set to expose the scsi drive as 4k/4k inside the guest, before the install is performed, and during subsequent. Please try again, with physcial/logical sector sizes set to 4k in the qemu invocation.

no longer affects: slof (Ubuntu)
no longer affects: qemu (Ubuntu)
no longer affects: grub2 (Ubuntu)
Changed in debian-installer (Ubuntu):
status: New → Invalid
bugproxy (bugproxy)
tags: removed: bugnameltc-145157 severity-high
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-09 15:43 EDT-------
Thanks. Passing the recommendations on to the submitter.

tags: added: bugnameltc-145157 severity-high
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-09 16:37 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-09-22 14:43 EDT-------
cde00 (<email address hidden>) added native attachment /tmp/AIXOS06128263/slof.bin on 2016-09-22 13:38:24

Revision history for this message
bugproxy (bugproxy) wrote : Add read-capacity-16 and read-16 scsi command in SLOF

------- Comment on attachment From <email address hidden> 2016-09-28 02:03 EDT-------

There were two new scsi commands needs to be added for booting disks more than 2TB (sector size 512bytes)

read-capacity-16 - Number of blocks is 8bytes
read-16: 8 bytes block number

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-10-11 16:43 EDT-------
Nikunj,

Are the required patches already upstream? If so, which are the commit ids?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-10-11 23:47 EDT-------
(In reply to comment #49)
> Nikunj,
>
> Are the required patches already upstream? If so, which are the commit ids?

Not yet, I have posted it here with comments addressed:

https://patchwork.ozlabs.org/patch/680993/

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-10-19 13:33 EDT-------
(In reply to comment #51)
> Nikunj - what is the latest status?

Patches are upstream now:

http://git.qemu.org/?p=SLOF.git;a=commit;h=cd8b261a9e68480a8486344d8b32ed4d46db48a4
http://git.qemu.org/?p=SLOF.git;a=commit;h=13ed4d27756b2e970c662c05da66bcd89b6df543

SLOF update is queued for QEMU 2.8 release here:

https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg04334.html

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-10-27 12:04 EDT-------
==== State: Verify by: nguyenp on 27 October 2016 10:30:43 ====

Issue has been resolve. Close out the defect.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.