Can't boot VM with more than 16 disks (slof buffer issue)

Bug #1734856 reported by bugproxy on 2017-11-28
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
SLOF - Slimline Open Firmware
New
Unknown
The Ubuntu-power-systems project
Critical
Canonical Server Team
slof (Ubuntu)
Critical
David Britton
Xenial
Undecided
Unassigned
Zesty
Undecided
Unassigned
Artful
Undecided
Unassigned
Bionic
Critical
David Britton

Bug Description

[Impact]

 * Booting a KVM guest with many disks considered as potential boot device
   fails on ppc64le

 * In detail this was an overflow, so now the processing of devices is
   changed to use dynamic allocation which works with higher numbers of
   devices.

[Test Case]

 * Comment #12 has the full final testcase, not writing all that up
   here again.

[Regression Potential]

 * It is a change of disk processing in the slof loader for ppc64el
   - that means on one hand only ppc64el will be affected by an issue
   - OTOH there might be a disk combination not part of my or upstreams or
     IBMs testing that might now fail with the new code (unlikely)
   - Given that the change is upstream and was provided by IBM which I
     consider the authority on that code I think it is safe to be
     considered.

[Other Info]

 * n/a

----

== Comment: #0 - RAHUL CHANDRAKAR <email address hidden> - 2017-11-28 03:40:37 ==
---Problem Description---
Can't boot VM with more than 16 disks.
It is an issue with qemu/SLOF (Bug: https://github.com/qemu/SLOF/issues/3) and it was fixed by Nikunj Amritlal Dadhnia. He has made a patch available and it has been tested by PowerVC team.

We need this fix in Ubuntu 16.04 and later releases.

Machine Type = 8348-21C (P8 Habanero)

---Steps to Reproduce---
 Steps to recreate:
1. Create a VM
2. Attach 50 disks
3. Shutdown from OS
4. Start again and let it boot

---uname output---
Linux neo160.blr.stglabs.ibm.com 4.4.0-101-generic #124-Ubuntu SMP Fri Nov 10 18:29:11 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

---Debugger---
A debugger is not configured

Patch posted and awaiting response...

http://patchwork.ozlabs.org/patch/842011/

bugproxy (bugproxy) on 2017-11-28
tags: added: architecture-ppc64le bugnameltc-161776 severity-critical targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → qemu (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → Critical
assignee: nobody → Canonical Server Team (canonical-server)
Manoj Iyer (manjo) on 2017-11-28
Changed in qemu (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → David Britton (davidpbritton)
importance: Undecided → Critical
tags: added: triage-g

------- Comment From <email address hidden> 2017-11-29 01:56 EDT-------
Patch under discussion upstream

http://patchwork.ozlabs.org/patch/842011/

Thanks for the bug, I see the patch is under active discussion atm.
Lets take a look to integrate once that discussion concludes.

Since it is packaged in slof I assigned that as the target and opened up per release tasks.

affects: qemu (Ubuntu) → slof (Ubuntu)

Also linked up the GH issue and the SLOF project in general.
We should get auto-updates once resolved, but to be sure @nikunj please feel free to ping here once the discussion on the patch concludes and is committed upstream.

Changed in slof:
status: Unknown → New
Andrew Cloke (andrew-cloke) wrote :

Marking as "incomplete" until patch lands upstream.

Changed in ubuntu-power-systems:
status: New → Incomplete
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-12-13 06:03 EDT-------
Patches commited:

https://github.com/aik/SLOF/commit/1e8f6e68d28132e70898bd4bd6f799b591c3bf2b
https://github.com/aik/SLOF/commit/80f74b855f2f67ae5ed45b0d32c57dce46585da7

Should reflect in qemu repository in a day

Thanks for the ping Nikunj!

Changed in slof (Ubuntu Bionic):
status: New → Triaged
tags: added: slof-18.04

@Nikunj - do you happen to know if Alexey is planning an official release soon that would include this fix?

Note: this is the first Delta to take, but I look into it as it is critical.
We want to get back to have upstream release a version, debian pick it up and we become a sync again in bionic.

Hi,
I don't want to wait too long, so while waiting for an answer I made a ppa available that should be tested to confirm this fix is good (for bionic).
Please test on Bionic from ppa: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3079

I'll mark the bug incomplete to clearly reflect I wait on an answer.

Having that verified unblocks me from uploading it and that will unblock the SRUs considerations according to the policy.

Changed in slof (Ubuntu Bionic):
status: Triaged → Incomplete

I tried to verify the case and fix myself which is a prereq for good steps to reproduce for the SRU anyway.
I came up with test steps based on what was initially reported.
$ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=ppc64el label=daily release=bionic
$ uvt-kvm create --password=ubuntu cpaelzer-bionic release=bionic arch=ppc64el label=daily
$ for i in {1..20}; do
  h=$(printf "\x$(printf %x $((98+$i)))");
  echo "<disk type='file' device='disk'><driver name='qemu' type='qcow2'/><source file='/var/lib/uvtool/libvirt/images/cpaelzer-bionic-t$i.qcow'/><alias name='virtio-disk0'/><target dev='vd$h' bus='virtio'/></disk>" > disk.xml;
  qemu-img create -f qcow2 /var/lib/uvtool/libvirt/images/cpaelzer-bionic-t$i.qcow 1M;
  virsh attach-device artful-testguest disk.xml;
done
$ virsh console <guest>
# in guest then
$ sudo reboot

That works for me with overall 22 disks.
I see a bunch of those:
  virtioblk_transfer: Access beyond end of device!
  virtioblk_transfer failed! type=0, status = 1

But it boots fine still and all disks are attached just fine.
If instead I add all disks to the device statically before starting it it is the same.
Do I just need more devices - any options to be set?

Please also help to provide working steps to reproduce (that if possible does not contain having 20+ real disks reachable).

Tried the base 2 + 2x20 = 42 disks.
Still working for me as-is.

To summarize what I wait on:
1. better steps to reproduce - if possible slight modification to my suggested workflow and even more so if possible without needing double digit amount of real disks
2. please test/verify the ppa linked in comment #8 in your environment

Nikunj A Dadhania (nikunjad) wrote :

@paelzer you need to have boot order in each of these disk

<boot order='$i'/>

It was still working with 20 disks and boot index, but 48 made it.
Thanks Nikunj for the bood index hint.

Overall testcase:
# Prep a guest
$ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=ppc64el label=daily release=bionic
$ uvt-kvm create --password=ubuntu bionic-testguest release=bionic arch=ppc64el label=daily

Use `virsh edit bionic-testguest` to do the following three steps:
1. Remove in the os section
  <boot dev='hd'/>

2. Add this on your former primary disk:
  <boot order='1'/>

3. And then add the xml content generated with:
$ echo "" > disk.xml;
$ for i in {1..24}; do
  h=$(printf "\x$(printf %x $((98+$i)))")
  echo "<disk type='file' device='disk'><driver name='qemu' type='qcow2'/><source file='/var/lib/uvtool/libvirt/images/cpaelzer-bionic-t$i.qcow'/><alias name='virtio-disk0'/><target dev='vd$h' bus='virtio'/><boot order='$((i+1))'/></disk>" >> disk.xml
  qemu-img create -f qcow2 /var/lib/uvtool/libvirt/images/cpaelzer-bionic-t$i.qcow 1M
  echo "<disk type='file' device='disk'><driver name='qemu' type='qcow2'/><source file='/var/lib/uvtool/libvirt/images/cpaelzer-bionic-tb$i.qcow'/><alias name='virtio-disk0'/><target dev='vdb$h' bus='virtio'/><boot order='$((i+24+1))'/></disk>" >> disk.xml
  qemu-img create -f qcow2 /var/lib/uvtool/libvirt/images/cpaelzer-bionic-tb$i.qcow 1M
done

That makes up for 48 extra disks and the two base devices.
Then run it into console to trigger the bug:

$ virsh start --console artful-testguest
( 300 ) Data Storage Exception [ 1dc4a018 ]

    R0 .. R7 R8 .. R15 R16 .. R23 R24 .. R31
000000001dbe2544 000000001e462008 0000000000000000 000000001dc04c00
000000001e665ff0 000000001dc5e248 0000000000000000 000000001dc09268
000000001dc0ed00 000000001dc4a010 0000000000000000 0000000000000003
0000000000000054 000000001e6663f5 0000000000000000 000000000000f001
000000001dc5e1c0 000000000000005b 000000001dc09438 ffffffffffffffff
000000001dc4a018 0000000000000000 0000000000008000 000000001e462010
000000001dc4a018 0000000000000000 000000000000f003 000000001dbe46d4
000000001e462010 0000000000000000 0000000000000006 4038303030303030

    CR / XER LR / CTR SRR0 / SRR1 DAR / DSISR
        80000404 000000001dbe4ec0 000000001dbe4700 4038303030303030
0000000020000000 000000001dbe46d4 8000000000001000 40000000

That error was reproducible on several restarts.
Then I installed the PPA and rerun it which now worked to be a successful boot again.

Prepping SRU Template with that info.

description: updated
Nikunj A Dadhania (nikunjad) wrote :

@paelzer Had a discussion with Alexey, he has created a release label

https://github.com/aik/SLOF/commit/fa981320a1e0968d6fc1b8de319723ff8212b337

Waiting for the mirror to happen on https://git.qemu.org/?p=SLOF.git;a=commit;h=HEAD

Once that happens, he will send a slof update patch to the qemu mailing.

Thanks for the info Nikunj.
That means we can likely soon pick it up as a sync from Debian again.
But for now can fix it by picking your changes.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in slof (Ubuntu Artful):
status: New → Confirmed
Changed in slof (Ubuntu Xenial):
status: New → Confirmed
Changed in slof (Ubuntu Zesty):
status: New → Confirmed

My former tests focussed on the testcase.
I deployed a fresh power8 system and ran some more tests on the proposed change but found no issues - that said going on.

That said I pushed an MP for review of the packaging changes.

@Nikunj / IBM - waiting on your check of the PPA as well if you can find the time before the final upload.

Nikunj A Dadhania (nikunjad) wrote :

On my test environment with the extracted slof.bin from the DEB package, I do not see the issue.

./ppc64-softmmu/qemu-system-ppc64 -nographic -nodefaults -serial stdio -monitor pty -m 2G -device virtio-scsi-pci `for ((i=2;i<=30;i++)) ; do echo -n " -drive file=/tmp/storage$i.qcow2,if=none,id=drive$i,format=qcow2 -device scsi-hd,drive=drive$i,id=disk$i,channel=0,scsi-id=0,lun=$i,bootindex=$i"; done;` -drive file=../../imgs/guest.disk,if=none,id=drv1,format=qcow2,cache=none -device scsi-hd,drive=drv1 -device virtio-scsi-pci `for ((i=31, j=1;i<=50;i++,j++)) ; do echo -n " -drive file=/tmp/storage$i.qcow2,if=none,id=drive$i,format=qcow2 -device scsi-hd,drive=drive$i,id=disk$i,channel=0,scsi-id=1,lun=$j,bootindex=$i"; done;`
qemu-system-ppc64: -monitor pty: char device redirected to /dev/pts/1 (label compat_monitor0)

SLOF **********************************************************************
QEMU Starting
 Build Date = Dec 13 2017 13:46:58
 FW Version = buildd@ release 20170724
 Press "s" to enter Open Firmware.

Boots fine, I have asked Rahul to verify as well in his environment.

I saw the new upstream release, over the next time this will be picked up by Debian and we make it a sync again then.
For now I pick the fix as tested from the ppa into Bionic.
Once that migrated I'll look at the SRUs into X-A.

Changed in slof (Ubuntu Bionic):
status: Incomplete → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package slof - 20170724+dfsg-1ubuntu1

---------------
slof (20170724+dfsg-1ubuntu1) bionic; urgency=medium

  * Fix boot with more than 16 disks (LP: #1734856)
    - d/p/0001-boot-do-not-concatenate-bootdev.patch
    - d/p/0002-boot-use-a-temporary-bootdev-buf.patch

 -- Christian Ehrhardt <email address hidden> Wed, 13 Dec 2017 14:46:58 +0100

Changed in slof (Ubuntu Bionic):
status: Fix Committed → Fix Released

MPs for the packaging change in X/Z/A up for review and linked in the bug.

Changed in slof (Ubuntu Artful):
status: Confirmed → In Progress
Changed in slof (Ubuntu Zesty):
status: Confirmed → In Progress
Changed in slof (Ubuntu Xenial):
status: Confirmed → In Progress

Doing another bigger check set on Xenial and then opening up the MPs for review.

Tests good - MPs open for packaging review

Hello bugproxy, or anyone else affected,

Accepted slof into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/slof/20170724+dfsg-1ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in slof (Ubuntu Artful):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-artful
Changed in slof (Ubuntu Zesty):
status: In Progress → Fix Committed
tags: added: verification-needed-zesty
Brian Murray (brian-murray) wrote :

Hello bugproxy, or anyone else affected,

Accepted slof into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/slof/20161019+dfsg-1ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in slof (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Brian Murray (brian-murray) wrote :

Hello bugproxy, or anyone else affected,

Accepted slof into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/slof/20151103+dfsg-1ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

bugproxy (bugproxy) on 2017-12-25
tags: removed: verification-needed-xenial
Kalpana S Shetty (kalshett) wrote :
Download full text (8.7 KiB)

Artful validation:
---------------------------------------------------------------------------------------------------
I could able to recreate the issue;

Initially added upto 32 boot order disks apart from primary disk(boot order 1), and not able to recreate it. However with one more disks added with boot dev, could able to create the reported issue.

ubuntu login: root
Password:
Welcome to Ubuntu 17.10 (GNU/Linux 4.13.0-16-generic ppc64le)

 * Documentation: https://help.ubuntu.com
 * Management: https://landscape.canonical.com
 * Support: https://ubuntu.com/advantage

46 packages can be updated.
27 updates are security updates.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

root@ubuntu:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
vda 252:0 0 1M 0 disk
vdb 252:16 0 1M 0 disk
vdc 252:32 0 1M 0 disk
vdd 252:48 0 1M 0 disk
vde 252:64 0 1M 0 disk
vdf 252:80 0 1M 0 disk
vdg 252:96 0 1M 0 disk
vdh 252:112 0 1M 0 disk
vdi 252:128 0 1M 0 disk
vdj 252:144 0 1M 0 disk
vdk 252:160 0 1M 0 disk
vdl 252:176 0 1M 0 disk
vdm 252:192 0 1M 0 disk
vdn 252:208 0 1M 0 disk
vdo 252:224 0 1M 0 disk
vdp 252:240 0 1M 0 disk
vdq 252:256 0 1M 0 disk
vdr 252:272 0 1M 0 disk
vds 252:288 0 1M 0 disk
vdt 252:304 0 1M 0 disk
vdu 252:320 0 1M 0 disk
vdv 252:336 0 1M 0 disk
vdw 252:352 0 1M 0 disk
vdx 252:368 0 1M 0 disk
vdy 252:384 0 1M 0 disk
vdz 252:400 0 1M 0 disk
vdaa 252:416 0 50G 0 disk
├─vdaa1 252:417 0 7M 0 part
└─vdaa2 252:418 0 50G 0 part /
vdab 252:432 0 1M 0 disk
vdac 252:448 0 1M 0 disk
vdad 252:464 0 1M 0 disk
vdae 252:480 0 1M 0 disk
vdaf 252:496 0 1M 0 disk
vdag 252:512 0 1M 0 disk

Added one more addition one, 34th boot order disk, could able to recreate it.

[root@lep8a artful]# vi disk.xml

[root@lep8a artful]# virsh edit kal-artful_vm1
Domain kal-artful_vm1 XML configuration edited.

[root@lep8a artful]# virsh destroy kal-artful_vm1
Domain kal-artful_vm1 destroyed

[root@lep8a artful]# virsh start --console kal-artful_vm1
Domain kal-artful_vm1 started
Connected to domain kal-artful_vm1
Escape character is ^]

SLOF **********************************************************************
QEMU Starting
 Build Date = May 19 2017 07:43:48
 FW Version = mockbuild@ release 20170303
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/v-scsi@2000
       SCSI: Looking for devices
          8000000000000000 CD-ROM : "QEMU QEMU CD-ROM 2.5+"
Populating /vdevice/vty@30000000
Populating /vdevice/nvram@71000000
Populating /pci@800000020000000
                     00 f800 (D) : 1af4 1001 virtio [ block ]
              ...

Read more...

Kalpana S Shetty (kalshett) wrote :

Applied artful proposed fixes and could not able to see the reported issue fixed.
Help needed to apply artful-proposed fix.

Download full text (9.7 KiB)

------- Comment From <email address hidden> 2017-12-25 15:10 EDT-------
I'm unable to get this fix applied to artful guest VM thought slof packages are updates.
----------------------------------------------------------------------------------------------------------------------------------
I could able to recreate the issue; Initially added upto 32 boot order disks apart from primary disk(boot order 1), and not able to recreate it.

Applied artful proposed fixes and could able to boot the guest successfully.

------- Comment From <email address hidden> 2017-12-25 15:13 EDT-------
(In reply to comment #25)
> I'm unable to get this fix applied to artful guest VM thought slof packages
> are updates.
> Artful validation:
> -----------------------------------------------------------------------------
> -----------------------------------------------------
> I could able to recreate the issue; Initially added upto 32 boot order disks
> apart from primary disk(boot order 1), and not able to recreate it.
>
> ubuntu login: root
> Password:
> Welcome to Ubuntu 17.10 (GNU/Linux 4.13.0-16-generic ppc64le)
>
> * Documentation: https://help.ubuntu.com
> * Management: https://landscape.canonical.com
> * Support: https://ubuntu.com/advantage
>
>
> 46 packages can be updated.
> 27 updates are security updates.
>
> The programs included with the Ubuntu system are free software;
> the exact distribution terms for each program are described in the
> individual files in /usr/share/doc/*/copyright.
>
> Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
> applicable law.
>
> root@ubuntu:~# lsblk
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sr0 11:0 1 1024M 0 rom
> vda 252:0 0 1M 0 disk
> vdb 252:16 0 1M 0 disk
> vdc 252:32 0 1M 0 disk
> vdd 252:48 0 1M 0 disk
> vde 252:64 0 1M 0 disk
> vdf 252:80 0 1M 0 disk
> vdg 252:96 0 1M 0 disk
> vdh 252:112 0 1M 0 disk
> vdi 252:128 0 1M 0 disk
> vdj 252:144 0 1M 0 disk
> vdk 252:160 0 1M 0 disk
> vdl 252:176 0 1M 0 disk
> vdm 252:192 0 1M 0 disk
> vdn 252:208 0 1M 0 disk
> vdo 252:224 0 1M 0 disk
> vdp 252:240 0 1M 0 disk
> vdq 252:256 0 1M 0 disk
> vdr 252:272 0 1M 0 disk
> vds 252:288 0 1M 0 disk
> vdt 252:304 0 1M 0 disk
> vdu 252:320 0 1M 0 disk
> vdv 252:336 0 1M 0 disk
> vdw 252:352 0 1M 0 disk
> vdx 252:368 0 1M 0 disk
> vdy 252:384 0 1M 0 disk
> vdz 252:400 0 1M 0 disk
> vdaa 252:416 0 50G 0 disk
> ??vdaa1 252:417 0 7M 0 part
> ??vdaa2 252:418 0 50G 0 part /
> vdab 252:432 0 1M 0 disk
> vdac 252:448 0 1M 0 disk
> vdad 252:464 0 1M 0 disk
> vdae 252:480 0 1M 0 disk
> vdaf 252:496 0 1M 0 disk
> vdag 252:512 0 1M 0 disk
>
> Added one more addition one, 34th boot order disk, could able to recreate it.
>
> [root@lep8a artful]# vi disk.xml
>
> [root@lep8a artful]# virsh edit kal-artful_vm1
> Domain kal-artful_vm1 XML configuration edited.
>
> [root@lep8a artful]# virsh destroy kal-ar...

Read more...

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-12-25 22:03 EDT-------
(In reply to comment #27)
> -----------------------------------------------------------------------------
> ----------------------
> I could able to recreate the issue;
>
> Initially added upto 32 boot order disks apart from primary disk(boot order
> 1), and not able to recreate it. However with one more disks added with boot
> dev, could able to create the reported issue.
>
>
>
>
>
> Applied artful proposed fixes and could not able to see the reported issue
> fixed.
> Help needed to apply artful-proposed fix.
>
> Applied artful proposed fixes and could not able to see the reported issue
> fixed.
> Help needed to apply artful-proposed fix.
>
> Applied artful proposed fixes and could not able to see the reported issue
> fixed.
> Help needed to apply artful-proposed fix.

Please ignore my comment #26 where I mentioned it could able to use the artful proposed fix. I'm facing issues in applying the fix. I need help to apply artful proposed fix on my VM to make sure issue is not recreatable. In my last comments update, I messed up in clearly mentioned about the issue.

Here is what the observation:
1. I could able to recreate the reported issue after I added 34th disk with boot dev option.
2. While I'm not able to apply the artful proposed fix, I need help in applying the artful-proposed fix.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-12-29 22:02 EDT-------
I could able to recreate the issue with UBU17.10(artful) as host, running with xenial, zesty, and artful as guest VMs, where guest crashed if we try to add more than 36 disks with "boot order=1" (this is what the observation in my testing).

Applied artful-proposed slof fix on the host and started the xenial, zesty and artful guests. With the proposed artful slof fix, I could able to see all 3 guests coming up rather booted successfully and showed all 49 disks added.

Results:
----------------------------------------------------------------------
Installed slof version:
root@lep8d:~# apt search slof
Sorting... Done
Full Text Search... Done
qemu-slof/artful,now 20170724+dfsg-1 all [installed,automatic]
Slimline Open Firmware -- QEMU PowerPC version
---------------------------------------------------------------------
Fixed slof version:
root@lep8d:~# apt-get install qemu-slof/artful-proposed
Reading package lists... Done
Building dependency tree
Reading state information... Done
Selected version '20170724+dfsg-1ubuntu0.1' (Ubuntu:17.10/artful-proposed [all]) for 'qemu-slof'
Suggested packages:
qemu
The following packages will be upgraded:
qemu-slof
1 upgraded, 0 newly installed, 0 to remove and 37 not upgraded.
Need to get 172 kB of archives.
After this operation, 1,024 B of additional disk space will be used.
Get:1 http://ports.ubuntu.com/ubuntu-ports artful-proposed/main ppc64el qemu-slof all 20170724+dfsg-1ubuntu0.1 [172 kB]
Fetched 172 kB in 0s (258 kB/s)
(Reading database ... 103312 files and directories currently installed.)
Preparing to unpack .../qemu-slof_20170724+dfsg-1ubuntu0.1_all.deb ...
Unpacking qemu-slof (20170724+dfsg-1ubuntu0.1) over (20170724+dfsg-1) ...
Setting up qemu-slof (20170724+dfsg-1ubuntu0.1) ...
---------------------------------------------------------------------------
root@lep8d:~# service libvirtd restart

root@lep8d:~# virsh list --all
Id Name State
----------------------------------------------------
16 kal-artful_vm1 running
17 kal-xenial_vm1 running
18 kal-zesty_vm1 running

Inside each guest verified that we can see 49 disks:
root@lep8d:~# virsh console kal-artful_vm1
root@ubuntu:~# lsblk | grep disk | wc -l
49

root@lep8d:~# virsh console kal-xenial_vm1
root@ubuntu:~# lsblk | grep disk | wc -l
49

root@lep8d:~# virsh console kal-zesty_vm1
root@ubuntu:~# lsblk | grep disk | wc -l
49

In summary, I have validated on an artful host with 3 different guests, its working fine. Copying Srikanth to take it forward if anything else needs to be validated.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-01 00:08 EDT-------
According to https://bugs.launchpad.net/ubuntu/+source/slof/+bug/1734856/comments/32

verification-done-artful

Ubuntu 17.10 (GNU/Linux 4.13.0-16-generic ppc64le)
qemu-slof/artful,now 20170724+dfsg-1

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-01 00:15 EDT-------
Since artful verification is done, is zesty verification really needed here? https://www.ubuntu.com/info/release-end-of-life says zesty end of life is 2017. I see verification-needed-xenial was added and then removed. I think we can verify this fix on xenial rather than zesty. Any comments?

On bionic we would verify it once the alpha build comes out.

tags: added: verification-done-artful
removed: verification-needed-artful

Thanks for sorting out the Artful test already bssrikanth.

To be clear (the comments by kalshett are unclear to me at least. You need to test Xenial/Zesty/Artful HOSTS. The fix is in the slof package that is installed in the KVM host.
It actually doesn't even matter which level of guest you start, but you have to do it on the host release you verify.

From my POV verification-needed-xenial was only removed by "you" (=bugproxy) but not yet set to be done (or anything else) and the commends are unclear to me :-/.
I'll fix that up to have it as -needed again.

TL;DR: Yes it is needed for Xenial and Zesty as well.

Once Xenial and Zesty are done as well, then also set the global verification-done to be complete.

tags: added: verification-needed-xenial
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-02 02:58 EDT-------
Thanks for the clarification. Yes verification being done using host levels: artful, xenial and zesty.

Ok, so are all three artful, xenial and zesty done already or are we waiting on xenial and zesty to complete?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-02 04:38 EDT-------
(In reply to comment #35)
> Ok, so are all three artful, xenial and zesty done already or are we waiting
> on xenial and zesty to complete?

verification on xenial and zesty in process.. Prudhvi will update here once done.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package slof - 20170724+dfsg-1ubuntu0.1

---------------
slof (20170724+dfsg-1ubuntu0.1) artful; urgency=medium

  * Fix boot with more than 16 disks (LP: #1734856)
    - d/p/0001-boot-do-not-concatenate-bootdev.patch
    - d/p/0002-boot-use-a-temporary-bootdev-buf.patch

 -- Christian Ehrhardt <email address hidden> Mon, 18 Dec 2017 14:18:30 +0100

Changed in slof (Ubuntu Artful):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for slof has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Download full text (18.7 KiB)

------- Comment From <email address hidden> 2018-01-02 07:17 EDT-------
on xenial still i am seeing the issue.
Machine details:
1)kernel level
root@ltc84-pkvm1:/var/lib/libvirt/images# uname -a
Linux ltc84-pkvm1 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:48 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

2) slof package level:slof package level:
root@ltc84-pkvm1:/var/lib/libvirt/images# dpkg -l | grep slof
ii qemu-slof 20151103+dfsg-1ubuntu1 all Slimline Open Firmware -- QEMU PowerPC version

3)boot the guest with following xml:

<domain type='kvm' id='2'>
<name>prudhvi_ubuntu</name>
<uuid>d8466153-2ee3-4fed-8275-6c31d34f6dd6</uuid>
<memory unit='KiB'>33554432</memory>
<currentMemory unit='KiB'>33554432</currentMemory>
<vcpu placement='static'>32</vcpu>
<resource>
<partition>/machine</partition>
</resource>
<os>
<type arch='ppc64le' machine='pseries-xenial'>hvm</type>
</os>
<cpu>
<topology sockets='1' cores='4' threads='8'/>
</cpu>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/bin/kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/ubuntu-16.04.3-ppc64le.qcow2'/>
<backingStore/>
<target dev='vda' bus='virtio'/>
<boot order='1'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/cpaelzer-bionic-t1.qcow'/>
<backingStore/>
<target dev='vdc' bus='virtio'/>
<boot order='2'/>
<alias name='virtio-disk2'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/cpaelzer-bionic-t2.qcow'/>
<backingStore/>
<target dev='vdd' bus='virtio'/>
<boot order='3'/>
<alias name='virtio-disk3'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/cpaelzer-bionic-t3.qcow'/>
<backingStore/>
<target dev='vde' bus='virtio'/>
<boot order='4'/>
<alias name='virtio-disk4'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/cpaelzer-bionic-t4.qcow'/>
<backingStore/>
<target dev='vdf' bus='virtio'/>
<boot order='5'/>
<alias name='virtio-disk5'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/cpaelzer-bionic-t5.qcow'/>
<backingStore/>
<target dev='vdg' bus='virtio'/>
<boot order='6'/>
<alias name='virtio-disk6'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/cpaelzer-bionic-t6.qcow'/>
<backingStore/>
<tar...

bugproxy (bugproxy) wrote :
Download full text (22.6 KiB)

------- Comment From <email address hidden> 2018-01-02 07:24 EDT-------
(In reply to comment #38)
> on xenial still i am seeing the issue.
> Machine details:
> 1)kernel level
> root@ltc84-pkvm1:/var/lib/libvirt/images# uname -a
> Linux ltc84-pkvm1 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:48 UTC
> 2017 ppc64le ppc64le ppc64le GNU/Linux
>

xenial kernel version -> 4.4.x? Looks like not used HWE kernel.

> 2) slof package level:slof package level:
> root@ltc84-pkvm1:/var/lib/libvirt/images# dpkg -l | grep slof
> ii qemu-slof 20151103+dfsg-1ubuntu1
> all Slimline Open Firmware -- QEMU PowerPC version
>
> 3)boot the guest with following xml:
>
> <domain type='kvm' id='2'>
> <name>prudhvi_ubuntu</name>
> <uuid>d8466153-2ee3-4fed-8275-6c31d34f6dd6</uuid>
> <memory unit='KiB'>33554432</memory>
> <currentMemory unit='KiB'>33554432</currentMemory>
> <vcpu placement='static'>32</vcpu>
> <resource>
> <partition>/machine</partition>
> </resource>
> <os>
> <type arch='ppc64le' machine='pseries-xenial'>hvm</type>
> </os>
> <cpu>
> <topology sockets='1' cores='4' threads='8'/>
> </cpu>
> <clock offset='utc'/>
> <on_poweroff>destroy</on_poweroff>
> <on_reboot>restart</on_reboot>
> <on_crash>restart</on_crash>
> <devices>
> <emulator>/usr/bin/kvm</emulator>
> <disk type='file' device='disk'>
> <driver name='qemu' type='qcow2'/>
> <source file='/var/lib/libvirt/images/ubuntu-16.04.3-ppc64le.qcow2'/>
> <backingStore/>
> <target dev='vda' bus='virtio'/>
> <boot order='1'/>
> <alias name='virtio-disk0'/>
> <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
> function='0x0'/>
> </disk>
> <disk type='file' device='disk'>
> <driver name='qemu' type='qcow2'/>
> <source file='/var/lib/libvirt/images/cpaelzer-bionic-t1.qcow'/>
> <backingStore/>
> <target dev='vdc' bus='virtio'/>
> <boot order='2'/>
> <alias name='virtio-disk2'/>
> <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
> function='0x0'/>
> </disk>
> <disk type='file' device='disk'>
> <driver name='qemu' type='qcow2'/>
> <source file='/var/lib/libvirt/images/cpaelzer-bionic-t2.qcow'/>
> <backingStore/>
> <target dev='vdd' bus='virtio'/>
> <boot order='3'/>
> <alias name='virtio-disk3'/>
> <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
> function='0x0'/>
> </disk>
> <disk type='file' device='disk'>
> <driver name='qemu' type='qcow2'/>
> <source file='/var/lib/libvirt/images/cpaelzer-bionic-t3.qcow'/>
> <backingStore/>
> <target dev='vde' bus='virtio'/>
> <boot order='4'/>
> <alias name='virtio-disk4'/>
> <address type='pci' domain='0x0000' bus='0x00' slot='0x08'
> function='0x0'/>
> </disk>
> <disk type='file' device='disk'>
> <driver name='qemu' type='qcow2'/>
> <source file='/var/lib/libvirt/images/cpaelzer-bionic-t4.qcow'/>
> <backingStore/>
> <target dev='vdf' bus='virtio'/>
> <boot order='5'/>
> <alias name='virtio-disk5'/>
> <address type='pci...

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-02 07:42 EDT-------
Does this proposed xenial slof fix dependent on kernel used on host? or the proposed slof fix should work on both xenial kernel 4.4.x or 4.10.x?

It is mostly independent to the kernel, the fix by Nikunj just needs the new Slof to be on the host.

Please sync internally if the test is correct as this is the fix that was suggested by you and at least in my tests it worked. Since former verifications were a bit back and forth and this confusion now looks similar please make sure what the actual state of this is now on Xeniakl and Zesty.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-02 08:05 EDT-------
Prudhvi gave the system access and I have replied the proposed slof and restarted the guest. It seems working fine with xenial. Guest booted successfully showing all 49 disks.

root@ltc84-pkvm1:/etc/apt# apt-get install qemu-slof/xenial-proposed
Reading package lists... Done
Building dependency tree
Reading state information... Done
Selected version '20151103+dfsg-1ubuntu1.1' (Ubuntu:16.04/xenial-proposed [all]) for 'qemu-slof'
Suggested packages:
qemu
The following packages will be upgraded:
qemu-slof
1 upgraded, 0 newly installed, 0 to remove and 4 not upgraded.
Need to get 173 kB of archives.
After this operation, 1,024 B of additional disk space will be used.
Get:1 http://ports.ubuntu.com/ubuntu-ports xenial-proposed/main ppc64el qemu-slof all 20151103+dfsg-1ubuntu1.1 [173 kB]
Fetched 173 kB in 0s (267 kB/s)
(Reading database ... 84595 files and directories currently installed.)
Preparing to unpack .../qemu-slof_20151103+dfsg-1ubuntu1.1_all.deb ...
Unpacking qemu-slof (20151103+dfsg-1ubuntu1.1) over (20151103+dfsg-1ubuntu1) ...
Setting up qemu-slof (20151103+dfsg-1ubuntu1.1) ...

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-03 02:12 EDT-------
Tested, it is working fine in both xenial and zesty.

information of zesty:
-----------------------
1)kernel level
root@ltc-fire4:/var/lib/libvirt/images# uname -a
Linux ltc-fire4 4.10.0-42-generic #46-Ubuntu SMP Mon Dec 4 14:35:45 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
root@ltc-fire4:/var/lib/libvirt/images#

2) slof package level:slof package level:
root@ltc-fire4:/var/lib/libvirt/images# dpkg -l | grep slof
ii qemu-slof 20161019+dfsg-1ubuntu0.1 all Slimline Open Firmware -- QEMU PowerPC version

3) Recreation steps:
i) Prep a guest
ii) Remove in the os section
<boot dev='hd'/>
iii) Add this on your former primary disk:
<boot order='1'/>
iv) And then add the xml content generated with:
echo "" > disk.xml;
for i in {1..24}; do
h=$(printf "\x$(printf %x $((98+$i)))")
echo "<disk type='file' device='disk'><driver name='qemu' type='qcow2'/><source file='/var/lib/uvtool/libvirt/images/cpaelzer-bionic-t$i.qcow'/><alias name='virtio-disk0'/><target dev='vd$h' bus='virtio'/><boot order='$((i+1))'/></disk>" >> disk.xml
qemu-img create -f qcow2 /var/lib/uvtool/libvirt/images/cpaelzer-bionic-t$i.qcow 1M
echo "<disk type='file' device='disk'><driver name='qemu' type='qcow2'/><source file='/var/lib/uvtool/libvirt/images/cpaelzer-bionic-tb$i.qcow'/><alias name='virtio-disk0'/><target dev='vdb$h' bus='virtio'/><boot order='$((i+24+1))'/></disk>" >> disk.xml
qemu-img create -f qcow2 /var/lib/uvtool/libvirt/images/cpaelzer-bionic-tb$i.qcow 1M
done
v) virsh start guest-name --console

4)will be attaching guest xml and ouput of lsblk command output from the guest.

------- Comment (attachment only) From <email address hidden> 2018-01-03 02:16 EDT-------

------- Comment (attachment only) From <email address hidden> 2018-01-03 02:16 EDT-------

verification done on
1. artful
2. xenial
3. zesty
results in comment 32, 45, 46

tags: added: verification-done verification-done-xenial verification-done-zesty
removed: verification-needed verification-needed-xenial verification-needed-zesty

Thank you Srikanth (and Team since I saw so many people update the bug).
This is already released for Artful, the SRU Team should soon release it for X/Z as well now.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package slof - 20151103+dfsg-1ubuntu1.1

---------------
slof (20151103+dfsg-1ubuntu1.1) xenial; urgency=medium

  * Fix boot with more than 16 disks (LP: #1734856)
    - d/p/0001-boot-do-not-concatenate-bootdev.patch
    - d/p/0002-boot-use-a-temporary-bootdev-buf.patch

 -- Christian Ehrhardt <email address hidden> Mon, 18 Dec 2017 14:12:32 +0100

Changed in slof (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package slof - 20161019+dfsg-1ubuntu0.1

---------------
slof (20161019+dfsg-1ubuntu0.1) zesty; urgency=medium

  * Fix boot with more than 16 disks (LP: #1734856)
    - d/p/0001-boot-do-not-concatenate-bootdev.patch
    - d/p/0002-boot-use-a-temporary-bootdev-buf.patch

 -- Christian Ehrhardt <email address hidden> Mon, 18 Dec 2017 14:15:11 +0100

Changed in slof (Ubuntu Zesty):
status: Fix Committed → Fix Released
Manoj Iyer (manjo) on 2018-01-16
Changed in ubuntu-power-systems:
status: Incomplete → Fix Released
bugproxy (bugproxy) on 2018-03-01
tags: added: targetmilestone-inin16044
removed: targetmilestone-inin---
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.