[ARM] : Unable to use Cinder volumes on ARM

Bug #1664737 reported by Raghuram Kota
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
Undecided
Sean Feole

Bug Description

From: Dann Frazier (dannf) :

I have an OpenStack setup built on an arm64-only MAAS using the openstack-base w/ default settings. From the cli, I am able to create and attach a volume to a running guest. However, that volume never appears in the guest - nothing new in dmesg, nothing new in /proc/partitions. There's no obvious errors, or even informational messages, in the nova, libvirt or cinder logs (included in attached juju crashdump).

Revision history for this message
Raghuram Kota (rkota) wrote :
Revision history for this message
Raghuram Kota (rkota) wrote :

From: Dann Frazier (dannf) :

Note that the reason I went about testing this was because I ran across this one:
  https://bugs.linaro.org/show_bug.cgi?id=2462
The symptoms don't seem to be the same - OpenStack in that case generated an error on attach. In this case, there was no error, but the instance did not see it.

Revision history for this message
Raghuram Kota (rkota) wrote :

From Ryan Beisner (1chb1n) :

Xenial-Mitaka

I've confirmed that hot-plugging a cinder volume to a uefi arm64 nova instance doesn't work out of the box.

After attaching the volume, it was necessary to reboot the instance in order to detect and use the new volume.

$ openstack volume create --size 60 volume1

$ openstack server add volume xenial-uefi-20170119b221218 volume1

+--------------------------------------+--------------+--------+------+------------------------------------------------------+
| ID | Display Name | Status | Size | Attached to |
+--------------------------------------+--------------+--------+------+------------------------------------------------------+
| 2631f6fe-ff0b-4357-9b65-1b449fc0905b | volume1 | in-use | 60 | Attached to xenial-uefi-20170119b221218 on /dev/vdc |
| cc15681a-4957-4653-b18b-63188430477d | volume0 | in-use | 10 | Attached to xenial-uefi-20170119b221218 on /dev/vdb |
+--------------------------------------+--------------+--------+------+------------------------------------------------------+

# Note the lack of the 60G storage
ubuntu@xenial-uefi-20170119b221218:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 10G 0 disk
vdb 253:16 0 20G 0 disk
├─vdb1 253:17 0 19.9G 0 part /
└─vdb15 253:31 0 99M 0 part /boot/efi

# After simply rebooting the nova instance, the 60G storage is attached
ubuntu@xenial-uefi-20170119b221218:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 10G 0 disk
vdb 253:16 0 20G 0 disk
├─vdb1 253:17 0 19.9G 0 part /
└─vdb15 253:31 0 99M 0 part /boot/efi
vdc 253:32 0 60G 0 disk

Revision history for this message
Raghuram Kota (rkota) wrote :

Ryan Beisner (1chb1n) wrote on 2017-01-26:

This does not appear to specific to arm64. On an x86 internal dev cloud, Trusty-Mitaka, I also have to reboot my nova instance after attaching ceph-backed cinder volumes before they are usable.

Next steps: test again with vanilla cinder (not ceph-backed).

Revision history for this message
Raghuram Kota (rkota) wrote :
Download full text (3.8 KiB)

Dmitrii Shcherbakov (dmitriis) wrote on 2017-02-02:

In general with a proper guest kernel support hot-plug of block devices should work both with virtio and virtio-scsi on any arch:

http://www.linux-kvm.org/page/Hotadd_pci_devices#Add_a_disk

Verified that using a xenial host and a xenial guest (using just libvirt and qemu without openstack, nova, cinder, ceph).

Both addition and removal worked fine.

ii libvirt-bin 1.3.1-1ubuntu10.5 amd64 programs for the libvirt library
ii qemu-system 1:2.5+dfsg-5ubuntu10.6 amd64 QEMU full system emulation binaries
ii qemu-system-x86 1:2.5+dfsg-5ubuntu10.6 amd64 QEMU full system emulation binaries (x86)

guest kernel:
uname -r
4.4.0-59-generic

dmesg:

PCI hot-plug:

фев 02 19:26:29 xenial kernel: pci 0000:00:09.0: [1af4:1001] type 00 class 0x010000
фев 02 19:26:29 xenial kernel: pci 0000:00:09.0: reg 0x10: [io 0x0000-0x003f]
фев 02 19:26:29 xenial kernel: pci 0000:00:09.0: reg 0x14: [mem 0x00000000-0x00000fff]
фев 02 19:26:29 xenial kernel: pci 0000:00:09.0: BAR 1: assigned [mem 0xc0000000-0xc0000fff]
фев 02 19:26:29 xenial kernel: pci 0000:00:09.0: BAR 0: assigned [io 0x1000-0x103f]
фев 02 19:26:29 xenial kernel: virtio-pci 0000:00:09.0: enabling device (0000 -> 0003)
фев 02 19:26:29 xenial kernel: virtio-pci 0000:00:09.0: virtio_pci: leaving for legacy driver
фев 02 19:26:36 xenial kernel: do_trap: 63 callbacks suppressed
фев 02 19:26:36 xenial kernel: traps: pool[3033] trap int3 ip:7fd0284bb9eb sp:7fcffac7b5b0 error:0
фев 02 19:27:11 xenial kernel: scsi 2:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5

SCSI hot-plug:
фев 02 19:27:11 xenial kernel: sd 2:0:0:1: Attached scsi generic sg2 type 0
фев 02 19:27:11 xenial kernel: sd 2:0:0:1: [sdb] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
фев 02 19:27:11 xenial kernel: sd 2:0:0:1: [sdb] Write Protect is off
фев 02 19:27:11 xenial kernel: sd 2:0:0:1: [sdb] Mode Sense: 63 00 00 08
фев 02 19:27:11 xenial kernel: sd 2:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
фев 02 19:27:11 xenial kernel: sd 2:0:0:1: [sdb] Attached SCSI disk
фев 02 19:28:13 xenial kernel: input: spice vdagent tablet as /devices/virtual/input/input6
фев 02 19:28:57 xenial kernel: sd 2:0:0:1: [sdb] Synchronizing SCSI cache
фев 02 19:28:57 xenial kernel: sd 2:0:0:1: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
фев 02 19:28:57 xenial kernel: sd 2:0:0:1: [sdb] Sense Key : Illegal Request [current]
фев 02 19:28:57 xenial kernel: sd 2:0:0:1: [sdb] Add. Sense: Logical unit not supported

# added vda - virtio
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 256G 0 disk
├─sda1 8:1 0 487M 0 part /boot
├─sda2 8:2 0 1K 0 part
└─sda5 8:5 0 255,5G 0 part
  ├─ubuntu--vg-root 252:0 0 253,5G 0 lvm /
  └─ubuntu--vg-swap_1 252:1 0 2G 0 lvm [SWAP]
sr0 11:0 1 1024M 0 rom
vda 253:0 0 20G 0 disk

# added sdb - virtio-scsi
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 256G 0 disk
├─sda1 8:1 0 487M 0 part /boot
├─sda2 8:2 0 1K 0 part
└─sda5 8:5 0 255,5G 0 part
  ├─ubuntu--vg-root 252:0 0 253,5G 0 lvm /
  └─ubuntu--vg-swap_1 252:1 0 2G 0 lvm [SWAP]
sdb 8:16 0 20G 0 disk
sr0 11:0 1 1024M 0 rom
vda 25...

Read more...

Revision history for this message
Raghuram Kota (rkota) wrote :

Ryan Beisner (1chb1n) wrote on 2017-02-02:

@dmitrii Can you please independently exercise the following?:

On ServerStack (x86_64):

 - Boot a new nova instance.
 - Create a cinder volume.
 - Attach the cinder volume to the running nova instance (with openstack cli).

We are looking to confirm whether or not the guest instance sees the newly-attached block device without rebooting the instance.

Many thanks!

Revision history for this message
Raghuram Kota (rkota) wrote :

Dmitrii Shcherbakov (dmitriis) wrote on 2017-02-02:

@beisner

Had issues with deployment - for some reason a couple of undercloud VMs got stuck (juju was waiting for a machine allocation while it already booted) after running juju-deployer. Had to manually remove affected units and 'juju add-unit' them to get it working.

Nevertheless, here's the result with a fully-deployed xenial-mitaka next.yaml bundle from the o-c-t (amd64, serverstack, some package versions included in the paste):

https://pastebin.canonical.com/178064/

I checked it from the nova, libvirt, qemu and guest POV - it looks good I wasn't able to reproduce the original issue.

virtio-blk devices are used by default so PCI-hotplug (http://wiki.qemu.org/Features/PCIBridgeHotplug) was triggered inside the guest kernel. I verified that libvirt has set up an RBD for the added virtio-blk device and that it was visible by qemu (checked via QMP) and the guest (dmesg + lsblk). Removal worked just fine as well.

Double addition of the same volume to the same VM without reboot worked fine too.

Ryan Beisner (1chb1n)
tags: added: arm64 uosci
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu (Ubuntu):
status: New → Confirmed
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Need to determine whether or not a fix mentioned here has been landed or proposed:
https://bugs.linaro.org/show_bug.cgi?id=2462

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

The upstream commit:
https://libvirt.org/git/?p=libvirt.git;a=commit;h=0701abcb3ba78ba27cf1f47e01b3d9607ad37b72

Proposed is 1.3.1-1ubuntu10.8:
https://launchpad.net/ubuntu/xenial/+source/libvirt

A quick and dirty check for a part of a diff https://libvirt.org/git/?p=libvirt.git;a=blobdiff;f=src/qemu/qemu_hotplug.c;h=9746a06cb57fa728a4e2bc5dad4fdb1df81f9ee2;hp=bcae1b6bdb0e6f0496cad58873117fb6f557ccba;hb=0701abcb3ba78ba27cf1f47e01b3d9607ad37b72;hpb=8550e8585eef1ed7f5850a698d680e20b5cbcdff:

pull-lp-source libvirt xenial proposed
pull-lp-source: Downloading libvirt version 1.3.1-1ubuntu10.8
pull-lp-source: Downloading libvirt_1.3.1.orig.tar.gz from archive.ubuntu.com (28.515 MiB)
pull-lp-source: Downloading libvirt_1.3.1-1ubuntu10.8.debian.tar.xz from archive.ubuntu.com (0.126 MiB)
...

grep -RiP 'qemuMonitorAddObject\(priv->mon, "secret"' libvirt-1.3.1/ ; echo $?
1

So no, this patch is not incorporated into our libvirt package.

Revision history for this message
Raghuram Kota (rkota) wrote :

This bug seems to have been addressed in upstream libvirt, per comm#18 of https://bugs.linaro.org/show_bug.cgi?id=2462

Per Gem Gomez (Linaro) :

The upstream review, commit of the patch can be found on the libvirt upstream mailing :
http://www.redhat.com/archives/libvir-list/2016-October/msg00395.html

Here is the message that confirms this patch made it to libvirt:
http://www.redhat.com/archives/libvir-list/2016-October/msg00876.html

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Please evaluate for backport to Xenial, Yakkety and Zesty libvirt to resolve the issue with Mitaka, Newton and Ocata, respectively.

tags: added: backport-potential
Ryan Beisner (1chb1n)
no longer affects: qemu (Ubuntu)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
we have to be careful on the identified patch
https://libvirt.org/git/?p=libvirt.git;a=commit;h=0701abcb3

IMHO there surely has to go this along it to really work (that is the one identified in linaro bug 2462):
https://libvirt.org/git/?p=libvirt.git;a=commit;h=fceeeda21

Also there is the related:
https://libvirt.org/git/?p=libvirt.git;a=commit;h=a1344f70a

And I just worked with James (bug 1672367 currently in Y-proposed) to addd:
https://libvirt.org/git/?p=libvirt.git;a=commit;h=d53d46508

Ok, complex enough, now trying to sort that out.

Ok, the order and dependency of these is:
hash appear does description
a1344f70a 1.3.5 feature "Utilize qemu secret objects for RBD auth/secret"
fceeeda21 2.1.0 fix-a1344 "Add secinfo for hotplug virtio disk"
d53d46508 2.2.0 fix-a1344 "Fix the command line generation for rbd"
0701abcb3 2.4.0 fix-fceee "Add support for using AES secret for SCSI hotplug"

Per Release thoughts:
- Xenial/Mitaka is on 1.3.1, so it has none of those at all (so does UCA-Mitaka then).
So it should work the "old way" whatever it was before adding "qemu secret objects"
Rkota - Tested Xenial-Mitaka in comment #7 and reported that all is fine now there which would match.

- Yakkety/Newton is on 2.1 and thereby has a1344f70a, also d53d46508 is in proposed and fixed it for the case of James Page. It does not yet have fceeeda21 and 0701abcb3, which might be needed.
But UCA-Newton has only Xenials libvirt that is on libvirt 1.3.1 and should not be affected at all either.

- Zesty/Ocata is on 2.5 and has all these fixes, so Zesty and thereby also Xenial-Ocata should be fine as well right?

---

I can think of considering this for Yakkety, but that is the only one I could think that is affected. All others Xenial, Zesty, Mitaka, Newton, Ocata should be good right now - I'm unsure this is not two or three bugs (or local setup issues) mixed.
Would one mind to try to sort that out by testing the following cases one-by one like Rkota did on comment #7 so we know what really is affected?

Done by Rkota:
1. Xenial-Mitaka (is ok)

TODO:
2. Xenial-Newton
3. Xenial-Ocata
4. Yakkety
5. Zesty

Changed in libvirt (Ubuntu):
status: New → Incomplete
Revision history for this message
Raghuram Kota (rkota) wrote :

@paelzer : Many thanks for looking into this. Xenial-Mitaka test on comm#7 was actually done by Dimitrii (dmitriis).Thanks go out to him. Based on comm#13, your hypothesis seems to be that this should only occur on Yakkety.

@dmitriis : Are you able to help with the tests #2-#5 on ServerStack or othewise, similar to testing you did in comm#7 to prove/disprove this hypothesis ?

Many Thanks,
Raghu

Revision history for this message
Ryan Beisner (1chb1n) wrote :

@rkota - fyi Xenial-Newton uses the versions from Yakkety, so it affects both.

Revision history for this message
Raghuram Kota (rkota) wrote :

@Ryan Beisner (1chb1n) : Oh, my apologies, I was only looking at the libvirt version (as the fix seems to be there) and went by the following statement in comm#13 : " But UCA-Newton has only Xenials libvirt that is on libvirt 1.3.1 and should not be affected at all either."

Are there components beyond libvirt involved in this fix ?

Thanks,
Raghu

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Revision history for this message
Raghuram Kota (rkota) wrote :

It looks like so far we have the following data points. :

ARM64
-----
1) Xenial-Mitaka : Fail (comm#3)

X86-64
---------
a) Trusty - Mitaka : Fail (Comm #4)
b) Xenial-Mitaka : OK (Comm #7)

In addition, Linaro had confirmed that https://bugs.linaro.org/show_bug.cgi?id=2462 (that trigerred the testing that sourced this bug) was occurring on both x86 and ARM64, but didn't specify the OPenstack versions. Something I'll try to follow up.

@paelzer, @dmitriis :With the above data, what in your view would be good set of experiments to run ?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
so far the data we have unfortunately makes it less clear.

All the reported failures are likely not what the referred fix is solving.
Because as outlined, Xenial (and thereby Mitaka) don't have the change (https://libvirt.org/git/?p=libvirt.git;a=commit;h=a1344f70a) that got fixed by the issue Linaro was working on.
It might be that the older stacks need all of them, but that is closer to a feature request and I wonder why it should be arch specific then.

Only a theory, but what about:
- worked on x86 the old way
- all fixes at least up fceeeda21 to are required for arm64
- the issue on arm64 before a1344f70a looks just the same as with a1344f70a but without fceeeda21

Also I wonder about:
a) Trusty - Mitaka : Fail (Comm #4)
b) Xenial-Mitaka : OK (Comm #7)
Those use the same libvirt/qemu/openstack versions so what is going on here?
We should consider questioning our test setup in those cases as well.

I'm afraid one needs to step up create some sort of semi-automated test to ensure reproducibility across the retests and re-do the full matrix with that test:
                       a-x86 b-arm64
0. Trusty-Mitaka
1. Xenial
2. Xenial-Newton
3. Yakkety
4. Zesty
5. Xenial-Ocata

We then have to ensure that the issues are really the same, so get as much logs as possible while doing so.

Revision history for this message
Andrew McLeod (admcleod) wrote :

For sanity sake I am currently attempting to reproduce this with pike on arm64

Revision history for this message
Andrew McLeod (admcleod) wrote :

I can confirm this is still an issue with pike + arm64

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, if broken in pike+arm64 still that means it is broken in the latest qemu/libvirt we currently have (until we bumped for Bionic/Queens).
That also implies that all fixes that were meant to be identified before can't be the fix as they are in arm64+pike.

This will get more complex once we need SRU's as I outlined in e.g. comment #19 and grows a bigger dependency chain of changes over time. But before any of that we have to find a combination that works at all. So for now I suggest to not think about anything but latest devel as it only makes the case more complex (we can think about that later once it works).

I see three options here to continue:
1. we can wait for our bionic/queens bump of the virtualization stack and retry, but there are no indications that we should have more than a "trial and error" level of hope for that

2. someone in contact with Linaro should ask them about their setup exactly on which they fixed those bugs - I have to assume that after such a fix their testcase worked. If that holds true for us we have to start identifying what is different in our testcase.

3. Someone has to break the complexity of openstack out of this test and create a list of reproducible commands that "would do what openstack does" in regard to volume creation, guest spawning and attaching. Once reproducible that way we can start debugging and also call out for upstream to help thinking with us through this case. Mostly likely this will also be required for SRUs later on so maybe that is the best step to take.

Revision history for this message
Andrew McLeod (admcleod) wrote :

Meanwhile, I have been testing this on xenial queens, and am receiving what appears to be a more documented error.

[ 720.314053] virtio_blk virtio4: virtio: device uses modern interface but does not have VIRTIO_F_VERSION_1
[ 720.325082] virtio_blk: probe of virtio4 failed with error -22

https://pastebin.canonical.com/209188/ - this happens every time the volume is added. "Removing" the volume produces an acceptable set of messages, e.g. https://pastebin.canonical.com/209298/

e.g.

https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03385.html

The implication here is to test toggling disable-modern and disable-legacy although I am not sure how to achieve this via openstack to test.

qemu, libvirt and kernel versions:

ubuntu@node-loewy:~$ qemu-system-aarch64 --version
QEMU emulator version 2.10.1(Debian 1:2.10+dfsg-0ubuntu5~cloud0)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

ubuntu@node-loewy:~$ libvirtd --version
libvirtd (libvirt) 4.0.0

ubuntu@node-loewy:~$ uname -a
Linux node-loewy 4.13.0-32-generic #35~16.04.1-Ubuntu SMP Thu Jan 25 10:10:26 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux

qemu-system-aarch64 command: https://pastebin.canonical.com/209299/

domain xml with disk added: https://pastebin.canonical.com/209300/

domain xml diff (with and without disk): https://pastebin.canonical.com/209301/

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

That is the new libvirt 4.0 which gives the better messages.
Interesting.

Could be an arm specific lack of virtio (which is generic).
IIRC there was something that arm only worked with modern (virtio 1) and might force this.
But I could be wrong (too old info).

The XML as well as the qemu command you linked had no version set.
In XML that would be like:
<virtio revision='1.0'/> (or 0.9 )
And on the commandline I'd expect either of the controls you already found disable-modern/disable-legacy.

You could try if openstack can leave the guest/volumes around.
Then you could login the host and tweak the xml for virtio revision.
With that it should generate modern/legacy accordingly.
And that might make it work.

Once confirmed if that "is it" we can still think how to make that working in openstack.
Can you test that locally?

Revision history for this message
Andrew McLeod (admcleod) wrote :

I have tried to add this <virtio revision..../> to the XML but it seems that it is missing from the libvirt domain schema.

error: XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/domain.rng
Extra element devices in interleave
Element domain failed to validate content

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.3 KiB)

Yeah it seems this was discussed plenty of times but not accepted that (actually in any) way.
I tried around a bit and discussed with more people, but didn't find a good way to hard control it.

Note: on IRC it also came up that a hot-attach fails with the message in comment #23.
But on a guest reboot it would work.
Base on that the theory for now is that by default not init modern/legacy correctly.
The hot add will add a modern-only device that fails to initialize.
On a reboot this will be re-handshaked between guest/host and it works.
(not sure just a thought - could even be vice versa)

@admcleod - does that also imply that if the guest has the guest in it's xml right away, that it works on a start of the domain immediately as well?

@admcleod - that test (direct start with disk) would also help us to show the cmdline with the rbd device added to check if they would today use modern or old syntax
modern should be: -drive driver=rbd,filename=%s, ... ,password-secret
legacy should be: -drive file=%s, ... ,file.password-secret

Well the hot add will use commands instead of cmdline anyway.

virtip-pci is generic, but you never know, so I first checked if forcing modern exist as cmdline arg as expected.
$ qemu-system-aarch64 -machine virt-2.10,accel=kvm,gic-version=3 -device virtio-blk,help 2>&1 | egrep 'modern|legacy'
virtio-blk-pci.disable-modern=bool
virtio-blk-pci.disable-legacy=OnOffAuto (on/off/auto)
virtio-blk-pci.modern-pio-notify=bool (on/off)

In general we should have (see qemu include/hw/compat.h)
<=2.6 disable-modern=on,disable-legacy=off (old)
>2.6 disable-modern=off,disable-legacy=on (new)

So you should be by default using modern - without forcing anything.

I went to check the runtime info, the following gives some info, please feel free to look around for more differences in there:
$ virsh qemu-monitor-command --hmp <guestname> 'info qtree' | grep -A 10 virtio-blk-pci

I already saw that on qemu2.10 x86 vs aarch64 there are differences (with basic discs without rbd):
arm: disable-legacy = "on", disable-modern = false
x86: disable-legacy = "off", disable-modern = false

So at least in my case x86 seems effectively to provide both, but arm ONLY the new one.
There might be reasons for that (bad pre 1.0 support on arm or so), so that is not an issue but interesting.

Note: I was trying to derive a fake-rbd from your device xml, but that hung my host for a while - so maybe you should test the above as you have it set up.

@admcleod - for your described lifecycle it would be interesting to query those attributes above.
1 after fresh guest start
2 after attaching the rbd device (that fails)
3 after reboot with rbd device attached (device now works)
4 after fresh guest start with rbd device
= all of the above comparing the same stack on x86 vs aarch64
Pick all of the devices output in qtree, and "info block" on it as well.

This list we can then compare in Y axis (1-4) and between arches (arm vs x86 on each step).
Hopefully that would lead to an insight that helps.

Finally by more discussions I actually found a way to control disable-modern/legacy via libvirt xml.
You can see an example to get to "both enabled" that works on my arm b...

Read more...

Revision history for this message
Sean Feole (sfeole) wrote :

Wanted to update this bug with my findings so far, I have found the linaro commit that fixed this issue here: https://www.redhat.com/archives/libvir-list/2016-October/msg00396.html

I was in the process of testing bionic, however was blocked due to a few juju bugs last week,(See: https://launchpad.net/juju/2.4/2.4-beta2 ) those bugs have since been repaired. My test hardware was temporarily removed last week and waiting until resources are back to continue with testing.

Steps moving forward are to deploy Queens + Bionic on arm64, test cinder volume hotplug. If the same symptoms remain , then recreate the process manually without using openstack. as Christian mentioned in one of the above comments.

Revision history for this message
Sean Feole (sfeole) wrote :
Download full text (5.2 KiB)

Today I was able to test Queens+Bionic for the first time in about 2 weeks, without being roadblocked by bugs, Due to a resource constraint in hardware i was only giving one host. I decided to deploy opentack-on-lxd which uses the same bits as a multi-node deployment.

ubuntu@alekhin:~/openstack-on-lxd$ openstack volume list
+--------------------------------------+---------+-----------+------+-------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+---------+-----------+------+-------------+
| 27ceef74-3746-48eb-88e0-6a3bf1a97dd6 | volume1 | available | 10 | |
+--------------------------------------+---------+-----------+------+-------------+

ubuntu@alekhin:~/openstack-on-lxd$ openstack server list
+--------------------------------------+---------------+--------+-------------------------------------+--------+----------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+---------------+--------+-------------------------------------+--------+----------+
| 4411c938-8a58-4a86-b666-a0e7d041e3f8 | sfeole-bionic | ACTIVE | internal=192.168.20.7, 10.228.22.34 | bionic | m1.small |
+--------------------------------------+---------------+--------+-------------------------------------+--------+----------+

ubuntu@alekhin:~/openstack-on-lxd$ openstack hypervisor list
+----+---------------------+-----------------+---------------+-------+
| ID | Hypervisor Hostname | Hypervisor Type | Host IP | State |
+----+---------------------+-----------------+---------------+-------+
| 1 | juju-ac9015-20.lxd | QEMU | 10.228.22.203 | up |
+----+---------------------+-----------------+---------------+-------+

On the instance we can see that only the root disk is available.

ubuntu@sfeole-bionic:~$ sudo fdisk -l
Disk /dev/vda: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 4A7C8D71-AC27-4899-89D8-213EAB291FB6

Device Start End Sectors Size Type
/dev/vda1 206848 20971486 20764639 9.9G Linux filesystem
/dev/vda15 2048 204800 202753 99M EFI System

Partition table entries are not in disk order.
ubuntu@sfeole-bionic:~$ sudo lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 10G 0 disk
├─vda1 252:1 0 9.9G 0 part /
└─vda15 252:15 0 99M 0 part /boot/efi

If then attached the volume to the live-running instance,

ubuntu@alekhin:~/openstack-on-lxd$ openstack server add volume sfeole-bionic volume1
ubuntu@alekhin:~/openstack-on-lxd$ openstack volume list
+--------------------------------------+---------+--------+------+----------------------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+---------+--------+------+----------------------------------------+
| 27ceef74-3746-48eb-88e0-6a3bf1a97dd6 | volume...

Read more...

Changed in libvirt (Ubuntu):
assignee: nobody → Sean Feole (sfeole)
status: Incomplete → Fix Committed
Revision history for this message
Sean Feole (sfeole) wrote :

Marking bug as fix committed as this now works, with Bionic-Queens.

Revision history for this message
Sean Feole (sfeole) wrote :
Download full text (3.3 KiB)

Some additional information: using libvirt 4.0

ii libvirt-clients 4.0.0-1ubuntu8 arm64 Programs for the libvirt library
ii libvirt-daemon 4.0.0-1ubuntu8 arm64 Virtualization daemon
ii libvirt-daemon-driver-storage-rbd 4.0.0-1ubuntu8 arm64 Virtualization daemon RBD storage driver
ii libvirt-daemon-system 4.0.0-1ubuntu8 arm64 Libvirt daemon configuration files
ii libvirt0:arm64 4.0.0-1ubuntu8 arm64 library for interfacing with different virtualization systems
ii nova-compute-libvirt 2:17.0.3-0ubuntu1 all OpenStack Compute - compute node libvirt support
ii python-libvirt 4.0.0-1 arm64 libvirt Python bindings

using the following juju charms from charmstore

App Version Status Scale Charm Store Rev OS Notes
ceilometer 10.0.0 active 1 ceilometer jujucharms 252 ubuntu
ceilometer-agent 10.0.0 active 1 ceilometer-agent jujucharms 243 ubuntu
ceph-mon 12.2.4 active 3 ceph-mon jujucharms 24 ubuntu
ceph-osd 12.2.4 active 3 ceph-osd jujucharms 261 ubuntu
ceph-radosgw 12.2.4 active 1 ceph-radosgw jujucharms 257 ubuntu
cinder 12.0.1 active 1 cinder jujucharms 271 ubuntu
cinder-ceph 12.0.1 active 1 cinder-ceph jujucharms 232 ubuntu
designate 6.0.1 blocked 1 designate jujucharms 18 ubuntu
designate-bind 9.11.3+dfsg active 1 designate-bind jujucharms 12 ubuntu
glance 16.0.1 active 1 glance jujucharms 264 ubuntu
gnocchi 4.2.4 active 1 gnocchi jujucharms 7 ubuntu
heat 10.0.0 active 1 heat jujucharms 251 ubuntu
keystone 13.0.0 active 1 keystone jujucharms 280 ubuntu
memcached unknown 1 memcached jujucharms 21 ubuntu
mysql 5.7.20-29.24 active 1 percona-cluster jujucharms 264 ubuntu
neutron-api 12.0.1 active 1 neutron-api jujucharms 259 ubuntu
neutron-gateway 12.0.1 active 1 neutron-gateway jujucharms 251 ubuntu
neutron-openvswitch 12.0.1 active 1 neutron-openvswitch jujucharms 249 ubuntu
nova-cloud-controller 17.0.3 active 1 nova-cloud-controller jujucharms 309 ubuntu
nova-compute 17.0.3 active 1 nova-compute jujucharms 282 ubuntu
openstack-dashboard 13.0.0 active 1 openstack-dashboa...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks a lot Sean, if it works as in Bionic/Queens then it is even released.
Is there a need for the super-deep dive what exactly fixed it to consider SRUs or is Bionic/Queens fine for now?

Changed in libvirt (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.