[bionic] libvirtError: Node device not found: no node device with matching name

Bug #1771662 reported by Jason Hobbs
26
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
OpenStack Nova Compute Charm
Invalid
Undecided
Frode Nordahl
libvirt (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned
Cosmic
Fix Released
Undecided
Unassigned
Disco
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * Libvirt has had the assumption that every VF (virtual function) will
   have a PF (physical function) assigned, but that does not hold true on
   some special Hardware like the Cavium ThunderX

 * Dannf helped some patches initially from Linaro to be accepted upstream
   and those we'd want to backport to Bionic and Cosmic

[Test Case]

 * Verify that virsh nodedev-list shows the onboard NICs:
   $ sudo virsh nodedev-list | grep ^net
   net_enP2p1s0f1_42_ca_74_64_88_75
   net_enP2p1s0f2_42_ca_74_64_88_76
   net_enP2p1s0f3_42_ca_74_64_88_77

 * This needs plenty of setup and special HW, but Jason Hobbs & Dannf are
   willing to do the verification in our test lab.

[Regression Potential]

 * Review hasn't spotted any issues, but in theory there could be negative
   effects to PF/VF pass-through cases. There is some code cleanup
   associated that should not, but might cause issues on that.
   I'd ask Jason to also run PF/VF workload on the PPA/SRU with other
   Hardware as well (like our x86 test environment) to be sure of that
   being ok.

[Other Info]

 * n/a

---

After deploying openstack on arm64 using bionic and queens, no hypervisors show upon. On my compute nodes, I have an error like:

2018-05-16 19:23:08.165 282170 ERROR nova.compute.manager libvirtError: Node device not found: no node device with matching name 'net_enP2p1s0f1_40_8d_5c_ba_b8_d2'

In my /var/log/nova/nova-compute.log

I'm not sure why this is happening - I don't use enP2p1s0f1 for anything.

There are a lot of interesting messages about that interface in syslog:
http://paste.ubuntu.com/p/8WT8NqCbCf/

Here is my bundle: http://paste.ubuntu.com/p/fWWs6r8Nr5/

The same bundle works fine for xenial-queens, with the source changed to the cloud-archive, and using stable charms rather than -next. I hit this same issue on bionic queens using either stable or next charms.

This thread has some related info, I think:
https://www.spinics.net/linux/fedora/libvir/msg160975.html

This is with juju 2.4 beta 2.

Package versions on affected system: http://paste.ubuntu.com/p/yfQH3KJzng/

Related branches

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
description: updated
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Subscribed to field high as this is blocking bionic queens testing and is 100% reproducible.

description: updated
description: updated
description: updated
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

What puzzles me is Xenial-Queens working and Bionic showing issues.
Because it seems like libvirt being unable to cope with this type of HW, but since it works in one but not the other ...
Yet versions are:
- xenial-queens
    libvirt 4.0.0-1ubuntu7~cloud0
    qemu 1:2.11+dfsg-1ubuntu7~cloud0
- bionic
    libvirt 4.0.0-1ubuntu8
    qemu 1:2.11+dfsg-1ubuntu7.1

Which are the same except a minor bump which UCA will sync in a bit.

And jhobbs reports even the kernels are the same (Xenial with HWE).
So for now, ?!?

description: updated
Ryan Beisner (1chb1n)
Changed in charm-nova-compute:
status: New → Invalid
Revision history for this message
Ryan Beisner (1chb1n) wrote :

We think this is an issue in libvirt, related to how it handles the sriov hardware in these machines.

Revision history for this message
Andrew McLeod (admcleod) wrote :

Further information: Using juju 2.4 beta2 I was able to deploy magpie on bionic in lxd and baremetal via MAAS.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

The deploy works fine with juju 2.4 beta 2 and xenial/queens.

package versions: http://paste.ubuntu.com/p/PF7Jb7gxnX/

we do see this in nova-compute.log, but it's not fatal:
http://paste.ubuntu.com/p/Dh4ZGVTtH8/

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

This looks like it is specific to this hardware and the way it does VFs and PFs, so I'm removing field-high.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

given it works with the same libvirt and kernel on 16.04 but not 18.04, I'm suspicious of netplan here.

Revision history for this message
Steve Langasek (vorlon) wrote :

> I'm suspicious of netplan here.

netplan is only the messenger here, between cloud-init+juju and networkd. Can you show the complete netplan yaml as it's been laid down on the system in question?

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

steve captured what I meant in #8 better than I did: 17:46 < slangasek> one could as accurately say "I'm suspicious this is related to us replacing the whole networking stack in Ubuntu" ;-)

Revision history for this message
Ryan Harper (raharper) wrote :

Please capture:

1) cloud-init collect-logs (writes cloud-init.tar to $CWD)
2) the journal /var/log/journal
3) /etc/netplan and /run/systemd
4) /etc/udev/rules.d

Revision history for this message
Ryan Harper (raharper) wrote :

And for the xenial deployment version, can we get what's in /etc/network/interfaces* (including the .d)?

I'm generally curious w.r.t what interfaces are managed by the OS, and which ones are being delegated to the guests.

Revision history for this message
Ryan Harper (raharper) wrote :

To make it more clear; the hardware SRIOV device is different that normal:

<cpaelzer> TL;DR this special device has VFs that have NO PF associated
<cpaelzer> software doesn't understand this

Though per comment #3; it seems odd that a Xenial/Queens with the same kernel (HWE) works OK. So some tracing in libvirt/nova to confirm different paths, I think.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

@rharper still working on getting the other stuff you've asked for, but here is the uname -a output from xenial vs bionic:
http://paste.ubuntu.com/p/rJDpK5SyW9/

Revision history for this message
Ryan Harper (raharper) wrote :

Some package level deltas that may be relevant:

ii linux-firmware 1.173
ii linux-firmware 1.157.18

ii pciutils 1:3.3.1-1.1ubuntu1.2
ii pciutils 1:3.5.2-1ubuntu

libvirt0:arm64 4.0.0-1ubuntu7~cloud0
libvirt0:arm64 4.0.0-1ubuntu8

Less likely to have an impact, guest firmware but none-the-less delta:

qemu-efi-aarch64 0~20180205.c0d9813c-2
qemu-efi 0~20160408.ffea0a2c-2

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

@rharper here are the logs you asked for from the bionic deploy

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

all of /var/log and /etc from the bionic deploy.

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for the logs.

I generally don't see anything *fatal* to libvirt. In the nova logs, I can see that virsh capabilities returns host information. It certainly is failing to find the VFs on the SRIOV device; it's not clear if that's because the device is misbehaving (we can see the kernel events indicating the driver is being reset, enP2p1s0f1 renamed eth0, eth0 renamed to enP2p1s0f1 which can only happen if the driver has been reset) or if the probing of device's PCI address space is triggering a reset.

Note that netplan has no skin in this game; it applies a DHCP and DNS config to enP2p1s0f3 which stays up the whole time, juju even bridges en..f3 etc. The other interfaces found during boot are set to "manual" config; that is netplan writes a .link file for setting the name, but note that the name is the predictable name it gets from the default udev policy anyhow.

At this point, we can compare the logs to Xenial, but I think the next step is back to the charms/nova-compute to determine how a node reports back to openstack that a compute node is ready.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (5.2 KiB)

Newly deployed Cavium System with 18.04 to get my own view onto this (without openstack/charms in the way)

1. start a basic guest
   $ sudo apt install uvtool-libvirt qemu-efi-aarch64
   $ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=arm64 label=daily release=bionic
   $ uvt-kvm create --password=ubuntu b1 release=bionic arch=arm64 label=daily

=> Just works, nothing special in logs
Since it was stated that the special VF/PF are not uses this already breaks the argument made in the bug report - my guest just works on this system.

2. check the odd PF/VF situation

Please note that I had only the initial renames to the new naming scheme, but no others:
dmesg | grep renamed
[ 10.450002] thunder-nicvf 0002:01:00.2 enP2p1s0f2: renamed from eth1
[ 10.489989] thunder-nicvf 0002:01:00.1 enP2p1s0f1: renamed from eth0
[ 10.629936] thunder-nicvf 0002:01:00.4 enP2p1s0f4: renamed from eth3
[ 10.877936] thunder-nicvf 0002:01:00.3 enP2p1s0f3: renamed from eth2
[ 10.957933] thunder-nicvf 0002:01:00.5 enP2p1s0f5: renamed from eth4

None of the devices has pyhsical_port_id but that is no fatal.
Because on other platforms I found the same e.g. ppc64el some have that some don't /sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.0/0003:09:00.0/net/enP3p9s0f0/phys_port_id': Operation not supported
/sys/devices/pci0005:00/0005:00:00.0/0005:01:00.3/net/enP5p1s0f3/phys_port_id 0400000000334233343130363730453131

It will just use NULL which essentially menas there is just one phys port and that is fine.

It is more interesting that it later checks physfn which exists on Cavium (but not on ppc64 for example)
ll /sys/devices/pci0002:00/0002:00:02.0/0002:01:01.4/physfn
lrwxrwxrwx 1 root root 0 May 18 06:23 /sys/devices/pci0002:00/0002:00:02.0/0002:01:01.4/physfn -> ../0002:01:00.0/

If this would NOT exist it would give up here.
But it does exist, so it tries to go on with it and then fails as it doesn't find anything.
That would match what we read in the reported upstream mail discussion.

But none of this matters as per jhobbs it should not use those devices at all.

FYI code in libvirt around that:
virNetDevGetPhysicalFunction
-> virNetDevGetPhysPortID
   -> virNetDevSysfsFile
   This gives you something like
   /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.4/net/enP2p1s0f4/phys_port_id
-> virNetDevSysfsDeviceFile
-> virPCIGetNetName
If none of these functions failed BUT returned no path then the reported message appears.
On other HW it either works OR just doesn't find the paths and gives up before the error message.

3. check libvirt capabilities and status
As I asked before, we would need to know the libvirt action that fails, as all I tried just works.

Also general probing like one would expect on an initial nova node setup:
  $ virsh capabilities
  $ virsh domcapabilities
  $ virsh sysinfo
  $ virsh nodeinfo
works just fine without the reported errors.

4. Lets even use those devices now
The host uses enP2p1s0f1, that is:
0002:01:00.1 Ethernet controller: Cavium, Inc. THUNDERX Network Interface Controller virtual function (rev 09)
So lets use its siblings
As passthrough host-interface
  0002...

Read more...

Changed in libvirt (Ubuntu):
status: New → Incomplete
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Christian, thanks for digging in. Yes, I really just setup base openstack and hit this condition. I'm not doing anything to setup devices as passthrough or anything along those lines, and I'm not trying to start instances.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

@rharper, here are the logs you requested from the xenial deploy.

Changed in charm-nova-compute:
status: Invalid → New
Changed in libvirt (Ubuntu):
status: Incomplete → New
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

marked new on nova-compute-charm due to rharper's comment #18, and new on libvirt because I've posted all the requested logs now.

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name
Download full text (6.3 KiB)

Comparing the kernel logs, on Xenial, the second nic comes up:

May 22 15:00:27 aurorus kernel: [ 24.840500] IPv6:
ADDRCONF(NETDEV_UP): enP2p1s0f2: link is not ready
May 22 15:00:27 aurorus kernel: [ 25.472391] thunder-nicvf
0002:01:00.2 enP2p1s0f2: Link is Up 10000 Mbps Full duplex

But on bionic, we only ever have f3 up. Note this isn't a network
configuration, but rather the state of the Nic and the switch.
It doesn't appear to matter, 0f3 is what get's bridged by juju anyhow.
But it does suggest that something is different.

There is a slight kernel version variance as well:

Xenial:
May 22 15:00:27 aurorus kernel: [ 0.000000] Linux version
4.15.0-22-generic (buildd@bos02-arm64-038) (gcc version 5.4.0 20160609
(Ubuntu/Lin

Bionic:
May 17 18:03:47 aurorus kernel: [ 0.000000] Linux version
4.15.0-20-generic (buildd@bos02-arm64-029) (gcc version 7.3.0
(Ubuntu/Linaro 7.3.

Looks like Xenial does not use unified cgroup namespaces, not sure
what affect this may have on what's running in those lxd juju
containers.

% grep DENIED *.log
bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592228] audit:
type=1400 audit(1526581173.043:70): apparmor="DENIED"
operation="mount" info="failed flags match" error=-13
profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
name="/sys/fs/cgroup/unified/" pid=24143 comm="systemd"
fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592476] audit:
type=1400 audit(1526581173.043:71): apparmor="DENIED"
operation="mount" info="failed flags match" error=-13
profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
name="/sys/fs/cgroup/unified/" pid=24143 comm="systemd"
fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
bionic.log:May 17 18:19:41 aurorus kernel: [ 991.818402] audit:
type=1400 audit(1526581181.267:88): apparmor="DENIED"
operation="mount" info="failed flags match" error=-13
profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
name="/run/systemd/unit-root/var/lib/lxcfs/" pid=24757
comm="(networkd)" flags="ro, nosuid, nodev, remount, bind"
bionic.log:May 17 18:19:46 aurorus kernel: [ 997.271203] audit:
type=1400 audit(1526581186.719:90): apparmor="DENIED"
operation="mount" info="failed flags match" error=-13
profile="lxd-juju-657fe9-1-lxd-2_</var/lib/lxd>"
name="/sys/fs/cgroup/unified/" pid=25227 comm="systemd"
fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
bionic.log:May 17 18:19:46 aurorus kernel: [ 997.271425] audit:
type=1400 audit(1526581186.723:91): apparmor="DENIED"
operation="mount" info="failed flags match" error=-13
profile="lxd-juju-657fe9-1-lxd-2_</var/lib/lxd>"
name="/sys/fs/cgroup/unified/" pid=25227 comm="systemd"
fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
bionic.log:May 17 18:19:55 aurorus kernel: [ 1006.285863] audit:
type=1400 audit(1526581195.735:108): apparmor="DENIED"
operation="mount" info="failed flags match" error=-13
profile="lxd-juju-657fe9-1-lxd-2_</var/lib/lxd>"
name="/run/systemd/unit-root/" pid=26209 comm="(networkd)" flags="ro,
remount, bind"
bionic.log:May 17 18:20:12 aurorus kernel: [ 1022.760512] audit:
type=1400 audit(1526581212.211:110): apparmor="D...

Read more...

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Download full text (8.3 KiB)

ok; looks like that 4.15.0-22-generic just released and wasn't what I
used in the first reproduction... I doubt that's it.

On Tue, May 22, 2018 at 4:58 PM, Ryan Harper <email address hidden> wrote:
> Comparing the kernel logs, on Xenial, the second nic comes up:
>
> May 22 15:00:27 aurorus kernel: [ 24.840500] IPv6:
> ADDRCONF(NETDEV_UP): enP2p1s0f2: link is not ready
> May 22 15:00:27 aurorus kernel: [ 25.472391] thunder-nicvf
> 0002:01:00.2 enP2p1s0f2: Link is Up 10000 Mbps Full duplex
>
> But on bionic, we only ever have f3 up. Note this isn't a network
> configuration, but rather the state of the Nic and the switch.
> It doesn't appear to matter, 0f3 is what get's bridged by juju anyhow.
> But it does suggest that something is different.
>
> There is a slight kernel version variance as well:
>
> Xenial:
> May 22 15:00:27 aurorus kernel: [ 0.000000] Linux version
> 4.15.0-22-generic (buildd@bos02-arm64-038) (gcc version 5.4.0 20160609
> (Ubuntu/Lin
>
> Bionic:
> May 17 18:03:47 aurorus kernel: [ 0.000000] Linux version
> 4.15.0-20-generic (buildd@bos02-arm64-029) (gcc version 7.3.0
> (Ubuntu/Linaro 7.3.
>
> Looks like Xenial does not use unified cgroup namespaces, not sure
> what affect this may have on what's running in those lxd juju
> containers.
>
> % grep DENIED *.log
> bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592228] audit:
> type=1400 audit(1526581173.043:70): apparmor="DENIED"
> operation="mount" info="failed flags match" error=-13
> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
> name="/sys/fs/cgroup/unified/" pid=24143 comm="systemd"
> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
> bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592476] audit:
> type=1400 audit(1526581173.043:71): apparmor="DENIED"
> operation="mount" info="failed flags match" error=-13
> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
> name="/sys/fs/cgroup/unified/" pid=24143 comm="systemd"
> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
> bionic.log:May 17 18:19:41 aurorus kernel: [ 991.818402] audit:
> type=1400 audit(1526581181.267:88): apparmor="DENIED"
> operation="mount" info="failed flags match" error=-13
> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
> name="/run/systemd/unit-root/var/lib/lxcfs/" pid=24757
> comm="(networkd)" flags="ro, nosuid, nodev, remount, bind"
> bionic.log:May 17 18:19:46 aurorus kernel: [ 997.271203] audit:
> type=1400 audit(1526581186.719:90): apparmor="DENIED"
> operation="mount" info="failed flags match" error=-13
> profile="lxd-juju-657fe9-1-lxd-2_</var/lib/lxd>"
> name="/sys/fs/cgroup/unified/" pid=25227 comm="systemd"
> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
> bionic.log:May 17 18:19:46 aurorus kernel: [ 997.271425] audit:
> type=1400 audit(1526581186.723:91): apparmor="DENIED"
> operation="mount" info="failed flags match" error=-13
> profile="lxd-juju-657fe9-1-lxd-2_</var/lib/lxd>"
> name="/sys/fs/cgroup/unified/" pid=25227 comm="systemd"
> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
> bionic.log:May 17 18:19:55 aurorus kernel: [ 1006.285863] audit:
> type=1400 audit(1526581195.735:108):...

Read more...

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Download full text (8.7 KiB)

Do you really want a tar? How about ls -alR? xenial:

http://paste.ubuntu.com/p/wyQ3kTsyBB/

On Tue, May 22, 2018 at 5:14 PM, Jason Hobbs <email address hidden> wrote:
> ok; looks like that 4.15.0-22-generic just released and wasn't what I
> used in the first reproduction... I doubt that's it.
>
> On Tue, May 22, 2018 at 4:58 PM, Ryan Harper <email address hidden> wrote:
>> Comparing the kernel logs, on Xenial, the second nic comes up:
>>
>> May 22 15:00:27 aurorus kernel: [ 24.840500] IPv6:
>> ADDRCONF(NETDEV_UP): enP2p1s0f2: link is not ready
>> May 22 15:00:27 aurorus kernel: [ 25.472391] thunder-nicvf
>> 0002:01:00.2 enP2p1s0f2: Link is Up 10000 Mbps Full duplex
>>
>> But on bionic, we only ever have f3 up. Note this isn't a network
>> configuration, but rather the state of the Nic and the switch.
>> It doesn't appear to matter, 0f3 is what get's bridged by juju anyhow.
>> But it does suggest that something is different.
>>
>> There is a slight kernel version variance as well:
>>
>> Xenial:
>> May 22 15:00:27 aurorus kernel: [ 0.000000] Linux version
>> 4.15.0-22-generic (buildd@bos02-arm64-038) (gcc version 5.4.0 20160609
>> (Ubuntu/Lin
>>
>> Bionic:
>> May 17 18:03:47 aurorus kernel: [ 0.000000] Linux version
>> 4.15.0-20-generic (buildd@bos02-arm64-029) (gcc version 7.3.0
>> (Ubuntu/Linaro 7.3.
>>
>> Looks like Xenial does not use unified cgroup namespaces, not sure
>> what affect this may have on what's running in those lxd juju
>> containers.
>>
>> % grep DENIED *.log
>> bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592228] audit:
>> type=1400 audit(1526581173.043:70): apparmor="DENIED"
>> operation="mount" info="failed flags match" error=-13
>> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
>> name="/sys/fs/cgroup/unified/" pid=24143 comm="systemd"
>> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
>> bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592476] audit:
>> type=1400 audit(1526581173.043:71): apparmor="DENIED"
>> operation="mount" info="failed flags match" error=-13
>> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
>> name="/sys/fs/cgroup/unified/" pid=24143 comm="systemd"
>> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
>> bionic.log:May 17 18:19:41 aurorus kernel: [ 991.818402] audit:
>> type=1400 audit(1526581181.267:88): apparmor="DENIED"
>> operation="mount" info="failed flags match" error=-13
>> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
>> name="/run/systemd/unit-root/var/lib/lxcfs/" pid=24757
>> comm="(networkd)" flags="ro, nosuid, nodev, remount, bind"
>> bionic.log:May 17 18:19:46 aurorus kernel: [ 997.271203] audit:
>> type=1400 audit(1526581186.719:90): apparmor="DENIED"
>> operation="mount" info="failed flags match" error=-13
>> profile="lxd-juju-657fe9-1-lxd-2_</var/lib/lxd>"
>> name="/sys/fs/cgroup/unified/" pid=25227 comm="systemd"
>> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
>> bionic.log:May 17 18:19:46 aurorus kernel: [ 997.271425] audit:
>> type=1400 audit(1526581186.723:91): apparmor="DENIED"
>> operation="mount" info="failed flags match" error=-13
>> profile="lxd-juju-657fe9-1-lxd-2_</var/lib/lx...

Read more...

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Download full text (9.0 KiB)

cd /sys/bus/pci/devices && grep -nr . *

xenial:
http://paste.ubuntu.com/p/F5qyvN2Qrr/

On Tue, May 22, 2018 at 5:27 PM, Jason Hobbs <email address hidden> wrote:
> Do you really want a tar? How about ls -alR? xenial:
>
> http://paste.ubuntu.com/p/wyQ3kTsyBB/
>
> On Tue, May 22, 2018 at 5:14 PM, Jason Hobbs <email address hidden> wrote:
>> ok; looks like that 4.15.0-22-generic just released and wasn't what I
>> used in the first reproduction... I doubt that's it.
>>
>> On Tue, May 22, 2018 at 4:58 PM, Ryan Harper <email address hidden> wrote:
>>> Comparing the kernel logs, on Xenial, the second nic comes up:
>>>
>>> May 22 15:00:27 aurorus kernel: [ 24.840500] IPv6:
>>> ADDRCONF(NETDEV_UP): enP2p1s0f2: link is not ready
>>> May 22 15:00:27 aurorus kernel: [ 25.472391] thunder-nicvf
>>> 0002:01:00.2 enP2p1s0f2: Link is Up 10000 Mbps Full duplex
>>>
>>> But on bionic, we only ever have f3 up. Note this isn't a network
>>> configuration, but rather the state of the Nic and the switch.
>>> It doesn't appear to matter, 0f3 is what get's bridged by juju anyhow.
>>> But it does suggest that something is different.
>>>
>>> There is a slight kernel version variance as well:
>>>
>>> Xenial:
>>> May 22 15:00:27 aurorus kernel: [ 0.000000] Linux version
>>> 4.15.0-22-generic (buildd@bos02-arm64-038) (gcc version 5.4.0 20160609
>>> (Ubuntu/Lin
>>>
>>> Bionic:
>>> May 17 18:03:47 aurorus kernel: [ 0.000000] Linux version
>>> 4.15.0-20-generic (buildd@bos02-arm64-029) (gcc version 7.3.0
>>> (Ubuntu/Linaro 7.3.
>>>
>>> Looks like Xenial does not use unified cgroup namespaces, not sure
>>> what affect this may have on what's running in those lxd juju
>>> containers.
>>>
>>> % grep DENIED *.log
>>> bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592228] audit:
>>> type=1400 audit(1526581173.043:70): apparmor="DENIED"
>>> operation="mount" info="failed flags match" error=-13
>>> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
>>> name="/sys/fs/cgroup/unified/" pid=24143 comm="systemd"
>>> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
>>> bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592476] audit:
>>> type=1400 audit(1526581173.043:71): apparmor="DENIED"
>>> operation="mount" info="failed flags match" error=-13
>>> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
>>> name="/sys/fs/cgroup/unified/" pid=24143 comm="systemd"
>>> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
>>> bionic.log:May 17 18:19:41 aurorus kernel: [ 991.818402] audit:
>>> type=1400 audit(1526581181.267:88): apparmor="DENIED"
>>> operation="mount" info="failed flags match" error=-13
>>> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
>>> name="/run/systemd/unit-root/var/lib/lxcfs/" pid=24757
>>> comm="(networkd)" flags="ro, nosuid, nodev, remount, bind"
>>> bionic.log:May 17 18:19:46 aurorus kernel: [ 997.271203] audit:
>>> type=1400 audit(1526581186.719:90): apparmor="DENIED"
>>> operation="mount" info="failed flags match" error=-13
>>> profile="lxd-juju-657fe9-1-lxd-2_</var/lib/lxd>"
>>> name="/sys/fs/cgroup/unified/" pid=25227 comm="systemd"
>>> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexe...

Read more...

Revision history for this message
Ryan Harper (raharper) wrote :
Download full text (9.7 KiB)

Looks like the ls -aLR contains more data; we can compare bionic.

On Tue, May 22, 2018 at 6:53 PM, Jason Hobbs <email address hidden> wrote:
> cd /sys/bus/pci/devices && grep -nr . *
>
> xenial:
> http://paste.ubuntu.com/p/F5qyvN2Qrr/
>
> On Tue, May 22, 2018 at 5:27 PM, Jason Hobbs <email address hidden> wrote:
>> Do you really want a tar? How about ls -alR? xenial:
>>
>> http://paste.ubuntu.com/p/wyQ3kTsyBB/
>>
>> On Tue, May 22, 2018 at 5:14 PM, Jason Hobbs <email address hidden> wrote:
>>> ok; looks like that 4.15.0-22-generic just released and wasn't what I
>>> used in the first reproduction... I doubt that's it.
>>>
>>> On Tue, May 22, 2018 at 4:58 PM, Ryan Harper <email address hidden> wrote:
>>>> Comparing the kernel logs, on Xenial, the second nic comes up:
>>>>
>>>> May 22 15:00:27 aurorus kernel: [ 24.840500] IPv6:
>>>> ADDRCONF(NETDEV_UP): enP2p1s0f2: link is not ready
>>>> May 22 15:00:27 aurorus kernel: [ 25.472391] thunder-nicvf
>>>> 0002:01:00.2 enP2p1s0f2: Link is Up 10000 Mbps Full duplex
>>>>
>>>> But on bionic, we only ever have f3 up. Note this isn't a network
>>>> configuration, but rather the state of the Nic and the switch.
>>>> It doesn't appear to matter, 0f3 is what get's bridged by juju anyhow.
>>>> But it does suggest that something is different.
>>>>
>>>> There is a slight kernel version variance as well:
>>>>
>>>> Xenial:
>>>> May 22 15:00:27 aurorus kernel: [ 0.000000] Linux version
>>>> 4.15.0-22-generic (buildd@bos02-arm64-038) (gcc version 5.4.0 20160609
>>>> (Ubuntu/Lin
>>>>
>>>> Bionic:
>>>> May 17 18:03:47 aurorus kernel: [ 0.000000] Linux version
>>>> 4.15.0-20-generic (buildd@bos02-arm64-029) (gcc version 7.3.0
>>>> (Ubuntu/Linaro 7.3.
>>>>
>>>> Looks like Xenial does not use unified cgroup namespaces, not sure
>>>> what affect this may have on what's running in those lxd juju
>>>> containers.
>>>>
>>>> % grep DENIED *.log
>>>> bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592228] audit:
>>>> type=1400 audit(1526581173.043:70): apparmor="DENIED"
>>>> operation="mount" info="failed flags match" error=-13
>>>> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
>>>> name="/sys/fs/cgroup/unified/" pid=24143 comm="systemd"
>>>> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
>>>> bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592476] audit:
>>>> type=1400 audit(1526581173.043:71): apparmor="DENIED"
>>>> operation="mount" info="failed flags match" error=-13
>>>> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
>>>> name="/sys/fs/cgroup/unified/" pid=24143 comm="systemd"
>>>> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
>>>> bionic.log:May 17 18:19:41 aurorus kernel: [ 991.818402] audit:
>>>> type=1400 audit(1526581181.267:88): apparmor="DENIED"
>>>> operation="mount" info="failed flags match" error=-13
>>>> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
>>>> name="/run/systemd/unit-root/var/lib/lxcfs/" pid=24757
>>>> comm="(networkd)" flags="ro, nosuid, nodev, remount, bind"
>>>> bionic.log:May 17 18:19:46 aurorus kernel: [ 997.271203] audit:
>>>> type=1400 audit(1526581186.719:90): apparmor="DENIED"
>>>> operation="mount" i...

Read more...

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Download full text (11.7 KiB)

ls -alR /sys on bionic http://paste.ubuntu.com/p/nrxyRGP3By/

The bionic kernel has also bumped:
Linux aurorus 4.15.0-22-generic #24-Ubuntu SMP Wed May 16 12:14:36 UTC
2018 aarch64 aarch64 aarch64 GNU/Linux

On Tue, May 22, 2018 at 7:10 PM, Ryan Harper <email address hidden> wrote:
> Looks like the ls -aLR contains more data; we can compare bionic.
>
> On Tue, May 22, 2018 at 6:53 PM, Jason Hobbs <email address hidden> wrote:
>> cd /sys/bus/pci/devices && grep -nr . *
>>
>> xenial:
>> http://paste.ubuntu.com/p/F5qyvN2Qrr/
>>
>> On Tue, May 22, 2018 at 5:27 PM, Jason Hobbs <email address hidden> wrote:
>>> Do you really want a tar? How about ls -alR? xenial:
>>>
>>> http://paste.ubuntu.com/p/wyQ3kTsyBB/
>>>
>>> On Tue, May 22, 2018 at 5:14 PM, Jason Hobbs <email address hidden> wrote:
>>>> ok; looks like that 4.15.0-22-generic just released and wasn't what I
>>>> used in the first reproduction... I doubt that's it.
>>>>
>>>> On Tue, May 22, 2018 at 4:58 PM, Ryan Harper <email address hidden> wrote:
>>>>> Comparing the kernel logs, on Xenial, the second nic comes up:
>>>>>
>>>>> May 22 15:00:27 aurorus kernel: [ 24.840500] IPv6:
>>>>> ADDRCONF(NETDEV_UP): enP2p1s0f2: link is not ready
>>>>> May 22 15:00:27 aurorus kernel: [ 25.472391] thunder-nicvf
>>>>> 0002:01:00.2 enP2p1s0f2: Link is Up 10000 Mbps Full duplex
>>>>>
>>>>> But on bionic, we only ever have f3 up. Note this isn't a network
>>>>> configuration, but rather the state of the Nic and the switch.
>>>>> It doesn't appear to matter, 0f3 is what get's bridged by juju anyhow.
>>>>> But it does suggest that something is different.
>>>>>
>>>>> There is a slight kernel version variance as well:
>>>>>
>>>>> Xenial:
>>>>> May 22 15:00:27 aurorus kernel: [ 0.000000] Linux version
>>>>> 4.15.0-22-generic (buildd@bos02-arm64-038) (gcc version 5.4.0 20160609
>>>>> (Ubuntu/Lin
>>>>>
>>>>> Bionic:
>>>>> May 17 18:03:47 aurorus kernel: [ 0.000000] Linux version
>>>>> 4.15.0-20-generic (buildd@bos02-arm64-029) (gcc version 7.3.0
>>>>> (Ubuntu/Linaro 7.3.
>>>>>
>>>>> Looks like Xenial does not use unified cgroup namespaces, not sure
>>>>> what affect this may have on what's running in those lxd juju
>>>>> containers.
>>>>>
>>>>> % grep DENIED *.log
>>>>> bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592228] audit:
>>>>> type=1400 audit(1526581173.043:70): apparmor="DENIED"
>>>>> operation="mount" info="failed flags match" error=-13
>>>>> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
>>>>> name="/sys/fs/cgroup/unified/" pid=24143 comm="systemd"
>>>>> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
>>>>> bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592476] audit:
>>>>> type=1400 audit(1526581173.043:71): apparmor="DENIED"
>>>>> operation="mount" info="failed flags match" error=-13
>>>>> profile="lxd-juju-657fe9-1-lxd-1_</var/lib/lxd>"
>>>>> name="/sys/fs/cgroup/unified/" pid=24143 comm="systemd"
>>>>> fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
>>>>> bionic.log:May 17 18:19:41 aurorus kernel: [ 991.818402] audit:
>>>>> type=1400 audit(1526581181.267:88): apparmor="DENIED"
>>>>> operation="mount" info="failed f...

Revision history for this message
Ryan Beisner (1chb1n) wrote : Re: libvirtError: Node device not found: no node device with matching name

If this is a bug on the OpenStack side, it's not in the charm. It would be in nova proper.

Changed in charm-nova-compute:
status: New → Opinion
Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name
Download full text (12.9 KiB)

After comparing the sysfs data, I don't see any differences w.r.t the
physical paths in sysfs for the thunder nic.

I wonder if there is something that detects "xenial" and does one
thing, vs "bionic" despite the xenial host using the same kernel
level.
The apparmor denied on the namespaces only shows up under bionic but
both kernels are the same level, so we should be seeing the
same errors if both stacks were use the same cgroups.

Can we check charms, juju or lxd w.r.t how it those cgroups are
mounted? That may not be related but we're running out of
differences.

On Tue, May 22, 2018 at 9:21 PM, Jason Hobbs <email address hidden> wrote:
> ls -alR /sys on bionic http://paste.ubuntu.com/p/nrxyRGP3By/
>
> The bionic kernel has also bumped:
> Linux aurorus 4.15.0-22-generic #24-Ubuntu SMP Wed May 16 12:14:36 UTC
> 2018 aarch64 aarch64 aarch64 GNU/Linux
>
> On Tue, May 22, 2018 at 7:10 PM, Ryan Harper <email address hidden> wrote:
>> Looks like the ls -aLR contains more data; we can compare bionic.
>>
>> On Tue, May 22, 2018 at 6:53 PM, Jason Hobbs <email address hidden> wrote:
>>> cd /sys/bus/pci/devices && grep -nr . *
>>>
>>> xenial:
>>> http://paste.ubuntu.com/p/F5qyvN2Qrr/
>>>
>>> On Tue, May 22, 2018 at 5:27 PM, Jason Hobbs <email address hidden> wrote:
>>>> Do you really want a tar? How about ls -alR? xenial:
>>>>
>>>> http://paste.ubuntu.com/p/wyQ3kTsyBB/
>>>>
>>>> On Tue, May 22, 2018 at 5:14 PM, Jason Hobbs <email address hidden> wrote:
>>>>> ok; looks like that 4.15.0-22-generic just released and wasn't what I
>>>>> used in the first reproduction... I doubt that's it.
>>>>>
>>>>> On Tue, May 22, 2018 at 4:58 PM, Ryan Harper <email address hidden> wrote:
>>>>>> Comparing the kernel logs, on Xenial, the second nic comes up:
>>>>>>
>>>>>> May 22 15:00:27 aurorus kernel: [ 24.840500] IPv6:
>>>>>> ADDRCONF(NETDEV_UP): enP2p1s0f2: link is not ready
>>>>>> May 22 15:00:27 aurorus kernel: [ 25.472391] thunder-nicvf
>>>>>> 0002:01:00.2 enP2p1s0f2: Link is Up 10000 Mbps Full duplex
>>>>>>
>>>>>> But on bionic, we only ever have f3 up. Note this isn't a network
>>>>>> configuration, but rather the state of the Nic and the switch.
>>>>>> It doesn't appear to matter, 0f3 is what get's bridged by juju anyhow.
>>>>>> But it does suggest that something is different.
>>>>>>
>>>>>> There is a slight kernel version variance as well:
>>>>>>
>>>>>> Xenial:
>>>>>> May 22 15:00:27 aurorus kernel: [ 0.000000] Linux version
>>>>>> 4.15.0-22-generic (buildd@bos02-arm64-038) (gcc version 5.4.0 20160609
>>>>>> (Ubuntu/Lin
>>>>>>
>>>>>> Bionic:
>>>>>> May 17 18:03:47 aurorus kernel: [ 0.000000] Linux version
>>>>>> 4.15.0-20-generic (buildd@bos02-arm64-029) (gcc version 7.3.0
>>>>>> (Ubuntu/Linaro 7.3.
>>>>>>
>>>>>> Looks like Xenial does not use unified cgroup namespaces, not sure
>>>>>> what affect this may have on what's running in those lxd juju
>>>>>> containers.
>>>>>>
>>>>>> % grep DENIED *.log
>>>>>> bionic.log:May 17 18:19:33 aurorus kernel: [ 983.592228] audit:
>>>>>> type=1400 audit(1526581173.043:70): apparmor="DENIED"
>>>>>> operation="mount" info="failed flags match" error=-13
>>>>>> profile="lxd-juju-...

Revision history for this message
Chris Gregan (cgregan) wrote : Re: libvirtError: Node device not found: no node device with matching name

This defect seems to have stalled somewhat. Is there more information we can gather for this to move forward again?

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

I'm not certain we can rule out the charm; the observant behavior is
that the compute nodes do not get enrolled.
Certainly the lack of a nova-compute node being registered has some
touch point to the charms.
The follow-up I think comes from the Openstack team to walk through
where the charm leaves off with the nova-compute package
and then how nova-compute interacts with libvirt and what ultimately
triggers the registration of a compute node with the cloud.

Christian and myself have looked at the logs, and while libvirt and
nova-compute are noisy w.r.t the virtual functions,
the node does not appear to be prevented from launching a guest, but
that could be confirmed to help rule out where
the failure to register the compute node is happening.

@Beisner thoughts?

On Tue, May 29, 2018 at 3:14 PM, Chris Gregan
<email address hidden> wrote:
> This defect seems to have stalled somewhat. Is there more information we
> can gather for this to move forward again?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1771662
>
> Title:
> libvirtError: Node device not found: no node device with matching name
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charm-nova-compute/+bug/1771662/+subscriptions

Revision history for this message
Ryan Beisner (1chb1n) wrote : Re: libvirtError: Node device not found: no node device with matching name

In order to make progress from the charm front, I would need access to at least one machine with the hardware which is specific to this bug, plus two adjacent machines for control/data plane. Can we arrange that access for openstack charms engineering?

Changed in charm-nova-compute:
status: Opinion → Incomplete
Revision history for this message
Ryan Beisner (1chb1n) wrote :

@raharper - I concur, there is a workflow gap in the nova-compute charm with regard to hypervisor registration success with nova, and I've raised a separate bug to address that generically. However, that won't fix this bug, it will just make it more visible by blocking the juju charm unit and juju charm application states.

https://bugs.launchpad.net/charm-nova-compute/+bug/1775690

Revision history for this message
Chris Gregan (cgregan) wrote :

Escalated due to delay in triage and fix given our contract with ARM

Revision history for this message
David Britton (dpb) wrote :

Incomplete in libvirt pending debug from live system by openstack team.

Changed in libvirt (Ubuntu):
status: New → Incomplete
Revision history for this message
Ryan Beisner (1chb1n) wrote :

To be clear, on our lab machines (gigabyte arm64), we don't observe this issue with Bionic + Queens, hence the request to try to triage on the specific kit involved. Thanks!

Revision history for this message
Frode Nordahl (fnordahl) wrote :
Download full text (8.4 KiB)

1) The 'No compute node record for host phanpy: ComputeHostNotFound_Remote: Compute host phanpy could not be found.' message is benign, this message appears on first start of the `nova-compute` service. It keeps appearing in the log here due to failure to register available resources. See 3)

2) Technically, the compute hosts are partially registered with `nova`:
$ nova service-list
+----+----------------+---------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+----------------+---------------------+----------+---------+-------+----------------------------+-----------------+
| 1 | nova-conductor | juju-302a0a-2-lxd-2 | internal | enabled | up | 2018-06-29T10:28:00.000000 | - |
| 14 | nova-scheduler | juju-302a0a-2-lxd-2 | internal | enabled | up | 2018-06-29T10:28:01.000000 | - |
| 15 | nova-compute | phanpy | nova | enabled | up | 2018-06-29T10:28:01.000000 | - |
| 16 | nova-compute | aurorus | nova | enabled | up | 2018-06-29T10:28:05.000000 | - |
| 26 | nova-compute | zygarde | nova | enabled | up | 2018-06-29T10:28:05.000000 | - |
+----+----------------+---------------------+----------+---------+-------+----------------------------+-----------------+

3) However the compute hosts does not have any resources. The reason for no resources appearing in `nova` is that `nova-compute` service hits a TraceBack during initial host registration:

2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager Traceback (most recent call last):
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 7277, in update_available_resource_for_node
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager rt.update_available_resource(context, nodename)
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 664, in update_available_resource
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6438, in get_available_resource
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager self._get_pci_passthrough_devices()
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5945, in _get_pci_passthrough_devices
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager pci_info.append(self._get_pcidev_info(name))
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5906, in _get_pcidev_info
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager device.update(_get_device_capabilities(device, address))
2018-06-29 06:25:57.161 35528 ERROR nova....

Read more...

Revision history for this message
Frode Nordahl (fnordahl) wrote :
Changed in charm-nova-compute:
status: Incomplete → Invalid
Ryan Beisner (1chb1n)
Changed in charm-nova-compute:
assignee: nobody → Frode Nordahl (fnordahl)
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Executive summary for kernel team:
What makes both libvirt and Nova unhappy about the Cavium Thunder X NIC is the fact that they are denied with "Operation not supported" when attempting to read from sysfs node phys_port_id from its virtual functions.

Example:
'/sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.0/0003:09:00.0/net/enP3p9s0f0/phys_port_id': Operation not supported

Ryan Beisner (1chb1n)
summary: - libvirtError: Node device not found: no node device with matching name
+ [bionic] libvirtError: Node device not found: no node device with
+ matching name
Revision history for this message
Paolo Pisati (p-pisati) wrote :

Actually the -EOPNOTSUPP error is the default behaviour, unless your driver implements the .ndo_get_phys_port_id() callback, and at the moment (4.18-rc3) only 7 drivers (out of several hundreds) implement that:

linux$ grep -ri do_get_phys_port_id drivers/net/
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: .ndo_get_phys_port_id = cxgb4_mgmt_get_phys_port_id,
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c: .ndo_get_phys_port_id = bnx2x_get_phys_port_id,
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: .ndo_get_phys_port_id = mlx4_en_get_phys_port_id,
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: .ndo_get_phys_port_id = mlx4_en_get_phys_port_id,
drivers/net/ethernet/sfc/efx.c: .ndo_get_phys_port_id = efx_get_phys_port_id,
drivers/net/ethernet/emulex/benet/be_main.c: .ndo_get_phys_port_id = be_get_phys_port_id,
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c: .ndo_get_phys_port_id = qlcnic_get_phys_port_id,
drivers/net/ethernet/intel/i40e/i40e_main.c: .ndo_get_phys_port_id = i40e_get_phys_port_id,

or you fall back to the "default behaviour" in net/core/dev.c:

/**
 * dev_get_phys_port_id - Get device physical port ID
 * @dev: device
 * @ppid: port ID
 *
 * Get device physical port ID
 */
int dev_get_phys_port_id(struct net_device *dev,
                         struct netdev_phys_item_id *ppid)
{
        const struct net_device_ops *ops = dev->netdev_ops;

        if (!ops->ndo_get_phys_port_id)
                return -EOPNOTSUPP;
        return ops->ndo_get_phys_port_id(dev, ppid);
}
EXPORT_SYMBOL(dev_get_phys_port_id);

Revision history for this message
Frode Nordahl (fnordahl) wrote :
Download full text (3.3 KiB)

That is interesting indeed.

The difference being on other systems the virtual functions are by default disabled which I guess is why no one is running into this problem with other hardware.

An example from a system with the ixgbe driver:
# cat /sys/devices/pci0000\:00/0000\:00\:03.0/0000\:01\:00.1/sriov_totalvfs
63
# cat /sys/devices/pci0000\:00/0000\:00\:03.0/0000\:01\:00.1/sriov_numvfs
0
# ls -l /sys/devices/pci0000\:00/0000\:00\:03.0/0000\:01\:00.1/virtfn*
ls: cannot access '/sys/devices/pci0000:00/0000:00:03.0/0000:01:00.1/virtfn*': No such file or directory

While on a system with the Cavium Thunder X card:
 cat /sys/devices/pci0002\:00/0002\:00\:02.0/0002\:01\:00.0/sriov_totalvfs
128
# cat /sys/devices/pci0002\:00/0002\:00\:02.0/0002\:01\:00.0/sriov_numvfs
18
# ls -l /sys/devices/pci0002\:00/0002\:00\:02.0/0002\:01\:00.0/virtfn*
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn0 -> ../0002:01:00.1
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn1 -> ../0002:01:00.2
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn10 -> ../0002:01:01.3
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn11 -> ../0002:01:01.4
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn12 -> ../0002:01:01.5
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn13 -> ../0002:01:01.6
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn14 -> ../0002:01:01.7
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn15 -> ../0002:01:02.0
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn16 -> ../0002:01:02.1
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn17 -> ../0002:01:02.2
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn2 -> ../0002:01:00.3
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn3 -> ../0002:01:00.4
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn4 -> ../0002:01:00.5
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn5 -> ../0002:01:00.6
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn6 -> ../0002:01:00.7
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn7 -> ../0002:01:01.0
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn8 -> ../0002:01:01.1
lrwxrwxrwx 1 root root 0 Jun 29 08:24 /sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn9 -> ../0002:01:01.2
# cat /sys/devices/pci0002\:00/0002\:00\:02.0/0002\:01\:00.0/virtfn0/net/enP2p1s0f1/phys_port_id
cat: '/sys/devices/pci0002:00/0002:00:02.0/0002:01:00.0/virtfn0/net/enP2p1s0f1/phys_port_id': Operation not supported

And this is on by default, without a operator having...

Read more...

Revision history for this message
mahmoh (mahmoh) wrote :

Hey guys,

This looks a lot like a problem Linaro reported [1] and if so it's a Cavium driver bug that I believe they're working on.

Have you tried this on any other Arm hardware platform to see if you hit the same problem?

[1] https://bugs.linaro.org/show_bug.cgi?id=3778

Thank you.

Revision history for this message
James Page (james-page) wrote :

@mahmoh

Thanks for the reference - yes we believe this is the same issue and we're not seeing it on other platforms.

Revision history for this message
dann frazier (dannf) wrote :

I tracked down why libvirt generates the message "internal error: The PF device for VF XXX has no network device name" on startup when built in bionic, but not in xenial. When populating device capabilities, we see:

virNetDevGetFeatures():
  virNetDevSwitchdevFeature():
    virNetDevGetPhysicalFunction()

However, virNetDevSwitchdevFeature() is stubbed out at build time unless HAVE_DECL_DEVLINK_CMD_ESWITCH_GET is defined. In bionic, this is defined in /usr/include/linux/devlink.h, which didn't exist in xenial.

Since all of our OpenStack/arm64 testing on bionic is blocked because our test systems all happen to be impacted by https://bugs.linaro.org/show_bug.cgi?id=3778 , I'm wondering if there's some kind of temporary hack we can carry to detect these devices, disable some set of (currently broken) features, and allow our testing to proceed until this problem is addressed upstream.

My understanding is that our testing succeeds with xenial, but fails with bionic, while the source version of libvirt remains constant. I therefore wonder if virNetDevSwitchdevFeature() is the (only) thing causing this to escalate to nova failure. In that case, could we e.g. compare vendor/device ids, and add a hack to return 0 if they match?

I've pushed a libvirt build to ppa:dannf/test with such a hack, if someone w/ a full arm64 openstack setup can try it to see if it would be able to unblock us.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Nice find Dann!

There are examples of excluding some devices of that check already.
E.g. in libvirt 4.1 non-PCI devices were excluded [1]

I like your change even being only a workaround so far.
Linaro has adopted a much more aggressive temp fix [2] with more compatibility issues.
The only thing that bothers me is that if there is a fix to something else than libvirt (e.g. the cavium kernel driver) then the workaround will block the usage still.

The test on the ppa is great, but eventually I wonder if we could make this part of one of the conffiles and use like virConfGetValueStringList [3] which might by default in the config have 0x177d:0xa034 but entries could be added/removed by an administrator.
It might even default to an empty list if that is more acceptable upstream, but allow installations to mask broken devices as needed.
Unfortunately none of the existing configs is used in the scope that we'd need it, which implies it would likely be a new config file [4] that is needed.
There is a lot of the usual overhead (check paths, permissions, ...) to be added just for that, but maybe it would make the hack upstreamable.

Hmm, OTOH maybe it would be over-engineering and we just use the simple change you suggested which would declare this network card not supporting switchdev offloads (even if fixed int he kernel driver, it is unlikely to reach Bionic trivially other than maybe HWE kernels)

But for now lets see what result the test on your ppa delivers

[1]: https://libvirt.org/git/?p=libvirt.git;a=commit;h=71d56a397925a1bd55d3aee30afdbdcd1a14f9a8
[2]: https://git.linaro.org/people/radoslaw.biernacki/libvirt.git/commit/?h=wip_thunder_fix&id=da79ade2f18bec11d1436dc12980f32b12fbad3c
[3]: https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virconf.c;h=e0a3fd12c04f9df0ae2a3a7054292f1093ab8693;hb=HEAD#l936
[4]: https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virconf.c;h=e0a3fd12c04f9df0ae2a3a7054292f1093ab8693;hb=HEAD#l746

Revision history for this message
Andrew McLeod (admcleod) wrote :

Have tested libvirt-bin from dann's PPA re comment 45 with partial success.

This results in hypervisors being listed via openstack hypervisor list, and instances can be launched.

https://pastebin.canonical.com/p/pDDmYQsvSr/

However, the instance build is not really successful - there is no network connectivity to the tenant network, as the guest instance just drops to an initramfs prompt.

https://pastebin.canonical.com/p/N2JQPvfsGv/

Revision history for this message
dann frazier (dannf) wrote :

@Andrew: thanks. Hard to tell what the root cause is there - might need more logs. The only obvious concern I see here is:

[ 0.062805] acpi PNP0A08:00: Bus 0000:00 not present in PCI namespace

Would you mind trying a newer guest (bionic/cosmic) and seeing if that is any better? The xenial GA kernel is lacking a lot of ACPI support, and maybe libvirt has grown to expect more.

Revision history for this message
Andrew McLeod (admcleod) wrote :

@Dann - good news, I am able to deploy a bionic guest using the non-uefi image (still tagged as such) and connect to it with bionic-rocky:

Result of running uname -a on 10.245.172.3: Linux bionic-101103 4.15.0-34-generic #37-Ubuntu SMP Mon Aug 27 15:22:18 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux

Will also now validate on bionic-queens

Revision history for this message
dann frazier (dannf) wrote :

Now that testing is looking good, here's a cleaned-up debdiff that is hopefully more suitable for carrying. It is as used in the cosmic build in ppa:dannf/test. This is still just the simple hack though (vs. a new config) - just with improved function name/return type and w/ a DEP-3 header.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Nice, did you intend to propose that upstream?
Or was the intention to carry this sort of forever in Ubuntu only?

The Linaro bug was sort of "we keep it for ourselve" I wonder if we should try to get upstream feeback?

Revision history for this message
dann frazier (dannf) wrote :

My original intent was to just carry it until there's a proper fix upstream, and we can evaluate that for backporting. But, if you think generic black-listing is a desired feature upstream (but w/ e.g. configurable ids), I can take a look at implementing that. However, that'll take some time, so I'd still be in favor of merging the existing patch to unblock running Ubuntu OpenStack on these systems in the short term.

tags: added: patch
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'd not go the "implement the perfect thing" route without strictly requiring it.

I only meant to get upstreams thoughts on it.
They don't have to accept it "that way" to consider it for us as a fix.
But suggesting it there on the old thread [1] that is dead since then would help.

I would not expect they take it as-is but who knows, and vice versa they might spot an issue doing it that way in reviewing it.

No matter what submitting it upstream would help to ensure the patch is good and to know if we need more (or not) to eventually bring it upstream.
Please CC me on that upstream submission so I can stay in the loop.

FYI: I'm out a week after today, but there is no libvirt upload in flight for cosmic. I checked the PPA [2] and found no issues with it (and it should be the same for 18.10, although I haven't tested it), so after you got a generally positive upstream reply feel free to upload to Cosmic next week if you want.
Otherwise I'll take a look when I'm back.

[1]: https://www.redhat.com/archives/libvir-list/2018-March/msg01383.html
[2]: https://launchpad.net/~dannf/+archive/ubuntu/test

Revision history for this message
Radosław Biernacki (rbiernacki) wrote :
Download full text (3.9 KiB)

As this is my first post, I would like to say Hello to everybody.
Seems that you are trying to fix the same issue as we in Cavium/Linaro [1] and therefore I would like to share with you with some findings, which I hope you will find helpful. It's a rather long explanation ;)

So first of all, this issue is triggered by one of ThunderX SOC devices, VNIC network device to be exact. It's PF does not expose netdev which makes trouble for libvirt code.

It means there is no interface name at which PF device can be found under /sys or pointed out using RTM_SETLINK message (there are some alternatives to that but more over this latter).
From information which I have at hand, there is no requirement from SRIOV standard (nor any other) that PF has to have the same functionality as their VF, by what IMHO all assumptions in libvirt code about VF->PF->VF mapping using interface names are false. Basically that's why the proper fix for that is hard as it requires a lot of rework in virtnetdev layer of libvirt. Someone might argue that Intel cards expose such interfaces but this does not mean that libvirt should assume that it is a "standard" behavior.

I started to working on those [2] but ended up with quite invasive fixes which for sure requires some discussion on libvirt-dev list (comments for [2] more than welcome, I will also start RFC on libvirt-dev for that).

The simple fix for this issue should:
- suppress the initialization error (more less this is what Dann Frazier does in his patch [3] or I'm doing in first patch of the series of [2])
- fix some NULL reference bugs in libvirt (third and fourth patch in the series [2]) so libvirt will not crash in case of <interface> config is given for VNIC VF netdev, but just throw an error, that due to missing HW support it is not able to configure the interface (I didn't introduced proper error messages yet in my patch set).

That's the fast fix which I should be able to extract from my fixes quite easily and it should be upstreamable.

---

Beside the libvirt, the for full fix the ThunderX VNIC kernel driver need to be enhanced, as it currently does not support VLAN's and VF MAC setting, which is essentially needed to make ThunderX VNIC a fully supported device under libvirt. In fact this is the major part of the work.

The true fix need:
- fix libvirt wrong assumptions about SRIOV netdevs and handling of VF->PF->VF mapping by netdev names (instead pci BDF should be used in whole virtnetdev layer or some non-netdev name, just generic dev related addressing scheme)
- fix libvirt PF VLAN and MAC handling code to use global port number instead of PF name for RTM_SETLINK
- ThunderX VNIC driver need to support VLAN's per VF (MCAM) possibly with dynamic VF creation as well as switchdev functionality (currently VF are created basing on active port count read from BGX)

---

@dann I understand the idea in your patch (filtering out the incompatible NIC's), but can You take a look at [4]?

Not sure what was exactly the intention of the code inside virNetDevSwitchdevFeature(), but if you look at following line:
pci_device_ptr = pfname ? virNetDevGetPCIDevice(pfname) : virNetDevGetPCIDevice(ifname);
It might suggest th...

Read more...

Revision history for this message
dann frazier (dannf) wrote :

@rbiernacki: Thanks for commenting here - and sorry for my delayed response as I was out on PTO. While I haven't tested it in our setup, I agree that your [4] is likely a better/more generic "fast fix" solution than what I prepared. Feel free to CC me on your RFC.

Revision history for this message
dann frazier (dannf) wrote :

@rbiernacki I just wanted to check - do you still have plans to propose your "fast fix" upstream?

Revision history for this message
Radosław Biernacki (rbiernacki) wrote :

Hi Dann, just sent the patches.
I decided also to share some fixes but the first one in series is the one you looking for.

Revision history for this message
dann frazier (dannf) wrote :

Thanks Radoslaw! I've created PPAs for the rocky & queens ubuntu cloud archives w/ your v2 patches integrated for testing.

ppa:dannf/queens-arm64
ppa:dannf/rocky-arm64

I plan to keep this up to date as your patch set iterates, as well as rebasing on latest QEMU until merged in mainline and backported to Ubuntu.

Revision history for this message
Radosław Biernacki (rbiernacki) wrote :

Thank you Dann.
Those fixes should unblock startup of libvirt on ThunderX. Keep in mind that only hostdev config will work on this platform as <interface type="hostdev"> is not supported.

<hostdev mode="subsystem" type="pci" managed="yes">
<driver name="vfio"/>
<source>
<address type="pci" domain="0x0002" bus="0x1" slot="0x0" function="0x2"/>
</source>
</hostdev>

In case you find and issues keep me informed.
I will send v3. Changes for v3 are about where to report error and should not influence on functionality.

dann frazier (dannf)
Changed in libvirt (Ubuntu):
status: Incomplete → In Progress
Revision history for this message
dann frazier (dannf) wrote :

Fixes have now landed upstream:

04983c3c6a util: Fixing invalid error checking from virPCIGetNetname()
8fac64db5e util: Fix for NULL dereference
10bca495e0 util: Code simplification
6452e2f5e1 util: fixing wrong assumption that PF has to have netdev assigned

Changed in libvirt (Ubuntu Disco):
status: In Progress → Triaged
Changed in nova:
status: New → Invalid
Changed in libvirt (Ubuntu Cosmic):
status: New → Triaged
Changed in libvirt (Ubuntu Bionic):
status: New → Triaged
tags: added: libvirt-19.04
Revision history for this message
Radosław Biernacki (rbiernacki) wrote :

Thank you Dann for finishing this!

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (11.2 KiB)

This bug was fixed in the package libvirt - 5.0.0-1ubuntu1

---------------
libvirt (5.0.0-1ubuntu1) disco; urgency=medium

  * Merged with Debian unstable
    Among many other new features and fixes this includes fixes for:
    LP: #1754871 - 1799446 zPCI passthrough support for KVM
    LP: #1811198 - remove arbitrary limit on socket_id/core_id
    Remaining changes:
    - Disable libssh2 support (universe dependency)
    - Disable firewalld support (universe dependency)
    - Set qemu-group to kvm (for compat with older ubuntu)
    - Additional apport package-hook
    - Autostart default bridged network (As upstream does, but not Debian).
      In addition to just enabling it our solution provides:
      + do not autostart if subnet is already taken (e.g. in guests).
      + iterate some alternative subnets before giving up
    - d/p/ubuntu/Allow-libvirt-group-to-access-the-socket.patch: This is
      the group based access to libvirt functions as it was used in Ubuntu
      for quite long.
      + d/p/ubuntu/daemon-augeas-fix-expected.patch fix some related tests
        due to the group access change.
      + d/libvirt-daemon-system.postinst: add users in sudo to the libvirt
        group.
    - ubuntu/parallel-shutdown.patch: set parallel shutdown by default.
    - Update Vcs-Git and Vcs-Browser fields to point to launchpad
    - Xen related
      - d/p/ubuntu/ubuntu-libxl-qemu-path.patch: this change was split. The
        section that adapts the path of the emulator to the Debian/Ubuntu
        packaging is kept.
      - d/p/ubuntu/ubuntu-libxl-Fix-up-VRAM-to-minimum-requirements.patch: auto
        set VRAM to minimum requirements
      - d/p/ubuntu/xen-default-uri.patch: set default URI on xen hosts
      - Add libxl log directory
      - libvirt-uri.sh: Automatically switch default libvirt URI for users on
        Xen dom0 via user profile (was missing on changelogs before)
    - d/p/ubuntu/apibuild-skip-libvirt-common.h: drop libvirt-common.h from
      included_files to avoid build failures due to duplicate definitions.
    - Update README.Debian with Ubuntu changes
    - Enable some additional features on ppc64el and s390x (for arch parity)
      + systemtap, zfs, numa and numad on s390x.
      + systemtap on ppc64el.
    - d/t/control, d/t/smoke-qemu-session: fixup smoke-qemu-session by making
      vmlinuz available and accessible (Debian bug 848314)
    - d/t/control, d/t/smoke-lxc: fix up lxc smoke test isolation
    - d/p/ubuntu/ubuntu_machine_type.patch: accept ubuntu types as pci440fx
    - Further upstreamed apparmor Delta, especially any new one
      Our former delta is split into logical pieces and is either Ubuntu only
      or is part of a continuous upstreaming effort.
      Listing related remaining changes in debian/patches/ubuntu-aa/:
      + 0001-apparmor-Allow-pygrub-to-run-on-Debian-Ubuntu.patch: apparmor:
        Allow pygrub to run on Debian/Ubuntu
      + 0003-apparmor-libvirt-qemu-Allow-read-access-to-overcommi.patch:
        apparmor, libvirt-qemu: Allow read access to overcommit_memory
      + 0007-apparmor-libvirt-qemu-Allow-owner-read-access-to-PRO.patch:
        apparmor, libvirt-qemu: Allow owner rea...

Changed in libvirt (Ubuntu Disco):
status: Triaged → Fix Released
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - the fix for this bug as an SRUs is ready and testable from a PPA for
Cosmic [1] and Bionic [2].

Since the verification of this bug requires special hardware, I'd appreciate if you could precheck these PPAs if they fix the issues. That would ensure that:
a) the fix is most likely to work when pushed as SRU
b) our plan to verify the actual by your testing SRU will work

In addition I'll push these PPAs through the automated regression tests for qemu/libvirt.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3620
[2]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3621

Revision history for this message
dann frazier (dannf) wrote :

@Christian: both verify fine for me:

$ dpkg -s libvirt-daemon-system | grep ^Version
Version: 4.6.0-2ubuntu3.3~ppa1
$ sudo virsh nodedev-list | grep ^net
net_enP2p1s0f1_40_8d_5c_b1_e4_44
net_enP2p1s0f2_40_8d_5c_b1_e4_45
net_enP2p1s0f3_40_8d_5c_b1_e4_46

$ dpkg -s libvirt-daemon-system | grep ^Version
Version: 4.0.0-1ubuntu8.7~ppa1
$ sudo virsh nodedev-list | grep ^net
net_enP2p1s0f1_42_ca_74_64_88_75
net_enP2p1s0f2_42_ca_74_64_88_76
net_enP2p1s0f3_42_ca_74_64_88_77

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks in advance for all the help to everybody involved!
All prechecks ready and uploaded to the SRU queue waiting for approval.

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Jason, or anyone else affected,

Accepted libvirt into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/4.6.0-2ubuntu3.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in libvirt (Ubuntu Cosmic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-cosmic
Changed in libvirt (Ubuntu Bionic):
status: Triaged → Fix Committed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Jason, or anyone else affected,

Accepted libvirt into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/4.0.0-1ubuntu8.7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-needed-bionic
Revision history for this message
dann frazier (dannf) wrote :

bionic verification:
ubuntu@seidel-FLAKYMEMORY:~$ dpkg -s libvirt-daemon-system | grep ^Version
Version: 4.0.0-1ubuntu8.7
ubuntu@seidel-FLAKYMEMORY:~$ sudo virsh nodedev-list | grep ^net
net_enP2p1s0f1_1c_1b_0d_0d_52_d6
net_enP2p1s0f2_1c_1b_0d_0d_52_d7
net_enP2p1s0f3_1c_1b_0d_0d_52_d8
net_enP2p1s0f4_1c_1b_0d_0d_52_d9
net_enP2p1s0f5_1c_1b_0d_0d_52_da

cosmic verification:
ubuntu@seuss-FLAKYMEMORY:~$ dpkg -s libvirt-daemon-system | grep ^Version
Version: 4.6.0-2ubuntu3.3
ubuntu@seuss-FLAKYMEMORY:~$ sudo virsh nodedev-list | grep ^net
net_enP2p1s0f1_40_8d_5c_ba_cd_c4
net_enP2p1s0f2_40_8d_5c_ba_cd_c5
net_enP2p1s0f3_40_8d_5c_ba_cd_c6

tags: added: verification-done-bionic verification-done-cosmic
removed: verification-needed-bionic verification-needed-cosmic
tags: added: verification-done
removed: verification-needed
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for libvirt has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 4.6.0-2ubuntu3.3

---------------
libvirt (4.6.0-2ubuntu3.3) cosmic; urgency=medium

  * d/p/ubuntu/lp-1811198-utils-Remove-arbitrary-limit-on-socket_id-core_id
    .patch: fix arm servers with high core_id (LP: #1811198)
  * d/p/ubuntu/lp-1771662-*: fix assumption that all VFs have PFs assigned
    (LP: #1771662)

 -- Christian Ehrhardt <email address hidden> Thu, 31 Jan 2019 12:29:37 +0100

Changed in libvirt (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 4.0.0-1ubuntu8.7

---------------
libvirt (4.0.0-1ubuntu8.7) bionic; urgency=medium

  * d/p/ubuntu/lp-1811198-utils-Remove-arbitrary-limit-on-socket_id-core_id
    .patch: fix arm servers with high core_id (LP: #1811198)
  * d/p/ubuntu/lp-1771662-*: fix assumption that all VFs have PFs assigned
    (LP: #1771662)

 -- Christian Ehrhardt <email address hidden> Thu, 31 Jan 2019 12:45:18 +0100

Changed in libvirt (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.