Net tools cause kernel soft lockup after DPDK touched VirtIO-pci devices

Bug #1570195 reported by Thiago Martins on 2016-04-14
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
dpdk (Ubuntu)
Medium
Christian Ehrhardt 
Xenial
Undecided
Unassigned
linux (Ubuntu)
Medium
Unassigned
Xenial
Undecided
Unassigned

Bug Description

Guys,

 I'm facing an issue here with both "ethtool" and "ip", while trying to manage black-listed by DPDK PCI VirtIO devices.

 You'll need an Ubuntu Xenial KVM guest, with 4 VirtIO vNIC cards, to run those tests

 PCI device example from inside a Xenial guest:

---
# lspci | grep Ethernet
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
00:04.0 Ethernet controller: Red Hat, Inc Virtio network device
00:05.0 Ethernet controller: Red Hat, Inc Virtio network device
00:06.0 Ethernet controller: Red Hat, Inc Virtio network device
---

Where "ens3" is the first / default interface, attached to Libvirt's "default" network. The "ens4" is reserved for "ethtool / ip" tests (attached to another Libvirt's network without IPs or DHCP), "ens5" will be "dpdk0" and "ens6" "dpdk1"...

---
 *** How it works?

 1- For example, try to enable multi-queue on DPDK's devices, boot your Xenial guest, and run:

 ethtool -L ens5 combined 4
 ethtool -L ens6 combined 4

 2- Install openvswitch-switch-dpdk configure DPDK and OVS and fire it up.

 https://help.ubuntu.com/16.04/serverguide/DPDK.html

 service openvswitch-switch stop
 service dpdk stop

 OVS DPDK Options (/etc/default/openvswitch-switch):

--
DPDK_OPTS='--dpdk -c 0x1 -n 4 --socket-mem 1024 --pci-blacklist 0000:00:03.0,0000:00:04.0'
--

 service dpdk start
 service openvswitch-switch start

 - Enable multi-queue on OVS+DPDK inside of the VM:

 ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=4
 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xff00

 * Multi-queue apparently works! ovs-vswitchd consumes more that 100% of CPU, meaning that it multi-queue is there...

 *** Where it fails?

 1- Reboot the VM and try to run ethtool again (or go straight to 2 below):

 ethtool -L ens5 combined 4

 2- Try to fire up ens4:

 ip link set dev ens4 up

 # FAIL! Both commands hangs, consuming 100% of guest's CPU...

 So, it looks like a Linux fault, because it is "allowing" the DPDK VirtIO App (a user land App), to interfere with kernel devices in a strange way...

Best,
Thiago

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-18-generic 4.4.0-18.34
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Uname: Linux 4.4.0-18-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Apr 14 00:35 seq
 crw-rw---- 1 root audio 116, 33 Apr 14 00:35 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser'
CRDA: N/A
Date: Thu Apr 14 01:27:27 2016
HibernationDevice: RESUME=UUID=833e999c-e066-433c-b8a2-4324bb8d56de
InstallationDate: Installed on 2016-04-07 (7 days ago)
InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Beta amd64 (20160406)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
PciMultimedia:

ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=9911604e-353b-491f-a0a9-804724350592 ro
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-18-generic N/A
 linux-backports-modules-4.4.0-18-generic N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: Ubuntu-1.8.2-1ubuntu1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-wily
dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-wily:cvnQEMU:ct1:cvrpc-i440fx-wily:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-wily
dmi.sys.vendor: QEMU

Thiago Martins (martinx) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed

FYI - I ran out of time today, but since I work on DPDK anyway I'll try to reproduce this tomorrow morning.

Changed in dpdk (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → ChristianEhrhardt (paelzer)

Repro:
OVS-DPDK starting up seems fine initializing my non-blacklisted card
  DPDK_OPTS are '--dpdk -c 0x6 -n 4 --pci-blacklist 0000:00:03.0 -m 2048'
Allowing PMDs on two CPUs consuming 2G of huge pages

Before adding Ports config is done with
ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=2
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x6

Port is added like:
ovs-vsctl add-port ovsdpdkbr0 dpdk0 -- set Interface dpdk0 type=dpdk

Two PMDs are seen
dpif_netdev|INFO|Created 2 pmd threads on numa node 0
bridge|INFO|bridge ovsdpdkbr0: added interface dpdk0 on port 1
dpif_netdev(pmd12)|INFO|Core 2 processing port 'dpdk0'
dpif_netdev(pmd13)|INFO|Core 1 processing port 'dpdk0'
  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26062 root 10 -10 2816828 105612 16328 R 99.9 1.7 1:09.46 pmd12
26061 root 10 -10 2816828 105612 16328 R 99.9 1.7 1:09.42 pmd13

Now I should be in a similar state as you are.

I know the assumption so far was that the reboot (which resets the #queues on the device) might be involved.
But I first wanted to try what changing queues without reboot would do.

Even setting it down from 4 to 3 (remember I only use 2 actively) goes into the block.
ethtool -L eth1 combined 3
=> I can see a hang of the ethtool program

Good thing: at least it seems we can remove the reboot out of our thinking. Just changing #queues with (?OVS?)DPDK attached seems to be very unhappy.
Please note the discussion that could be related: http://dpdk.org/ml/archives/dev/2016-April/037443.html

For confirmation @Thiago - does for you also the ethtool program hang, or the full guest?

Next steps:
- gather debug data on hanging ethtool
- check what happens if we ethtool after we stopped OVS-DPDK (match upstream discussion)
- check what happens if we have testpmd enabled instead of openvswitch-dpdk

Download full text (3.8 KiB)

Appears running:
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 0 26330 26263 20 0 7588 980 - R+ pts/2 33:52 \_ ethtool -L eth1 combined 3

All that touches it seems to get affected, so e.g. a ltrace/strace get stuck as well.

Meanwhile the log on virsh console of the guest goes towards soft lockups:
[ 568.394870] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ethtool:26330]
[ 575.418868] INFO: rcu_sched self-detected stall on CPU
[ 575.419674] 0-...: (14999 ticks this GP) idle=66d/140000000000001/0 softirq=21127/21127 fqs=14994
[ 575.420779] (t=15000 jiffies g=11093 c=11092 q=9690)

More Info in the journal:
 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [ethtool:26330]
 Modules linked in: openvswitch nf_defrag_ipv6 nf_conntrack isofs ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul parport_pc parport joydev serio_raw iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd floppy
 CPU: 0 PID: 26330 Comm: ethtool Not tainted 4.4.0-18-generic #34-Ubuntu
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
 task: ffff8801b747d280 ti: ffff8800ba58c000 task.ti: ffff8800ba58c000
 RIP: 0010:[<ffffffff815f1a43>] [<ffffffff815f1a43>] virtnet_send_command+0xf3/0x150
 RSP: 0018:ffff8800ba58fb60 EFLAGS: 00000246
 RAX: 0000000000000000 RBX: ffff8800bba62840 RCX: ffff8801b64a9000
 RDX: 000000000000c010 RSI: ffff8800ba58fb64 RDI: ffff8800bba6c400
 RBP: ffff8800ba58fbf8 R08: 0000000000000004 R09: ffff8801b9001b00
 R10: ffff8801b671b080 R11: 0000000000000246 R12: 0000000000000002
 R13: ffff8800ba58fb88 R14: 0000000000000000 R15: 0000000000000004
 FS: 00007fb57d56c700(0000) GS:ffff8801bfc00000(0000) knlGS:0000000000000000
 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fb57cd7b680 CR3: 00000000ba85a000 CR4: 00000000001406f0
 Stack:
  ffff8800ba58fc28 ffffea0002ee9882 0000000200000940 0000000000000000
  0000000000000000 ffffea0002ee9882 0000000100000942 0000000000000000
  0000000000000000 ffff8800ba58fb68 ffff8800ba58fc10 ffff8800ba58fb88
 Call Trace:
  [<ffffffff815f1d9a>] virtnet_set_queues+0x9a/0x100
  [<ffffffff815f1e52>] virtnet_set_channels+0x52/0xa0
  [<ffffffff8171fc3c>] ethtool_set_channels+0xfc/0x140
  [<ffffffff81720afd>] dev_ethtool+0x40d/0x1d70
  [<ffffffff811cafc5>] ? page_add_file_rmap+0x25/0x60
  [<ffffffff8172f8d5>] ? __rtnl_unlock+0x15/0x20
  [<ffffffff8171ec61>] ? netdev_run_todo+0x61/0x320
  [<ffffffff8118d8a9>] ? unlock_page+0x69/0x70
  [<ffffffff81733b42>] dev_ioctl+0x182/0x580
  [<ffffffff811bf9f4>] ? handle_mm_fault+0xe44/0x1820
  [<ffffffff816fb932>] sock_do_ioctl+0x42/0x50
  [<ffffffff816fbe32>] sock_ioctl+0x1d2/0x290
  [<ffffffff8121ff9f>] do_vfs_ioctl+0x29f/0x490
  [<ffffffff8106b554>] ? __do_page_fault+0x1b4/0x400
  [<ffffffff81220209>] SyS_ioctl+0x79/0x90
  [<ffffffff818243b2>] entry_SYSCALL_64_fastpath+0x16/0x71
 Code: 44 89 e2 4c 89 6c c5 b0 e8 3b dc ec ff 48 8b 7b 08 e8 f2 db...

Read more...

I tested to change a device touched (initialized) by DPDK, but not yet on an Openvswitch bridge (no port added).
That hangs stalls as well, so it is not required to add it to openvswitch-dpdk.

Next step was to exclude openvswitch completely, therefor I ran testpmd against those ports.
  /usr/bin/testpmd --pci-blacklist 0000:00:03.0 --socket-mem 2048 -- --interactive --total-num-mbufs=2048

After the test I ran ethtool and boom, hangs again.
Test gets simpler and simpler.

I confirmed that before running testpmd I can change the number of used queues just fine.

This is not happeneing on a ixgbe device that formerly was used by DPDK.
But then, that device had to be bound to uio_pci_generic and rebound to be reachable by ethtool.

Now in the virtual environment force a reinit of the driver after DPDK used it

Reinitialize the driver by rebinding it
apt-get install linux-image-extra-virtual
/usr/bin/testpmd --pci-blacklist 0000:00:03.0 --socket-mem 2048 -- --interactive --total-num-mbufs=2048
dpdk_nic_bind -b uio_pci_generic 0000:00:04.0
dpdk_nic_bind -b virtio-pci 0000:00:04.0

We see the device "reinitialized" back on the virtio-pci driver.
It is also back to 1 of 4 queues being used (as after the reboot.

Now this works fine:
ethtool -L eth1 combined 4
ethtool -L eth1 combined 3
ethtool -L eth1 combined 4

summary: - Network tools like "ethtool" or "ip" freezes when DPDK Apps are running
- with VirtIO
+ Net tools cause kernel soft lockup after DPDK touched VirtIO-pci
+ devices

There is no obvious run-until-success loop in any of the involved code.
Only this in virtnet_send_command could be related
/* Spin for a response, the kick causes an ioport write, trapping
 * into the hypervisor, so the request should be handled immediately.
 */
while (!virtqueue_get_buf(vi->cvq, &tmp) &&
       !virtqueue_is_broken(vi->cvq))
       cpu_relax();

We need to catch who is calling whom and how often to get a better idea what is going on when going to get stuck.
Interesting are from the stack:

cpu_relax
virtnet_send_command
virtnet_set_queues
virtnet_set_channels
ethtool_set_channels
dev_ethtool

cd /sys/kernel/debug/tracing
echo 0 > tracing_on
echo function_graph > current_tracer
tail -f trace
# get global and one on each of our 4 CPUs from trace and per_cpu/cpu[0-3]/trace
echo 1 > tracing_on
ethtool -L eth1 combined 3

The system is stuck enough that all hang immediately without reporting.
Need to go deeper with debugging, but that is probably monday then.

Changed in linux (Ubuntu):
importance: Undecided → Medium

Other than in the upstream discussion I linked above around a similar - but it seems not related - issue in our case interrupts, memory, and such from lspci and /proc/interrupts stay just "as-is".
No change due to running dpdk on that device.

I'd not even consider it all too broken if the tools say "go away I'm broken" until the device was reinitialized by e.g. the driver reload.
But the hang is too severe.

Since ftrace failed me I switched to gdb via the qemu -s parameter.

Debuginfo and source of guest kernel on the Host:
sudo apt-get install linux-tools-4.4.0-18-dbgsym
sudo pull-lp-source linux 4.4.0-18.34
sudo mkdir -p /build/linux-XwpX40; sudo ln -s /home/ubuntu/linux-4.4.0 /build/linux-XwpX40/linux-4.4.0

Edit that into the guest and restart:
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <qemu:commandline>
    <qemu:arg value='-s'/>
  </qemu:commandline>
gdb /usr/lib/debug/boot/vmlinux-4.4.0-18-generic

b dev_ethtool
b ethtool_set_channels
b virtnet_set_channels
b virtnet_set_queues

Then on the guest run
sudo /usr/bin/testpmd --pci-blacklist 0000:00:03.0 --socket-mem 2048 -- --interactive --total-num-mbufs=2048

Attach gdb with
target remote :1234

Then on the guest trigger the bug
sudo ethtool -L eth1 combined 3

It is really "hanging" on that virtnet_send_command called from there.
As expected the loop never breaks.

1010 /* Spin for a response, the kick causes an ioport write, trapping
1011 * into the hypervisor, so the request should be handled immediately.
1012 */
1013 while (!virtqueue_get_buf(vi->cvq, &tmp) &&
1014 !virtqueue_is_broken(vi->cvq))
1015 cpu_relax();
1016
1017 return vi->ctrl_status == VIRTIO_NET_OK;
(gdb) n
1014 !virtqueue_is_broken(vi->cvq))
(gdb)
1013 while (!virtqueue_get_buf(vi->cvq, &tmp) &&
(gdb)
1015 cpu_relax();
[...]
Infinite loop.

Breaking on the two check functions and the calling one to see where things break:

b virtnet_send_command
# virtqueue_get_buf gets hit by __do_softirq -> napi_poll -> virtnet_poll -> virtnet_receive -> virtqueue_get_buf all the time.
Need to keep that disabled and step INTO from virtnet_send_command.
b virtqueue_get_buf
b virtqueue_is_broken

Here is what we see in the two checkers then
virtqueue_get_buf (_vq=0xffff8801b6b7d000, len=0xffff8801b7f17b64) at /build/linux-XwpX40/linux-4.4.0/drivers/virtio/virtio_ring.c:478
p *(_vq)
$12 = {list = {next = 0xffff8801b69c8b00, prev = 0xffff8801b640d000}, callback = 0x0 <irq_stack_union>, name = 0xffffffff81d094d7 "control", vdev = 0xffff8801b69c8800,
  index = 8, num_free = 63, priv = 0x1c010}

 if (unlikely(!vq->data[i])) {
         BAD_RING(vq, "id %u is not a head!\n", i);
         return NULL;
 }
 ret = vq->data[i];
 [...]
 return ret;
So this should for sure be valid when returning or we would see the BAD_RING.
But then it is looping on after returning on
 while !virtqueue_get_buf(vi->cvq, &tmp) && !virtqueue_is_broken(vi->cvq)

So we "should be" (tm) safe to assume that we always get a good buffer back, but then lack one?

Too much is optimized out by default to take a much deeper look.
I need to understand more what happens there, so I'm going to recompile the kernel with extra stuff, more debug and less optimization.

pull-lp-source linux 4.4.0-18.34
Build from source with oldconfig and such
Enable all kind of debug for virtio
Add some checks where we expect it to fail
mkdir /home/ubuntu/4.4.0-debug
# not needed make INSTALL_MOD_PATH=/home/ubuntu/4.4.0-debug modules_install
make INSTALL_PATH=/home/ubuntu/4.4.0-debug install

    <kernel>/home/ubuntu/linux-4.4.0/vmlinuz-4.4.6</kernel>
    <cmdline>root=/dev/vda1 console=tty1 console=ttyS0 net.ifnames=0</cmdline>

Attach debugger as before and retrigger the bug
Ensure /home/ubuntu/linux-4.4.0/scripts/gdb/vmlinux-gdb.py gets loaded properly for helpers

On boot my debug starts to work on the one device that gets innitialized on boot:
[ 3.557697] __virtqueue_get_buf: Entry checks passed - vq ffff8800bbae6400 from _vq ffff8800bbae6400
[ 3.559320] __virtqueue_get_buf: Exit checks passed - ffff8801b74b2840 vq->data[i]
[ 3.560515] __virtqueue_get_buf: Returning ret ffff8801b74b2840

Prep issue:
sudo /usr/bin/testpmd --pci-blacklist 0000:00:03.0 --socket-mem 2048 -- --interactive --total-num-mbufs=2048

* it might be worth to mention that nothing regarding the queues came by running testpmd - neither in console nor in gdb

Trigger hang:
sudo ethtool -L eth1 combined 3

__virtqueue_is_broken: - vq ffff8800bbae7000 from _vq ffff8800bbae7000 -> broken 0
__virtqueue_is_broken: - vq ffff8800bbae7000 from _vq ffff8800bbae7000 -> broken 0
[...]

With the debug we have we can check the vvq's status
BTW - the offset of that container_of is 0 - so we can just cast it :-/

$4 = {vq = {list = {next = 0xffff8800bb892b00, prev = 0xffff8801b7518000}, callback = 0x0 <irq_stack_union>, name = 0xffffffff81d0f164 "control",
    vdev = 0xffff8800bb892800, index = 8, num_free = 63, priv = 0x1c010}, vring = {num = 64, desc = 0xffff8801b7514000, avail = 0xffff8801b7514400,
    used = 0xffff8801b7515000}, weak_barriers = true, broken = false, indirect = true, event = true, free_head = 1, num_added = 0, last_used_idx = 0,
  avail_flags_shadow = 1, avail_idx_shadow = 1, notify = 0xffffffff814bca40 <vp_notify>, data = 0xffff8800bbae7078}

So it considers itself not broken.

But I've seen it run over the usually disabled (so we don't see it by default):
pr_debug("No more buffers in queue\n");

That depends on !more_used(vq)
Which is:
  return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
               0 != 0

(gdb) p ((struct vring_virtqueue *)0xffff8800bbae7000)->vring
$19 = {num = 64, desc = 0xffff8801b7514000, avail = 0xffff8801b7514400, used = 0xffff8801b7515000}
(gdb) p *((struct vring_virtqueue *)0xffff8800bbae7000)->vring.used
$21 = {flags = 0, idx = 0, ring = 0xffff8801b7515004}
(gdb) p *((struct vring_virtqueue *)0xffff8800bbae7000)->vring.avail
$22 = {flags = 1, idx = 1, ring = 0xffff8801b7514404}
(gdb) p *((struct vring_virtqueue *)0xffff8800bbae7000)->vring.desc
$23 = {addr = 3140568064, len = 48, flags = 4, next = 1}

0!=0 => false -> so more_used returns fals
But the call said !more_used, so virtqueue_get_buf returns NULL - and that is all it does "forever".

Before going into discussions how it "should" be I added more debug code and gatherered some good case vs bad case data.

First of all it is "ok" to have no more buffers.
I had a prink in a codepath that only triggers when !more_used triggers.
And I've seen plentry for all kind of idx values.
On adding virtio traffic it triggers a few times as well.
Eventually that is what the loop is for, to wait until there is ia buffer that it can get.
So things aren't broken if this triggers ever - but of course it is if it never changes.

IIRC: last_used is != vring_used->idx just means nothing happened since our last interaction (to be confirmed).

Good case:
Some !more_used might occur, but not related and not infintely
[ 393.542550] __virtqueue_get_buf: No more buffers in vq ffff8801b74b3000 - vq->last_used_idx 303 == vq->vring.used->idx 303
[ 394.097117] __virtqueue_get_buf: No more buffers in vq ffff8801b74b3000 - vq->last_used_idx 304 == vq->vring.used->idx 304
[ 394.097413] __virtqueue_get_buf: No more buffers in vq ffff8801b74b4000 - vq->last_used_idx 125 == vq->vring.used->idx 125
[...]
[ 394.449672] __virtqueue_get_buf: Entry checks passed - vq ffff8800bbaef000 from _vq ffff8800bbaef000
[ 394.452734] __virtqueue_get_buf: Exit checks passed - ffff8801b74b5840 vq->data[i]
[ 394.455087] __virtqueue_get_buf: Returning ret ffff8801b74b5840
Done

Bad case (after DPDK ran):
Now both debug printk's trigger
I get a LOT of
[ 552.018862] __virtqueue_is_broken: - vq ffff8800bbaef000 from _vq ffff8800bbaef000 -> broken 0
Followed by a sequence like that in between
[ 554.157376] __virtqueue_get_buf: No more buffers in vq ffff8800bbaef000 - vq->last_used_idx 2 == vq->vring.used->idx 2
[ 554.158916] __virtqueue_is_broken: - vq ffff8800bbaef000 from _vq ffff8800bbaef000 -> broken 0
[ 554.160135] __virtqueue_get_buf: No more buffers in vq ffff8800bbaef000 - vq->last_used_idx 2 == vq->vring.used->idx 2
[ 554.161583] __virtqueue_is_broken: - vq ffff8800bbaef000 from _vq ffff8800bbaef000 -> broken 0
[ 554.162776] __virtqueue_get_buf: No more buffers in vq ffff8800bbaef000 - vq->last_used_idx 2 == vq->vring.used->idx 2
[ 554.164189] __virtqueue_is_broken: - vq ffff8800bbaef000 from _vq ffff8800bbaef000 -> broken 0
[...] (infinite loop)

Current assumption: DPDK disables something in the host part of the virtio device that makes the host no more response "correctly".
Via unbinding/binding the driver we can reinitialize that, but if not we will run into this hang.
Remember: we only initialize DPDK with testpmd, no load whatsoever is driven by it.

We likely need two fixes:
1. find what DPDK does "to" the device and avoid it
2. the kernel should give up after some number of retries or so and give up returning a fail (not good, but much better than hanging)

Tested latest upstream versions to be sure we not just miss a patch that already exists.
Bug still happens with linux-4.6-rc4.tar.xz

Discussing with Thomas Monjalon revealed a set of post 2.2 patches.
These will no more let you intialize DPDK while a kernel driver - like virtio-pci is still bound.

I already proved before on this bug that reinitializing it to virtio-pci will properly set it up and make it workable again.
So I intend to backport and test those patches together with some more for the next upload.
This might need some more Doc updates and also will get rid of users accidentially killing their connection by failing to blacklist their main virtio device

Thanks to Thomas for identifying these.

@Martin - until then the proper "workaround" is to reinitialize via e.g.:
dpdk_nic_bind -b uio_pci_generic 0000:00:04.0
dpdk_nic_bind -b virtio-pci 0000:00:04.0

The attachment "printk debugging around the issue of never getting out of the loop in virtnet_send_command" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch

Now working - a device still "in touch" by the kernel will be rejected to be used.

EAL: probe driver: 1af4:1000 rte_virtio_pmd
EAL: Error - exiting with code: 1
  Cause: Requested device 0000:00:05.0 cannot be used

You have to at least unbind them now to use them with DPDK:
sudo dpdk_nic_bind -u 0000:00:04.0
You can assign them to uio_pci_generic if you want, but it is not required
sudo dpdk_nic_bind -b uio_pci_generic 0000:00:05.0

Using testpmd now on those works as before (you still need to blacklist/whitelist as it can't know which ones to use).

Then reassigning the kernel driver to use them "normally" again
sudo dpdk_nic_bind -b virtio-pci 0000:00:04.0
sudo dpdk_nic_bind -b virtio-pci 0000:00:05.0

After this re-init I can properly use them again e.g.:
sudo ethtool -L ens5 combined 4

I'll try to make the rejecting error more "readable" and check the docs to still match.
Other than that it will be in the next upload for DPDK.

Until then (at your own risk) one can try https://launchpad.net/~paelzer/+archive/ubuntu/dpdk-packaging-tests

Xenial is released, so we are back in SRU mode.
Therefore I add the matching SRU Template for the upload of 2.2.0ubuntu8 which is in the unapproved queue atm.

[Impact]

 * using devices by DPDK and the kernel at once drives the system into hangs
 * the fix avoids using devices in DPDK that are still in use by the kernel
 * fix is a backport form upstream accepted patch

[Test Case]

 * run dpdk in a guest on virtio-pci devices
 * afterwards do anything that touches the queues of the device like ethtool -L

[Regression Potential]

 * Some existing setups might no more work if they set up DPDK on kernel owned devices. But that is intentional as they are only one step away from breaking their systems
 * The documentation in the server guide has been adapted to reflect the new needs (merge proposal waits for ack)
 * also the comments and examples in the config files have been adapted to reflect the new style
  * passed ADT tests on i368/amd64/amd64-lowmem and our full CI (https://code.launchpad.net/~ubuntu-server/ubuntu/+source/dpdk-testing/+git/dpdk-testing)

Hello Thiago, or anyone else affected,

Accepted dpdk into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/dpdk/2.2.0-0ubuntu8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed

FYI - Verified in Proposed.

Next I need to prep some Y tests to reasonably request an upload to Yakkety to allow migration as Martin indicated.

tags: added: verification-done-xenial
tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dpdk - 2.2.0-0ubuntu9

---------------
dpdk (2.2.0-0ubuntu9) yakkety; urgency=medium

  * d/p/ubuntu-backport-[36-37] fix virtio issues (LP: #1570195):
    - don't let DPDK initialize virtio devices still in use by the kernel
    - this avoids conflicts between kernel and dpdk usage of those devices
    - an admin now has to unbind/bind devices as on physical hardware
    - this is in the dpdk 16.04 release and delta can then be dropped
    - d/dpdk-doc.README.Debian update for changes in virtio-pci handling
    - d/dpdk.interfaces update for changes in virtio-pci handling
  * d/p/ubuntu-backport-38... fix for memory leak (LP: #1570466):
    - call vhost_destroy_device on removing vhost user ports to fix memory leak
    - this likely is in the dpdk 16.07 release and delta can then be dropped
  * d/p/ubuntu-fix-vhost-user-socket-permission.patch fox (LP: #1546565):
    - when vhost_user sockets are created they are owner:group of the process
    - the DPDK api to create those has no way to specify owner:group
    - to fix that without breaking the API and potential workaround code in
      consumers of the library like openvswitch 2.6 for example. This patch
      adds an EAL commandline option to specify user:group created vhost_user
      sockets should have.

 -- Christian Ehrhardt <email address hidden> Wed, 27 Apr 2016 07:52:48 -0500

Changed in dpdk (Ubuntu):
status: Confirmed → Fix Released
Thiago Martins (martinx) wrote :

Just for the record, after upgrading DPDK (proposed repo), OpenvSwitch+DPDK isn't not starting up anymore when inside of a VM...

I am double checking everything again...

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dpdk - 2.2.0-0ubuntu8

---------------
dpdk (2.2.0-0ubuntu8) xenial; urgency=medium

  * d/p/ubuntu-backport-[36-37] fix virtio issues (LP: #1570195):
    - don't let DPDK initialize virtio devices still in use by the kernel
    - this avoids conflicts between kernel and dpdk usage of those devices
    - an admin now has to unbind/bind devices as on physical hardware
    - this is in the dpdk 16.04 release and delta can then be dropped
    - d/dpdk-doc.README.Debian update for changes in virtio-pci handling
    - d/dpdk.interfaces update for changes in virtio-pci handling
  * d/p/ubuntu-backport-38... fix for memory leak (LP: #1570466):
    - call vhost_destroy_device on removing vhost user ports to fix memory leak
    - this likely is in the dpdk 16.07 release and delta can then be dropped
  * d/p/ubuntu-fix-vhost-user-socket-permission.patch fox (LP: #1546565):
    - when vhost_user sockets are created they are owner:group of the process
    - the DPDK api to create those has no way to specify owner:group
    - to fix that without breaking the API and potential workaround code in
      consumers of the library like openvswitch 2.6 for example. This patch
      adds an EAL commandline option to specify user:group created vhost_user
      sockets should have.

 -- Christian Ehrhardt <email address hidden> Mon, 25 Apr 2016 11:42:40 +0200

Changed in dpdk (Ubuntu Xenial):
status: New → Fix Released

The verification of the Stable Release Update for dpdk has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Chris J Arges (arges) on 2016-05-04
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu Xenial):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers