Networking issues after upgrade to 1:2.5+dfsg-5ubuntu10.37

Bug #1829245 reported by sw0x2A on 2019-05-15
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Undecided
Unassigned
Xenial
Critical
Unassigned

Bug Description

Since yesterdays upgrade to qemu packages version 1:2.5+dfsg-5ubuntu10.37, new VMs have networking issues. Their network interfaces stop working after a short time, are not even able to PXE boot but when, they will LOSE connectivity after a few seconds (longest was around a minute). Downgrading to 1:2.5+dfsg-5ubuntu10.36 fixed the issue again.

Unfortunately there are no error messages thrown. This is all I can provide:

2019-05-15 12:40:14.103+0000: starting up libvirt version: 1.3.1, package: 1ubuntu10.25 (Marc Deslauriers <email address hidden> Wed, 13 Mar 2019 08:10:12 -0400), qemu version: 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.37), hostname: vh-4.YYYY.XXXXXXX.net
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name one-1110 -S -machine pc-i440fx-xenial,accel=kvm,usb=off -cpu host -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid d006f44c-bb54-410b-ba94-15a46d0cfd46 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-1110/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/one//datastores/101/1110/disk.0,format=raw,if=none,id=drive-virtio-disk0,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/var/lib/one//datastores/101/1110/disk.1,format=raw,if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=44,id=hostnet0,vhost=on,vhostfd=55 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=02:00:3d:86:51:7b,bus=pci.0,addr=0x3,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:1110,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
Domain id=63 is tainted: host-cpu
char device redirected to /dev/pts/18 (label charserial0)
2019-05-15T13:44:09.552393Z qemu-system-x86_64: terminating on signal 15 from pid 50288
2019-05-15 13:44:09.753+0000: shutting down
2019-05-15 13:44:27.508+0000: starting up libvirt version: 1.3.1, package: 1ubuntu10.25 (Marc Deslauriers <email address hidden> Wed, 13 Mar 2019 08:10:12 -0400), qemu version: 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.36), hostname: vh-4.YYYY.XXXXXXX.net
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name one-1110 -S -machine pc-i440fx-xenial,accel=kvm,usb=off -cpu host -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 414fcc19-2440-4ff1-9c29-50f2770c94f1 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-1110/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/one//datastores/101/1110/disk.0,format=raw,if=none,id=drive-virtio-disk0,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/var/lib/one//datastores/101/1110/disk.1,format=raw,if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=44,id=hostnet0,vhost=on,vhostfd=53 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=02:00:3d:86:51:7b,bus=pci.0,addr=0x3,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:1110,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
Domain id=66 is tainted: host-cpu
char device redirected to /dev/pts/18 (label charserial0)

sw0x2A (sw0x2a) wrote :

I just noticed you released a new version already. Testing 1:2.5+dfsg-5ubuntu10.38.

Sebastien Bacher (seb128) wrote :

Thank you for the bug report, let us know how the new version testing is going

I'm wondering as 1:2.5+dfsg-5ubuntu10.37 never existed in the wild.
The archive should have went from .36 to .38 I'd think.

There is a single upload [1] covering both.
.37 only existed in proposed I'd think.

Never the less it contained a few changes to networking.
One to slirp (not your case), but also:
  * d/p/lp1823458/add-VirtIONet-vhost_stopped-flag-to-prevent-multiple.patch,
    d/p/lp1823458/do-not-call-vhost_net_cleanup-on-running-net-from-ch.patch:
    - Prevent crash due to race condition on shutdown;
      this is fixed differently upstream (starting in Bionic), but
      the change is too large to backport into Xenial. These two very
      small patches work around the problem in an unintrusive way.
      (LP: #1823458)

I'll subscribe mdeslaur, sbeattie (uploaders) and ddstreet (patch author) to consider your case.

Please add whatever else you found, in particular:
- does every qemu start fail networking for you?
- if not what part of the config is it that seem required to hit the issue
- how is your networking set up in general (normal libvirt network, OVS, ..)?
- which network HW is attached (in case it makes a difference)?
- the guest cmdline looks like a libvirt qemu-cmdline, so you might add the guest XML

[1]: https://launchpad.net/ubuntu/+source/qemu/1:2.5+dfsg-5ubuntu10.37

tags: added: regression-update
sw0x2A (sw0x2a) wrote :

Version 1:2.5+dfsg-5ubuntu10.38 doesn't fix it entirely but works better than ...10.37. It seem to lose network connectivity only on reboots of a VM. Poweroff/resume (killing and restarting the kvm process) fixes it again.

Still no error messages or something. Willing to provide more information when you tell me what you need.

Dan Streetman (ddstreet) wrote :

@sw0x2a can you try the qemu from this ppa:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1829245

sw0x2A (sw0x2a) wrote :

This is a OpenNebula cluster. AFAICS all hosts are effected as soon as they run with the new versions. Hosts have different hardware but similar network configuration: 2 bonded NICs and a bridged interface to be used by VMs.

 Network setup on VH host
auto lo
iface lo inet loopback
#eth0 is manually configured, and slave to the "bond0" bonded NIC
auto eth0
iface eth0 inet manual
bond-master bond1
#eth1 ditto, thus creating a 2-link bond.
auto eth1
iface eth1 inet manual
bond-master bond1
# bond1 is the bonded NIC and can be used like any other normal NIC.
auto bond1
iface bond1 inet manual
# bond1 uses standard IEEE 802.3ad LACP bonding protocol
bond-mode 802.3ad
bond-miimon 100
bond-lacp-rate 1
bond-xmit-hash-policy layer3+4
bond-slaves eth0 eth1
# Bridged interface to be used by VMs
auto ipblsrvrs
iface ipblsrvrs inet static
  bridge_ports bond1
  address 172.20.4.106
  gateway 172.20.4.1
  netmask 255.255.252.0
  bridge_stp on
  bridge_fd 1
  bridge_hello 2
  bridge_maxage 12

# Guest XML created by OpenNebula looks like this:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
 <name>one-1110</name>
 <vcpu><![CDATA[2]]></vcpu>
 <cputune>
  <shares>205</shares>
 </cputune>
 <memory>4194304</memory>
 <os>
  <type arch='x86_64'>hvm</type>
 </os>
 <cpu mode='host-passthrough'>
 </cpu>
 <devices>
  <emulator><![CDATA[/usr/bin/kvm]]></emulator>
  <disk type='file' device='disk'>
   <source file='/var/lib/one//datastores/101/1110/disk.0'/>
   <target dev='vda'/>
   <boot order='2'/>
   <driver name='qemu' type='raw' cache='writeback'/>
  </disk>
  <disk type='file' device='cdrom'>
   <source file='/var/lib/one//datastores/101/1110/disk.1'/>
   <target dev='hda'/>
   <readonly/>
   <driver name='qemu' type='raw'/>
  </disk>
  <interface type='bridge'>
   <source bridge='ipblsrvrs'/>
   <mac address='02:00:3d:86:51:7b'/>
   <target dev='one-1110-0'/>
   <boot order='1'/>
   <model type='virtio'/>
  </interface>
  <graphics type='vnc' listen='0.0.0.0' port='7010' passwd='secret'/>
 </devices>
 <features>
  <acpi/>
 </features>
 <devices><serial type='pty'><source path='/dev/pts/5'/><target port='0'/></serial><console type='pty' tty='/dev/pts/5'><source path='/dev/pts/5'/><target port='0'/></console></devices>
 <metadata>
  <system_datastore><![CDATA[/var/lib/one//datastores/101/1110]]> </system_datastore>
 </metadata>
</domain>

sw0x2A (sw0x2a) wrote :

@dan Ok, will try and report back soon.

sw0x2A (sw0x2a) wrote :

@dan It works! I cannot reproduce the bug on the patched host (version 1:2.5+dfsg-5ubuntu10.38+bug1829245v20190516b1 from your ppa).

sw0x2A (sw0x2a) wrote :

> I'm wondering as 1:2.5+dfsg-5ubuntu10.37 never existed in the wild.
> The archive should have went from .36 to .38 I'd think.

@christian: We mirror Ubuntu repositories in a way that keeps versions in our repo even when they have been removed from the original one. I guess this means version .37 was published for a moment and quickly removed again from official repos but the sync'ed it already into ours.

Dan Streetman (ddstreet) on 2019-05-16
Changed in qemu (Ubuntu Xenial):
assignee: nobody → Dan Streetman (ddstreet)
Changed in qemu (Ubuntu):
status: New → Invalid
Changed in qemu (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Critical
assignee: Dan Streetman (ddstreet) → nobody
Dan Streetman (ddstreet) wrote :

@sw0x2a thanks - my patch (that breaks things) will be reverted in a new release through the security pocket and i'll reapply the fixed version thru the normal sru process.

Changed in qemu (Ubuntu Xenial):
status: In Progress → New
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.5+dfsg-5ubuntu10.39

---------------
qemu (1:2.5+dfsg-5ubuntu10.39) xenial-security; urgency=medium

  * Disable patches from 1:2.5+dfsg-5ubuntu10.37 to prevent regression
    (LP: #1829245)
    - d/p/lp1823458/add-VirtIONet-vhost_stopped-flag-to-prevent-multiple.patch
    - d/p/lp1823458/do-not-call-vhost_net_cleanup-on-running-net-from-ch.patch

 -- Marc Deslauriers <email address hidden> Thu, 16 May 2019 07:11:54 -0400

Changed in qemu (Ubuntu Xenial):
status: New → Fix Released
Dan Streetman (ddstreet) on 2019-05-16
no longer affects: cloud-archive
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers