We're running Kata Containers using QEMU and with v5.1.0rc0 and rc1 have noticed a problem at startup where QEMu appears to hang. We are not seeing this problem on our bare metal nodes and only on a VSI that supports nested virtualization.
We unfortunately see nothing at all in the QEMU logs to help understand the problem and a hung process is just a guess at this point.
Using git bisect we first see the problem with...
---
f19bcdfedd53ee93412d535a842a89fa27cae7f2 is the first bad commit
commit f19bcdfedd53ee93412d535a842a89fa27cae7f2
Author: Jason Wang <email address hidden>
Date: Wed Jul 1 22:55:28 2020 +0800
virtio-pci: implement queue_enabled method
With version 1, we can detect whether a queue is enabled via
queue_enabled.
Signed-off-by: Jason Wang <email address hidden>
Signed-off-by: Cindy Lu <email address hidden>
Message-Id: <email address hidden>
Reviewed-by: Michael S. Tsirkin <email address hidden>
Signed-off-by: Michael S. Tsirkin <email address hidden>
Acked-by: Jason Wang <email address hidden>
hw/virtio/virtio-pci.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
---
Reverting this commit (on top of 5.1.0-rc1) seems to work and prevent the hanging.
---
Here's how kata ends up launching qemu in our environment --
/opt/kata/bin/qemu-system-x86_64 -name sandbox-849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f -uuid 6bec458e-1da7-4847-a5d7-5ab31d4d2465 -machine pc,accel=kvm,kernel_irqchip -cpu host,pmu=off -qmp unix:/run/vc/vm/849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f/qmp.sock,server,nowait -m 4096M,slots=10,maxmem=30978M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= -device virtio-serial-pci,disable-modern=true,id=serial0,romfile= -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f/console.sock,server,nowait -device virtio-scsi-pci,id=scsi0,disable-modern=true,romfile= -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0,romfile= -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev socket,id=charch0,path=/run/vc/vm/849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f/kata.sock,server,nowait -chardev socket,id=char-396c5c3e19e29353,path=/run/vc/vm/849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f/vhost-fs.sock -device vhost-user-fs-pci,chardev=char-396c5c3e19e29353,tag=kataShared,romfile= -netdev tap,id=network-0,vhost=on,vhostfds=3:4,fds=5:6 -device driver=virtio-net-pci,netdev=network-0,mac=52:ac:2d:02:1f:6f,disable-modern=true,mq=on,vectors=6,romfile= -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -object memory-backend-file,id=dimm1,size=4096M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /opt/kata/share/kata-containers/vmlinuz-5.7.9-74 -initrd /opt/kata/share/kata-containers/kata-containers-initrd_alpine_1.11.2-6_agent.initrd -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug panic=1 nr_cpus=4 agent.use_vsock=false scsi_mod.scan=none init=/usr/bin/kata-agent -pidfile /run/vc/vm/849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f/pid -D /run/vc/vm/849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f/qemu.log -smp 2,cores=1,threads=1,sockets=4,maxcpus=4
---
Seems an odd one to be the culprit? Still lets ask Jason.
Can you just clarify; is this qemu 5.1.0-rc on both the host and the kata, or are you just changing to using qemu 5.1 as the host, and you're sticking with the existing kata qemu?
Which host CPU have you got?