Comment 6 for bug 2051965

Revision history for this message
Matthieu Baerts (matttbe) wrote : Re: QEmu with TCG acceleration (without KVM) causes kernel panics with kernels >=6.3

Hi Sergio,

Thank you for your reply!

> The next steps here would be to come up with a simple reproducer for the bug, so that I can write the SRU text. Then, I will prepare uploads for Jammy and Mantic, and you can help us by testing the packages once they are uploaded to the archive.

Thank you! Sure, I can test new packages.

> Since you were bisecting the problem, it seems to me that you have the reproducer pretty much nailed already, right? Would you be able to tell me so that I can try reproducing the bug locally as well?

I have a reproducer. But that's not a "simple" one. Here is what can be done:

  # Download the Linux kernel source from kernel.org or git, at least Linux 6.3, ideally a recent one, e.g. v6.7.4
  cd [linux kernel source code]

  # modify a test to stop after what triggers the kernel panic (ping)
  sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh

  # to run the ping test max 250 in the next step
  echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run

  # use a Docker image based on Ubuntu 23.10 including QEmu 8.0.4 with the bug + tools
  # this will build the kernel and dependences, then run 'mptcp_connect.sh' test 250 times
  # docker is used without "--privileged", so KVM will not be used (on purpose)
  docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --rm -it -e INPUT_BUILD_SKIP_PERF=1 \
    --pull always mptcp/mptcp-upstream-virtme-docker:latest \
    auto-normal

When I tested different versions of QEmu, I used the above command with 'cmd bash' instead of 'auto-normal', and run commands manually to compile QEmu and execute the tests from the VM (./.virtme/scripts/virtme.expect).

I don't have a simple C program to reproduce this concurrency bug. Is it an issue?

It is not clear how other people managed to reproduced the bug. According to the original cover letter [1], I think the bug was visible by just booting (?) Fedora rawhide with kernel-core-6.5.0-0.rc0.20230703gita901a3568fd2.8.fc39.x86_64.rpm. But on [2], it looks like the bug has been seen by LKFT team, when running kernel selftests on various kernel versions, but no more details. If needed, I guess we can contact these people.

[1] https://lore<email address hidden>/
[2] https://lore.kernel.org<email address hidden>/

> Meanwhile, I'll see about checking the commits you mentioned and start backporting them.

Thanks! Do not hesitate to look at commits from https://gitlab.com/matttbe/qemu/-/commits/lp-2051965/
But that's the first time I'm looking at QEmu code, I hope I fixed the conflicts properly.