nested 9pfs read fail
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
QEMU |
Invalid
|
Undecided
|
Unassigned |
Bug Description
tl;dr: A virtfs read fails. The init being on this virtfs (mounted by the initrd), the linux kernel guest is unable to boot, and kernel panics. The fact that qemu still takes 100%cpu after the kernel panic makes me think it's a qemu bug.
Here is the setup (some hashes replaced with "..."):
* A (NixOS) host system, with /nix/store as a btrfs on lvm on cryptsetup
* Running a qemu-kvm NixOS guest, with /nix/.ro-store as a virtfs mapping to host /nix/store:
```
exec /nix/store/
-name test \
-m 8192 \
-cpu kvm64 \
-net nic,vlan=
-virtfs local,path=
-virtfs local,path=
-virtfs local,path=
-drive index=0,
-kernel /nix/store/
-initrd /nix/store/
-append "$(cat /nix/store/
-vga std -usbdevice tablet
```
* With /nix/store as an overlayfs between /nix/.ro-store and /nix/.rw-store
* Running a qemu-kvm NixOS guest, with /nix/.ro-store as a virtfs mapping to host /nix/store/
```
/nix/store/
-name nginx -m 128 -smp 2 -cpu kvm64 \
-nographic -serial unix:"/
-drive file="/
-virtfs local,path=
-netdev type=tap,
-kernel /nix/store/
-initrd /nix/store/
-append "$(cat /nix/store/
-virtfs local,path=
-virtfs local,path=
-virtfs local,path=
```
* With /nix/store as an overlayfs between /nix/.ro-store and /nix/.rw-store
* With init in /nix/store
What happens is that at boot time the inner VM doesn't manage to read the init script after the initrd has mounted the 9pfs and overlayfs.
What makes me think it's a qemu bug is that htop in the outer VM shows the inner VM's qemu as using 100% cpu, despite the fact the kernel it's running is under kernel panic, thus doing nothing.
What do you think about this?
Oh, I forgot to mention: it first worked for some time, then in the middle of a shell session running over a screen /var/lib/ vm/consoles/ nginx/screen from the outer VM (socat-linked to /var/lib/ vm/consoles/ nginx/socket. unix to provide a predictable pty link), the 9pfs stopped returning any data, and it didn't go away after a reboot of the inner VM, as it then no longer managed to boot.
I was unfortunately unable to identify exactly which operation caused the thing to "stop working", but I'd say it is due to zsh's path-full autocompletion in paths including directories with ~700 entries, without being certain of that.