HWRNG access failing in docker container for libvirt, killing VMs on start

Bug #1938644 reported by Boris Lukashev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
New
Undecided
Unassigned

Bug Description

Trying to get VMs to have better access to the hardware available, i enabled `rng_dev_path=/dev/hwrng` for nova-compute, and set up udev to make it 0666 for consumers. This is the setup used in our older non-containerized openstacks, liberty and mitaka (Fuel-built) where it works great on much older hardware.
The Docker container can "see" the device node:
```
(nova-libvirt)[root@compute4 /]# ls -alh /dev/hwrng
crw-rw-rw- 1 root root 10, 183 Jul 31 02:24 /dev/hwrng

```
However, all VMs _silently_ fail to start (not recorded to ElasticSearch - only thing ES gets is "qemuMonitorIO:618 : internal error: End of file from qemu monitor") with the following error, presented on-CLI when manually trying to start the instance:
```
# virsh start instance-00000019; tail -f /var/log/libvirt/qemu/instance-0000001*
Domain instance-00000019 started

==> /var/log/libvirt/qemu/instance-00000016.log <==
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \
-object rng-random,id=objrng0,filename=/dev/hwrng \
-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2021-08-01 13:23:50.508+0000: Domain id=3 is tainted: host-cpu
**
ERROR:/build/qemu-rbeYHu/qemu-4.2/backends/rng-random.c:48:entropy_available: assertion failed: (len != -1)
Bail out! ERROR:/build/qemu-rbeYHu/qemu-4.2/backends/rng-random.c:48:entropy_available: assertion failed: (len != -1)
2021-08-01 13:24:03.876+0000: shutting down, reason=crashed
```
The container can clearly _see_ the device node, but cannot read from it:
```
# dd if=/dev/hwrng of=/tmp/rnd.bin bs=25 count=1
dd: error reading '/dev/hwrng': No such device
0+0 records in
0+0 records out
0 bytes copied, 0.000517002 s, 0.0 kB/s
```
which, if i am groking this right, prevents the VM from starting due to some Docker-ism.

Revision history for this message
Boris Lukashev (rageltman) wrote :

This seems to be an order of operations issue: looks like inserting the module after the container starts is the problem here. A fresh boot of a compute node loads the module via udev prior to the container starting (which works fine), inserting the module while the container is running produces the effect described, and stopping then starting the container also resolves it after loading the module.

I think the only actual bug here is the lack of correct log output/collection by the logging stack, and possibly some weirdness with ubuntu (the host os).

So if anyone else does this in their setup, you need to first load the kernel modules involved and udev rule to set hwrng to 0666, and only then run the `kolla-ansible reconfigure --tags nova` piece which should perform container restarts in the right order. If it still doesn't work, my "fix" may have been the piece of fully stopping the container then starting it back up.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.