- The logic net_cleanup calls the vhost_net_stop.
- This last one iterates over all vhost networks to stop one by one.
- Idea behind is to cleanly do the virtqueue stop, releasing resources.
- In order to stop the virtqueue, vhost has to get the vring base address
(by sending a msg of VHOST_USER_GET_VERING_BASE)
- the char device would read from the socket the base address.
- if it reads nothing, the qemu tcp channel driver would disconnect the socket.
- when the socket is disconnected, vhost_user stops all the queues to that vhost_user socket.
From the dump:
By disconnecting charnet2 device we reach the error. Since the char device has already been disconnected, the vhost_user_stop tries to stop all queues but it accidentally treats all of them the same (and charnet4 is a TAP device, not a VHOST USER).
#### Logic Error:
Here is the charnet2 data at the time of the error:
When it realizes the connection is gone it creates an event:
qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
Which will call:
net_vhost_user_event
This last function finds all NetClientState using a pointer called "name".
The event was originated the device charnet2 and the event callback is running using charnet4, which explains why the bad decision (assert) was made (trying to assert if a TAP device is a VHOST_USER one).
#### Possible Fix
There is already a commit upstream that might address this:
"name" is freed after visiting options, instead use the first NetClientState
name. Adds a few assert() for clarifying and checking some impossible states.
# BUG Description after dump analysis
- The logic net_cleanup calls the vhost_net_stop. GET_VERING_ BASE)
- This last one iterates over all vhost networks to stop one by one.
- Idea behind is to cleanly do the virtqueue stop, releasing resources.
- In order to stop the virtqueue, vhost has to get the vring base address
(by sending a msg of VHOST_USER_
- the char device would read from the socket the base address.
- if it reads nothing, the qemu tcp channel driver would disconnect the socket.
- when the socket is disconnected, vhost_user stops all the queues to that vhost_user socket.
From the dump:
By disconnecting charnet2 device we reach the error. Since the char device has already been disconnected, the vhost_user_stop tries to stop all queues but it accidentally treats all of them the same (and charnet4 is a TAP device, not a VHOST USER).
#### Logic Error:
Here is the charnet2 data at the time of the error:
Name : filename (from CharDriverState) 0x556a934b0a90 "disconnected: unix:/run/ openvswitch/ vhostuser- vcic" 0x556a934b0a90 "disconnected: unix:/run/ openvswitch/ vhostuser- vcic" 93916226062992 101010101101010 100100110100101 100001010100100 00 22605220
Details:
Default:
Decimal:
Hex:0x556a934b0a90
Binary:
Octal:025265223
When it realizes the connection is gone it creates an event:
qemu_chr_ be_event( chr, CHR_EVENT_CLOSED);
Which will call:
net_vhost_ user_event
This last function finds all NetClientState using a pointer called "name".
The event was originated the device charnet2 and the event callback is running using charnet4, which explains why the bad decision (assert) was made (trying to assert if a TAP device is a VHOST_USER one).
#### Possible Fix
There is already a commit upstream that might address this:
commit c1bf3531aecf4a0 ba25bb150dd5fe2 1edf406c88
Author: Marc-André Lureau <email address hidden> 2016-02-23 18:10:49
Committer: Michael S. Tsirkin <email address hidden> 2016-03-11 14:59:12
Branches: master, origin/HEAD, origin/master, origin/stable-2.10, origin/stable-2.6, origin/stable-2.7, origin/stable-2.8, origin/stable-2.9
vhost-user: fix use after free
"name" is freed after visiting options, instead use the first NetClientState
name. Adds a few assert() for clarifying and checking some impossible states.