Comment 24 for bug 1743637

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Alright, so I removed the proposed patch for Xenial since it was only dealing with half of the issue as you could see by my previous post, about the stack trace on a crash file I got. What I did not say in the last comment was about some other discoveries.

Basically, as I said, the qemu-chr code really implements a CharBackend mechanism (differentiating backend and frontend), and it does add a check for null arguments on some of the qemu-chr functions. The point is that, previously, it was up to the caller to check if argument was good or not (and that is why it doesn't have checkers on 2.5 qemu version).

Nevertheless, I haven't pointed why the argument was null =). On:

vhost_net_stop():

...
    for (i = 0; i < total_queues; i++) {
        vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
    }
...

get_vhost_net(ncs[i].peer) returned null because the vhost_net struct was already cleaned up by the qemu char event itself (after the socket was closed). There is an upstream fix for this:

commit e6bcb1b61782cefacecc8b9ee7a5f782373dab2d
Author: Marc-André Lureau <email address hidden>
Date: Wed Jul 27 01:15:12 2016 +0400

    vhost-user: keep vhost_net after a disconnection

And it changes the place where vhost_net struct was freed (to the tap interface cleanup). It also replaces (freeing properly) the vhost_net if the nic is restarted.

I'm attaching the fix for 2 cases, found in 2 stack traces in previous comments, and I'm asking the final user to provide feedback on these fixes. Hopefully the vhost-user shutdown logic will be fixed for good.