QEMU

Bug #1673976
Comment #25

Comment 25 for bug 1673976

Revision history for this message

Éric Hoffman (ehoffman-videotron) wrote on 2018-03-23:

#25

Hello

Sorry for the delay...

Actually, you only need the parent to get the status from the child, which can be passed in other way than through common memory.

The idea is to use pipefd to actually wait for the child to either terminate or successfully call execve. As follow:

When the TARGET_NR_clone syscall is trapped, you do:
- Call do_fork(), as currently done
- In do_fork(), at the beginning, if CLONE_VFORK flag is set, keep track of it (i.e. do not clear the flag, just clear the CLONE_VM, as currently done, to do a normal fork, i.e. the child have it's own copy of the memory segments).
- Just before the call to fork(), create a pipefd.
- The parent branch and then (if CLONE_VFORK is set) close the write end of the pipe (it's own copy), and start looping (could be indefinitely, but preferably some sort of timeout logic could be set) on the read fd, waiting continuously for status updates from the child.
- The child branch close the read-end of the pipe (it's own forked copy), set the write-end fd flag FD_CLOEXEC (with fnctl()), and put the write fd into it's QEMU state variables (parent vfork fd).
- The child then move on.

When the TARGET_NR_execve syscall is trapped (this is in child context), you do:
- Do everything as currently done, up to just before the safe_execve() call.
- Just before the call to safe_execve(), check if the QEMU state variable (parent vfork fd) is defined. If so, tell the the parent (through the pipe), that we are good so far, and about to call execve(). Note that the parent just update the child status, but keep looping endlessly.
- Call the execve().
- If the above call return, an error occurred. If this occur, check if the QEMU state variable (parent vfork fd) is defined. If so, tell whatever error status you got to the parent (through the pipe). The parent update it's child status, but again, continue to loop endlessly.
- Continue normally.

That's pretty much the bulk of the work done! What will happen:
- Either the child will eventually call execve, which will succeed, at which point the write end of the pipe will be closed (because we set the pipe to close on execve, with the FD_CLOEXEC flag).
- The child could be playing on us, and try to re-call execve() multiple times (possibly with different arguments, executables path, etc.), but every time, the parent will just receive status update through the pipe. And eventually, the above case will occur (success), and pipe will be closed.
- The child call _exit(), which will close the pipe again.
- The child get some horrible signal, get killed, or whatever else... Pipe still get closed.

The parent, on it's side, just update the status endlessly, UNTIL the other end of the pipe get closed. At this point, the read() of the pipe will get a 'broken pipe' error. This signal the parent to move on, and return whatever status the child last provided.

Note that this status could initially be set to an error state (in case the child die or call _exit() before calling execve()).

The only thing that could make the parent hang is if the child hang (and never call execve() or _exit() or die...). But the beauty is that this is perfectly fine, because that is exactly the required behavior when CLONE_VFORK flag is set (parent wait for the child).

This is a lot of description, but should be relatively easy and straightforward to implement. Could this work?

There are a few examples similar to this on the Web, using pipefd, fork and execve, for different applications. Here, we just pass the status.

Regards,
Eric

Hello

Sorry for the delay...

Actually, you only need the parent to get the status from the child, which can be passed in other way than through common memory.

The idea is to use pipefd to actually wait for the child to either terminate or successfully call execve.  As follow:

When the TARGET_NR_execve syscall is trapped (this is in child context), you do:
- Do everything as currently done, up to just before the safe_execve() call.
- Just before the call to safe_execve(), check if the QEMU state variable (parent vfork fd) is defined.  If so, tell the the parent (through the pipe), that we are good so far, and about to call execve().  Note that the parent just update the child status, but keep looping endlessly.
- Call the execve().
- If the above call return, an error occurred.  If this occur, check if the QEMU state variable (parent vfork fd) is defined.  If so, tell whatever error status you got to the parent (through the pipe).  The parent update it's child status, but again, continue to loop endlessly.
- Continue normally.

That's pretty much the bulk of the work done!  What will happen:
- Either the child will eventually call execve, which will succeed, at which point the write end of the pipe will be closed (because we set the pipe to close on execve, with the FD_CLOEXEC flag).
- The child could be playing on us, and try to re-call execve() multiple times (possibly with different arguments, executables path, etc.), but every time, the parent will just receive status update through the pipe.  And eventually, the above case will occur (success), and pipe will be closed.
- The child call _exit(), which will close the pipe again.
- The child get some horrible signal, get killed, or whatever else...  Pipe still get closed.

The parent, on it's side, just update the status endlessly, UNTIL the other end of the pipe get closed.  At this point, the read() of the pipe will get a 'broken pipe' error.  This signal the parent to move on, and return whatever status the child last provided.

Note that this status could initially be set to an error state (in case the child die or call _exit() before calling execve()).

The only thing that could make the parent hang is if the child hang (and never call execve() or _exit() or die...).  But the beauty is that this is perfectly fine, because that is exactly the required behavior when CLONE_VFORK flag is set (parent wait for the child).

This is a lot of description, but should be relatively easy and straightforward to implement.  Could this work?

There are a few examples similar to this on the Web, using pipefd, fork and execve, for different applications.  Here, we just pass the status.

Regards,
Eric