9pfs does not honor open file handles on unlinked files
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | QEMU |
Undecided
|
Unassigned | ||
Bug Description
This was originally filed over here: https:/
The open-unlink-fstat idiom used in some places to create an anonymous private temporary file does not work in a QEMU guest over a virtio-9p filesystem.
Version-Release number of selected component (if applicable):
qemu-kvm-
qemu-system-
(those are fedora RPMs)
How reproducible:
Always. See this example C program:
https:/
Steps to Reproduce:
1. Export a filesystem with virt-manager for the guest.
(type: mount, driver: default, mode: passthrough)
2. Start guest and mount that filesystem
(mount -t 9p -o trans=virtio,
3. Run a program that uses open-unlink-fstat
(in my case it was trying to compile Perl 5.20)
Actual results:
fstat fails:
open("/
unlink(
fstat(3, 0x23aa1a8) = -1 ENOENT (No such file or directory)
close(3)
Expected results:
open("/
unlink(
fstat(3, {st_mode=
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
close(3)
Additional info:
There was a patch put into the kernel back in '07 to handle this very problem for other filesystems; maybe its helpful:
http://
There is also a thread on LKML from last December specifically about this very problem:
https:/
There was a discussion on the QEMU list back in '11 that doesn't seem to have come to a conclusion, but did provide the test program that i've attached to this report:
| Mark Glines (infinoid) wrote : | #1 |
| Eric Van Hensbergen (ericvh) wrote : Re: [Qemu-devel] [Bug 1336794] Re: 9pfs does not honor open file handles on unlinked files | #2 |
I've done some digging from the client side. As is mentioned in Miklos'
original patch (which appears to have been shot down) the higher level
implementation appear to be broken for this. Here's what the code looks
like for fstat in fs/stat.c:
int vfs_fstat(unsigned int fd, struct kstat *stat)
{
struct fd f = fdget_raw(fd);
int error = -EBADF;
if (f.file) {
}
return error;
}
In other words, it only uses the open fd to derrive a path and then
executes the getattr off of that path. If that path no longer exists
(because of unlink or remove) then you are hosed. In my understanding, the
"work around" I suppose is the so-called 'silly renaming' where
remove/unlink simply does a rename until all open instances are closed.
The frustrating thing is that the 9p protocol is setup to work off of the
fid, which maps to the fd -- so its perfectly capable of the original
semantic but the high level code in fs/stat.c only wants to give a path.
I can do a work around on the client where a getattr "favors" open fids for
the operation or I can do the sillyrename hack that NFS and CIFS has used
but both of those look god-awful.
-eric
On Fri, Apr 10, 2015 at 7:30 AM, Mark Glines <email address hidden> wrote:
> This bug makes it difficult to run a Debian Jessie guest with a 9pfs
> root filesystem, because "apt-get update" uses the open-unlink-fstat
> idiom. With this bug present, apt dies with the following error:
>
> E: Unable to determine file size for fd 7 - fstat (2: No such file or
> directory)
>
> --
> You received this bug notification because you are a member of qemu-
> devel-ml, which is subscribed to QEMU.
> https:/
>
> Title:
> 9pfs does not honor open file handles on unlinked files
>
> Status in QEMU:
> New
>
> Bug description:
> This was originally filed over here:
> https:/
>
> The open-unlink-fstat idiom used in some places to create an anonymous
> private temporary file does not work in a QEMU guest over a virtio-9p
> filesystem.
>
> Version-Release number of selected component (if applicable):
>
> qemu-kvm-
> qemu-system-
> (those are fedora RPMs)
>
> How reproducible:
>
> Always. See this example C program:
>
> https:/
>
> Steps to Reproduce:
> 1. Export a filesystem with virt-manager for the guest.
> (type: mount, driver: default, mode: passthrough)
> 2. Start guest and mount that filesystem
> (mount -t 9p -o trans=virtio,
> 3. Run a program that uses open-unlink-fstat
> (in my case it was trying to compile Perl 5.20)
>
> Actual results:
>
> fstat fails:
>
> open("/
> unlink(
> fstat(3, 0x23aa1a8) = -1 ENOENT (No such file or
> directory)
> close(3)
>
> Expected results:
>
> open("/
> unlin...
| Eric Van Hensbergen (ericvh) wrote : | #3 |
On Sun, Apr 12, 2015 at 9:09 AM, Al Viro <email address hidden> wrote:
> On Sun, Apr 12, 2015 at 12:42:35PM -0000, Eric Van Hensbergen wrote:
>
> > In other words, it only uses the open fd to derrive a path and then
> > executes the getattr off of that path. If that path no longer exists
> > (because of unlink or remove) then you are hosed. In my understanding,
> the
> > "work around" I suppose is the so-called 'silly renaming' where
> > remove/unlink simply does a rename until all open instances are closed.
>
> What do you mean, "no longer exists"? Don't confuse path with pathname -
> it's a mount,dentry pair. And dentry in question bloody well ought to
> still
> have the FID associated with it - you shouldn't use the same FID for
> TREMOVE and for TREAD/TWRITE.
Quite right, the fid is still there, I just don't have an easy way to get
at it. vfs_file operations have a direct notion of the fid because they
cache it in the struct file private data. dentries have several fids
associated with them and stored in thier private data, so I've got to
"guess" the right one. In most cases this probably won't cause a problem,
but it just feels a bit off.
There was a thread a few years back where Miklos was arguing for fstat to
pass through struct file information since the the fd given the fstat had a
file structure associated with it (which in 9p's case has a direct pointer
to the "right" fid):
http://
In any case, I've drafted a quick patch which takes the approach of
searching the dentry fid list for the fid, but it doesn't feel like the
right answer and I'm fairly certain I need to iterate on it a few times to
make sure I haven't hosed something up.
> TREMOVE clunks the FID passed to it; on
> some servers you really have no choice - server discards the file
> completely
> and on any FID that used to refer to it you get an error from that point
> on.
> On those you'd really have to do something like sillyrename - the only
> way to keep IO going for a file sitting on such server is to have it
> visible somewhere. Normal fs(4) is that way; e.g. u9fs(4) isn't - there
> FID maps to opened file descriptor on host and TREMOVE on another FID
> doesn't break it, as long as host supports IO on opened-but-unlinked files.
> I don't remember where qemu 9pfs falls in that respect, but I'd expect it
> to be more like u9fs...
>
>
Sort of, the servers in kvmtool and qemu (and diod) have a fid with the
open handle. However, all of them seem to implement getattr assuming they
have to re-walk to the file. In order to test my aforementioned changes to
the client, I also did a quick patch to kvmtool which checks and sees if
the fid it receives has an fd and just uses fstat instead of lstat. Patch
is here at the moment, I'll send upstream once I'm happy with the client
side changes and look into creating a patch for qemu/diod:
https:/
> Which FID are you passing to server on unlink()?
>
>
Unlink/remove look to be getting a proper fid (in other words, not using
the one that is open). The problem is that getattr is using a reference
fid (an open f...
| Eric Van Hensbergen (ericvh) wrote : Re: [V9fs-developer] [Qemu-devel] [Bug 1336794] Re: 9pfs does not honor open file handles on unlinked files | #4 |
Well, technically fid 3 isn't 'open', only fid 2 is open - at least
according to the protocol. fid 3 and fid 2 are both clones of fid 1.
However, thanks for the alternative workaround. If you get a chance, can
you check that my change to the client to partially fix this for the other
servers doesn't break nfs-ganesha:
https:/
On Mon, Apr 13, 2015 at 3:27 AM, Dominique Martinet <
<email address hidden>> wrote:
> Hi,
>
> for what it's worth, the sample code given works with nfs-ganesha
> server, so I'm not sure what's not working here.
>
> Here's the server traces:
> TATTACH: tag=1 fid=0 afid=-1 uname='nobody' aname='/export' n_uname=-1
> RATTACH: tag=1 fid=0 qid=(type=
> TGETATTR: tag=1 fid=0 request_mask=0x7ff
> RGETATTR: tag=1 valid=0x7ff qid=(type=
> uid=0 gid=0 nlink=3 rdev=192 size=4096 blksize=4096 blocks=8
> atime=(
> gen=0, data_version=0
> TATTACH: tag=1 fid=1 afid=-1 uname='' aname='/export' n_uname=0
> RATTACH: tag=1 fid=1 qid=(type=
> TGETATTR: tag=1 fid=1 request_mask=0x3fff
> RGETATTR: tag=1 valid=0x7ff qid=(type=
> uid=0 gid=0 nlink=3 rdev=192 size=4096 blksize=4096 blocks=8
> atime=(
> gen=0, data_version=0
> TWALK: tag=1 fid=1 newfid=2 nwname=1 (component 1/1 :test.txt)
> RERROR(_9P_TWALK) tag=1 err=(2|No such file or directory)
> TWALK: tag=1 fid=1 newfid=2 nwname=0
> RWALK: tag=1 fid=1 newfid=2 nwqid=0 fileid=1 pentry=0x8278c0 refcount=4
> TLCREATE: tag=1 fid=2 name=test.txt flags=0100102 mode=0100644 gid=0
> RLCREATE: tag=1 fid=2 name=test.txt
> qid=(type=
> pentry=
> TWALK: tag=1 fid=1 newfid=3 nwname=1 (component 1/1 :test.txt)
> RWALK: tag=1 fid=1 newfid=3 nwqid=1 fileid=
> pentry=
> TGETATTR: tag=1 fid=3 request_mask=0x17ff
> RGETATTR: tag=1 valid=0x7ff qid=(type=
> mode=0100644 uid=0 gid=0 nlink=1 rdev=192 size=0 blksize=4096 blocks=0
> atime=(
> gen=0, data_version=0
> TWRITE: tag=1 fid=2 offset=0 count=6
> RWRITE: tag=1 fid=2 offset=0 input count=6 output count=6
> TUNLINKAT: tag=1 dfid=1 name=test.txt
> TUNLINKAT: tag=1 dfid=1 name=test.txt
> TGETATTR: tag=1 fid=3 request_mask=0x3fff
> RGETATTR: tag=1 valid=0x7ff qid=(type=
> mode=0100644 uid=0 gid=0 nlink=0 rdev=192 size=6 blksize=4096 blocks=0
> atime=(
> gen=0, data_version=0
> TCLUNK: tag=1 fid=2
> RCLUNK: tag=1 fid=2
> TCLUNK: tag=1 fid=3
> RCLUNK: tag=1 fid=3
>
> (if you're not familiar with 9P, ATTACH = mount, WALK = create a new
> 'fid' either clone of current file (nwname = 0) or lookup, CLUNK ~=
> close. Rest is obvious enough)
>
>
> There's no lookup between the unlink and the getattr, so what I think is
> missing is that both qemu and d...
| Eric Van Hensbergen (ericvh) wrote : Re: [Qemu-devel] [V9fs-developer] [Bug 1336794] Re: 9pfs does not honor open file handles on unlinked files | #5 |
That patch looks fine by me. Happy to put it in the queue. Thanks Al.
On Tue, Apr 14, 2015 at 11:07 AM Al Viro <email address hidden> wrote:
> On Mon, Apr 13, 2015 at 04:05:28PM +0000, Eric Van Hensbergen wrote:
> > Well, technically fid 3 isn't 'open', only fid 2 is open - at least
> > according to the protocol. fid 3 and fid 2 are both clones of fid 1.
> > However, thanks for the alternative workaround. If you get a chance, can
> > you check that my change to the client to partially fix this for the
> other
> > servers doesn't break nfs-ganesha:
> >
> >
> https:/
>
> BTW, what the hell is going on in v9fs_vfs_mknod() and v9fs_vfs_link()?
> You allocate 4Kb buffer, sprintf into it ("b %u %u", "c %u %u", or "%d\n")
> feed it to v9fs_vfs_
> a local array of char? In the worst case it needs to be char name[24] -
> surely, we are not so tight on stack that extra 16 bytes (that array
> instead
> of a pointer) would drive us over the cliff?
>
> IOW, do you have any problem with this:
> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> index 703342e..cda68f7 100644
> --- a/fs/9p/vfs_inode.c
> +++ b/fs/9p/vfs_inode.c
> @@ -1370,6 +1370,8 @@ v9fs_vfs_
> *dentry, const char *symname)
> return v9fs_vfs_
> }
>
> +#define U32_MAX_DIGITS 10
> +
> /**
> * v9fs_vfs_link - create a hardlink
> * @old_dentry: dentry for file to link to
> @@ -1383,7 +1385,7 @@ v9fs_vfs_
> inode *dir,
> struct dentry *dentry)
> {
> int retval;
> - char *name;
> + char name[1 + U32_MAX_DIGITS + 2]; /* sign + number + \n + \0 */
> struct p9_fid *oldfid;
>
> p9_debug(
> @@ -1393,20 +1395,12 @@ v9fs_vfs_
> inode *dir,
> if (IS_ERR(oldfid))
> return PTR_ERR(oldfid);
>
> - name = __getname();
> - if (unlikely(!name)) {
> - retval = -ENOMEM;
> - goto clunk_fid;
> - }
> -
> sprintf(name, "%d\n", oldfid->fid);
> retval = v9fs_vfs_
> - __putname(name);
> if (!retval) {
> v9fs_refresh_
> v9fs_invalidate
> }
> -clunk_fid:
> p9_client_
> return retval;
> }
> @@ -1425,7 +1419,7 @@ v9fs_vfs_
> *dentry, umode_t mode, dev_t rde
> {
> struct v9fs_session_info *v9ses = v9fs_inode2v9se
> int retval;
> - char *name;
> + char name[2 + U32_MAX_DIGITS + 1 + U32_MAX_DIGITS + 1];
> u32 perm;
>
> p9_debug(
> @@ -1435,26 +1429,16 @@ v9fs_vfs_
> *dentry, umode_t mode, dev_t rde
> if (!new_valid_
> return -EINVAL;
>
> - name = __getname();...
| Eric Van Hensbergen (ericvh) wrote : Re: [V9fs-developer] [Qemu-devel] [Bug 1336794] Re: 9pfs does not honor open file handles on unlinked files | #6 |
good to know, thanks dominique. I gave it a sniff test with FSX and a few
other benchmarks, but I need to hit it with some multithreaded
regressions. Any pointers to reproducible failure cases would be
beneficial.
On Wed, Apr 15, 2015 at 6:28 AM Dominique Martinet <
<email address hidden>> wrote:
> Eric Van Hensbergen wrote on Mon, Apr 13, 2015 at 04:05:28PM +0000:
> > Well, technically fid 3 isn't 'open', only fid 2 is open - at least
> > according to the protocol. fid 3 and fid 2 are both clones of fid 1.
>
> Right, they're clone fids, but nothing in the protocol says what should
> happen to non-open fids when you unlink the file either - I guess both
> behaviours are OK as long as the client can handle it, so it would make
> sense to at least call fstat() on the fid matching the fd, but while
> I think this is how the kernel currently behaves the kernel doesn't HAVE
> to make one open, separate fid per open file descriptor either.
>
> > However, thanks for the alternative workaround. If you get a chance, can
> > you check that my change to the client to partially fix this for the
> other
> > servers doesn't break nfs-ganesha:
> >
> >
> https:/
>
> I'm afraid I haven't had much time lately, but while fstat-after-unlink
> still works I'm getting some Oops with my basic test suite (create empty
> files and rm -rf, kernel compilation, etc - nothing fancy) to the point
> of locking myself out of my test box pretty quickly.
>
> I'll try to debug patches a bit more individually (trying everything at
> once isn't helping), but at the very least something is fishy
>
> --
> Dominique Martinet
>


This bug makes it difficult to run a Debian Jessie guest with a 9pfs root filesystem, because "apt-get update" uses the open-unlink-fstat idiom. With this bug present, apt dies with the following error:
E: Unable to determine file size for fd 7 - fstat (2: No such file or directory)