GVFS-SMB write problems to SMB2 shares

Bug #1815395 reported by Scott A. Wozny
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
gvfs (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Summary:

I believe there’s a bug in GVFS-SMB (or its implementation in Nautilus) that manifests when writing to SMB2 shares. It occurs inconsistently (but enough to be annoying) in everyday use, but it can be replicated consistently with a simple echo command redirected to a file on the share. This doesn’t happen when using the ‘mount’ command to connect to SMB2 shares. It also doesn’t happen when making an SMB1 connection through Nautilus OR mount. I’ve also tested this against 2 different SMB servers (a Synology DS216J and a share on a Win7 Pro system) and the results are consistent so I don’t believe it’s a server side issue.

I don’t know enough about this part of the system to identify where the issue is or fix it, but I’ve included data below to show the basis for my assertions above. The workaround is either going back to SMB1 or mounting my shares with the mount command, but I was hoping someone could take this on and fix it long-term. As Microsoft progressively phases out SMB1, I think this will eventually become a wider scale problem so now would be an opportune time to address it. If there’s any further data I can provide, please let me know.

Also, I’m sorry this bug report is so long, but I’m erring on the side of too much detail rather than too little.

Detail:

On my Ubuntu 18.04 system, when I attempt to echo a string into a file on a GVFS mounted share (left hand nav of Nautilus + Other Locations) and that share is configured to SMB1 (client side left to negotiate, server side locked to SMB1) this works as expected. I tested against SMB shares on both a Synology Diskstation DS216J NAS and a Windows 7 Professional workstation. When I was on Ubuntu 16.04 that was SMB1 only out of the box I didn’t know this problem even existed. Now that I’m on Ubuntu 18.04 and my share was willing to make connection at SMB2 I started to have weird, inconsistent disk problems. Much troubleshooting and analysis later, I can recreate this issue consistently with a simple echo command to a file on the share.

To the SMB share on a Synology Diskstation NAS set to SMB1:

echo "This is a test." > /run/user/1000/gvfs/smb-share\:server\=diskstation\,share\=fileshare/Test-File-SMB1-GVFS-NAS.txt

The command executes with no feedback (other than the return of the prompt awaiting the next command) and the file of the name specified with the contents specified appears at the root of the NAS file share.

To the SMB share on a Windows 7 Professional workstation set to SMB1:

echo "This is a test." > /run/user/1000/gvfs/smb-share\:server\=hp_laptop\,share\=videos/Test-File-SMB1-GVFS-Win7.txt

The command also executes with no feedback (other than the return of the prompt awaiting the next command) and the file of the name specified with the contents specified appears at the root of the “videos” file share on the Win7 workstation.

I also took PCAPs of this occurring (attached). What they show is about what I would expect for this operation. Client SMB pings the server in packet 1 who responds in packet 2, queries info on the folder in packet 4 and gets some basic info back in packet 5, queries the filename (to make sure it doesn’t already exist) several times (not sure why), each time getting a name not found reply in packets 6 through 17, then creates the file in packet 18, gets back a success message in packet 19, queries info on the new file in packet 20, gets a success response in packet 21, writes the contents in packet 22, gets a good response in packet 23, queries the file one more time (presumably to make sure the new size is good) in packet 24, gets a response with the info in packet 25, requests the handle be closed in packet 26, and gets an OK to that in packet 27.

This is all wonderful. Perhaps slightly inefficient, but I don’t know what’s going on under the hood, so who am I to judge? When I switch to SMB2 on the server side and reconnect via GVFS using Nautilus, things fall apart.

To the SMB share on a Synology Diskstation NAS set to SMB2:

echo "This is a test." > /run/user/1000/gvfs/smb-share\:server\=diskstation\,share\=fileshare/Test-File-SMB2-GVFS-NAS.txt

returns

bash: /run/user/1000/gvfs/smb-share:server=diskstation,share=fileshare/Test-File-SMB2-GVFS-NAS.txt: Invalid argument

The file is where I expected it, but it is 0 bytes.

To the SMB share on a Windows 7 Professional workstation set to SMB2:

echo "This is a test." > /run/user/1000/gvfs/smb-share\:server\=hp_laptop\,share\=videos/Test-File-SMB2-GVFS-Win7.txt

returns

bash: /run/user/1000/gvfs/smb-share:server=hp_laptop,share=videos/Test-File-SMB2-GVFS-Win7.txt: Invalid argument

Again, the file is there in the right location, but it is 0 bytes.

Going to take a look at the PCAPs, it’s pretty clear what’s happened, but less clear why. The PCAPs were consistent between the NAS and the Win7 Pro workstation. So the client requests a file with no name (presumably as shortcut for pwd) be opened in packet 1 and a handle was returned in packet 2. Info about this handle was requested in packet 3 and returned in packet 4 (filename is \) followed by a request to close that handle in packet 5 and a success response in packet 6. I assume up until now, it was just overhead to make sure the share worked properly. Then 4 repeated requests of disposition 1 (open if exists, else fail) for the filename I echoed to with the related (and expected) name not found errors returned in packets 7 through 14. Then finally a disposition 3 request (create; fail if file already exists) for the filename specified in packet 15 followed by a success message in packet 16. This is the point where things start to go wrong. We then have 4 repeated file_all_info requests each followed by an access denied in packets 17 through 24. Of course, one looks at this as says WTF? You JUST created the file (and the 0 byte file does, indeed, appear in the location requested) so why would a request for file info return an access denied? I checked the file handles and they appear consistent. Unfortunately, I don’t know enough about the inner workings of SMB to see if there was anything incorrect about the file create in packet 15 or the response in packet 16, but that seems like a good place to start looking. After the access denied packets, the client must get the hint and requests a close of the file handle in packet 25 and gets an OK from the server to that in packet 26.

So, I stumbled around with this problem for a while comparing SMB1 with SMB2 (e.g. apples and oranges or, I guess more accurately, oranges and grapefruits) until someone in the forums suggested using the mount command instead of GVFS / Nautilus. I did that and things worked just fine.

To the SMB share on a Synology Diskstation NAS set to SMB2:

sudo mount -t cifs //diskstation/fileshare /media/NAS -o username=sawozny,vers=2.0,uid=1000

echo "This is a test." > /media/NAS/Test-File-SMB2-MOUNT-NAS.txt

The command executes with no feedback (other than the return of the prompt awaiting the next command) and the file of the name specified with the contents specified appears at the root of the NAS file share.

To the SMB share on a Windows 7 Professional workstation set to SMB2:

sudo mount -t cifs //hp_laptop/videos /media/Win7 -o username=sawozny,vers=2.0,uid=1000

echo "This is a test." > /media/Win7/Test-File-SMB2-MOUNT-Win7.txt

The command also executes with no feedback (other than the return of the prompt awaiting the next command) and the file of the name specified with the contents specified appears at the root of the “videos” file share on the Win7 workstation.

Note: I know I shouldn’t need to specify SMB dialect when doing the mount because the server is set to only make an SMB2 connection, but mounting doesn’t work for me if I don’t specify the version number. Maybe I’m doing something wrong here, but since the point was to make an SMB2 connection and I did I’m not burning a lot of cycles on why I needed to use an additional parameter that shuoldn’t have been needed.

Digging into the PCAPs (attached), it’s clear that what’s going on under the hood is very different. The PCAPs both appear identical, so there do not appear to be any server side differences affecting the testing. The PCAP starts right out of the gate with a disposition 5 (overwrite if the file already exists; otherwise, create it) request of the correct name in packet 1 and a handle returned in packet 2, then a file internal info request in packet 4 and a response in packet 5, followed by a set EOF info in packet 6 with a success response in packet 7. Next a disposition 1 (open if file exists, else fail) request in packet 8 followed by a success return in packet 9 followed by a set info setting the last write and last change dates in packet 10, followed by a success return in packet 11 and a close of the handle opened in packet 9 in packet 12 with a success return in packet 13, then the actual data write in packet 14 followed by a success return in packet 15, a close handle (from packet 2) in packet 16 and a successful return of that request in packet 17.

And the file is where it’s supposed to be with the correct contents on an SMB2 share.

Unfortunately, this doesn’t really help pin down the nature of the issue. If the SMB2 conversations were the same to a certain point and diverged it would be easier to point to a spot and say “look here” but like everything else in computing there’s more than one way to do almost any function. If we were to focus on the actual moment of file creation, then in the GVFS SMB2 PCAPs there is no OpLock (0) vs batch OpLock (9) in the MOUNT PCAP, also the Access Mask in the GVFS PCAP is 0x00120116 vs 0x40000080 in the MOUNT PCAP, the Share Access is 0x0000003 GVFS vs 0x0000007 MOUNT, the disposition of the create is different as described and there is an SMB2_CREATE_DURABLE_HANDLE_REQUEST in the MOUNT PCAP that is not in the GVFS PCAP, but I don’t know if anything that happened before or after the moment of file creation in both the failed and successful PCAPs affected that moment, making those differences make sense. So all the PCAPs are attached for your review to see what else might be making the difference.

Finally, per the instructions at https://wiki.gnome.org/Projects/gvfs/debugging I ran 2 debugs of this issue. One was with the base instructions for a debug and the other was with GVFS_SMB_DEBUG set to 10. These were not done at the same time as the PCAPs to avoid Heisenberg differences, but the mount steps and echo command were the same, so if you follow the latter part of the debug (the packet captures only have the actual transfer where the debug log has to have everything from the launch of Nautilus) it should be pretty easy to match up what’s happening. Unfortunately (AGAIN) not much additional info seems to be offered. The failed requests for the file name show up (2 “No such directory or file” returns in the log, rather than the 4 STATUS_OBJECT_NAME_NOT_FOUND packets in the PCAP; not sure why the discrepancy) but the write returns no error code in the log, despite the fact that in the PCAP it doesn’t appear to even have gotten to the stage of writing. Perhaps the repeated STATUS_ACCESS_DENIED in the PCAP were part of that stage in the process, but a failure at that point wasn’t something GVFS-SMB knew how to handle so it didn’t return an error code on that function call (but also didn’t do the actual writing). The only other thing I can find in that file of interest is a GvfsJobCreateMonitor call that returned “Operation not supported by backend” but in context it seems to be for the creation of a directory monitor which doesn’t seem to be that life threatening a failure. Anyway, I’m just guessing at this point so I included the logs in case there’s something important in there I missed.

If you’ve made it this far, I hope this has been useful to tracking down this bug. If there is anything I can do to help in terms of further information gathering or testing, please let me know.

Revision history for this message
Scott A. Wozny (sawozny) wrote :
Revision history for this message
Sebastien Bacher (seb128) wrote :

Thanks for the detailled bug, maybe it would be better if you reported that directly upstream to the people writing the code though
https://gitlab.gnome.org/GNOME/gvfs/issues

no longer affects: ubuntu
Revision history for this message
Scott A. Wozny (sawozny) wrote :

Will do. I wanted to post here before cross-posting it in case the issue was something to do with the Ubuntu implementation of GVFS and not GVFS itself. I was also kind of hoping someone could do my tests to confirm this is actually a bug and not some craziness that only affects me but I suppose the GVFS folks could do that, too.

Revision history for this message
Sebastien Bacher (seb128) wrote :

We might have a look but we don't really have a gvfs maintainer for Ubuntu, also the package is not patched in any Ubuntu specific way so if there is an issue it's probably an upstream one

Revision history for this message
Scott A. Wozny (sawozny) wrote :

Well, I'm not sure what the issue or fix was, but this issue can be closed.

I posted the bug to the GVFS list and, amongst other suggestions, one of the maintainers noticed an error in the GVFS debug log of 2 malformed parameters in smb.conf min client protocol and max client protocol (vs the correct client min protocol and client max protocol). These nonsense parameters were also both set to SMB3 so the first thing I did was remove them from smb.conf before doing the other tests recommended. However, now my repeatable "issue" scenario writes the files properly (and the traces show a perfectly good response to the file info request that repeatedly returned access denied errors before, BUT the file creation parameter values are now different for no reason I can figure out). I tried adding the parameter lines back, as they were before, but I still can't recreate the issue. I also tried "fixing" those bad parameter lines and it prevented me from connecting to my SMB2 share at all, which I would expect when the client side is locked to SMB3. So after I pulled the lines again to let the client side negotiate naturally, I can't get the issue to happen again after repeated tests with different programs, some of which had consistent write problems and some of which had intermittent write problems. I didn't do any updates to the system in between when I posted this and when I "fixed" it and the one change I DID make I reversed, but it didn't cause the problem to recur.

I suppose this is why I wanted to see if anyone could replicate this. I would bet now that they couldn't but I doubt I'll ever know what the actual issue was, now.

Anyway, I'm closing this bug. Sorry for the false alarm. :)

Thanks,

Scott

Revision history for this message
Scott A. Wozny (sawozny) wrote :

Sorry, spoke too soon. I'm not sure which status to change this to. It was never confirmed by anyone else, but the traces definitely showed something bad happening. Any suggestions what I should set the status to in order to properly file it away?

Scott A. Wozny (sawozny)
Changed in gvfs (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.