Comment 42 for bug 2060780

Revision history for this message
Daniel Dawson (dotdawson) wrote :

Hi, I believe that kernel 5.15.0-105-generic may not have have solved the issue entirely.

I upgraded to 104 from -proposed, and then when 105 was available, I updated to 105, and then removed 104 and the -proposed repo entirely. While, the issue does not present itself as frequently, I still encounter similar if not the same CIFS errors that were not present in older kernels.

After updating to 105, I no longer encountered the issue when I first connect to the SMB share. However, I encounter the issue during long file copies to an SMB share. After trying to copy 20GB of data to the smb server, the the copy is interrupted after some time, and then the mount point is broken and unable to be accessed.

I am using dockers cifs volumes to mount smb shares into docker containers. The host uses cifs and mounts these into the /var/lib/docker/volumes/... directories which are remapped to containers in some way.

I have encountered the issue on 105 after the following steps.
On the host, I directly added a mount in `/etc/fstab`, updated `systemctl daemon-reload`, and mounted the share `mount -a`.
I then copy 20GB of data to the share `cp -r /path/to/my/data /mnt/rf/data`. After about a minute, the copy terminates.
In `dmesg`, I see the cifs errors. When I look at the host CIFS mount point, I can see that the folder is inaccessible.

uname -a
```
Linux docker-gpu-01 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
```

apt list --installed | grep "cifs-utils"
```
cifs-utils/jammy-updates,jammy-security,now 2:6.14-1ubuntu0.1 amd64 [installed]
```

cat /etc/fstab
```
...
//10.0.6.2/rf /mnt/rf cifs uid=1000,gid=1000,credentials=/home/user/.smb 0 0
```

sudo dmesg
```
[138420.302415] CIFS: Attempting to mount \\10.0.6.2\rf
[138420.315798] CIFS: VFS: parse_server_interfaces: malformed interface info
[138538.756550] CIFS: VFS: \\10.0.6.2 sends on sock 000000009d8f9284 stuck for 15 seconds
[138538.757040] CIFS: VFS: \\10.0.6.2 Error -11 sending data on socket to server
[138543.032429] CIFS: reconnect tcon failed rc = -13
[138543.051221] CIFS: VFS: No writable handle in writepages rc=-13
[138543.053159] CIFS: VFS: No writable handle in writepages rc=-13
[138543.063685] CIFS: VFS: No writable handle in writepages rc=-13
[138543.065707] CIFS: VFS: No writable handle in writepages rc=-13
[138543.077380] CIFS: VFS: No writable handle in writepages rc=-13
[138543.080055] CIFS: VFS: No writable handle in writepages rc=-13
[138543.090757] CIFS: VFS: No writable handle in writepages rc=-13
[138543.092996] CIFS: VFS: No writable handle in writepages rc=-13
[138543.151054] CIFS: VFS: \\10.0.6.2\media Close unmatched open for MID:116
```

Notes:
You can see that at `138420` I ran `mount -a` and the mount was added.
There was an error `parse_server_interfaces: malformed interface info`, but the file system was mounted and available.
Shortly after, I started the file transfer (I started it probably after 20-30 seconds.).
This file transfer failed by `138538` and produced the `VFS: \\10.0.6.2 sends on sock 000000009d8f9284 stuck for 15 seconds` error.
I can see on the sever that about `6GB/20GB` of data has transferred at this point.
Finally, you can see that at `138543`, a different mount (managed by docker) failed with the error `VFS: \\10.0.6.2\media Close unmatched open for MID:116`.

The storage system has a 1Gbps link to the server. Assuming 80MBps, the file copy would have made it through 6GB in about 75 seconds before it encountered this error, which is consistent with the dmesg timeline and my experience while watching the file copy.

The mount point on the host now looks like this:

sudo ls -hal /mnt
```
ls: cannot access '/mnt/rf': Permission denied
total 8.0K
drwxr-xr-x 3 root root 4.0K Apr 21 10:03 .
drwxr-xr-x 19 root root 4.0K Mar 26 21:02 ..
d????????? ? ? ? ? ? rf
```

I have not looked at the bugfix code, or the code that introduced the bug, but I am not fully convinced that the issue is solved.