Shared folder randomly not mounted

Bug #1687273 reported by Jonathan on 2017-04-30
128
This bug affects 19 people
Affects Status Importance Assigned to Milestone
Linux
Unknown
Unknown
cifs-utils (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Unassigned
linux (Debian)
Fix Released
Unknown
linux (Ubuntu)
High
Joseph Salisbury
Xenial
High
Joseph Salisbury

Bug Description

Hello Everybody,

since last update from ubuntu 16.04.2, last night, my servers are randomly not able to read and write from shared folders. I got this message:

lsof: WARNING: can't stat() cifs file system /mnt/record
      Output information may be incomplete.
lsof: WARNING: can't stat() cifs file system /mnt/record
      Output information may be incomplete.
lsof: WARNING: can't stat() cifs file system /mnt/record
      Output information may be incomplete.
...
rsync: ERROR: cannot stat destination "/mnt/record/": Host is down (112)
rsync error: errors selecting input/output files, dirs (code 3) at main.c(652) [Receiver=3.1.1]

This comes from my cron script.

When I type df -h it takes sometimes very long to get the list. And it also can be that the mounted share is disconnected and I have to mount -a (what also takes long) again.

My fstab have this command:

//address /mnt/storage cifs auto,nofail,iocharset=utf8,rw,credentials=/path/.smb,uid=1000,gid=1000,file_mode=0660,dir_mode=0770 0 0

My update list from last night was this:

linux-image-4.4.0-75-generic:amd64 (4.4.0-75.96, automatic)
linux-image-extra-4.4.0-75-generic:amd64 (4.4.0-75.96, automatic)
linux-headers-4.4.0-75-generic:amd64 (4.4.0-75.96, automatic)
linux-headers-4.4.0-75:amd64 (4.4.0-75.96, automatic)
Upgrade: libdns-export162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.5, 1:9.10.3.dfsg.P4-8ubuntu1.6)
linux-headers-generic:amd64 (4.4.0.72.78, 4.4.0.75.81)
linux-libc-dev:amd64 (4.4.0-72.93, 4.4.0-75.96)
mysql-client-5.7:amd64 (5.7.17-0ubuntu0.16.04.2, 5.7.18-0ubuntu0.16.04.1)
libapt-inst2.0:amd64 (1.2.19, 1.2.20)
mysql-server-5.7:amd64 (5.7.17-0ubuntu0.16.04.2, 5.7.18-0ubuntu0.16.04.1)
libsystemd0:amd64 (229-4ubuntu16, 229-4ubuntu17)
linux-image-generic:amd64 (4.4.0.72.78, 4.4.0.75.81)
apt:amd64 (1.2.19, 1.2.20)
mysql-server:amd64 (5.7.17-0ubuntu0.16.04.2, 5.7.18-0ubuntu0.16.04.1)
udev:amd64 (229-4ubuntu16, 229-4ubuntu17)
libapt-pkg5.0:amd64 (1.2.19, 1.2.20)
libudev1:amd64 (229-4ubuntu16, 229-4ubuntu17)
cifs-utils:amd64 (2:6.4-1ubuntu1, 2:6.4-1ubuntu1.1)
dpkg:amd64 (1.18.4ubuntu1.1, 1.18.4ubuntu1.2)
mysql-client-core-5.7:amd64 (5.7.17-0ubuntu0.16.04.2, 5.7.18-0ubuntu0.16.04.1)
libisc-export160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.5, 1:9.10.3.dfsg.P4-8ubuntu1.6)
linux-virtual:amd64 (4.4.0.72.78, 4.4.0.75.81)
systemd-sysv:amd64 (229-4ubuntu16, 229-4ubuntu17)
distro-info-data:amd64 (0.28ubuntu0.2, 0.28ubuntu0.3)
systemd:amd64 (229-4ubuntu16, 229-4ubuntu17)
mysql-common:amd64 (5.7.17-0ubuntu0.16.04.2, 5.7.18-0ubuntu0.16.04.1)
apt-utils:amd64 (1.2.19, 1.2.20)
libmysqlclient20:amd64 (5.7.17-0ubuntu0.16.04.2, 5.7.18-0ubuntu0.16.04.1)
linux-headers-virtual:amd64 (4.4.0.72.78, 4.4.0.75.81)
linux-image-extra-virtual:amd64 (4.4.0.72.78, 4.4.0.75.81)
thermald:amd64 (1.5-2ubuntu3, 1.5-2ubuntu4)
qemu-guest-agent:amd64 (1:2.5+dfsg-5ubuntu10.10, 1:2.5+dfsg-5ubuntu10.11)
libfreetype6:amd64 (2.6.1-0.1ubuntu2.1, 2.6.1-0.1ubuntu2.2)
linux-image-virtual:amd64 (4.4.0.72.78, 4.4.0.75.81)
libdpkg-perl:amd64 (1.18.4ubuntu1.1, 1.18.4ubuntu1.2)
mysql-server-core-5.7:amd64 (5.7.17-0ubuntu0.16.04.2, 5.7.18-0ubuntu0.16.04.1)
libxslt1.1:amd64 (1.1.28-2.1, 1.1.28-2.1ubuntu0.1)
dpkg-dev:amd64 (1.18.4ubuntu1.1, 1.18.4ubuntu1.2)

I saw that somebody else has the same problem:

https://askubuntu.com/questions/910058/updates-broke-cifs-smb-mounts-in-16-04

I had to remove the latest kernel update, after that it works.

Regards

Jonathan (jb-alvarado) on 2017-04-30
description: updated
tags: removed: ubuntu
Jonathan (jb-alvarado) on 2017-04-30
description: updated
Jonathan (jb-alvarado) on 2017-05-01
description: updated
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cifs-utils (Ubuntu):
status: New → Confirmed
Michael May (mjmay) on 2017-05-02
no longer affects: cifs-utils
Jonathan (jb-alvarado) on 2017-05-02
description: updated
Jonathan (jb-alvarado) wrote :

Hello,
I only wanted to inform you, that the problem still exist in kernel 4.4.0-77.

Regards,
Jonathan

Not sure if this helps. This is what DebugData looks like after disconnecting.
% cat /proc/fs/cifs/DebugData
Display Internal CIFS Data Structures for Debugging
---------------------------------------------------
CIFS Version 2.08
Features: dfs fscache lanman posix spnego xattr acl
Active VFS Requests: 0
Servers:
Number of credits: 1
1) Name: <redacted> Domain: WORKGROUP Uses: 1 OS: Windows 6.1
        NOS: Samba 4.3.13 Capability: 0x8080f3fc
        SMB session status: 1 TCP status: 4
        Local Users To Server: 1 SecMode: 0x7 Req On Wire: 1
        Shares:
        1) \\<redacted> Mounts: 1 Type: NTFS DevInfo: 0x20 Attributes: 0x1002f
        PathComponentMax: 255 Status: 1 type: DISK DISCONNECTED

        MIDs:
        State: 2 com: 43 pid: 4894 cbdata: ffff88007aee6000 mid 0

Upgrading to SMB version 3 seems to have fixed it, now running stable since 12 hours whereas the mount point went away after approx. 50 minutes with the old version.
Current mount options:
iocharset=utf8,rw,credentials=/path,uid=0,gid=0,file_mode=0660,dir_mode=0770,vers=3.0

Casey Knolla (cknolla) wrote :

Reverting to kernel (GNU/Linux 4.4.0-72-generic x86_64) undoes the damage this bug causes. My shares were disconnecting in about 5 minutes from mount. Quite annoying.
Confirmed 4.4.0-75 and 4.4.0-77 are both affected.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1687273

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Iggy (ignatius-pang) on 2017-05-18
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Iggy (ignatius-pang) wrote :

This is a huge issue for me as I cannot access my shared drives from my Ubunut computer. If every computer updates have an issue like this, this will have big impact on peoples work. I think this is a critical issue that needs to be addressed asap.

Iggy (ignatius-pang) wrote :

Something serious like this should not technically appear on a Long Term Support Ubuntu version, and if it does, should be fixed in a very short time.

Iggy (ignatius-pang) wrote :

Another problem is that since the CIFS shared drive is mounted. It causes the Nautilus File Explorer to crash, this slows down all my processes on my computer, rendering my computer unable to work at all. It needs to be restarted about every 3 hours after that. Completely rendering us not being able to work efficiently. I have to spend precious work time copying over the files from my shared drive to my local computers, make sure that the files are copied correctly and connection did not died or failed. This is not practical.

A Kernel downgrade is not a long term solution, as dowwn grading the kernel could possibly cause something else to break (for example, VirtualBox, which relies on the linux kernel to work properly). It also potentially introduce vulnerability to attack. Given that downgrading could cause more problem, it is just not safe to do for me.

As much as I like Ubuntu, using it for 10+ years now, this has failed big time. Surely will look into other linux distro or switching over to Mac next time I get a new computer.

Iggy (ignatius-pang) wrote :

I've mounted the following shared drive at around 8 am, it work fine then. This is now 10 am and the shared drive has started stall. The CIFS drive debug gave me the following messages:

Display Internal CIFS Data Structures for Debugging
---------------------------------------------------
CIFS Version 2.08
Features: dfs fscache lanman posix spnego xattr acl
Active VFS Requests: 3
Servers:
Number of credits: 0
1) entry for <redacted> not fully displayed
 TCP status: 4
 Local Users To Server: 1 SecMode: 0x7 Req On Wire: 15
 Shares:
 1) <redacted> Mounts: 1 Type: NTFS DevInfo: 0x20 Attributes: 0xc700ff
 PathComponentMax: 255 Status: 1 type: DISK DISCONNECTED

 MIDs:
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 10016 cbdata: ffff880128c80000 mid 0
 State: 2 com: 43 pid: 8737 cbdata: ffff880128c80000 mid 0
 State: 2 com: 114 pid: 10421 cbdata: ffff8807e105da00 mid 3050

Kai-Heng Feng (kaihengfeng) wrote :
Emory Penney (ejpenney) wrote :

I've had some luck adding this line to my crontab:

*/1 * * * * ls /mnt/record

It seems by listing the contents of the mount every 60 seconds I stave off its death, but YMMV. It's definitely a hack, and I look forward to an actual fix, but in the meantime.

Jonathan (jb-alvarado) wrote :

A better way is, at the moment, to put "vers=3.0" to the mount option, like Christian Drexler wrote. This helps.

DanielW (daniel-watsonbros) wrote :

This problem hit me on a xenial install with HWE and linux-generic-hwe-16.04 with generic kernels 4.8.0-52 and 4.8.0-53. Problem is alleviated by using the mainline kernel 4.12-rc2.

Please note: I cannot use the vers=3.0 workaround as described above because one of my servers is running Windows Server 2003 which, as I understand, still uses the older SMB1 protocol.

This problem happened on a users workstation and due to the freezing problem with the desktop and file manager, her system was unusable for about 2-3 days.

It took me a little while to pin-point the problem to a complete cifs freeze which was occurring within 5-45 minutes of logging on.

Patrick Tenhaken (paret) wrote :

Hello,
i today i upgraded all my Systems and i was affected by this problem. Filesystem is a Windows Server 2016 Datacenter. All my Ubuntu releases have problems (16.04 LTS, 16.10 and 17.04).

The version solution solve the problem for me - but overall it isn't really funny because when the problem occure there is from time to time a full system freeze so i was not able to change my frozen tty session.

Iggy (ignatius-pang) wrote :

Updating the kernel to the latest version did not fix the problem for me, unfortunately. This is still quite unacceptable, not being able to access essential files and folders!

This is some sort of race condition every 15 minutes causes a flood/ddos and will break the connection

debian bugtracker: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=856843
kernel bugtracker: https://bugzilla.kernel.org/show_bug.cgi?id=194531
patch: http://www.spinics.net/lists/linux-cifs/msg12456.html

Using v3 is not an option for me, because my provider is not offering unix extensions with version > v1. This bug was not fixed on HWE kernels (4.8), therefore there is no fixed version available. Please backport the patch that was mentioned above, thank you!

The attachment "fix.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
affects: cifs-utils → linux
Changed in cifs-utils (Debian):
status: Unknown → Fix Released

any update? what is necessary to include this patch?

tags: added: regression-update

still no update, any chance to get this fixed?

Steve Langasek (vorlon) on 2017-06-30
affects: cifs-utils (Debian) → linux (Debian)
Changed in cifs-utils (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → High
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
John Kovalcyk (bxdfpbvdga) wrote :

any chance to get this resolved after no response at all, @Joseph Salisbury? i'm affected too

Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

The patch posted in comment #19 was included in mainline as commit 62a6cfddcc0a5313e7da3e8311ba16226fe0ac10 and was cc'd to upstream stable.

This commit is in the Ubuntu-4.4.0-79.100 kernel. Can you apply the latest updates and see if this bug is resolved?

seems to be resolved, issue did not occured again

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cifs-utils (Ubuntu Xenial):
status: New → Confirmed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Released
Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.