Regular recoverable disconnections over SAMBA on OSX clients

Bug #1666843 reported by Eric Altman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
samba (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

On Ubuntu 16.04.02 running SAMBA 4.3.11, various OS X (or macOS now) clients have issues with our SAMBA server which uses ZFS as the filesystem.

During their regular work, it seems nothing that can be pinned down, every share the system is connected to will disconnect with a notification from macOS. This occurs on our Yosemite, El Cap, and Sierra clients. The systems can be remounted immediately without needing a reboot or resetting services.

This happens anywhere between once and many times a day, but there is not a day that goes by where it doesn't happen once.

The ZFS pool is basic but large (10 pools of 10 disks in a z2), and there is a 128GB of ram dedicated to caching. There have been no reported issues with ZFS that I can see. No errors, missing devices, corruption, etc... The pool is set xattr=sa.

As you can see from the attached smb.conf, fruit is active and configured. We have tried countless variations of the smb.conf but COULD have missed something, so invite enquiries there.

More details: The server is connected over a lagged connection to a switch then served to the clients. Some clients are connecting through a Dell switch and some through a Supermicro switch, both with their own LAG. This doesn't seem to matter.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: samba 2:4.3.11+dfsg-0ubuntu0.16.04.3
ProcVersionSignature: Ubuntu 4.4.0-59.80-generic 4.4.35
Uname: Linux 4.4.0-59-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
ApportVersion: 2.20.1-0ubuntu2.5
Architecture: amd64
Date: Wed Feb 22 20:45:15 2017
InstallationDate: Installed on 2017-02-02 (19 days ago)
InstallationMedia: ShareOS 16.04.1 2017.02.01 amd64 "ShareOS Xenial"
NmbdLog:

OtherFailedConnect: Yes
ProcEnviron:
 LANGUAGE=en_US
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SambaServerRegression: No
SmbConfIncluded: Yes
SmbLog:

SourcePackage: samba
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Eric Altman (ericdaltman) wrote :
summary: - Regular recoverable disconnections on OSX clients
+ Regular recoverable disconnections over SAMBA on OSX clients
description: updated
Revision history for this message
Nish Aravamudan (nacc) wrote :

Hello and thank you for filing this bug report!

This does seem like a real bug, but it might be quite tricky to debug, as there are several different axis to consider.

Is it possible to:

a) Reproduce this without ZFS (just for testing, in a separate share)

b) Reproduce this with newer Ubuntu (just for testing, yakkety would be interesting, but zesty is probably better initially, only because we need to ensure it is fixed in the latest release).

c) Does it happen to Ubuntu Samba clients? Or only MacOS? Is it possibly a bug in MacOS?

d) I don't know much about networking itself, but if we could remove the nuances of the local configuration and reproduce it with a simple setup (one server and one client on a single switch using a standard filesystem backing the NFS share), then I think we can more readily debug the issue.

Thanks!
-Nish

Changed in samba (Ubuntu):
status: New → Triaged
Revision history for this message
Eric Altman (ericdaltman) wrote :

My apologies for not getting back to this. Realizing my settings here were a little off and I wasn't getting notifications on responses. This is a production server with no real downtime working 7 days a week on a continent on the other side of the world from me.

That said, the incidents have reduced significantly (about once a week now, instead daily) when I tried creating one share and then using the 'copy' feature to define the other shares.

ZFS is a little broken right now in the newer kernel, so I'm unwilling to update until it is well tested here in my Lab.

The network only has Mac clients, unfortunately, and while possibly a MacOS bug I couldn't find anything in the console aside from the usual notifications of the disconnect.

The clients are connected over one of three methods. Some are direct connect 10GbE, some through a 1GbE switch (coming in with a 2x10GbE LAG), others through a 10GbE switch (coming in with a 2x40GbE LAG).

I think closing this out for now and re-reporting if there is a new instance with more metrics I can share is prudent.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Eric for coming back to this!
Quite understandable that you don't want to mess with the remote production server to test this.

I'm happy that the issues at least show up less now, although the updates that came in were only security related and should not be the cause for this improvement :-/.

Following you closing statement and setting the bug to incomplete for now, if you gain extra insights/metrics please share them and reopen.

Changed in samba (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for samba (Ubuntu) because there has been no activity for 60 days.]

Changed in samba (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.