nfs-kernel-server requires a real interface to be up

Bug #848823 reported by Tim Lunn
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
ifupdown (Ubuntu)
Triaged
Medium
Clint Byrum

Bug Description

For the last few weeks whenever I try to mount a nfs share from another machine, I get an access denied message from the server.

once I restart via 'service nfs-kernel-server restart', then it starts working again. This is required atleast once per day.

nfs server is running on fully updated 11.10 onieric
clients are on 11.04 and 10.04

I never experienced such issues prior to 11.10

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: nfs-kernel-server 1:1.2.4-1ubuntu2
ProcVersionSignature: Ubuntu 3.0.0-10.16-generic 3.0.4
Uname: Linux 3.0.0-10-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 1.22.1-0ubuntu2
Architecture: amd64
Date: Tue Sep 13 20:02:19 2011
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Alpha amd64 (20110803.1)
SourcePackage: nfs-utils
UpgradeStatus: Upgraded to oneiric on 2011-08-13 (31 days ago)

Revision history for this message
Tim Lunn (darkxst) wrote :
Revision history for this message
Steve Langasek (vorlon) wrote :

Thanks for the report.

When this happens, what does 'showmount $server' show? Does the error happen only after a server reboot, or randomly at any time?

What mount options do you use on the client, and what does /etc/exports show on the server?

Changed in nfs-utils (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Tim Lunn (darkxst) wrote : Re: [Bug 848823] Re: mounting of nfs shares fails and must restart nfs-kernel-server

I thought it was happening randomly however it does actually appear to happen after rebooting the server. I do notice some errors on boot
relating to exportfs but I can't find them in any logs (or get a chance to actually read what they say).

options on client in fstab are 'nfs rsize=8192,wsize=8192,timeo=14,intr'
manually trying to mount from client results in this error
"mount.nfs: access denied by server while mounting"

options in /etc/exports are '(rw,no_subtree_check)'

restarting nfs-kernel-server after I have booted up results in this error
"exportfs: scandir /etc/exports.d: No such file or directory"

however everything does work fine afterwards

Revision history for this message
Steve Langasek (vorlon) wrote : Re: mounting of nfs shares fails and must restart nfs-kernel-server

Well, I asked for the output of 'showmount $server' and what /etc/exports shows (not just the options) because I suspected you are having a name resolution problem at startup. Your responses so far appear to confirm this possibility. How is the network configured on this server - via NetworkManager, or via ifupdown (/etc/network/interfaces)? How does this server do DNS resolution: does it have a local DNS server, or is it pointed at another server somewhere on the network?

Revision history for this message
Tim Lunn (darkxst) wrote :

$showmount server (this does work so the server does seem to be running)
Hosts on server:
192.168.1.32

'/etc/exports' (there are a bunch of other folders shared, but they all have the same options)
/media/drive1 client1(rw,no_subtree_check) client2(rw,no_subtree_check)

Network on the server is configured via network manager, DNS is off a dd-wrt router and using static IP's via DHCP.

Revision history for this message
Steve Langasek (vorlon) wrote :

Ok. If I'm not mistaken, this is a regression introduced by the switch of rc-sysinit.conf from starting on 'network-interface-up IFACE!=lo' to 'static-network-up'. Previously, we had a guarantee that scripts in runlevel 2 would not be started until at least one non-loopback interface is up. Now, if there are *no* interfaces configured in ifupdown (i.e., all networking is done via NM), the 'static-network-up' event will be emitted as soon as the loopback interface itself comes up.

That means services such as nfs-kernel-server will try to start before there's a route to the DNS server, resulting in exactly the failure you describe here.

We need to fix this in ifupdown for release, so that we never emit static-network-up until at least one non-loopback interface is brought up. (And if none are brought up, we wait for the 'failsafe' event as before.)

affects: nfs-utils (Ubuntu) → ifupdown (Ubuntu)
Changed in ifupdown (Ubuntu):
assignee: nobody → Clint Byrum (clint-fewbar)
importance: Medium → High
status: Incomplete → Triaged
summary: - mounting of nfs shares fails and must restart nfs-kernel-server
+ if all interfaces are configured via NM, static-network-up emitted when
+ only lo is configured
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: if all interfaces are configured via NM, static-network-up emitted when only lo is configured

Discussed with Steve on IRC, and we confirmed that the assumption that this is the rc-sysinit change is probably false. Prior to 11.10, the rc-sysinit start on condition was

start on filesystem and net-device-up IFACE=lo

So in the case of a NetworkManager only system, runlevel 2 is reached at precisely the same point as before.

My guess is that this has always been racing with network-manager, and somewhere between 11.04 and 11.10, we changed things enough that rc-sysinit + everything in /etc/rc2.d/S* before nfs-kernel-server overtook network-manager, which does start very early (local-filesystems and started dbus), so this seems quite feasible.

Since all listening is, by default, on 0.0.0.0 (::0 for ipv6) then I have to wonder if you have it binding to a specific interface (possibly py pssing -H some.host.name in /etc/default/nfs-kernel-server in RPCNFSDOPTS).

If so, it may be preferrable for you to move the configuration of that interface to /etc/network/interfaces to prevent this type of race condition.

Either way, I don't believe this is due to the rc-sysinit change, so I'm moving the status of this back to New, though keeping myself assigned.

Changed in ifupdown (Ubuntu):
status: Triaged → New
Revision history for this message
Tim Lunn (darkxst) wrote :

nfsd on my system is just configured as default. the only file I have touched is 'exports'.

Have moved the interface into /etc/network/interfaces, and yes it does prevent the race condition.

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 848823] Re: if all interfaces are configured via NM, static-network-up emitted when only lo is configured

Excerpts from Tim's message of Tue Sep 20 00:19:49 UTC 2011:
> nfsd on my system is just configured as default. the only file I have
> touched is 'exports'.
>
>
> Have moved the interface into /etc/network/interfaces, and yes it does prevent the race condition.
>

Very strange though that it caused an issue to start w/o a specific bind
address. Would you mind running

sudo netstat -tnlp

On the system and pasting it (redact any public IPs).

Thanks!

Revision history for this message
Tim Lunn (darkxst) wrote : Re: if all interfaces are configured via NM, static-network-up emitted when only lo is configured

output from 'sudo netstat -tnlp' is attached.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Thanks Tim!

So the bug seems to be more that nfs-kernel-server may require a real interface to be up to start, since it does look like it is listening on :: and 0.0.0.0, which shouldn't need any interfaces, and certainly none beyond lo. I'll re-title as such and re-assign to nfs-utils.

summary: - if all interfaces are configured via NM, static-network-up emitted when
- only lo is configured
+ nfs-kernel-server requires a real interface to be up
affects: ifupdown (Ubuntu) → nfs-utils (Ubuntu)
Changed in nfs-utils (Ubuntu):
assignee: Clint Byrum (clint-fewbar) → nobody
Revision history for this message
Steve Langasek (vorlon) wrote :

No, nfs-kernel-server isn't failing to *start*; the "access denied" from the bug description contradicts this. All signs point to this being a race caused by nfs-kernel-server starting before DNS is available (because the network isn't up yet), causing hostnames in the config to fail to resolve.

It's true that ifupdown hasn't regressed vs. how we were starting runlevel 2 before, but it's still buggy, because classically init scripts in runlevel 2 are allowed to assume that the network is fully up before they're called. When we changed this to only wait for lo, it was a compromise because we hadn't worked out a way to keep the boot from hanging forever in the case of problems. Now that we have a way to do that, we should use it, deferring the static-network-up event until at least one non-loopback interface is up even when all interfaces are managed by NM. No additional delay if the network config actually *is* static, and more robust handling of init scripts if it isn't.

(Clint, I've assigned this bug back to you as part of the reassignment to ifupdown, but if this isn't on your priority list, feel free to unassign... or assign it to one of your colleagues :)

affects: nfs-utils (Ubuntu) → ifupdown (Ubuntu)
Changed in ifupdown (Ubuntu):
assignee: nobody → Clint Byrum (clint-fewbar)
Revision history for this message
Steve Langasek (vorlon) wrote :

Ok, reversing my position after talking with Clint on IRC. :)

We can't delay the static-network-up event because doing so would slow down the entire desktop boot sequence for all users. This needs to be fixed instead by converting nfs-kernel-server to use an upstart job that's 'start on [...] net-device-up IFACE!=lo'.

Tim, a workaround for you would be to configure your network via /etc/network/interfaces instead of network-manager. That would eliminate the race in your particular case (i.e., running servers on a desktop).

affects: ifupdown (Ubuntu) → nfs-utils (Ubuntu)
Changed in nfs-utils (Ubuntu):
assignee: Clint Byrum (clint-fewbar) → nobody
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 848823] Re: nfs-kernel-server requires a real interface to be up

Excerpts from Steve Langasek's message of Tue Sep 20 20:09:58 UTC 2011:
> No, nfs-kernel-server isn't failing to *start*; the "access denied" from
> the bug description contradicts this. All signs point to this being a
> race caused by nfs-kernel-server starting before DNS is available
> (because the network isn't up yet), causing hostnames in the config to
> fail to resolve.
>
> It's true that ifupdown hasn't regressed vs. how we were starting
> runlevel 2 before, but it's still buggy, because classically init
> scripts in runlevel 2 are allowed to assume that the network is fully up
> before they're called. When we changed this to only wait for lo, it was
> a compromise because we hadn't worked out a way to keep the boot from
> hanging forever in the case of problems. Now that we have a way to do
> that, we should use it, deferring the static-network-up event until at
> least one non-loopback interface is up even when all interfaces are
> managed by NM. No additional delay if the network config actually *is*
> static, and more robust handling of init scripts if it isn't.
>
> (Clint, I've assigned this bug back to you as part of the reassignment
> to ifupdown, but if this isn't on your priority list, feel free to
> unassign... or assign it to one of your colleagues :)

That all makes sense Steve, and I hadn't thought about DNS problems.

The current situation where static-network-up is emitted only when
/etc/network/interfaces is handled is a compromise to keep from regressing
the desktop boot speed waiting for connections that may never come.

We could probably interrogate NetworkManager and find out if there are
"boot time" network interfaces that are going to be brought up, and if so,
delay while waiting for those. I'll keep this task as part of ifupdown
and see if we can do just that early in the next dev cycle.

Changed in nfs-utils (Ubuntu):
status: New → Triaged
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Reassigning to ifupdown after talking to Steve again. Reducing importance to Medium since there is a workaround and no new regression.

affects: nfs-utils (Ubuntu) → ifupdown (Ubuntu)
Changed in ifupdown (Ubuntu):
assignee: nobody → Clint Byrum (clint-fewbar)
importance: High → Medium
Revision history for this message
Joseph Brown (1st2be) wrote :

I've run into a similar issue after upgrading from 11.04 to 11.10 on server:

$ showmount server
Hosts on server:
192.168.111.110
192.168.111.57

By placing the following in the fstab of clients, resolved consistent nfs halts:

server:/path /local/path nfs proto=udp

The halts were occurring consistently until I coerced it to use nfs3 AND udp.

This could be dns issues, but had /etc/network/interfaces configured before and after network-manager was uninstalled, did not resolve the issue. My local nameserver /firewall does a bad job of tracking locals. Tried placing hosts in /etc/hosts; little to no effect.

Seems like NFS keeps working on the server, nfswatch, etc, looks okay, but clients hang as if they are not connected inconsistently.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Joseph, this doesn't sound related. The issue here is that if you configure nfs exports that need DNS, you either need to configure a static network interface in /etc/network/interfaces instead of network-manager, or you need to refresh exports every time network interfaces come up.

Revision history for this message
DJ (ke7mbz) wrote :

I can confirm this issue. It seemed to have either started after upgrading to 11.10 from 11.04, or after changing my exports from IPs to DNS names. Apparently, it was both.

I did see an error message the other day which said that NFS was unable to resolve DNS, but I couldn't remember where it was. I can't find it in any of my log files. It must have been on reboot. Where would that be logged at?

Also, it seems to occur after rebooting. Desktop server. So an exact match.

I also found, on my laptop, that my ifdown scripts that stopped NFS and SSHFS connections from hanging when the network connection dropped, stopped working in 11.10. Do those scripts not get run anymore?

Revision history for this message
Doug Jones (djsdl) wrote :

This is affecting me as well. It started when I upgraded from 11.04 to 11.10.

I am identifying machines with IPs, not DNS. I set it up as per this tutorial:

https://mostlylinux.wordpress.com/network/nfshowto/

These instructions work fine for 10.04, 10.10, and 11.04. (However, the commands it describes for restarting services no longer apply in 10.10 and later; my workaround has been to simply reboot the machine instead, and that works fine).

I still have machines running those older versions, and they still talk to each other, but the one running 11.10 is incommunicado.

I did get the error message Tim mentioned about "/etc/exports.d: No such file or directory", but I simply created an empty directory at /etc/exports.d and that message no longer appears. But NFS still doesn't work.

I have tried the workaround Joseph Brown describes but that doesn't seem to help.

I would like to try the workaround where one uses /etc/network/interfaces instead of network-manager, but I have no idea how to do that. If someone could spell that out, at the user-friendliness level of the tutorial I mentioned above, that would be great.

BTW, I know at least two other people who are using NFS on 10.04 LTS, using basically the same setup I have (from that same tutorial), and who probably would be upgrading to the upcoming LTS if they hadn't already been warned about NFS being borked. I wonder how many other LTS users are about to get a nasty surprise.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Clint: Any progress on this?

Would the following work:
 - Introduce new network-has-address event.
 - Change our scripts to emit this event at the same time as network-static-up when we have a non-loopback interface configured.
 - Add a Network Manager hook to emit it too.
 - Implement a fallback job that times out after 2min (similar to that for network-static-up but without holding the boot).

The rational for a separate event being that I don't think we want to delay the boot on wireless-only desktop machines by whatever time it takes to connect to their wireless network, only the services explicitly requiring it should be delayed.

I wouldn't actually be against not having a timeout at all as these services will typically fail whenever we reach that timeout anyway.

Does that make sense? did I miss something?

Revision history for this message
Richard (richard-prior) wrote :

I also discovered the problem with 12.04 where exportfs fails at boot.
In the log /var/log/boot.log I had errors for each client name
e.g. "exportfs: Failed to resolve clientname

Search I found there are a number of trivial workarounds.
- Adding the names to /etc/hosts.
- Using IP numbers in the configurations.
- Restarting nfs-server.service after bootup.

I took a different route and checked out the script /etc/init.d/nfs-kernel-server. I have now added a sleep at the beginning to delay the NFS start long enough to allow the LAN and name resolution to come up - it seems to work.

 The first 3 dozen lines are below

#!/bin/bash

### BEGIN INIT INFO
# Provides: nfs-kernel-server
# Required-Start: $remote_fs nfs-common $portmap $time
# Required-Stop: $remote_fs nfs-common $portmap $time
# Should-Start: $named
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Kernel NFS server support
# Description: NFS is a popular protocol for file sharing across
# TCP/IP networks. This service provides NFS server
# functionality, which is configured via the
# /etc/exports file.
### END INIT INFO

# What is this?
DESC="NFS kernel daemon"
PREFIX=/usr

# Mod by Richard Prior
# Add delay on start to allow network and Name resolution services to be available
# Without delay the exportfs will fail resolving machine names
# Failures are listed in the /var/log/boot.log and look like
# exportfs: Failed to resolve owl
sleep 10

# Exit if required binaries are missing.
[ -x $PREFIX/sbin/rpc.nfsd ] || exit 0
[ -x $PREFIX/sbin/rpc.mountd ] || exit 0
[ -x $PREFIX/sbin/exportfs ] || exit 0

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.