Bug #525154 “mountall for /var or other nfs mount races with rpc...” : Lucid (10.04) : Bugs : nfs-utils package : Ubuntu

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-02-24:

#1

The whole /var thing is not really a very well thought out part of the FHS

affects:

upstart (Ubuntu) → nfs-utils (Ubuntu)

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-02-24: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#2

On Wed, 2010-02-24 at 15:06 +0000, Scott James Remnant wrote:
> The whole /var thing is not really a very well thought out part of the
> FHS

OK. What does that mean in terms of this bug though?

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-02-25: Re: mountall for /var races with rpc.statd

#3

statd is "start on (started portmap or mount TYPE=nfs)"
portmap is "start on (local-filesystems and net-device-up IFACE=lo)"

and the statd job tries to start portmap if it's not already running.

So the only possible race conditions I see here are if
- mount TYPE=nfs is emitted before all the local filesystems are mounted
- mount TYPE=nfs is emitted before lo is configured, and this causes portmap to fail
- /var is on a network filesystem *other* than NFS (if it's on NFS, then this can't really be solved, you just get a deadlock if you try)

Can you post your fstab, so I can better understand which of these cases applies? (I'm not sure it's fixable anyway with current upstart, but at least we'll know what we're dealing with)

Changed in nfs-utils (Ubuntu):
status:	New → Incomplete

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-02-25: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#4

On Thu, 2010-02-25 at 13:35 +0000, Steve Langasek wrote:
> statd is "start on (started portmap or mount TYPE=nfs)"
> portmap is "start on (local-filesystems and net-device-up IFACE=lo)"

I'm not terribly conversant on the state language of upstart yet, but
does the above say that statd will be started after portmap has been
started *or* an NFS mount is required and portmap will be started after
local-filesystems has been completed and interface "lo" is up?

> and the statd job tries to start portmap if it's not already running.

Yeah.

> So the only possible race conditions I see here are if
> - mount TYPE=nfs is emitted before all the local filesystems are mounted

Indeed! And I believe this is in fact the race I am running into.

> - mount TYPE=nfs is emitted before lo is configured, and this causes portmap to fail

Nope. I have debugged enough to know this is not the case.

> - /var is on a network filesystem *other* than NFS (if it's on NFS, then this can't really be solved, you just get a deadlock if you try)

Nope. /var is local.

> Can you post your fstab, so I can better understand which of these cases
> applies?

Sure:

# /etc/fstab: static file system information.
#
# file system mount point type options dump pass
/dev/rootvol/ubuntu_root / ext3 defaults 0 0
UUID=9d79e085-9980-444d-b58b-e0a49b5c2edb /boot ext3 rw,nosuid,nodev 0 2

/dev/rootvol/swap none swap sw 0 0
proc /proc proc defaults 0 0
sys /sys sysfs defaults 0 0

/dev/fd0 /mnt/floppy auto noauto,rw,sync,user,exec 0 0
/dev/cdrom /mnt/cdrom auto noauto,ro,user,exec 0 0
/dev/rootvol/ubuntu_var /var ext3 rw,nosuid,nodev 0 2
/dev/rootvol/apt_archives /var/cache/apt/archives ext3 rw,nosuid,nodev 0 2
/dev/rootvol/ubuntu_usr /usr ext3 rw,nodev 0 2
/dev/rootvol/home /home ext3 rw,nosuid,nodev 0 2
/dev/datavol/video /video xfs rw,nosuid,nodev 0 2
pc:/home/brian /autohome/brian nfs auto,exec,dev,suid,rw,bg,rsize=8192,wsize=8192 1 1
linux:/mnt/mp3/library /var/lib/mythtv/music nfs rw,noexec,nodev,nosuid,bg,rsize=8192,wsize=8192 1 1
linux:/usr/src /usr/src nfs rw,nodev,nosuid,bg,rsize=8192,wsize=8192 1 1

I think you will agree that it's the first race condition.

I'm not sure exactly what "local-filesystems" signal is signalling but
assuming it really does mean "local" (i.e. directly attached block
devices) is there any reason the boolean operator in the condition for
starting statd is not "and" rather than "or"? That would ensure
that /var is mounted and portmapper is running before a statd start is
attempted. Doesn't statd require portmapper anyway?

On Thu, 2010-02-25 at 13:35 +0000, Steve Langasek wrote: 
> statd is "start on (started portmap or mount TYPE=nfs)"
> portmap is "start on (local-filesystems and net-device-up IFACE=lo)"

I'm not terribly conversant on the state language of upstart yet, but
does the above say that statd will be started after portmap has been
started *or* an NFS mount is required and portmap will be started after
local-filesystems has been completed and interface "lo" is up?

> and the statd job tries to start portmap if it's not already running.

Yeah.

> So the only possible race conditions I see here are if
>  - mount TYPE=nfs is emitted before all the local filesystems are mounted

Indeed!  And I believe this is in fact the race I am running into.

> - mount TYPE=nfs is emitted before lo is configured, and this causes portmap to fail

Nope.  I have debugged enough to know this is not the case.

> - /var is on a network filesystem *other* than NFS (if it's on NFS, then this can't really be solved, you just get a deadlock if you try)

Nope.  /var is local.

> Can you post your fstab, so I can better understand which of these cases
> applies?

Sure:

# /etc/fstab: static file system information.
#
# file system    mount point   type    options                  dump pass
/dev/rootvol/ubuntu_root /      ext3    defaults                 0    0
UUID=9d79e085-9980-444d-b58b-e0a49b5c2edb /boot ext3 rw,nosuid,nodev 0 2

/dev/rootvol/swap none         swap    sw                       0    0
proc             /proc         proc    defaults                 0    0
sys              /sys          sysfs   defaults                 0    0

/dev/fd0         /mnt/floppy   auto    noauto,rw,sync,user,exec 0    0
/dev/cdrom       /mnt/cdrom    auto    noauto,ro,user,exec      0    0
/dev/rootvol/ubuntu_var /var    ext3    rw,nosuid,nodev          0    2
/dev/rootvol/apt_archives /var/cache/apt/archives ext3 rw,nosuid,nodev 0 2
/dev/rootvol/ubuntu_usr /usr    ext3    rw,nodev                 0    2
/dev/rootvol/home /home        ext3    rw,nosuid,nodev          0    2
/dev/datavol/video /video      xfs     rw,nosuid,nodev          0    2
pc:/home/brian   /autohome/brian nfs   auto,exec,dev,suid,rw,bg,rsize=8192,wsize=8192 1 1
linux:/mnt/mp3/library /var/lib/mythtv/music nfs rw,noexec,nodev,nosuid,bg,rsize=8192,wsize=8192 1 1
linux:/usr/src /usr/src nfs rw,nodev,nosuid,bg,rsize=8192,wsize=8192 1 1

I think you will agree that it's the first race condition.

I'm not sure exactly what "local-filesystems" signal is signalling but
assuming it really does mean "local" (i.e. directly attached block
devices) is there any reason the boolean operator in the condition for
starting statd is not "and" rather than "or"?  That would ensure
that /var is mounted and portmapper is running before a statd start is
attempted.  Doesn't statd require portmapper anyway?

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-04-21: Re: mountall for /var races with rpc.statd

#5

> I'm not sure exactly what "local-filesystems" signal is signalling but
> assuming it really does mean "local" (i.e. directly attached block
> devices) is there any reason the boolean operator in the condition for
> starting statd is not "and" rather than "or"?

Because due to a bug in upstart, this would cause every NFS mount after the first one to block indefinitely, waiting for another 'started portmap' event that will never come.

Anyway, as you're aware, portmap no longer waits for local-filesystems; so that's no longer a guarantee.

Changed in nfs-utils (Ubuntu):
status:	Incomplete → Triaged

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-04-29:

#6

Per bug 555661 is there a change in upstart .conf file dependencies that you would like me to test/try?

You said in bug 555661 comment 12:
> > So for lucid, I'm still inclined to update the statd job to 'start on
> > local-filesystems'. Possibly 'start on (local-filesystems and mounting
> > TYPE=nfs)' - if that doesn't cause NFS mount attempts after the first one to
> > deadlock in mountall/upstart. I'll have to test this and propose it as an
> > SRU if it checks out.
>
> Ah, in fact that causes a deadlock in mountall/upstart even before NFS
> mounts are attempted. So 'start on local-filesystems' is as close as we can
> probably get for lucid.

Can you clarify exactly what changes you envision for Lucid, to clear this mess up?

Revision history for this message

Cody Herriges (ody-cat) wrote on 2010-04-29:

#7

This is rather an unacceptable race condition. This will once again cause me an enormous amount of pain in upgrading my nearly 200 Ubuntu servers and desktops that all mount a substantial amount of stuff over NFS, including user home directories.

Revision history for this message

Cody Herriges (ody-cat) wrote on 2010-04-29:

#8

Changing line 6 to the following fixes the problem.

--- statd.conf 2010-04-29 14:22:27.567158573 -0700
+++ /etc/init/statd.conf 2010-04-29 14:18:56.057316910 -0700
@@ -3,7 +3,7 @@
description "NSM status monitor"
author "Steve Langasek <email address hidden>"

-start on (started portmap or mounting TYPE=nfs)
+start on ((started portmap and mounted MOUNTPOINT=/var) or mounting TYPE=nfs)
stop on stopping portmap

expect fork

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-04-30: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#9

On Thu, Apr 29, 2010 at 09:25:53PM -0000, ody wrote:
> Changing line 6 to the following fixes the problem.

> --- statd.conf 2010-04-29 14:22:27.567158573 -0700
> +++ /etc/init/statd.conf 2010-04-29 14:18:56.057316910 -0700
> @@ -3,7 +3,7 @@
> description "NSM status monitor"
> author "Steve Langasek <email address hidden>"
>
> -start on (started portmap or mounting TYPE=nfs)
> +start on ((started portmap and mounted MOUNTPOINT=/var) or mounting TYPE=nfs)
> stop on stopping portmap

For users with a separate /var partition, yes. For users without, it causes
statd to consistently fail to start at boot.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message

Cody Herriges (ody-cat) wrote on 2010-05-01:

#10

On 04/30/2010 05:21 AM, Steve Langasek wrote:
> For users with a separate /var partition, yes. For users without, it causes
> statd to consistently fail to start at boot.
>

Oh yeah. That patch is a total kludge/hack we put in place so we could
quickly deploy Lucid. Will look forward to the actual fix or might hack
on a better statd.conf change that doesn't break the rest of the world.

Revision history for this message

Nathan Grennan (ngrennan) wrote on 2010-05-04: Re: mountall for /var races with rpc.statd

#11

I am the sysadmin for a company that uses Ubuntu for desktops, and uses nfs heavily. I upgraded to Lucid from Karmic, and everything has been fine. A co-worker upgraded, and ran into this bug. He tried the mounted MOUNTPOINT=/var workaround. It actually seemed to make the problem worse. The first time he booted it just hung the boot process. With a reboot, it came up without the hang.

What upstart needs is more of a "can I write to this directory" option.

Revision history for this message

Cody Herriges (ody-cat) wrote on 2010-05-04: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#12

On May 4, 2010, at 4:14 PM, Nathan Grennan wrote:

> I am the sysadmin for a company that uses Ubuntu for desktops, and uses
> nfs heavily. I upgraded to Lucid from Karmic, and everything has been
> fine. A co-worker upgraded, and ran into this bug. He tried the mounted
> MOUNTPOINT=/var workaround. It actually seemed to make the problem
> worse. The first time he booted it just hung the boot process. With a
> reboot, it came up without the hang.
>
> What upstart needs is more of a "can I write to this directory" option.
>

I saw a similar freeze when I was hacking about and tried `mounted MOUNTPOINT=/var/run` which is used by /etc/init/mounted-varrun.conf. This new tight integration with upstart is going to take some getting use too.

Revision history for this message

Cody Herriges (ody-cat) wrote on 2010-05-06: Re: mountall for /var races with rpc.statd

#13

statd.patch Edit (132 bytes, text/plain)

Anyone willing to try out this work around? `/etc/init/mountall` emits local-filesystems so if you change line 6 of statd.conf to the following things look to come up normally. This is probably a better more sane solution then what I posted earlier.

start on ((started portmap and local-filesystems) or mounting TYPE=nfs)

Revision history for this message

John Peach (john-launchpad) wrote on 2010-05-08:

#14

Yes, that change works for me....

start on ((started portmap and local-filesystems) or mounting TYPE=nfs)

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-05-10:

#15

This proposed workaround will cause a hang whenever portmap is restarted on a package upgrade.

Revision history for this message

Arjen Verweij (arjen-verweij) wrote on 2010-05-10:

#16

#13 doesn't work for me. The only viable workaround I know of is listing the NFS mounts as noauto and adding them to /etc/rc.local individually

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-05-12:

#17

I think the suggested fix in comment #8 is absolutely evil. The intentions are well-placed but the results of using that work-around are evil and I believe lead to the sort of issues reported in bug #543506. I still have to go to a few more machines and "undo" that change to be completely sure. I will update when I have done more testing.

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-05-12:

#18

I should add, that even with the suggested patch from comment #8 in place, it didn't stop rpc.statd from being started before /var was mounted, so it was not even helping in that manner.

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-05-13:

#19

As an alternative hack/solution to this race (until it's resolved more elegantly within upstart), could we not simply spin in the statd.conf script waiting for /var/lib/nfs to be available?

This would be most interesting to do because, in fact, I believe that /var/lib/nfs not being available when statd.conf runs is not the only issue that is causing rpc.statd startup to fail. I see failures reported by upstart, during boot, even after /var is mounted.

Let me put such a spin lock in place and see how that goes.

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-05-13:

#20

OK. The spin loop/lock seems to work much better than the start triggers.

I still find however that mountall is trying to mount nfs filesystems before rpc.statd is started. Do we consider this an nfs-utils bug or an upstart/mountall bug?

Revision history for this message

Joshua Baergen (joshuabaergen) wrote on 2010-05-13:

#21

> I still find however that mountall is trying to mount nfs
> filesystems before rpc.statd is started. Do we consider
> this an nfs-utils bug or an upstart/mountall bug?

But statd's job is triggered on such NFS mounts, so I think that's OK as long as mountall doesn't do it (or give up) too early, is it not?

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-05-13: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#22

On Thu, 2010-05-13 at 19:03 +0000, Joshua Baergen wrote:
>
> But statd's job is triggered on such NFS mounts,

If by triggered you mean that such a mount stops and waits for statd to
be started, nope.

> so I think that's OK as
> long as mountall doesn't do it (or give up) too early, is it not?

But it does do it. Eventually the mount does succeed but it's very ugly
to race like, hoping that a retry will succeed and then there's the spat
of ugly "failure" messages during boot.

All in all, the race needs to be resolved, IMHO.

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-05-17: Re: mountall for /var races with rpc.statd

#23

So, I now have a case where this race prevents a successful boot completely, 100% of the time. The only way I can get a normal boot is to manually mount an NFS filesystem in another window while mountall/upstart has stalled.

To clarify, the only way I can get this machine to boot is to boot the kernel with init=/bin/bash.

Once I have the init/bash i open a vt with "open -c 12 /bin/bash". I then exec init with "exec /sbin/init" and normal boot continues until upstart/mountall gets stuck waiting for NFS filesystems to mount, which never do. This will wait here forever if I don't intervene.

To get the boot to continue, I switch to the bash I started on vt 12 and just mount one of the several nfs filesystems mountall is waiting for with:

# mount /autohome/brian

for example. At this point upstart resumes starting services and regular boot completes.

Happy to provide any information needed to progress this issue to resolution.

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-05-17:

#24

In this reproducible case, is this the problem that the network has been brought up before /var is mounted? I.e., if you run 'killall -USR1 mountall' /instead of/ running a mount command by hand, does the boot continue?

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-05-18: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#25

On Mon, 2010-05-17 at 23:09 +0000, Steve Langasek wrote:
> In this reproducible case, is this the problem that the network has been
> brought up before /var is mounted?

I have not been able to test on that platform yet, however...

> I.e., if you run 'killall -USR1
> mountall' /instead of/ running a mount command by hand, does the boot
> continue?

On another platform that I was debugging last night, this indeed did
seem to work.

Hopefully I can get to my 100% reproducible case today and will let you
know.

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-05-19:

#26

On Mon, 2010-05-17 at 23:09 +0000, Steve Langasek wrote:
> In this reproducible case, is this the problem that the network has been
> brought up before /var is mounted? I.e., if you run 'killall -USR1
> mountall' /instead of/ running a mount command by hand, does the boot
> continue?

Yes sir, it does! Nice catch.

Now, what's the fix? :-)

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-05-19: Re: mountall for /var races with rpc.statd

#27

FWIW, this new 100% reproducible case is 100% reproducible for a reason, which I will get to in a minute.

So, given that in two cases where I have had a stalled boot, signalling mountall with a USR1 has caused the boot to proceed. This seems like a race somewhere.

The reason I was able to reproduce this 100% of the time on one particular given platform is because it's a PXE (i.e. netboot/netroot) system, booting from the network and mounting it's entire root (which includes /usr and /var) from an NFS server. In this scenario I found that allowing ifup to configure an interface that has already been configured by the kernel during the netboot was causing the system to hang.

As a short-term solution until I could research the real, long-term solution, I decided that given that the interface was already up even before init was even started, I would just disable it in /etc/network/interfaces. That of course also prevents the "initctl emit -n net-device-up ..." in /etc/network/if-up.d/upstart from firing.

But it would only prevent it from firing for the ethernet interface. Shouldn't it still fire for lo being ifup'd, giving mountall the USR1 it needs?

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-05-19:

#28

Oh, for NFS root, there's bug #537133. It seems there are known problems still with mountall in that configuration.

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-05-19: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#29

On Wed, 2010-05-19 at 15:13 +0000, Steve Langasek wrote:
> Oh, for NFS root, there's bug #537133. It seems there are known
> problems still with mountall in that configuration.

Funny enough, once I stopped dhclient from resetting an interface's
address to 0.0.0.0 (and filing a bug upstream about it) and then
re-enabled the interface in /etc/network/interfaces, my netboot/nfsroot
system is working just peachy. Well apart from the ugly messages about
failing to mount (other, non /) NFS filesystems first time through.

Brian Murray (brian-murray) on 2010-05-20

tags:

added: patch

Revision history for this message

Tapani Tarvainen (ubuntu-tapani) wrote on 2010-05-29: Re: mountall for /var races with rpc.statd

#30

Same problem here, /var on separate file system, patch in #13 seems to work.

Revision history for this message

Bjarne Steinsbø (bsteinsbo) wrote on 2010-06-12:

#31

Thank you all for providing descriptions and solutions. My systems were upgraded from 9.10 where everything was working OK, and I was about to give up on Ubuntu when the systems didn't boot after upgrade.

computer 1: /var on separate file system, mounting /home from nfs, hangs consistently on boot, "fixed" by removing fstab entry and mounting /home in /etc/rc.local

computer 2: hmm, this is a special one... On one hand it is a perfectly normal dual-boot laptop, running Windows 7 and Ubuntu. On the other hand, that same Ubuntu partition is the disk of a virtual machine, so that I can run the same Ubuntu installation from within VirtualBox. Working OK when booting from BIOS, but was failing to boot from within VirtualBox until i moved the mount of a vboxsf share to rc.local.

Chelmite (steve-kelem) on 2010-06-14

description:	updated
description:	updated

Revision history for this message

Gerb (g.roest) wrote on 2010-06-29:

#32

having /var on a separate filesystem, patch in #13 works for me too.

Revision history for this message

Carl Nobile (cnobile1) wrote on 2010-07-19:

#33

This seems to be the cause of both bugs 573919 and 590570.

Revision history for this message

Azamat S. Kalimoulline (turtle-bazon) wrote on 2010-07-27:

#34

/var on separate filesystem, remote dir moounted using autofs. #13 works for me.

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-07-27:

#35

So, what will be done about this (and the host of other mounting bugs) in Lucid? Lucid is an LTS release and as such should be stable for 2 years. It is not. This bug (and the host of other mounting bugs) make it not so. Yet no updates have come forth to solve the problem nor has there been any activity from Ubuntu developers.

Have Ubuntu abandoned this (and the host of other mounting bugs) as simply too difficult to deal with? If not, please, let us know what the path forward, to a stable LTS (with separate /var, and/or NFS rooting, etc.) release is.

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-07-27:

#36

As I wrote in comment #15, we don't have a viable workaround yet that doesn't inroduce other hangs / failures in other scenarios. A "fix" that will break all ability to further upgrade the system is worse than the status quo, because it means security fixes can't be applied.

Until someone is able to identify a solution that doesn't have this disadvantage, there's nothing that can be done here.

> Lucid is an LTS release and as such should be stable for 2 years.

"stable" does not mean "usable for all proposed uses".

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2010-07-27: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#37

On Tue, 2010-07-27 at 15:28 +0000, Steve Langasek wrote:
> As I wrote in comment #15, we don't have a viable workaround yet that
> doesn't inroduce other hangs / failures in other scenarios.

I guess the question then is whether any effort is being spent towards
such a solution.

> A "fix"
> that will break all ability to further upgrade the system is worse than
> the status quo, because it means security fixes can't be applied.

Fair enough. But leaving systems unbootable for a portion of the users
surely cannot be an acceptable solution either, yes?

> Until someone is able to identify a solution that doesn't have this
> disadvantage, there's nothing that can be done here.

Who would this "someone" be? It's sounding an awful lot like nobody is
actually working on the real root cause (and solution) of this issue and
everyone is just hoping that "somebody else" will come up with a
solution.

> "stable" does not mean "usable for all proposed uses".

So having a /var on a separate filesystem is "fringe" enough that those
users should not be able to experience a stable system? /var on it's
own filesystem is the only "responsible" way to manage a system. You
can glom any other crap you want onto / but leaving /var on / to grow
until it fills up / is simply irresponsible for any use-case except
single-user machines. Is this "single-user" use case all that Ubuntu is
interested in satisfying? Multi-user/server (i.e. corporate users)
installations are not a useful user-base for Canonical?

Revision history for this message

Juan Andrés Ghigliazza (tizone) wrote on 2010-07-27: Re: mountall for /var races with rpc.statd

#38

This bug is affecting me too. It would be great to have a definitively solution.

Revision history for this message

MarkG (movieman523) wrote on 2010-07-28:

#39

Same here: this is making my MythTV frontend unusable as about half the time it can't see the MythTV NFS directories after booting.

This used to work OK in 9.10; I upgraded it to Lucid in the hope that the newer kernel might fix USB remote bugs and instead I got an unusable network and lost most of the sound from xbmc.

Revision history for this message

Matthias Steup (matthias-steup) wrote on 2010-08-06:

#40

I'm not an expert but I tested this:

Create a script "startstatd" and save it in "/etc/init.d".
The script:

#! /bin/sh
service statd start
mount /192.168.0.31: ... #add directives for mounting nfs shares

Then create a symlink to /etc/rcS.d, for example @S91startstatd.

It seems to be working well.

Steve Langasek (vorlon) on 2010-12-23

Changed in nfs-utils (Ubuntu):
importance:	Undecided → High

Clint Byrum (clint-fewbar) on 2011-01-05

Changed in nfs-utils (Ubuntu Natty):
status:	Triaged → In Progress
assignee:	nobody → Clint Byrum (clint-fewbar)

Clint Byrum (clint-fewbar) on 2011-01-05

Changed in nfs-utils (Ubuntu Lucid):
status:	New → Confirmed
Changed in nfs-utils (Ubuntu Maverick):
status:	New → Confirmed
Changed in portmap (Ubuntu Natty):
status:	New → Confirmed

Launchpad Janitor (janitor) on 2011-01-11

Changed in portmap (Ubuntu Natty):
status:	Confirmed → Fix Released
Changed in nfs-utils (Ubuntu Natty):
status:	In Progress → Fix Released

Steve Langasek (vorlon) on 2011-01-16

Changed in nfs-utils (Ubuntu Lucid):
status:	Confirmed → Triaged
Changed in portmap (Ubuntu Lucid):
status:	New → Triaged
Changed in nfs-utils (Ubuntu Maverick):
importance:	Undecided → High
Changed in nfs-utils (Ubuntu Lucid):
importance:	Undecided → High
Changed in portmap (Ubuntu Lucid):
importance:	Undecided → High
Changed in portmap (Ubuntu Maverick):
importance:	Undecided → High
status:	New → Triaged
Changed in nfs-utils (Ubuntu Maverick):
status:	Confirmed → Triaged

Steve Langasek (vorlon) on 2011-01-19

description:

updated

Martin Pitt (pitti) on 2011-01-20

Changed in nfs-utils (Ubuntu Lucid):
status:	Triaged → Fix Committed
tags:	added: verification-needed
Changed in portmap (Ubuntu Lucid):
status:	Triaged → Fix Committed

Martin Pitt (pitti) on 2011-01-25

tags:

added: verification-done
removed: verification-needed

Martin Pitt (pitti) on 2011-01-25

Changed in portmap (Ubuntu Maverick):
status:	Triaged → Fix Committed
tags:	removed: verification-done
tags:	added: verification-needed
tags:	added: verification-done
Changed in nfs-utils (Ubuntu Maverick):
status:	Triaged → Fix Committed
tags:	removed: verification-done
tags:	added: verification-done removed: verification-needed
tags:	added: verification-needed

Clint Byrum (clint-fewbar) on 2011-01-25

Changed in nfs-utils (Ubuntu Natty):
assignee:	Clint Byrum (clint-fewbar) → nobody

Steve Langasek (vorlon) on 2011-01-25

tags:

removed: verification-needed

Launchpad Janitor (janitor) on 2011-01-27

Changed in nfs-utils (Ubuntu Lucid):
status:	Fix Committed → Fix Released
Changed in portmap (Ubuntu Lucid):
status:	Fix Committed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-02-01:

#87

This bug was fixed in the package portmap - 6.0.0-2ubuntu1.1

---------------
portmap (6.0.0-2ubuntu1.1) maverick-proposed; urgency=low

  * debian/upstart renamed to debian/portmap.portmap.upstart,
    debian/portmap.portmap-boot.upstart, debian/rules: Added to set
    special ON_BOOT flag during boot, which allows statd to use an
    AND with 'started portmap ON_BOOT=y'. This version of portmap is a
    dependency of nfs-utils to fix LP: #525154
  * debian/portmap.portmap-wait.upstart: job to wait for portmap to
    finish starting. also depended on on by nfs-utils.
-- Steve Langasek <email address hidden> Tue, 18 Jan 2011 15:28:05 -0800

Changed in portmap (Ubuntu Maverick):
status:	Fix Committed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-02-01:

#88

This bug was fixed in the package nfs-utils - 1:1.2.2-1ubuntu1.1

---------------
nfs-utils (1:1.2.2-1ubuntu1.1) maverick-proposed; urgency=low

  * debian/nfs-common.statd.upstart,
    debian/nfs-common.statd-mounting.upstart: refactor startup to wait for
    local-filesystems. (LP: #525154)
  * debian/control: depend on portmap version that sets ON_BOOT=y and
    has the portmap-wait job.
  * debian/rules: install new statd-mounting upstart job
  * debian/nfs-common.rpc_pipefs.upstart: instantiate this job separately for
    gssd and idmapd, so that the filesystem gets mounted and unmounted
    correctly even if both of gssd and idmapd aren't being run, or if one of
    the two tries to start before the filesystem is fully mounted. Though
    it may be simpler now to move this logic back into the gssd and idmapd
    jobs directly, leave that for a later date.
-- Steve Langasek <email address hidden> Wed, 19 Jan 2011 16:05:07 -0800

Changed in nfs-utils (Ubuntu Maverick):
status:	Fix Committed → Fix Released

Revision history for this message

Alexander Achenbach (xela) wrote on 2011-02-03:

#89

You may also refer to LP #643289 posts #2, #3, #4, which provide fixes for mountall blocking and for various start-up problems of statd as well as for idmapd/gssd/rpc_pipefs. Those fixes are based on the above fixes, but go further to make sure mountall and rpc_pipefs mounting proceed in proper sequence where needed.

Revision history for this message

Stephan Rügamer (sruegamer) wrote on 2011-02-21:

#90

@All:

Did anyone tried to install now nfs-common (in lucid with the latest sru bugfix) in a chroot environment?

If so, did nobody see nfs-common failing in nfs-common.postinst?

Normally, in a chroot, starting services is disabled and/or not allowed (or in some cases the startup mechanism is somehow diverted to /bin/true or /bin/false whatever).
But now, nfs-common.postinst does an invoke.rc statd and this failes in a chroot environment which means, that the package in question is not configured properly.

The upstart dependency (started portmap ON_BOOT= or (local-filesystems and started portmap ON_BOOT=y)) doesn't apply inside a chroot.

One thing that needs to be done is, to not start the services during installation (which means removing all blind magic of dh_installinit) or whatever it takes to come back with the old behaviour.

Regards,
\sh

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-02-21: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#91

On Mon, 2011-02-21 at 14:10 +0000, Stephan Adig wrote:
> @All:
>
> Did anyone tried to install now nfs-common (in lucid with the latest sru
> bugfix) in a chroot environment?
>
> If so, did nobody see nfs-common failing in nfs-common.postinst?
>
> Normally, in a chroot, starting services is disabled and/or not allowed (or in some cases the startup mechanism is somehow diverted to /bin/true or /bin/false whatever).
> But now, nfs-common.postinst does an invoke.rc statd and this failes in a chroot environment which means, that the package in question is not configured properly.
>
> The upstart dependency (started portmap ON_BOOT= or (local-filesystems
> and started portmap ON_BOOT=y)) doesn't apply inside a chroot.
>
> One thing that needs to be done is, to not start the services during
> installation (which means removing all blind magic of dh_installinit) or
> whatever it takes to come back with the old behaviour.
>

Hi Stephan, thanks for the feedback.

First off, services are always started/stopped with invoke-rc.d on
upgrade in Debian packages and, thusly, have always been started and
stopped in Ubuntu.

This is a known issue in upstart which is under active development.
Basically upstart doesn't know anything about the chroot environment, so
it doesn't read the /etc/init from the chroot.

Take a look at bug #430224 for more information.

A simple workaround to get the postinst to succeed is to edit the
postinst in /var/lib/dpkg/info/nfs-common.postinst and remove the calls
to invoke-rc.d

Revision history for this message

Steve Langasek (vorlon) wrote on 2011-02-21:

#92

On Mon, Feb 21, 2011 at 02:10:11PM -0000, Stephan Adig wrote:

> Did anyone tried to install now nfs-common (in lucid with the latest sru
> bugfix) in a chroot environment?

> If so, did nobody see nfs-common failing in nfs-common.postinst?

> Normally, in a chroot, starting services is disabled and/or not allowed
> (or in some cases the startup mechanism is somehow diverted to /bin/true
> or /bin/false whatever). But now, nfs-common.postinst does an invoke.rc
> statd and this failes in a chroot environment which means, that the
> package in question is not configured properly.

> The upstart dependency (started portmap ON_BOOT= or (local-filesystems
> and started portmap ON_BOOT=y)) doesn't apply inside a chroot.

invoke-rc.d should not fail in a chroot environment. How have you
configured your chroot? *Any* of the standard methods for disabling
services in a chroot should have the intended effect (i.e., diverting
/sbin/initctl to /bin/true, or configuring a policy-rc.d to disallow service
starting).

If you have /sbin/initctl diverted to /bin/*false*, that is a
misconfiguration in your environment.

> One thing that needs to be done is, to not start the services during
> installation (which means removing all blind magic of dh_installinit) or
> whatever it takes to come back with the old behaviour.

No, that is not a thing that needs to be done.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message

Janusz Mordarski (janusz-mordarski) wrote on 2011-03-29: Re: mountall for /var races with rpc.statd

#93

i'm using Ubuntu 10.04 as nfsroot on diskless workstation. For now, only working patch for this issue is putting

restart portmap
restart nfs

in some script started early in /etc/rcS.d (so 'single user' mode stage of init)

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-04-01: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#94

Download full text (3.3 KiB)

Excerpts from Janusz Mordarski's message of Tue Mar 29 09:55:14 UTC 2011:
> i'm using Ubuntu 10.04 as nfsroot on diskless workstation. For now, only
> working patch for this issue is putting
>
> restart portmap
> restart nfs
>
> in some script started early in /etc/rcS.d (so 'single user' mode stage
> of init)

Janus, can you explain how you think this is related to the bug you've
commented on?

It sounds like you have a different situation, and you should open a new
report or look through some of the other ones against mountall. There
are known issues with nfs root that we haven't addressed yet, though
we'd like to and it will help if we can have enough information from
users like yourself.

If you do open another report, it would be a good idea to come back and
note the new bug # in the comments here.

>
> --
> You received this bug notification because you are a direct subscriber
> of a duplicate bug (692793).
> https://bugs.launchpad.net/bugs/525154
>
> Title:
> mountall for /var races with rpc.statd
>
> Status in “nfs-utils” package in Ubuntu:
> Fix Released
> Status in “portmap” package in Ubuntu:
> Fix Released
> Status in “nfs-utils” source package in Lucid:
> Fix Released
> Status in “portmap” source package in Lucid:
> Fix Released
> Status in “nfs-utils” source package in Maverick:
> Fix Released
> Status in “portmap” source package in Maverick:
> Fix Released
> Status in “nfs-utils” source package in Natty:
> Fix Released
> Status in “portmap” source package in Natty:
> Fix Released
>
> Bug description:
> If one has /var (or /var/lib or /var/lib/nfs for that matter) on its
> own filesystem the statd.conf start races with the mounting of /var as
> rpc.statd needs /var/lib/nfs to be available in order to work.
>
> I am sure this is not the only occurrence of this type of problem.
>
> A knee-jerk solution is to simply spin in statd.conf waiting for
> /var/lib/nfs to be available, but polling sucks, especially for
> something like upstart whose whole purpose is to be an event driven
> action manager.
>
> SRU justification: NFS mounts do not start reliably on boot in lucid
> and maverick (depending on the filesystem layout of the client system)
> due to race conditions in the startup of statd. This should be fixed
> so users of the latest LTS can make reliable use of NFS.
>
> Regression potential: Some systems may fail to mount NFS filesystems
> at boot time that didn't fail before. Some systems may hang at boot.
> Some systems may hang while upgrading the packages (this version or in
> a future SRU). I believe the natty update adequately guards against
> all of these possibilities, but the risk is there.
>
> TEST CASE:
> 1. Configure a system with /var as a separate partition.
> 2. Add one or more mounts of type 'nfs' to /etc/fstab.
> 3. Boot the system.
> 4. Verify whether statd has started (status statd) and whether all NFS filesystems have been mounted.
> 5. Repeat 3-4 until the race condition is triggered.
> 6. Upgrade to the new version of portmap and nfs-common from -proposed.
> 7. Repeat steps 3-4 until satisfied that statd now starts reliably ...

Excerpts from Janusz Mordarski's message of Tue Mar 29 09:55:14 UTC 2011:
> i'm using Ubuntu 10.04 as nfsroot on diskless workstation. For now, only
> working patch for this issue is putting
> 
> restart portmap
> restart nfs
> 
> in some script started early in /etc/rcS.d  (so 'single user' mode stage
> of init)

Janus, can you explain how you think this is related to the bug you've
commented on?

It sounds like you have a different situation, and you should open a new
report or look through some of the other ones against mountall. There
are known issues with nfs root that we haven't addressed yet, though
we'd like to and it will help if we can have enough information from
users like yourself.

If you do open another report, it would be a good idea to come back and
note the new bug # in the comments here.

> 
> -- 
> You received this bug notification because you are a direct subscriber
> of a duplicate bug (692793).
> https://bugs.launchpad.net/bugs/525154
> 
> Title:
>   mountall for /var races with rpc.statd
> 
> Status in “nfs-utils” package in Ubuntu:
>   Fix Released
> Status in “portmap” package in Ubuntu:
>   Fix Released
> Status in “nfs-utils” source package in Lucid:
>   Fix Released
> Status in “portmap” source package in Lucid:
>   Fix Released
> Status in “nfs-utils” source package in Maverick:
>   Fix Released
> Status in “portmap” source package in Maverick:
>   Fix Released
> Status in “nfs-utils” source package in Natty:
>   Fix Released
> Status in “portmap” source package in Natty:
>   Fix Released
> 
> Bug description:
>   If one has /var (or /var/lib or /var/lib/nfs for that matter) on its
>   own filesystem the statd.conf start races with the mounting of /var as
>   rpc.statd needs /var/lib/nfs to be available in order to work.
> 
>   I am sure this is not the only occurrence of this type of problem.
> 
>   A knee-jerk solution is to simply spin in statd.conf waiting for
>   /var/lib/nfs to be available, but polling sucks, especially for
>   something like upstart whose whole purpose is to be an event driven
>   action manager.
> 
>   SRU justification: NFS mounts do not start reliably on boot in lucid
>   and maverick (depending on the filesystem layout of the client system)
>   due to race conditions in the startup of statd.  This should be fixed
>   so users of the latest LTS can make reliable use of NFS.
> 
>   Regression potential:  Some systems may fail to mount NFS filesystems
>   at boot time that didn't fail before.  Some systems may hang at boot.
>   Some systems may hang while upgrading the packages (this version or in
>   a future SRU).  I believe the natty update adequately guards against
>   all of these possibilities, but the risk is there.
> 
>   TEST CASE:
>   1. Configure a system with /var as a separate partition.
>   2. Add one or more mounts of type 'nfs' to /etc/fstab.
>   3. Boot the system.
>   4. Verify whether statd has started (status statd) and whether all NFS filesystems have been mounted.
>   5. Repeat 3-4 until the race condition is triggered.
>   6. Upgrade to the new version of portmap and nfs-common from -proposed.
>   7. Repeat steps 3-4 until satisfied that statd now starts reliably and all non-gss-authenticated NFSv3 filesystems mount correctly at boot time.
> 
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/525154/+subscribe

Revision history for this message

Janusz Mordarski (janusz-mordarski) wrote on 2011-04-09: Re: mountall for /var races with rpc.statd

#95

well restarting portmap and nfs helped, and i don't want now to revert back to faulty config.

i was getting this mesage when booting my diskless workstations:

mount.nfs: rpc.statd is not running but is required for remote locking.
Either use '-o nolocks' to keep locks local, or start statd.

/root was mounted OK of course, because if not, it wouldn't boot at all
- problem was with /home and other directories mounted by mountall init scripts
- after complete boot, in gdm, there was no /home , i had to once again invoke mount -a (when gdm was running) - i solved it by using rc.local < dirty solution ;)

portmap and nfs services were starting, but this error message was showing up anyway

thanks to this thread i found out, that maybe starting statd was racing with mounting /var by nfs in read write mode or something like this, so i thought that restarting portmap and nfs services in single user mode, before any extra NFS shares are mounted (/home) will solve the problem, and it works fine now. no error messages during boot process. i don't really know if it qualifies for new bug or not.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-04-09: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#96

Excerpts from Janusz Mordarski's message of Sat Apr 09 16:46:18 UTC 2011:
> well restarting portmap and nfs helped, and i don't want now to revert
> back to faulty config.
>
> i was getting this mesage when booting my diskless workstations:
>
> mount.nfs: rpc.statd is not running but is required for remote locking.
> Either use '-o nolocks' to keep locks local, or start statd.
>
> /root was mounted OK of course, because if not, it wouldn't boot at all
> - problem was with /home and other directories mounted by mountall init scripts
> - after complete boot, in gdm, there was no /home , i had to once again invoke mount -a (when gdm was running) - i solved it by using rc.local < dirty solution ;)
>
> portmap and nfs services were starting, but this error message was
> showing up anyway
>
> thanks to this thread i found out, that maybe starting statd was racing
> with mounting /var by nfs in read write mode or something like this, so
> i thought that restarting portmap and nfs services in single user mode,
> before any extra NFS shares are mounted (/home) will solve the problem,
> and it works fine now. no error messages during boot process. i don't
> really know if it qualifies for new bug or not.

Janusz, this bug is about statd not being available when NFS mounts are
made. This should be fixed in Lucid and Maverick. Make sure all updates
are applied. If you have customized any of the upstart jobs in /etc/init
that control statd or portmap, that may be causing problems. Look for
files with the extension '.dpkg-new', like /etc/init/statd.conf.dpkg-new,
if those exist, you will want to make sure to merge them into the live
files (like /etc/init/statd.conf).

Revision history for this message

Christian Reis (kiko) wrote on 2011-06-20: Re: mountall for /var races with rpc.statd

#97

Clint, similarly to Janusz I see these exact same issues on my Natty NFS-root system, even when running with the pristing statd* *portmap* init configuration files. I don't quite understand enough about upstart to say what's wrong yet.

I just spent an afternoon chasing this down and am pretty sure a bug still remains somewhere, though I'm finding it hard to see where without complete event logging or an infinite kernel scrollback buffer.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-06-20:

#98

Christian, if you're using NFS root, you probably have an issue. But it is probably not *this* issue, as this one was not specific to nfs root configurations. It would be quite helpful if you were to raise a new bug report against nfs-utils that detailed what you expect to have happen, and what is actually happening.

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2011-06-21: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#99

On 11-06-20 07:10 PM, Clint Byrum wrote:
> Christian, if you're using NFS root, you probably have an issue. But it
> is probably not *this* issue, as this one was not specific to nfs root
> configurations. It would be quite helpful if you were to raise a new bug
> report against nfs-utils that detailed what you expect to have happen,
> and what is actually happening.

But the question that begs to be asked is why are common configurations
like NFS-root and separate /var and /usr filesystems STILL not part of
Ubuntu's standard QA processes?

These are not entirely "esoteric" configurations you know and they have
been shown to have problems in past releases so why are current QA
processes not testing for these?

You do understand that effective QA means that when a configuration is
shown to have a potential for regressions that such a configuration be
added to the battery of tests that QA runs. It's simply not effective
to identify a configuration that doesn't work, (think that you have)
fix(ed) it and simply move on and not ever test that configuration again
during a regular release cycle. That's exactly how regressions leak
into GA product. It's embarrassing.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-06-21:

#100

Excerpts from Brian J. Murrell's message of Tue Jun 21 10:18:06 UTC 2011:
> On 11-06-20 07:10 PM, Clint Byrum wrote:
> > Christian, if you're using NFS root, you probably have an issue. But it
> > is probably not *this* issue, as this one was not specific to nfs root
> > configurations. It would be quite helpful if you were to raise a new bug
> > report against nfs-utils that detailed what you expect to have happen,
> > and what is actually happening.
>
> But the question that begs to be asked is why are common configurations
> like NFS-root and separate /var and /usr filesystems STILL not part of
> Ubuntu's standard QA processes?
>
> These are not entirely "esoteric" configurations you know and they have
> been shown to have problems in past releases so why are current QA
> processes not testing for these?
>
> You do understand that effective QA means that when a configuration is
> shown to have a potential for regressions that such a configuration be
> added to the battery of tests that QA runs. It's simply not effective
> to identify a configuration that doesn't work, (think that you have)
> fix(ed) it and simply move on and not ever test that configuration again
> during a regular release cycle. That's exactly how regressions leak
> into GA product. It's embarrassing.

Brian, much of our QA is still community driven. We devote significant
resources to testing the base system, but multi-server setups like NFS
root are taxing to manually test, and more complex to automate. I'd love
to say we write a regression test for every issue we fix and run it on
every possible configuration. Clearly, we don't.

If NFS root is important to you, I would suggest that you help us out
by gathering other interested users, and putting together a blueprint
for the next UDS. Lets get automated tests setup for this configuration.

I'd support this 100%, but I don't think we can do it without some help
from the actual users of NFS root systems.

Revision history for this message

Christian Reis (kiko) wrote on 2011-06-21: Re: mountall for /var races with rpc.statd

#101

I can definitely file a new bug. I've been on and off fighting this and found that while the solution posted to this bug does not fix this problem with our diskless root, but a related fix does. Here's my statd-starting script below; I noticed that the script is run multiple times and that the "start statd" line isn't actually syncronous, problems which the rpcinfo check solves. I'm not sure you can assume portmap is listening on localhost, but this works for me:

description^I"Trigger a statd run"

start on mounting TYPE=nfs
task
console output

script
    # This apparently is necessary to ensure the statd run completes;
    # it's a hack but it seems to work more reliably than anything else
    while ! rpcinfo -u localhost status; do
        start statd
        echo "Waiting for statd to show up.."
        sleep 1s
    done
end script

Revision history for this message

Christian Reis (kiko) wrote on 2011-06-21:

#102

By using the script above, I don't actually need any "start on" clause in my statd.conf file at all. I posted some stream-of-consciousness entries here:

  - http://www.async.com.br/~kiko/diary.html?date=20.06.2011
  - http://www.async.com.br/~kiko/diary.html?date=21.06.2011
  - http://www.async.com.br/~kiko/diary.html?date=22.06.2011

I've pushed the bzr branch containing my complete init setup to http://bazaar.launchpad.net/~kiko/+junk/init-diskless/files -- feel free to poach and comment.

Revision history for this message

Forest (foresto) wrote on 2011-06-30:

#103

I'm still seeing this problem on a fully updated Natty. My nfs mount occasionally succeeds at boot, but most of the time it doesn't. After editing /etc/init/mountall.conf to log the mountall --debug output, I see these messages in the log:

mounting /myremotedir
spawn: mount -t nfs -o rw,intr,retrans=180,async,noatime,nodiratime 10.0.95.4:/myremotedir /myremotedir
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified
mountall: mount /myremotedir [896] terminated with status 32

When the mount fails, "status statd" reports that statd is running, which makes me think there is still a race condition here.

I'm not sure if it's relevant, but the 10.0.95.x network is reachable via eth1, not eth0.

I see a lot of "fix released" notes on this bug report. Has the fix made it in to the Natty release repositories yet? If so, it looks to me like the fix needs some work.

Revision history for this message

Forest (foresto) wrote on 2011-07-01:

#104

mountall-net-nfs-on-statd.patch Edit (318 bytes, text/plain)

Following up on my own question: I don't see any updated mountall or nfs package in natty-proposed, and my fully-updated natty system still is still failing to mount my nfs shares at boot because of a race with statd.

When I change this line in /etc/init/mountall-net.conf:

start on net-device-up

To this:

start on net-device-up or stopped statd-mounting

mountall gets called again after statd has actually started, and my nfs shares get mounted at startup. I'm attaching a patch for mountall.

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2011-07-02: Re: [Bug 525154] Re: mountall for /var races with rpc.statd

#105

On 11-07-01 02:52 PM, Forest wrote:
> Following up on my own question: I don't see any updated mountall or
> nfs package in natty-proposed, and my fully-updated natty system still
> is still failing to mount my nfs shares at boot because of a race with
> statd.

My advise here would be either (a) give up on running a /var that's
separate from / and just learn to cope with a system that becomes
entirely useless at some point because something writing into /var has
filled your root filesystem or (b) switch to a different distro that
actually pays attention to "server deployment" practices wherein
separating /var from / (and /usr for that matter) is an accepted and
supported practice.

Given that this bug has existed since Lucid (3 releases now) makes it
clear to me that Ubuntu is not at all interested in supporting server
deployments where responsible practice is to keep /var from being able
to cripple an entire system simply because it fills up.

I guess Ubuntu is targeting the desktop and if you want to deploy
servers (where you likely will have budgets for support contracts) you
need to look at a different distro.

Just my perspective having watched this bug stagnate through three releases.

Forest (foresto) on 2011-07-02

summary:

- mountall for /var races with rpc.statd
+ mountall for /var or other nfs mount races with rpc.statd

Revision history for this message

Steve Langasek (vorlon) wrote on 2011-07-17:

#106

The proposed change to mountall-net.conf here is incorrect. Please file new bug reports against the nfs-utils package for races you're seeing between statd startup and mountall calls. From some of the later comments in this bug report, it looks like the statd process isn't ready to serve requests at the time it forks; if so, that's a bug in statd that needs to be fixed there.

Changed in mountall (Ubuntu):
status:	New → Invalid
Changed in mountall (Ubuntu Lucid):
status:	New → Invalid
Changed in mountall (Ubuntu Maverick):
status:	New → Invalid
Changed in mountall (Ubuntu Natty):
status:	New → Invalid

Revision history for this message

Steve Langasek (vorlon) wrote on 2011-07-17:

#107

Brian, in comment #84 you said that the SRUed package fixed the issue for you, but in your latest post you comment that "this bug has existed since Lucid". Did the updated package fix your issue, or did it not? If it didn't, we should reopen this bug report; up to then, you had certainly given the impression that your bug was fixed.

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2011-07-18: Re: [Bug 525154] Re: mountall for /var or other nfs mount races with rpc.statd

#108

On 11-07-17 04:59 AM, Steve Langasek wrote:
> Brian, in comment #84 you said that the SRUed package fixed the issue
> for you, but in your latest post you comment that "this bug has existed
> since Lucid". Did the updated package fix your issue, or did it not?
> If it didn't, we should reopen this bug report; up to then, you had
> certainly given the impression that your bug was fixed.

Yeah. I would say that it has. I guess it must just be all of the
other problems in Lucid (and beyond) that have not been fixed (i.e. all
of the bugs related to /var being on it's own filesystem that are still
open and dangling) that are clouding my judgment in this bug.

b.

Revision history for this message

Steve Langasek (vorlon) wrote on 2011-07-18:

#109

On Mon, Jul 18, 2011 at 05:38:51PM -0000, Brian J. Murrell wrote:
> On 11-07-17 04:59 AM, Steve Langasek wrote:
> > Brian, in comment #84 you said that the SRUed package fixed the issue
> > for you, but in your latest post you comment that "this bug has existed
> > since Lucid". Did the updated package fix your issue, or did it not?
> > If it didn't, we should reopen this bug report; up to then, you had
> > certainly given the impression that your bug was fixed.

> Yeah. I would say that it has. I guess it must just be all of the
> other problems in Lucid (and beyond) that have not been fixed (i.e. all
> of the bugs related to /var being on it's own filesystem that are still
> open and dangling) that are clouding my judgment in this bug.

Oh. Can you give me some specific bug numbers there? I wasn't aware of any
other issues with /var as a separate filesystem, and based on the
architecture I really wouldn't *expect* any bugs not related to NFS. So if
there are other problems, I'd very much like to know what they are so we can
see about getting them fixed.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2011-07-18:

#110

On 11-07-18 02:49 PM, Steve Langasek wrote:
>
> Oh. Can you give me some specific bug numbers there?

Not at the moment I'm afraid. The number of bugs i have in my
subscribed list is just way to big to go searching right now, but off
the top of my head, there is the ureadahead bug, where people have
actually posted solutions and afaik, it's still open.

Revision history for this message

Andy Hauser (andy-ubuntu-bugzilla) wrote on 2011-07-18:

#111

#484209

#275451

#690401

come to my mind.

Also just having a bind mount to a NFS mount in fstab makes my lucid nodes unbootable.

All in all I think we fixed/worked around 4 bugs in lucid to get them to boot.
All related to the introduction of upstart. And not all have fixes released.

Also I think at this point most people will have migrated to another distro, so don't assume
missing feedback means fixed.

Revision history for this message

ingo (ingo-steiner) wrote on 2011-07-18:

#112

@ Andy
> Also I think at this point most people will have migrated to another distro,

so me!

It's now well over a year since Lucid has been released and still not a single word of those issues in the "Release Notes (known issues)" here http://www.debian.org/releases/stable/amd64/release-notes/index.en.html.

As most of those bugs are obviously "by design" (upstart) they should have been fixed before release or QC should have refused to approve Lucid as a LTS.

Revision history for this message

Steve Langasek (vorlon) wrote on 2011-07-18:

#113

On Mon, Jul 18, 2011 at 07:05:51PM -0000, Brian J. Murrell wrote:
> On 11-07-18 02:49 PM, Steve Langasek wrote:

> > Oh. Can you give me some specific bug numbers there?

> Not at the moment I'm afraid. The number of bugs i have in my
> subscribed list is just way to big to go searching right now, but off
> the top of my head, there is the ureadahead bug, where people have
> actually posted solutions and afaik, it's still open.

This appears to be bug #523484.

To the best of my understanding, this describes a feature that is missing
when using a separate /var (the ureadahead job will run but not do anything
useful). This is certainly a bug, but not something that we are likely to
backport to lucid even when a fix becomes available. I think you would be
hard pressed to convince any of the Ubuntu developers that it has a major
impact on the usability of Ubuntu that systems with a separate /var don't
get the boot speed enhancement from ureadahead!

Also, you say that people have posted solutions; I've reviewed the bug log
and there are no solutions to the bug there. Most of the proposals would
work *only* on systems with detached /var, so are not suitable for
inclusion in the distribution; the closest thing to a fix is Clint's
proposal to add a signal handler and an additional upstart job, but that
code hasn't actually been written.

So I'm afraid I regard this bug's current prioritization as "medium" to be
correct, sorry. Patches welcome, but I think it's unlikely that this is
going to be worked on soon otherwise.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message

ingo (ingo-steiner) wrote on 2011-07-18:

#114

Sorry, my link was to the Squeeze release notes, here the one for Lucid:
https://wiki.ubuntu.com/LucidLynx/ReleaseNotes#Other_known_issues

Revision history for this message

Steve Langasek (vorlon) wrote on 2011-07-18:

#115

On Mon, Jul 18, 2011 at 07:42:54PM -0000, Andy Hauser wrote:
> #484209

Which is fixed.

> #275451

Which has nothing to do with lucid; the bug report dates back to 2008, is
marked "incomplete" in Debian, and does not discuss /var as a separate
partition. That's not an actionable bug report; I've closed it now.

> #690401

This is marked as a duplicate of the present bug report, which is fixed!

> Also just having a bind mount to a NFS mount in fstab makes my lucid
> nodes unbootable.

Is that bug #524972 (where the bind mount refers to a path which is a
symlink)?

> All in all I think we fixed/worked around 4 bugs in lucid to get them to
> boot. All related to the introduction of upstart. And not all have fixes
> released.

I'm sorry you found this to be the case. Unfortunately some of the
NFS-related problems were discovered quite late in the Lucid cycle, and some
of these bugs took quite some time to untangle once identified. But we do
take responsibility for all such critical bugs, which is precisely why I'm
checking to see if there are any such bugs that aren't on our radar.

So far, it doesn't appear that there are.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message

Andy Hauser (andy-ubuntu-bugzilla) wrote on 2011-07-18:

#116

Seems like the further problems in #484209 also lead here.

So maybe only this and the bindmount thing remain.

> > Also just having a bind mount to a NFS mount in fstab makes my lucid
> > nodes unbootable.

> Is that bug #524972 (where the bind mount refers to a path which is a symlink)?

Maybe. Only that I don't remember there being a question. Anyways answering
questions on the tty everytime the cluster nodes reboot is not a solution here.
Not even sure what the resolution of that bug is.

As far as I remember I also had to remove ureadahead.

Anyways. I guess it's nice of you to care. Certainly a great improvement over
the responses of Scott James Remnant at the time these bug reports were
filed ...

Revision history for this message

Brian J. Murrell (brian-interlinx) wrote on 2011-07-19:

#117

On 11-07-18 04:27 PM, Steve Langasek wrote:
>
> This appears to be bug #523484.
>
> To the best of my understanding, this describes a feature that is missing

Uhm, not so much "missing" as "broken".

> when using a separate /var (the ureadahead job will run but not do anything
> useful).

It's not even that nice. What it will do is litter the boot with error
messages, which is annoying and distracting at best and a red herring
when there are other upstart/mountall bugs, at worst.

A first time admin of a system with a separate /var sees these errors on
boot, and even if he is lucky enough that boot succeeds, is still
concerned about the errors (if he is any good) and wastes time chasing
them down only to find that they are a bug that exists and has not and
will not be fixed. Not very classy.

> This is certainly a bug, but not something that we are likely to
> backport to lucid even when a fix becomes available.

So instead you let the above situation continue on ad infinitum,
confusing new users?

> I think you would be
> hard pressed to convince any of the Ubuntu developers that it has a major
> impact on the usability of Ubuntu that systems with a separate /var don't
> get the boot speed enhancement from ureadahead!

It's not even the boot speed enhancement that is the issue. It's the
emission of spurious errors that any good admin will have to waste time
chasing down.

> Also, you say that people have posted solutions; I've reviewed the bug log
> and there are no solutions to the bug there. Most of the proposals would
> work *only* on systems with detached /var, so are not suitable for
> inclusion in the distribution;

Surely they are a good start though, with some conditional code needing
added to test for that separate /var case.

> So I'm afraid I regard this bug's current prioritization as "medium" to be
> correct, sorry. Patches welcome, but I think it's unlikely that this is
> going to be worked on soon otherwise.

Exactly my point and the point of mine (and others') frustration. You
guys let a bug get out into the wild by not testing a use-case that is
and/or should be very common in the server use space and now that it's
out there, you are just letting it ride.

On 11-07-18 04:27 PM, Steve Langasek wrote:
> 
> This appears to be bug #523484.
> 
> To the best of my understanding, this describes a feature that is missing

Uhm, not so much "missing" as "broken".

> when using a separate /var (the ureadahead job will run but not do anything
> useful).

It's not even that nice.  What it will do is litter the boot with error
messages, which is annoying and distracting at best and a red herring
when there are other upstart/mountall bugs, at worst.

A first time admin of a system with a separate /var sees these errors on
boot, and even if he is lucky enough that boot succeeds, is still
concerned about the errors (if he is any good) and wastes time chasing
them down only to find that they are a bug that exists and has not and
will not be fixed.  Not very classy.

> This is certainly a bug, but not something that we are likely to
> backport to lucid even when a fix becomes available.

So instead you let the above situation continue on ad infinitum,
confusing new users?

> I think you would be
> hard pressed to convince any of the Ubuntu developers that it has a major
> impact on the usability of Ubuntu that systems with a separate /var don't
> get the boot speed enhancement from ureadahead!

It's not even the boot speed enhancement that is the issue.  It's the
emission of spurious errors that any good admin will have to waste time
chasing down.

> Also, you say that people have posted solutions; I've reviewed the bug log
> and there are no solutions to the bug there.  Most of the proposals would
> work *only* on systems with detached /var, so are not suitable for
> inclusion in the distribution;

Surely they are a good start though, with some conditional code needing
added to test for that separate /var case.

> So I'm afraid I regard this bug's current prioritization as "medium" to be
> correct, sorry.  Patches welcome, but I think it's unlikely that this is
> going to be worked on soon otherwise.

Exactly my point and the point of mine (and others') frustration.  You
guys let a bug get out into the wild by not testing a use-case that is
and/or should be very common in the server use space and now that it's
out there, you are just letting it ride.

Revision history for this message

Oliver Brakmann (obrakmann) wrote on 2011-07-19:

#118

Hi,

this is turning into a forum discussion real fast, and really has no
place here. I suggest we take it someplace else. Ubuntu-devel, maybe?

On 2011-07-19 12:17, Brian J. Murrell wrote:
> On 11-07-18 04:27 PM, Steve Langasek wrote:
>>
>> This appears to be bug #523484.
>> To the best of my understanding, this describes a feature that is missing
> Uhm, not so much "missing" as "broken".
>> when using a separate /var (the ureadahead job will run but not do anything
>> useful).
>
> It's not even that nice. What it will do is litter the boot with error
> messages

I think "litter" is a bit strong. There's a single line that says that
ureadahead terminated with status soandso. That's hardly littering.
And that's about it with regard to the impact of that bug. I have never
seen a system not boot due to it, and I have used (and am using) Lucid
on bare-metal servers, virtual machines, desktops and road warrior
laptops, all with a separate /var.

> which is annoying and distracting at best and a red herring
> when there are other upstart/mountall bugs, at worst.

I'll agree that it is annoying, but any admin that sees the ureadahead
message when a system shows other problems and goes 'oh, ureadahead
croaked, that must be the cause of it all!' seriously needs to drop it
right then and there and go stack shelves somewhere.

>> This is certainly a bug, but not something that we are likely to
>> backport to lucid even when a fix becomes available.
>
> So instead you let the above situation continue on ad infinitum,
> confusing new users?

That's not what Steve said.

>> I think you would be
>> hard pressed to convince any of the Ubuntu developers that it has a major
>> impact on the usability of Ubuntu that systems with a separate /var don't
>> get the boot speed enhancement from ureadahead!

I agree.

> Exactly my point and the point of mine (and others') frustration. You
> guys let a bug get out into the wild by not testing a use-case that is
> and/or should be very common in the server use space and now that it's
> out there, you are just letting it ride.

Again, I agree that the message is annoying, and it would be nice if we
could get rid of it. But I would describe it as 'cosmetic' at best, and
hardly something to get frustrated about, much less fault the developers
for not giving it highest priority.

So is this seriously something to get so worked up about?

Also, you know, Ubuntu makes alphas public for a reason.

Regards,
Oliver

Hi,

this is turning into a forum discussion real fast, and really has no
place here.  I suggest we take it someplace else. Ubuntu-devel, maybe?

On 2011-07-19 12:17, Brian J. Murrell wrote:
> On 11-07-18 04:27 PM, Steve Langasek wrote:
>>
>> This appears to be bug #523484.
>> To the best of my understanding, this describes a feature that is missing
> Uhm, not so much "missing" as "broken".
>> when using a separate /var (the ureadahead job will run but not do anything
>> useful).
> 
> It's not even that nice.  What it will do is litter the boot with error
> messages

I think "litter" is a bit strong.  There's a single line that says that
ureadahead terminated with status soandso.  That's hardly littering.
And that's about it with regard to the impact of that bug.  I have never
seen a system not boot due to it, and I have used (and am using) Lucid
on bare-metal servers, virtual machines, desktops and road warrior
laptops, all with a separate /var.

> which is annoying and distracting at best and a red herring
> when there are other upstart/mountall bugs, at worst.

I'll agree that it is annoying, but any admin that sees the ureadahead
message when a system shows other problems and goes 'oh, ureadahead
croaked, that must be the cause of it all!' seriously needs to drop it
right then and there and go stack shelves somewhere.

>> This is certainly a bug, but not something that we are likely to
>> backport to lucid even when a fix becomes available.
> 
> So instead you let the above situation continue on ad infinitum,
> confusing new users?

That's not what Steve said.

>> I think you would be
>> hard pressed to convince any of the Ubuntu developers that it has a major
>> impact on the usability of Ubuntu that systems with a separate /var don't
>> get the boot speed enhancement from ureadahead!

I agree.

> Exactly my point and the point of mine (and others') frustration.  You
> guys let a bug get out into the wild by not testing a use-case that is
> and/or should be very common in the server use space and now that it's
> out there, you are just letting it ride.

Again, I agree that the message is annoying, and it would be nice if we
could get rid of it.  But I would describe it as 'cosmetic' at best, and
hardly something to get frustrated about, much less fault the developers
for not giving it highest priority.

So is this seriously something to get so worked up about?

Also, you know, Ubuntu makes alphas public for a reason.

Regards,
Oliver

Revision history for this message

Steve Langasek (vorlon) wrote on 2011-07-28:

#119

On Mon, Jul 18, 2011 at 10:56:50PM -0000, Andy Hauser wrote:
> > Is that bug #524972 (where the bind mount refers to a path which is a
> > symlink)?

> Maybe. Only that I don't remember there being a question.

That could point to a plymouth bug affecting your system.

> Anyways answering questions on the tty everytime the cluster nodes reboot
> is not a solution here.

Certainly not! But if you're affected by bug #524972, there's a
straightforward workaround - specify the real path in /etc/fstab instead of
a symlink.

> As far as I remember I also had to remove ureadahead.

So far, the only confirmed issues with ureadahead in the final release are
cosmetic ones; if you can reproduce any problems with ureadahead causing
boot failures, we'd be interested to know about them.

Revision history for this message

hyper_ch (bugs-launchpad-net-roleplayer) wrote on 2011-08-12:

#120

Download full text (6.6 KiB)

I just had quite a few updates in the pipe today and it seems I run into statd problems also. Not quite sure if it's the same bug.

cat syslog | grep statd gives me this:

Aug 12 18:46:04 doubi statd-pre-start: local-filesystems started
Aug 12 18:46:05 doubi rpc.statd[1008]: Version 1.2.2 starting
Aug 12 18:46:05 doubi rpc.statd[1008]: Flags:
Aug 12 18:46:05 doubi rpc.statd[1008]: unable to register (statd, 1, udp).
Aug 12 18:46:05 doubi init: statd main process (1008) terminated with status 1
Aug 12 18:46:05 doubi init: statd main process ended, respawning
Aug 12 18:46:05 doubi statd-pre-start: local-filesystems started
Aug 12 18:46:05 doubi rpc.statd[1032]: Version 1.2.2 starting
Aug 12 18:46:05 doubi rpc.statd[1032]: Flags:
Aug 12 18:46:05 doubi rpc.statd[1032]: unable to register (statd, 1, udp).
Aug 12 18:46:05 doubi init: statd main process (1032) terminated with status 1
Aug 12 18:46:05 doubi init: statd main process ended, respawning
Aug 12 18:46:05 doubi statd-pre-start: local-filesystems started
Aug 12 18:46:05 doubi rpc.statd[1041]: Version 1.2.2 starting
Aug 12 18:46:05 doubi rpc.statd[1041]: Flags:
Aug 12 18:46:05 doubi rpc.statd[1041]: unable to register (statd, 1, udp).
Aug 12 18:46:05 doubi init: statd main process (1041) terminated with status 1
Aug 12 18:46:05 doubi init: statd main process ended, respawning
Aug 12 18:46:05 doubi statd-pre-start: local-filesystems started
Aug 12 18:46:05 doubi rpc.statd[1049]: Version 1.2.2 starting
Aug 12 18:46:05 doubi rpc.statd[1049]: Flags:
Aug 12 18:46:05 doubi rpc.statd[1049]: unable to register (statd, 1, udp).
Aug 12 18:46:05 doubi init: statd main process (1049) terminated with status 1
Aug 12 18:46:05 doubi init: statd main process ended, respawning
Aug 12 18:46:05 doubi statd-pre-start: local-filesystems started
Aug 12 18:46:05 doubi rpc.statd[1066]: Version 1.2.2 starting
Aug 12 18:46:05 doubi rpc.statd[1066]: Flags:
Aug 12 18:46:05 doubi rpc.statd[1066]: unable to register (statd, 1, udp).
Aug 12 18:46:05 doubi init: statd main process (1066) terminated with status 1
Aug 12 18:46:05 doubi init: statd main process ended, respawning
Aug 12 18:46:05 doubi statd-pre-start: local-filesystems started
Aug 12 18:46:05 doubi rpc.statd[1074]: Version 1.2.2 starting
Aug 12 18:46:05 doubi rpc.statd[1074]: Flags:
Aug 12 18:46:05 doubi rpc.statd[1074]: unable to register (statd, 1, udp).
Aug 12 18:46:05 doubi init: statd main process (1074) terminated with status 1
Aug 12 18:46:05 doubi init: statd main process ended, respawning
Aug 12 18:46:05 doubi statd-pre-start: local-filesystems started
Aug 12 18:46:05 doubi rpc.statd[1082]: Version 1.2.2 starting
Aug 12 18:46:05 doubi rpc.statd[1082]: Flags:
Aug 12 18:46:05 doubi rpc.statd[1082]: unable to register (statd, 1, udp).
Aug 12 18:46:05 doubi init: statd main process (1082) terminated with status 1
Aug 12 18:46:05 doubi init: statd main process ended, respawning
Aug 12 18:46:05 doubi statd-pre-start: local-filesystems started
Aug 12 18:46:05 doubi rpc.statd[1101]: Version 1.2.2 starting
Aug 12 18:46:05 doubi rpc.statd[1101]: Flags:

[... continues for quite some time... ]

Aug 12 18:46:05 doubi rpc.statd[1109]: unable to register (statd, 1, ...

Ubuntu
nfs-utils package

mountall for /var or other nfs mount races with rpc.statd

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Patches

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
mountall (Ubuntu)	Invalid	Undecided	Unassigned
Lucid	Invalid	Undecided	Unassigned
Maverick	Invalid	Undecided	Unassigned
Natty	Invalid	Undecided	Unassigned
nfs-utils (Ubuntu)	Fix Released	High	Unassigned
Lucid	Fix Released	High	Unassigned
Maverick	Fix Released	High	Unassigned
Natty	Fix Released	High	Unassigned
portmap (Ubuntu)	Fix Released	Undecided	Unassigned
Lucid	Fix Released	High	Unassigned
Maverick	Fix Released	High	Unassigned
Natty	Fix Released	Undecided	Unassigned

Ubuntunfs-utils package

mountall for /var or other nfs mount races with rpc.statd

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntu
nfs-utils package