mountall does not honour _netdev

Bug #1313513 reported by Stuart Longland
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
mountall (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi,

This is a fresh install of Ubuntu 14.04 LTS AMD64. I tried configuring a Ceph Rados Block Device (rbd) to be mounted during boot on /var/lib/one, containing my OpenNebula configuration and database.

The idea being that should the machine go belly up, I'll have an up-to-date snapshot of the OpenNebula data on Ceph to mount on the new frontend machine.

/etc/ceph/rbdmap is configured, I set up /etc/fstab with an entry:

/dev/rbd/pool/rbdname /var/lib/one xfs defaults,_netdev 0 1

then rebooted. According to mount(8), _netdev is supposed to tell mountall to skip mounting this device until the network is up.

As seen from the attached snapshot, it doesn't bother to wait, and blindly tries to mount the RBD before connecting to Ceph: this will never work.

mountall seems to rely on *knowing* a list of network file systems: this means when someone comes up with a new network file system, or uses a conventional disk file system with a remote block device, mountall's heuristic falls flat on its face as has been demonstrated here. The problem would also exist for iSCSI, AoE, FibreChannel, nbd and drbd devices.

Due to bug 1313497, the keyboard is non-functional. Recovery is useless as the keyboard is broken there too, and now the machine is waiting for a keypress it will never see due to that bug. A headless system would similarly have this problem.

Two suggestions I would have:
1. mountall should honour _netdev to decide whether to mount a device or not: this gives the user the means to manually tell mountall that the device needs network access to operate even if the filesystem looks to be local. I'd wager that if the user specified _netdev, they probably meant it and likely know better than mountall.
2. mountall should time out after a predefined period and NEVER wait indefinitely: even if the disk is local. If a disk goes missing, then it is better the machine tries to boot in its degraded state so it can be remotely managed and raise an alarm, than to wait for someone to notice the machine being down.

Unfortunately since the machine is now effectively bricked, I can only grep proxy server logs to see what packages got installed. mountall_2.53_amd64.deb seems to be the culprit.

Revision history for this message
Stuart Longland (redhatter) wrote :
Revision history for this message
Steve Langasek (vorlon) wrote :

While the mount(8) manpage says that _netdev causes the mount to be deferred until the network is up, this manpage was written in a bygone era when "network up" was a discrete event, which it hasn't been for a long time. The current behavior is that _netdev devices will be tried immediately on boot, and tried again each time a network interface comes up. If this doesn't give the desired results, I think this is a bug in the ceph driver - not in mountall, which has been tested with _netdev (and network filesystems) repeatedly and shown to work correctly.

> As seen from the attached snapshot, it doesn't bother to wait,
> and blindly tries to mount the RBD before connecting to Ceph:
> this will never work.

If there is a specific connection that needs to be made before running the mount command, then I don't think that's something mountall can be expected to handle. Something else on the system would need to intercept the request for a ceph mount, and block it until ceph is available.

Revision history for this message
Stuart Longland (redhatter) wrote : Re: [Bug 1313513] Re: mountall does not honour _netdev

On 29/04/14 10:32, Steve Langasek wrote:
> While the mount(8) manpage says that _netdev causes the mount to be
> deferred until the network is up, this manpage was written in a bygone
> era when "network up" was a discrete event, which it hasn't been for a
> long time.

Ahh, so out of date documentation strikes again. Ahh well, we should
perhaps amend that documentation. Or an equivalent feature re-instated,
as I believe there are valid use cases for the old _netdev behaviour.

> The current behavior is that _netdev devices will be tried
> immediately on boot, and tried again each time a network interface comes
> up. If this doesn't give the desired results, I think this is a bug in
> the ceph driver - not in mountall, which has been tested with _netdev
> (and network filesystems) repeatedly and shown to work correctly.

The trouble is it hangs waiting for a /dev/rbd device to appear, which
won't happen until the 'rbdmap' service is started.

Once 'rbdmap' has done its duty, mount works as expected (and thus,
mountall should also work).

>> As seen from the attached snapshot, it doesn't bother to wait,
>> and blindly tries to mount the RBD before connecting to Ceph:
>> this will never work.
>
> If there is a specific connection that needs to be made before running
> the mount command, then I don't think that's something mountall can be
> expected to handle. Something else on the system would need to
> intercept the request for a ceph mount, and block it until ceph is
> available.

How about not blocking the entire system boot so the machine remains
unresponsive and impossible to connect to remotely?

Some of the machines we look after are stuck in military bases or
underground in mines: it's not like we can just stroll up to the console
and press a button.

Had the 'mountall' not stalled the entire boot sequence, but allowed the
boot to proceed minus the /var/lib/one whilst continuing to retry, it
might've found the device it needed would appear in time.

I can understand the "let's wait it out and see if it appears", but not
the "let's halt everything until the device magically appears". The
latter is dangerous for any system for which local console access is
difficult or unavailable. (As is my case here, with the buggered keyboard.)

Regards,
--
Stuart Longland
Systems Engineer
     _ ___
\ /|_) | T: +61 7 3535 9619
 \/ | \ | 38b Douglas Street F: +61 7 3535 9699
   SYSTEMS Milton QLD 4064 http://www.vrt.com.au

Revision history for this message
Stuart Longland (redhatter) wrote :

As a point of interest, this is an `rbdmount` script I use with upstart as a work-around to the mountall issue.

It assumes that the devices listed in /etc/ceph/rbdmap are intended to be mounted locally, and so calls mount on each listed device that appears.

Very much a hack: it'd be more elegant for mountall to not block the boot process.

Revision history for this message
Steve Langasek (vorlon) wrote :

mountall blocks only those parts of the boot process that are marked as depending on the relevant filesystem. And by default, mountall assumes that the 'filesystem' event should not be emitted until all filesystems under /usr and /var are mounted. If this is not the correct policy for your use case, you can override the behavior with the 'nobootwait' option in /etc/fstab.

However, in this case that would seem to just be a workaround for the ceph client itself blocking on the 'filesystem' event when it shouldn't. Are you using the Ubuntu ceph package on the client? If so, this bug should be reassigned there.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mountall (Ubuntu):
status: New → Confirmed
Revision history for this message
Juhan Leemet (juhan) wrote :

IMO it is not enough to say "obsolete documentation, someone should remove it". Some of us have been using *nix for years (or decades), and some of these facilities have been developed and stabilized years ago. We rely on things that used to work to keep working. At the very least, one should revise old documentation to clarify that some older method has been "deprecated" (as in java documentation), and the new method should be referenced, together with conversion methods and/or tools. We want to build on previous work. We don't want all of our sand castles to fall down.

In my case, I also find that _netdev does not work on a recent installation of Ubuntu 14.04.2 LTS (upgraded to .3) on Lenovo D20 (dual Xeon), but mysteriously it DOES work on a 1 y.o. installation (upgraded to 14.04.3 LTS) on HP 500-189 (AMD A10). That suggests to me that there must be some interaction(s) between packages? My old installation has everything (including the kitchen sink), but my new installation is rather spare (tho not exactly minimum). What is annoying is I want/need to NFS mount my /home directories on my new install as well. I will try autofs (waits longer, delaying during remainder of boot, until I try to login) instead of hard NFS mount during boot sequence. Still, it is annoying when things that used to work suddenly stop working.

Revision history for this message
Jonathan Kamens (jik) wrote :

I'm really not sure what to make of all this. My fstab has several cifs filesystems in it, and none of them mount at boot regardless of whether _netdev is specified. I don't know whether that's because of this bug -- it's not even clear to me _exactly_ what this bug is about -- or some other bug that is or is not in the database already. I don't know what I can do to help move the process forward of getting this issue -- which apparently affects other people besides me -- fixed. I'm surprised that this bug is almost two years old and nobody seems to care about the fact that network filesystems in /etc/fstab don't work on Ubuntu.

P.S. for me, it's 15.10 that's not working.

Revision history for this message
Jonathan Kamens (jik) wrote :

By the way, in my case it appears to be because mount can't resolve the host name of the CIFS file server. I think this is because systemd is trying to mount the filesystems before the nameserver is finished launching, so perhaps if bind9.service is enabled on the host, systemd needs to wait for it to start before mounting network filesystems?

Revision history for this message
Jonathan Kamens (jik) wrote :

Actually, I was wrong, it's not because of DNS problems, it's because systemd is trying to mount the filesystems before the network is up. I changed the host names in /etc/fstab to IP addresses, and it still doesn't work:

Jan 20 08:17:31 jik5 mount[979]: mount error(101): Network is unreachable

So there is something here that is definitely not working properly.

Revision history for this message
Jonathan Kamens (jik) wrote :

Fixed for me by putting this in /etc/systemd/system/remote-fs-pre.target.d/override.conf:

[Unit]
Requires=NetworkManager-wait-online.service
After=NetworkManager-wait-online.service

Revision history for this message
Steve Langasek (vorlon) wrote :

On Wed, Jan 20, 2016 at 12:52:34PM -0000, Jonathan Kamens wrote:
> P.S. for me, it's 15.10 that's not working.

You're on the wrong bug. Mountall is an upstart-specific package, which is
not used for mounting in systemd systems (15.04 and later).

Revision history for this message
Jonathan Kamens (jik) wrote :

OK. Couldn't find another bug about this with systemd, so created #1536294.

Revision history for this message
Steve Baker (sbaker-gre) wrote :

I seem to be running into this bug on a new build of 14.04.4 when using an LVM on a multipathed ISCSI drive which I am trying to mount at /var/lib/mysql.

I'm no upstart guru, so if there is anything I can post to help verify if it is this bug that is causing an issue please let me know.

If I just add the 'nobootwait' option to fstab will mysql still wait till the iscsi drive is mounted correctly?

Thanks,
Steve

Revision history for this message
Elias Abacioglu (raboo) wrote :

I have a similar problem when trying to mount a cephfs volume on ubuntu 14.04.
It just freezes and I can't do anything about it.
I'm unable to figure out why it fails. I am running OpenvSwitch, what I see in boot log is something like this:

* Starting configure network device
* Starting configure network device security
* Starting configure network device security
* Starting Mount network filesystems
* Starting configure network device
* Starting configure network device security
* Stopping Mount network filesystems
* Starting Mount network filesystems
* Starting configure network device
* Starting Open vSwitch switch
* Stopping Mount network filesystems
_

(it's stuck here forever)

And my fstab looks like this
10.3.60.25,10.3.60.23,10.3.60.21:/opennebula /var/lib/one ceph _netdev,mount_timeout=10,name=opennebula_cephfs,secretfile=/etc/ceph/ceph.client.opennebula_cephfs.secret 0 0

However worth mentioning that one of the IP's in the list is this machine itself.. In this case second in the list, 10.3.60.23.

Revision history for this message
vijay (vijayforos) wrote :

In ubuntu 14.05.5 I am running into the same issue. I have a iscsi device added in fstab with _netdev switch but doesnt' seem to work. system just hangs at boot, i have to go into recovery mode to comment out the filesystem to get the system to boot.

Interested in what others have done as a work around ? appreciate inputs.

Thanks,
Vj

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.