mountall causes automatic mounting of gluster shares to fail

Bug #1103047 reported by Joris S'heeren
44
This bug affects 9 people
Affects Status Importance Assigned to Milestone
mountall (Ubuntu)
Expired
Low
Unassigned

Bug Description

Ubuntu Server 12.04.1 LTS (GNU/Linux 3.2.0-29-virtual x86_64)
glusterfs 3.3.1 (from semiosis PPA)
mountall 2.36.3

We are running an OpenStack cloud (essex). We use Gluster as a network filesystem for storage of data.
When we updated mountall on our client VMs we experienced a failure of the automatic mounting of the gluster-shares at boot-time. The gluster-shares can be mounted manually when the VMs are booted and online.

When we downgraded the mountall package from new version 2.36.3 to old version 2.36, the gluster-shares mounted again during boot.

boot.log (added --verbose to /etc/init/mountall.conf)
=======
Connected to Plymouth
/ is local
/proc is virtual
/sys is virtual
/sys/fs/fuse/connections is virtual
/sys/kernel/debug is virtual
/sys/kernel/security is virtual
/dev is virtual
/dev/pts is virtual
/tmp is local
/run is virtual
/run/lock is virtual
/run/shm is virtual
/home is nowait
/scratch is nowait
/opt is nowait
/mnt is nowait
mounting event sent for /sys/fs/fuse/connections
mounting event sent for /sys/kernel/debug
mounting event sent for /sys/kernel/security
mounting event sent for /run/lock
mounting event sent for /run/shm
mounting event sent for /
mounted event handled for /sys
swap finished
local 0/2 remote 0/0 virtual 1/10 swap 0/0
mounted event handled for /dev/pts
local 0/2 remote 0/0 virtual 2/10 swap 0/0
mounting event handled for /sys/fs/fuse/connections
mounting /sys/fs/fuse/connections
mount /sys/fs/fuse/connections [282] exited normally
mounting event handled for /sys/kernel/debug
mounting /sys/kernel/debug
mount /sys/kernel/debug [296] exited normally
mounting event handled for /sys/kernel/security
mounting /sys/kernel/security
mount /sys/kernel/security [299] exited normally
mounting event handled for /run/lock
mounting /run/lock
mount /run/lock [301] exited normally
mounting event handled for /run/shm
mounting /run/shm
mount /run/shm [304] exited normally
mounted event handled for /proc
local 0/2 remote 0/0 virtual 3/10 swap 0/0
mounting event handled for /
remounting /
mount / [310] exited normally
mounted event handled for /dev
mount / [321] exited normally
mount /proc [322] exited normally
mount /sys [323] exited normally
mount /sys/fs/fuse/connections [324] exited normally
mount /sys/kernel/debug [325] exited normally
mount /sys/kernel/security [326] exited normally
mount /dev [327] exited normally
mount /dev/pts [328] exited normally
mount /run [329] exited normally
mount /run/lock [330] exited normally
mount /run/shm [331] exited normally
local 0/2 remote 0/0 virtual 4/10 swap 0/0
mounted event handled for /sys/fs/fuse/connections
local 0/2 remote 0/0 virtual 5/10 swap 0/0
mounted event handled for /sys/kernel/security
local 0/2 remote 0/0 virtual 6/10 swap 0/0
mounting event sent for /tmp
mounting event sent for /home
mounting event sent for /scratch
mounting event sent for /opt
mounting event sent for /mnt
mounted event handled for /sys/kernel/debug
local 0/2 remote 0/0 virtual 7/10 swap 0/0
mounted event handled for /run/lock
local 0/2 remote 0/0 virtual 8/10 swap 0/0
mounted event handled for /run/shm
local 0/2 remote 0/0 virtual 9/10 swap 0/0
mounting event handled for /tmp
mounting event handled for /home
mounting /home
Mount failed. Please check the log file for more details.
mountall: mount /home [333] terminated with status 1
mountall: Filesystem could not be mounted: /home
mounting event handled for /scratch
mounting /scratch
Mount failed. Please check the log file for more details.
mountall: mount /scratch [389] terminated with status 1
mountall: Filesystem could not be mounted: /scratch
mounting event handled for /opt
mounting /opt
Mount failed. Please check the log file for more details.
mountall: mount /opt [424] terminated with status 1
mountall: Filesystem could not be mounted: /opt
mounting event handled for /mnt
mounting /mnt
mount /mnt [459] exited normally
mounting event sent for /home
mounting event sent for /scratch
mounting event sent for /opt
mounted event handled for /tmp
local 1/2 remote 0/0 virtual 9/10 swap 0/0
mounting event handled for /home
mounting event handled for /scratch
mounting event handled for /opt
mounted event handled for /mnt
local 1/2 remote 0/0 virtual 9/10 swap 0/0
mounting event sent for /home
mounting event sent for /scratch
mounting event sent for /opt
mounting event handled for /home
mounting event handled for /scratch
mounting event handled for /opt
cloud-init start-local running: Tue, 22 Jan 2013 15:55:55 +0000. up 3.83 seconds
mounted event handled for /run
virtual finished
remote finished
local 1/2 remote 0/0 virtual 10/10 swap 0/0
mounting event sent for /home
mounting event sent for /scratch
mounting event sent for /opt
mounting event handled for /home
mounting event handled for /scratch
mounting event handled for /opt
no instance data found in start-local
mounting event sent for /mnt
mounting event handled for /mnt
Received SIGUSR1 (network device up)
mounting event sent for /home
mounting event sent for /scratch
mounting event sent for /opt
mounting event handled for /home
mounting event handled for /scratch
mounting event handled for /opt
Received SIGUSR1 (network device up)
mounting event sent for /home
mounting event sent for /scratch
mounting event sent for /opt
mounting event handled for /home
mounting event handled for /scratch
mounting event handled for /opt
ci-info: lo : 1 127.0.0.1 255.0.0.0 .
ci-info: eth0 : 1 192.168.0.12 255.255.255.0 fa:16:3e:0a:0b:4c
ci-info: route-0: 0.0.0.0 192.168.0.11 0.0.0.0 eth0 UG
ci-info: route-1: 192.168.0.0 0.0.0.0 255.255.255.0 eth0 U
cloud-init start running: Tue, 22 Jan 2013 15:55:57 +0000. up 5.78 seconds
found data source: DataSourceEc2
mounted event handled for /
local finished
All filesystems mounted
local 2/2 remote 0/0 virtual 10/10 swap 0/0
mounting event sent for /home
mounting event sent for /scratch
mounting event sent for /opt
mounting event handled for /home
mounting event handled for /scratch
mounting event handled for /opt
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
Skipping profile in /etc/apparmor.d/disable: usr.bin.firefox
 * Starting AppArmor profiles [ OK ]
Starting memcached: * Stopping System V initialisation compatibility [ OK ]
 * Starting System V runlevel compatibility [ OK ]
 * Starting automatic crash report generation [ OK ]
 * Starting deferred execution scheduler [ OK ]
 * Starting regular background program processing daemon [ OK ]
 * Starting ACPI daemon [ OK ]
 * Starting save kernel messages [ OK ]
 * Starting CPU interrupts balancing daemon [ OK ]
 * Stopping save kernel messages [ OK ]
 * Starting crash report submission daemon [ OK ]
 * Stopping Handle applying cloud-config [ OK ]
memcached.
 * Starting Name Service Cache Daemon nscd [ OK ]
 * Starting Postfix Mail Transport Agent postfix [ OK ]
 * Starting HTTP accelerator varnishd [ OK ]
 * Starting NTP server ntpd [ OK ]
landscape-client is not configured, please run landscape-config.
Warning: DocumentRoot [/home/www] does not exist
 * Starting web server apache2 [ OK ]
 * Stopping System V runlevel compatibility [ OK ]
 * Starting execute cloud user/final scripts [ OK ]

/etc/fstab
========
LABEL=cloudimg-rootfs / ext4 defaults 0 0
10.0.0.100:/home /home glusterfs defaults,nobootwait,comment=cloudconfig 0 0
10.0.0.100:/scratch /scratch glusterfs defaults,nobootwait,comment=cloudconfig 0 0
10.0.0.100:/ubuntu_11_10_opt /opt glusterfs defaults,nobootwait,comment=cloudconfig 0 0
/dev/vdb /mnt auto defaults,nobootwait,comment=cloudconfig 0 2

gluster log of home volume
======================
[2013-01-22 13:17:58.788134] I [glusterfsd.c:1666:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.3.1
[2013-01-22 13:17:58.862333] E [common-utils.c:125:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known)
[2013-01-22 13:17:58.862395] E [name.c:243:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host 10.0.0.100
[2013-01-22 13:17:58.862514] E [glusterfsd-mgmt.c:1787:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: Success
[2013-01-22 13:17:58.862546] I [glusterfsd-mgmt.c:1790:mgmt_rpc_notify] 0-glusterfsd-mgmt: -1 connect attempts left
[2013-01-22 13:17:58.863087] W [glusterfsd.c:831:cleanup_and_exit] (-->/usr/sbin/glusterfs(glusterfs_mgmt_init+0x1ff) [0x7f5d23e811af] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_start+0x12) [0x7f5d237cd922] (-->/usr/sbin/glusterfs(+0xd486) [0x7f5d23e81486]))) 0-: received signum (1), shutting down
[2013-01-22 13:17:58.863134] I [fuse-bridge.c:4648:fini] 0-fuse: Unmounting '/home'.

Please let me know if you need any more info.

description: updated
Revision history for this message
Steve Langasek (vorlon) wrote :

mountall doesn't know that glusterfs is a network filesystem type. This means that at boot time, there's a race between bringing the network up and trying to mount /home, and when mountall wins the race, the mount fails and is never re-tried. The newer version of mountall is faster by mounting in parallel, which increases the chance of hitting this race and exposing this problem.

To work around this bug, you should be able to add the '_netdev' option to the fstab entries, signal to mountall that the mount should be retried when network interfaces are brought up. Can you please test that this fixes the issue for you?

Changed in mountall (Ubuntu):
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
Bram De Wilde (gbramdewilde) wrote :

Mounting of gluster volumes with or without the _netdev option in the fstab entry gives a:

unknown option _netdev (ignored)

It's my understanding that _netdev only works for nfs volumes, no?

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1103047] Re: mountall causes automatic mounting of gluster shares to fail

On Thu, Jan 24, 2013 at 10:19:03AM -0000, Bram De Wilde wrote:
> Mounting of gluster volumes with or without the _netdev option in the
> fstab entry gives a:

> unknown option _netdev (ignored)

At what point do you get this message?

> It's my understanding that _netdev only works for nfs volumes, no?

No, _netdev is a standard fstab option as described in the mount(8) manpage
for declaring that the filesystem requires network access when this is not
evident from the filesystem type. It's possible that mountall passes this
through to the mount command and the glusterfs mount helper doesn't
understand it, causing this warning message. But by itself this message
shouldn't prevent the filesystem from mounting. Did the filesystems mount
ok on boot after this change?

Revision history for this message
Christian Parpart (trapni) wrote :

I am also running in this problem, and _netdev didn't fix it, and as the other guy said, right after booting, doing a `mount -a` mounts the /home just fine.

Revision history for this message
Steve Langasek (vorlon) wrote :

> I am also running in this problem, and _netdev didn't fix it

Did you also get the "unknown option _netdev (ignored)" message? If so, when and where does it show up?

_netdev is a standard option which should do exactly what's expected here. So if it's not working, I would need to see some logs from mountall --verbose to understand why.

Revision history for this message
Clayton Kramer (clayton-kramer) wrote :

Just had the same thing happen on a number of our 12.04.2 KVM guests. The _netdev parameter is being ignored.

Revision history for this message
Clayton Kramer (clayton-kramer) wrote :

I had to comment out the gluster fstab entry to get these 12.04.2 machines to finish their boot. Mountall kept throwing an "mountall event failed" before the system console was up. Adding and removing _netdev from the /etc/fstab had no affect.

My workaround involved booting the KVMs on server ISO using recovery mode. Edited the fstab and added gluster mount to the rc.local file.

sleep 3 && mount.glusterfs [hostname]:/vol1 /mnt/glusterfs/vol1

Revision history for this message
Steve Langasek (vorlon) wrote :

Clayton, can you provide the output of 'mountall --verbose' requested in comment #5, when using the _netdev option? To get this output, add --verbose to the commandline in /etc/init/mountall.conf (as in the original bug report).

BTW, glusterfs doesn't require any client daemons to be running for mounts to work, does it (like NFS does)? From the Ubuntu packages, it doesn't look like this is the case.

Revision history for this message
Bruno Léon (bruno-leon) wrote :

Hello,

here is a log: http://pastebin.com/tBJLDVwj

I did the mountall myself fro mthe command line (i.e not log from fstab).

Might indeed be that glusterfs is passed the _netdev option and does not support it.

Thanks

Revision history for this message
Bruno Léon (bruno-leon) wrote :

From fstab we have this kind of log:

http://pastebin.com/DBFPmRTQ

Revision history for this message
Steve Langasek (vorlon) wrote :

Bruno, your first pastebin shows:

local 3/3 remote 0/1 virtual 11/11 swap 0/0
mounting event handled for /var/www
mounting event sent for /var/www
mounting event handled for /var/www
unknown option _netdev (ignored)
mounted event handled for /var/www
remote finished

This tells us several things:

 - using _netdev causes mountall to correctly identify /var/www as a remote mount
 - the _netdev option is passed to the mount command (which is probably a bug), and is not understood by the glusterfs mount helper, but is ignored and does *not* cause the mount to fail
 - when run after boot, mountall has no difficulty mounting glusterfs.

But I really need to see a log of mountall from *boot* time, showing the failure to mount when using _netdev.

Revision history for this message
Bruno Léon (bruno-leon) wrote :

Ok so the situation was as following.

In boot.log we have:

mounted event handled for /var/www
local 2/3 remote 1/1 virtual 9/10 swap 0/0
Mount failed. Please check the log file for more details.
mountall: mount /var/www [337] terminated with status 1
Filesystem could not be mounted: /var/www

The thing is the log file mentionned, which I suppose to be the gluster log file, does not exist.
Multiple attempts (and this one I do not explain) permitted me to have one time in boot.log a message stating that /va/log/glusterfs/var-www.log did not exist.

Actually the folder exist, and permissions are fine as well.

The only thing is that actually my /var/log is a dedicated partition like often on servers.
I commented this out in fstab for a try, and managed to have the /va/log/glusterfs/var-www.log writtend at boot time.

[2013-03-21 21:55:45.968441] I [glusterfsd.c:1666:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.3.1
[2013-03-21 21:55:45.970384] E [common-utils.c:125:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known)
[2013-03-21 21:55:45.970471] E [name.c:243:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host storage.domain.com
[2013-03-21 21:55:45.970503] E [glusterfsd-mgmt.c:1787:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: Success
[2013-03-21 21:55:45.970523] I [glusterfsd-mgmt.c:1790:mgmt_rpc_notify] 0-glusterfsd-mgmt: 2 connect attempts left

The fstab line is:
storage.domain.com:web-g-www /var/www glusterfs defaults,_netdev,backupvolfile-server=172.18.2.130,fetch-attempts=1 0 0

I'm not sure where the "2 connect attempts left" is coming from as I indicated fetch-attempts=1 is the fstab.

Still we can see that DNS resolution is failing, may be indicating that the network is not really up when mountall is trying to mount glusterfs filesytems.

Setting fetch-attempts=3 in the fstab managed to have all filesystem properly mounted, including glusterfs.

So this tell us a few things:
- network does not seem to be up when mountall try to mount glustersfs filesystems (at least DNS is failing)
- there may be a missing dependency on having local fs up before trying to mount remote ones, because obviously the "log" file that could help in debugging this issue is not available for write when mountall (thus glusterfs) tries to write to it. (this is fixed by not separating /var/log but on a server may be dangerous then)

Hope this helps
--
Bruno

Revision history for this message
Steve Langasek (vorlon) wrote :

Bruno, could you please attach the boot.log in question? I need to see the entirety of what mountall is doing to understand what's really going wrong. It's normal that mountall will attempt to mount all filesystems ASAP after boot, which means sometimes it tries to mount them before the network is up; then it will retry the mount for each network interface that's brought up. So seeing the full log will show which part isn't working correctly.

One possibility is that glusterfs is taking so long to fail to mount, that between the time the mount fails and it actually notifies mountall that it /has/ failed, all the network interface events have already gone past. But

> The only thing is that actually my /var/log is a dedicated partition
> like often on servers.

I don't think this is a very common configuration at all. I don't see any reason to have /var/log separate from the rest of /var. On the other hand, I'm not aware of any services that would currently have problems with this except for glusterfs.

Revision history for this message
Zach Bethel (zach-bethel) wrote :
Download full text (28.1 KiB)

I'm getting this same behavior on Ubuntu 12.04 LTS with an ext4 filesystem (iSCSI volume).

I am repeatedly seeing events like this sprinkled throughout my boot.log file.

mount: special device /dev/mapper/mpath2-part1 does not exist
mount: special device /dev/mapper/mpath3-part1 does not exist
mount: special device /dev/mapper/mpath5-part1 does not exist
mountall: mount /export/users [1133] terminated with status 32
mountall: mount /export/snapshots/users [1135] terminated with status 32
mountall: mount /export/faculty [1138] terminated with status 32
mount: special device /dev/mapper/mpath1-part1 does not exist
mount: special device /dev/mapper/mpath4-part1 does not exist
mountall: mount /export/snapshots/faculty [1140] terminated with status 32
mountall: mount /export/projects [1142] terminated with status 32
mount: special device /dev/mapper/mpath2-part1 does not exist
mount: special device /dev/mapper/mpath3-part1 does not exist
mount: special device /dev/mapper/mpath5-part1 does not exist
mountall: mount /export/users [1175] terminated with status 32
mountall: mount /export/snapshots/users [1179] terminated with status 32
mountall: mount /export/faculty [1184] terminated with status 32
mount: special device /dev/mapper/mpath1-part1 does not exist
mount: special device /dev/mapper/mpath4-part1 does not exist
mount: special device /dev/mapper/mpath2-part1 does not exist
mountall: mount /export/snapshots/faculty [1185] terminated with status 32
mountall: mount /export/projects [1192] terminated with status 32
mountall: mount /export/users [1201] terminated with status 32
mount: special device /dev/mapper/mpath3-part1 does not exist
mountall: mount /export/snapshots/users [1208] terminated with status 32
mount: special device /dev/mapper/mpath5-part1 does not exist
mountall: mount /export/faculty [1216] terminated with status 32
mount: special device /dev/mapper/mpath2-part1 does not exist
mount: special device /dev/mapper/mpath5-part1 does not exist
mount: special device /dev/mapper/mpath3-part1 does not exist
mount: special device /dev/mapper/mpath1-part1 does not existmountall: mount /export/users [1265] terminated with status 32
mountall: mount /export/snapshots/users [1270] terminated with status 32
mountall: mount /export/faculty [1271] terminated with status 32

The offending lines in my /etc/fstab look like this:

/dev/mapper/mpath2-part1 /export/users ext4 defaults,user_xattr,acl,barrier=1,errors=remount-ro,_netdev 0 0
/dev/mapper/mpath3-part1 /export/snapshots/users ext4 defaults,user_xattr,acl,barrier=1,errors=remount-ro,_netdev 0 0
/dev/mapper/mpath5-part1 /export/faculty ext4 defaults,quota,user_xattr,acl,barrier=1,errors=remount-ro,_netdev 0 0
/dev/mapper/mpath1-part1 /export/snapshots/faculty ext4 defaults,user_xattr,acl,barrier=1,errors=remount-ro,_netdev 0 0
/dev/mapper/mpath4-part1 /export/projects ext4 defaults,quota,user_xattr,acl,barrier=1,errors=remount-ro,_netdev 0 0

As you can see, _netdev is specified and is being ignored. The filesystems do mount after the network is up, but not without lots of nasty errors.

Here's the boot.log file with mountall set to...

Revision history for this message
Steve Langasek (vorlon) wrote :

On Mon, Apr 15, 2013 at 11:07:10PM -0000, Zach Bethel wrote:
> As you can see, _netdev is specified and is being ignored.

Incorrect. _netdev is being *respected*: _netdev does not tell the system
anything about *which* network device a filesystem mount depends on, so
mountall retries the mount after each network interface is brought up.

> The filesystems do mount after the network is up, but not without lots of
> nasty errors.

That is, therefore, unrelated to this bug report.

Revision history for this message
Juan Pavlik (jjpavlik) wrote :

Hi guys, i have the exact same problem but with ocfs2. mountall tries to mount my remote volume before o2cb is running, so it fails. There's a really nasty workaround adding /etc/init.d/o2cb start in the mountall-net.conf file, like this:

# mountall-net - Mount network filesystems
#
# Send mountall the USR1 signal to inform it to try network filesystems
# again.

description "Mount network filesystems"

start on net-device-up

task

script
    /etc/init.d/o2cb start
    PID=$(status mountall 2>/dev/null | sed -e '/start\/running,/{s/.*,[^0-9]*//;q};d')
    [ -n "$PID" ] && kill -USR1 $PID || true
end script

It works, but i'd like to avoid doing these things. I think that this bug, is really close to this one https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/474215?comments=all , eventhough gluster an ocfs2 are different things.

Revision history for this message
Bruno MACADRE (bruno-macadre) wrote :

I create a bug (#1205075) about mountall that doesn't generate good mount commande line for glusterfs. My Bug was marked duplicate of this one.

The final problem was the same, but the clues I found are pretty different.

More, this bug has importance=low but for me it would be CRITICAL 'cause my server doesn't start anymore ....

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for mountall (Ubuntu) because there has been no activity for 60 days.]

Changed in mountall (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Louis Zuckerman (semiosis) wrote :

You should always use 'nobootwait' on remote mounts so that the server can boot even when there is some problem with the remote mount.

Drop in to #gluster on Freenode IRC and ping me, semiosis, for help troubleshooting this.

Thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.