Kernel BUG with multiple NFS4 kerberos mounts on boot

Bug #1365485 reported by Jani Jaakkola
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Currently latest Ubuntu Linux kernel image has a a bug, probably a race condition, which happens when there are multiple kerberos nfs4 mounts in /etc/fstab. This does not happen on every boot, so to reproduce this you probably need a few retries. This happens using the current Ubuntu Linux kernel in 14.04:

# cat /proc/version_signature
Ubuntu 3.13.0-35.62-generic 3.13.11.6

Apparently you need to have _multiple_ kerberos NFS4 mounts in /etc/fstab to trigger this:

xxxxxx.helsinki.fi:/root_vdm_3/fshome1/u1 /home/ad/fshome1/u1 nfs4 sec=krb5,rw,bg,hard 0 0
xxxxxx.helsinki.fi:/root_vdm_3/fshome2/u2 /home/ad/fshome2/u2 nfs4 sec=krb5,rw,bg,hard 0 0
xxxxxx.helsinki.fi:/root_vdm_3/fshome3/u3 /home/ad/fshome3/u3 nfs4 sec=krb5,rw,bg,hard 0 0
xxxxxx.helsinki.fi:/root_vdm_3/fshome4/u4 /home/ad/fshome4/u4 nfs4 sec=krb5,rw,bg,hard 0 0
xxxxxx.helsinki.fi:/root_vdm_3/fshome5/u5 /home/ad/fshome5/u5 nfs4 sec=krb5,rw,bg,hard 0 0
xxxxxx.helsinki.fi:/root_vdm_3/fshome6/u6 /home/ad/fshome6/u6 nfs4 sec=krb5,rw,bg,hard 0 0

 When this happens we get a kernel stack trace (complete trace included), which starts like this:

[ 19.999751] gss_pipe_downcall: bad return from gss_fill_context: -4
[ 19.999779] ------------[ cut here ]------------
[ 19.999791] kernel BUG at /build/buildd/linux-3.13.0/net/sunrpc/auth_gss/auth_gss.c:735!
[ 19.999796] invalid opcode: 0000 [#1] SMP
[ 19.999802] Modules linked in: arc4(+) des_generic cmac xcbc nfsv4 rmd160 crypto_null af_key xfrm_algo dm_crypt snd_hda_codec_realtek gpio_ich hp_wmi sparse_keymap snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi coretemp dm_multipath snd_seq kvm_intel scsi_dh kvm snd_seq_device bnep serio_raw rfcomm snd_timer bluetooth lpc_ich snd soundcore tpm_infineon mei_me mei mac_hid parport_pc ppdev lp parport binfmt_misc rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache dm_mirror dm_region_hash dm_log hid_generic usbhid hid nouveau mxm_wmi video i2c_algo_bit ttm e1000e drm_kms_helper ahci psmouse libahci drm wmi ptp pps_core

When this has happened, rpc.gssd gets stuck in D state:

# ps aux|grep gssd
root 452 0.0 0.0 0 0 ? Ds 13:18 0:00 [rpc.gssd]

Also NFS4 mounts will fail, with an error message which does not tell what is actually going on:

root@do0-kukad211-07:~# mount -a -t nfs4
mount.nfs4: access denied by server while mounting xxxx.helsinki.fi:/root_vdm_3/fshome1/u1
mount.nfs4: access denied by server while mounting xxxx.helsinki.fi:/root_vdm_3/fshome2/u2
mount.nfs4: access denied by server while mounting xxxx.helsinki.fi:/root_vdm_3/fshome3/u3
mount.nfs4: access denied by server while mounting xxxx.helsinki.fi:/root_vdm_3/fshome4/u4
mount.nfs4: access denied by server while mounting xxxx..helsinki.fi:/root_vdm_3/fshome5/u5
mount.nfs4: access denied by server while mounting xxxx.helsinki.fi:/root_vdm_3/fshome6/u6

This also happened on Ubuntu 12.04, so the bug is probably old. There is a bug report, which is (IMHO) incorrectly reported against nfs-utils: https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1046762

We will fix this by removing the NFS mounts from fstab and doing them sequentially in startup scripts, but it would be nice if the kernel race would be fixed too.
---
ApportVersion: 2.14.1-0ubuntu3.3
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: lightdm 2698 F.... pulseaudio
 /dev/snd/seq: timidity 2607 F.... timidity
CRDA: Error: [Errno 2] No such file or directory
CurrentDmesg:
 [ 23.980615] init: gdm main process (1881) killed by TERM signal
 [ 25.395448] init: plymouth-upstart-bridge main process ended, respawning
 [ 25.402287] init: plymouth-upstart-bridge main process (2248) terminated with status 1
 [ 25.402298] init: plymouth-upstart-bridge main process ended, respawning
 [ 27.805918] init: plymouth-stop pre-start process (2612) terminated with status 1
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=UUID=c7fe0bc6-4712-4a0e-9f10-9ade6280a12e
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
MachineType: Hewlett-Packard HP Compaq 8000 Elite CMT PC
Package: linux (not installed)
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-35-generic root=UUID=268aac80-328a-4ab2-9e21-f38b9897ba79 ro adhome quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 3.13.0-35.62-generic 3.13.11.6
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-35-generic N/A
 linux-backports-modules-3.13.0-35-generic N/A
 linux-firmware 1.127.5
RfKill:

Tags: trusty
Uname: Linux 3.13.0-35-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 10/22/2009
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: 786G7 v01.02
dmi.board.asset.tag: CZC042D2X6
dmi.board.name: 3647h
dmi.board.vendor: Hewlett-Packard
dmi.chassis.asset.tag: CZC042D2X6
dmi.chassis.type: 6
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-Packard:bvr786G7v01.02:bd10/22/2009:svnHewlett-Packard:pnHPCompaq8000EliteCMTPC:pvr:rvnHewlett-Packard:rn3647h:rvr:cvnHewlett-Packard:ct6:cvr:
dmi.product.name: HP Compaq 8000 Elite CMT PC
dmi.sys.vendor: Hewlett-Packard

Revision history for this message
Jani Jaakkola (jj-lousa) wrote :
Revision history for this message
Mikko Rauhala (mjr-iki) wrote :

I'll just chime in (from the same site as Jani) that lacking a proper fix, it would be nice if the boot-time automounting worked around this by default by forgoing parallel mounts, or have the facilities to be instructed to do so.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1365485

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
Jani Jaakkola (jj-lousa) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Jani Jaakkola (jj-lousa) wrote : BootDmesg.txt

apport information

Revision history for this message
Jani Jaakkola (jj-lousa) wrote : Lspci.txt

apport information

Revision history for this message
Jani Jaakkola (jj-lousa) wrote : Lsusb.txt

apport information

Revision history for this message
Jani Jaakkola (jj-lousa) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Jani Jaakkola (jj-lousa) wrote : ProcEnviron.txt

apport information

Revision history for this message
Jani Jaakkola (jj-lousa) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Jani Jaakkola (jj-lousa) wrote : ProcModules.txt

apport information

Revision history for this message
Jani Jaakkola (jj-lousa) wrote : UdevDb.txt

apport information

Revision history for this message
Jani Jaakkola (jj-lousa) wrote : UdevLog.txt

apport information

Revision history for this message
Jani Jaakkola (jj-lousa) wrote : WifiSyslog.txt

apport information

Revision history for this message
Jani Jaakkola (jj-lousa) wrote :

I added the required information from another (identical) host with the same problem. Please ignore it, since the actual hardware used is not really relevant. We can reproduce this in other hardware too. This happens in classroom installations in University of Helsinki and has been going on for at least two years.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.17 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.17-rc3-utopic/

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.