kernel soft lockup on nfs server when using a kerberos mount
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
nfs-utils (Ubuntu) |
Confirmed
|
Medium
|
Unassigned |
Bug Description
The kernel seems to lock up when accessing a kerberos mounted nfs share with a user holding a kerberos ticket. This is in a virtualbox vm(run by vagrant), but it also occurs in a VMware vm from a standard ubuntu 14.04 server install disk.
To reproduce:
Join machine to windows active directory domain using sssd
Install nfs-kernel server, enable NEED_SVCGSSD
Enable NEED_GSSD for the client(can be same host as the server)
$ sudo mount -t nfs4 <fqdn>:/ /mnt/nfs -o sec=krb5
$ sudo ls -l /mnt/nfs # this works ok
$ kinit <a_domain_user>
$ ls /mnt/nfs
Permission denied error # I don't recall the exact wording
# wait a few moments, and the kernel starts reporting a soft lockup.
I think there are a few other things that trigger this error as well. It basically makes nfs+kerberos unusable.
I have the kernel crash dump, and the debugsyms installed so if there is any other information I can provide please let me know.
lsb_release -rd:
Description: Ubuntu 14.04.2 LTS
Release: 14.04
$ apt-cache policy nfs-common
nfs-common:
Installed: 1:1.2.8-6ubuntu1.1
Info from crash:
KERNEL: /usr/lib/
DUMPFILE: dump.201506181954 [PARTIAL DUMP]
CPUS: 2
DATE: Thu Jun 18 19:54:08 2015
UPTIME: 00:05:48
LOAD AVERAGE: 1.50, 0.45, 0.18
TASKS: 120
NODENAME: t-fileserver
RELEASE: 3.13.0-53-generic
VERSION: #89-Ubuntu SMP Wed May 20 10:34:39 UTC 2015
MACHINE: x86_64 (3581 Mhz)
MEMORY: 511.6 MB
PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
PID: 1353
COMMAND: "rpc.svcgssd"
TASK: ffff880014dce000 [THREAD_INFO: ffff88001514e000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
Backtrace:
PID: 1353 TASK: ffff880014dce000 CPU: 0 COMMAND: "rpc.svcgssd"
#0 [ffff88001fc03d18] machine_kexec at ffffffff8104ace2
#1 [ffff88001fc03d68] crash_kexec at ffffffff810e7423
#2 [ffff88001fc03e30] panic at ffffffff8171bcc4
#3 [ffff88001fc03ea8] watchdog_timer_fn at ffffffff8110dc85
#4 [ffff88001fc03ed8] __run_hrtimer at ffffffff8108e8c7
#5 [ffff88001fc03f18] hrtimer_interrupt at ffffffff8108f08f
#6 [ffff88001fc03f80] local_apic_
#7 [ffff88001fc03f98] smp_apic_
#8 [ffff88001fc03fb0] apic_timer_
--- <IRQ stack> ---
#9 [ffff88001514fd58] apic_timer_
[exception RIP: qword_addhex+176]
RIP: ffffffffa01c2df0 RSP: ffff88001514fe08 RFLAGS: 00000206
RAX: 0000000000000001 RBX: 0000000000000006 RCX: 00000000000001f6
RDX: ffff880015e54678 RSI: ffff88001514fe84 RDI: ffff88001514fe88
RBP: ffff88001514fe18 R8: ffff880015e57cf5 R9: 000000000000030b
R10: 0000000000000039 R11: 0000000000000027 R12: 0000000000000006
R13: ffffea0000422420 R14: ffffea00003e46e0 R15: ffff88001514fe98
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#10 [ffff88001514fe20] rsi_request at ffffffffa01f61bb [auth_rpcgss]
#11 [ffff88001514fe48] cache_read at ffffffffa01c5045 [sunrpc]
#12 [ffff88001514fec0] cache_read_procfs at ffffffffa01c51a1 [sunrpc]
#13 [ffff88001514fee8] proc_reg_read at ffffffff81224a6d
#14 [ffff88001514ff08] vfs_read at ffffffff811bdf55
#15 [ffff88001514ff40] sys_read at ffffffff811bea69
#16 [ffff88001514ff80] system_
RIP: 00007f85cf9f3810 RSP: 00007ffdba7dfe78 RFLAGS: 00000206
RAX: 0000000000000000 RBX: ffffffff8173391d RCX: ffffffffffffffff
RDX: 0000000000001000 RSI: 00000000008746f0 RDI: 0000000000000004
RBP: 00000000006083f8 R8: 0000000000000000 R9: 0000000000878820
R10: 00007f85cfcc67b8 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000004 R14: 0000000000608400 R15: 00000000008744b0
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
Log:
[ 348.084011] BUG: soft lockup - CPU#0 stuck for 23s! [rpc.svcgssd:1353]
[ 348.084011] Modules linked in: cts vboxsf(OX) nfsv4 rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache dm_crypt ip6t_REJECT ppdev xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_comment xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter parport_pc ip6_tables parport nf_conntrack_
[ 348.084011] CPU: 0 PID: 1353 Comm: rpc.svcgssd Tainted: G OX 3.13.0-53-generic #89-Ubuntu
[ 348.084011] Hardware name: innotek GmbH VirtualBox/
[ 348.084011] task: ffff880014dce000 ti: ffff88001514e000 task.ti: ffff88001514e000
[ 348.084011] RIP: 0010:[<
[ 348.084011] RSP: 0018:ffff880015
[ 348.084011] RAX: 0000000000000001 RBX: 0000000000000006 RCX: 00000000000001f6
[ 348.084011] RDX: ffff880015e54678 RSI: ffff88001514fe84 RDI: ffff88001514fe88
[ 348.084011] RBP: ffff88001514fe18 R08: ffff880015e57cf5 R09: 000000000000030b
[ 348.084011] R10: 0000000000000039 R11: 0000000000000027 R12: 0000000000000006
[ 348.084011] R13: ffffea0000422420 R14: ffffea00003e46e0 R15: ffff88001514fe98
[ 348.084011] FS: 00007f85d02f374
[ 348.084011] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 348.084011] CR2: 0000000000408000 CR3: 00000000152f7000 CR4: 00000000000006f0
[ 348.084011] Stack:
[ 348.084011] ffff88001478d580 ffff88001514fe88 ffff88001514fe40 ffffffffa01f61bb
[ 348.084011] ffff8800150b2840 0000000000001000 ffff88001f48f500 ffff88001514feb8
[ 348.084011] ffffffffa01c5045 0002000000000001 ffff88000c084a60 ffff880015c2bb60
[ 348.084011] Call Trace:
[ 348.084011] [<ffffffffa01f6
[ 348.084011] [<ffffffffa01c5
[ 348.084011] [<ffffffffa01c5
[ 348.084011] [<ffffffff81224
[ 348.084011] [<ffffffff811bd
[ 348.084011] [<ffffffff811be
[ 348.084011] [<ffffffff811d3
[ 348.084011] [<ffffffff81733
[ 348.084011] Code: e0 27 42 8d 44 20 30 41 88 40 fe 41 0f b6 c2 83 f8 0a 44 89 d8 0f 4c c3 41 83 e9 02 83 e9 01 46 8d 54 10 30 0f 95 c0 41 83 f9 01 <45> 88 50 ff 7f aa 45 85 c9 7f 1f 5b 41 5c 5d 41 b9 ff ff ff ff
[ 348.084011] Kernel panic - not syncing: softlockup: hung tasks
[ 348.084011] CPU: 0 PID: 1353 Comm: rpc.svcgssd Tainted: G OX 3.13.0-53-generic #89-Ubuntu
[ 348.084011] Hardware name: innotek GmbH VirtualBox/
[ 348.084011] 000000000000012d ffff88001fc03e28 ffffffff81722e1e ffffffff81a62b16
[ 348.084011] ffff88001fc03ea0 ffffffff8171bcbd 0000000000000008 ffff88001fc03eb0
[ 348.084011] ffff88001fc03e50 0000000000000086 0000000000000046 0000000000000007
[ 348.084011] Call Trace:
[ 348.084011] <IRQ> [<ffffffff81722
[ 348.084011] [<ffffffff8171b
[ 348.084011] [<ffffffff8110d
[ 348.084011] [<ffffffff8108e
[ 348.084011] [<ffffffff8110d
[ 348.084011] [<ffffffff8108f
[ 348.084011] [<ffffffff81043
[ 348.084011] [<ffffffff81735
[ 348.084011] [<ffffffff81734
[ 348.084011] <EOI> [<ffffffffa01c2
[ 348.084011] [<ffffffffa01f6
[ 348.084011] [<ffffffffa01c5
[ 348.084011] [<ffffffffa01c5
[ 348.084011] [<ffffffff81224
[ 348.084011] [<ffffffff811bd
[ 348.084011] [<ffffffff811be
[ 348.084011] [<ffffffff811d3
[ 348.084011] [<ffffffff81733
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: nfs-common 1:1.2.8-6ubuntu1.1
ProcVersionSign
Uname: Linux 3.13.0-53-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.11
Architecture: amd64
Date: Thu Jun 18 20:46:05 2015
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
XDG_RUNTIME_
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: nfs-utils
UpgradeStatus: No upgrade log present (probably fresh install)
Changed in nfs-utils (Ubuntu): | |
importance: | Undecided → Medium |
I worked around this by setting NO_AUTH_ DATA_REQUIRED on the userAccountControl attribute in ldap for the server account to prevent the PAC from being added to the kerberos ticket. I guess maybe when svcgssd gets a kerberos ticket that is too large it gets unhappy and stuck in a loop?
A few references: /lists. samba.org/ archive/ samba/2013- June/174045. html blog.evad. io/2014/ 11/04/kerberos- protected- nfs-with- active- directory- and-the- pac/
https:/
http://