Kernel Oops - unable to handle kernel paging request at fffffffffffffff8; EIP is at gss_wrap_req_priv.isra.10+0x183/0x2d0

Bug #1559563 reported by aschmitz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

This is a kernel oops that appears to occur under heavy load. Unfortunately, it is not easy to reproduce, and I haven't been able to isolate whether it's associated with NFS load, CPU load, both, or neither. (NFS appears to be relevant, I'm using NFS4 with Kerberos in krb5p, and the oops is in GSS code. I am not using Kerberos for login or other features.)

The oops output is attached. Nothing occurs in the console output for hours before it, and I suspect the second oops is related to the first one in some way. Shortly afterward, soft lockups on all other cores (including those on a physically separate processor) make the computer unresponsive.

The issue appears to be in net/sunrpc/auth_gss/auth_gss.c's gss_wrap_req_priv, in the following line:
    tmp = page_address(rqstp->rq_enc_pages[rqstp->rq_enc_pages_num - 1]);

It appears that rqstp is set, but both rqstp->rq_enc_Pages and rqstp->rq_enc_pages_num are zero, leading the kernel to attempt dereferencing an invalid address. Because this surfaces only under load, and is on a multiple-processor system, I'm inclined to guess there is a race condition between accesses to rqstp on different cores, but I haven't been able to determine where at this point.

Unfortunately, /proc/version_signature isn't currently available as the computer got updated to a 3.13.0-77-generic upon restart, but this was running on a stock "3.13.0-76-generic #120-Ubuntu" kernel on 14.04/trusty.

Tags: trusty
Revision history for this message
aschmitz (aschmitz-) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1559563

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
aschmitz (aschmitz-) wrote :

The kernel currently running on the hardware is the -77 revision, not the -76 revision, so I believe apport-collect would be misleading rather than helpful, so I'm marking this as confirmed. If you require additional information anyway, let me know and I can gather it.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.5 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5-wily/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.