Oops __d_lookup+0x88/0x194

Bug #1440536 reported by dann frazier
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Unknown
Unknown
linux (Ubuntu)
Invalid
Critical
Unassigned
Vivid
Invalid
Critical
Unassigned

Bug Description

This started happening on a Mustang board after upgrading to 3.19.0-9.9 and persists in 3.19.0-11.

ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: linux-image-3.19.0-11-generic 3.19.0-11.11
ProcVersionSignature: Ubuntu 3.19.0-11.11-generic 3.19.3
Uname: Linux 3.19.0-11-generic aarch64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Dec 31 1969 seq
 crw-rw---- 1 root audio 116, 33 Dec 31 1969 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.17-0ubuntu1
Architecture: arm64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/timer', '/dev/snd/seq'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Date: Sun Apr 5 08:32:28 2015
HibernationDevice: RESUME=UUID=014663f6-5135-4075-bf04-d2f42c4fc90b
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: console=ttyS0,115200n8 ro earlyprintk=uart8250-32bit,0x1c020000
RelatedPackageVersions:
 linux-restricted-modules-3.19.0-11-generic N/A
 linux-backports-modules-3.19.0-11-generic N/A
 linux-firmware 1.143
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
SystemImageInfo:
 current build number: 0
 device name: ?
 channel: daily
 last update: Unknown
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
dann frazier (dannf) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Tim Gardner (timg-tpi) wrote :

From what version did you upgrade ?

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1440536] Re: Oops __d_lookup+0x88/0x194

On Mon, Apr 6, 2015 at 6:26 AM, Tim Gardner <email address hidden> wrote:
> >From what version did you upgrade ?

It looks like 3.19.0-4.4. I'll downgrade to see if going back resolves
the issue and, if so, try and bisect it down.

Revision history for this message
Alberto Salvia Novella (es20490446e) wrote :

Which are the visible consequences of this bug?

When answered, please set this bug status back to "confirmed". Thank you.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
dann frazier (dannf) wrote :

It seems like a corruption issue. The machine is sometimes usable for short periods after this errors occurs, but is otherwise deadlocked.

I did go back to 3.19.0-4.4, and the problem is persisting. I also tried building a kernel from Linus' git tree and the problem is still there.

 My guess is that something changed outside of the kernel that is now triggering this bug. Perhaps it was the switch to systemd, or perhaps it is that I have a juju local provider deployment going on this system now - or a combination of the two.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Critical
tags: added: kernel-net
tags: added: kernel-bug-exists-upstream kernel-oops
Revision history for this message
Alberto Salvia Novella (es20490446e) wrote :

Please:
  1. Report to <https://bugzilla.kernel.org/>.
  2. Paste the new report URL here.
  3. Set this bug status back to "confirmed".

Thank you.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: asked-to-upstream
Changed in linux (Ubuntu Vivid):
status: Incomplete → Triaged
tags: added: kernel-da-key
Revision history for this message
Alberto Salvia Novella (es20490446e) wrote :

@ Joseph Salisbury

Could you explain your change?

Changed in linux (Ubuntu Vivid):
status: Triaged → Incomplete
Revision history for this message
dann frazier (dannf) wrote :
Download full text (6.2 KiB)

Diagnosing further, I've found that this happens when accessing a certain file:

dannf@mustang:~$ ls -l /usr/bin/qemu-system-aarch64
-rwxr-xr-x 1 root root 4998496 Mar 12 21:17 /usr/bin/qemu-system-aarch64
dannf@mustang:~$ ls -l /usr/lib/python2.7/six.pyc
[ 3586.472110] Unable to handle kernel paging request at virtual address 3ffffffffffffc
[ 3586.479819] pgd = ffffffc0fdebf000
[ 3586.483237] [3ffffffffffffc] *pgd=00000040fde7d003, *pud=00000040fde7d003, *pmd=0000000000000000
[ 3586.492027] Internal error: Oops: 96000004 [#13] SMP
[ 3586.496965] Modules linked in: xt_conntrack ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6t
[ 3586.539676] CPU: 5 PID: 1932 Comm: ls Tainted: G D 3.19.0-12-generic #12-Ubuntu
[ 3586.547982] Hardware name: APM X-Gene Mustang board (DT)
[ 3586.553266] task: ffffffc3e8a38c40 ti: ffffffc0fc68c000 task.ti: ffffffc0fc68c000
[ 3586.560712] PC is at __d_lookup_rcu+0x80/0x17c
[ 3586.565133] LR is at lookup_fast+0x4c/0x2f0
[ 3586.569293] pc : [<ffffffc000229eec>] lr : [<ffffffc00021ce68>] pstate: 00000145
[ 3586.576648] sp : ffffffc0fc68fb40
[ 3586.579942] x29: ffffffc0fc68fb40 x28: ffffffc0fc68c000
[ 3586.585245] x27: ffffffffffffffff x26: 0000000000000007
[ 3586.590546] x25: ffffffc3ed26a033 x24: ffffffc0fc68fd20
[ 3586.595848] x23: 0000000725cabf8e x22: ffffffc0fc68fbfc
[ 3586.601150] x21: ffffffc3ed53b780 x20: 003ffffffffffff8
[ 3586.606451] x19: 0040000000000000 x18: 0000000000000000
[ 3586.611754] x17: 00000000004292d0 x16: ffffffc000215d3c
[ 3586.617057] x15: ffffffffffffffff x14: 0000007faf35c16c
[ 3586.622360] x13: ffffffffffffffff x12: 0000000000000010
[ 3586.627662] x11: 0000000000000000 x10: fefefefefefefeff
[ 3586.632965] x9 : 0000007fca1ade6c x8 : 535f474458006379
[ 3586.638268] x7 : 0000000000000000 x6 : 0000000000000020
[ 3586.643571] x5 : 0000000000000000 x4 : ffffffc000b64f60
[ 3586.648872] x3 : 0000000000000015 x2 : ffffffc3eec00000
[ 3586.654174] x1 : 000000000000000b x0 : 00000000001ed681
[ 3586.659475]
[ 3586.660956] Process ls (pid: 1932, stack limit = 0xffffffc0fc68c058)
[ 3586.667275] Stack: (0xffffffc0fc68fb40 to 0xffffffc0fc690000)
[ 3586.672991] fb40: fc68fba0 ffffffc0 0021ce68 ffffffc0 fc68fd10 ffffffc0 00000000 00000000
[ 3586.681126] fb60: fc68fc70 ffffffc0 fc68fc60 ffffffc0 dfa06020 ffffffc3 ed53b780 ffffffc3
[ 3586.689260] fb80: 0000011a 00000000 0000004f 00000000 00b6e000 ffffffc0 c3e4c240 ffffffbe
[ 3586.697395] fba0: fc68fc00 ffffffc0 0021f578 ffffffc0 fc68fd10 ffffffc0 00000000 00000000
[ 3586.705529] fbc0: 00000040 00000000 ed26a020 ffffffc3 ffffff9c 00000000 00000015 00000000
[ 3586.713663] fbe0: 0000011a 00000000 ed26a020 ffffffc3 ffffff9c 00000000 000994ec ffffffc0
[ 3586.721798] fc00: fc68fc90 ffffffc0 0021fbe4 ffffffc0 ed26a000 ffffffc3 fc68fd10 ffffffc0
[ 3586.729932] fc20: ed26a000 ffffffc3 00000000 00000000 ffffff9c 00000000 00000015 00000000
[ 3586.738066] fc40: 0000011a 00000000 0000004f 00000000 00b6e000 ffffffc0 ca1ade4c 0000007f
[ 3586.746201] fc60: 00000000 00000000 00000000 00000000 fc68fe20 ffffffc0 00000015 00000000
[ 3586.754335] fc80: fc68fcd0 ffffffc0 0022133c ffffffc0 fc68fcd0 ffffffc0 0022...

Read more...

Revision history for this message
dann frazier (dannf) wrote :
Changed in linux (Ubuntu Vivid):
status: Incomplete → Confirmed
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.19.0-13.13)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get dist-upgrade

If the bug still exists, change the bug status from Incomplete to New. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux:
status: New → Incomplete
tags: added: kernel-request-3.19.0-13.13
Revision history for this message
dann frazier (dannf) wrote :

Tested w/ 4.0, still exists.

Changed in linux (Ubuntu Vivid):
status: Confirmed → New
status: New → Confirmed
Changed in linux (Ubuntu Vivid):
status: Confirmed → Triaged
Changed in linux:
importance: Undecided → Unknown
status: Incomplete → Unknown
Revision history for this message
dann frazier (dannf) wrote :

I'm going to mark this as invalid for the following reasons:

 - I reinstalled the system and the issue persisted.
 - I was unable to reproduce on another, identically configured system.
 - After removing one of the two DIMMs, the system is no longer seeing the problem.

This all points to a localized hardware failure.

Changed in linux (Ubuntu):
status: Triaged → Invalid
Changed in linux (Ubuntu Vivid):
status: Triaged → Invalid
Revision history for this message
dann frazier (dannf) wrote :

Reopening as this has now been seen on 2 other systems.

Changed in linux (Ubuntu Vivid):
status: Invalid → Confirmed
Changed in linux (Ubuntu):
status: Invalid → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Changed in linux (Ubuntu Vivid):
status: Confirmed → Triaged
Revision history for this message
Ming Lei (tom-leiming) wrote :

Looks there was similar report from upstream:

http://marc.info/?l=linux-fsdevel&m=142865378923064&w=2

but still don't have resolution or further report.

Also I tried to use fbench to create lots of files, list dirs concurently for reproducing
the issue, but can't reproduce it yet.

Revision history for this message
Ming Lei (tom-leiming) wrote :

BTW, looks it isn't related with specific filesystem, and from the recent triger, it happened when
walking path inside proc filesystem:

[24993.562923] Call trace:
[24993.565357] [<ffffffc00022a278>] __d_lookup+0x88/0x194
[24993.570467] [<ffffffc00022a3bc>] d_lookup+0x38/0x64
[24993.575319] [<ffffffc00022a43c>] d_hash_and_lookup+0x54/0x6c
[24993.580948] [<ffffffc00027e1a4>] proc_flush_task+0xa8/0x1a8
[24993.586491] [<ffffffc0000b6920>] release_task+0x5c/0x46c
[24993.591774] [<ffffffc0000b6e70>] wait_task_zombie+0x140/0x67c
[24993.597489] [<ffffffc0000b7734>] wait_consider_task+0x388/0x670
[24993.603377] [<ffffffc0000b7b04>] do_wait+0xe8/0x234
[24993.608229] [<ffffffc0000b8cf4>] SyS_wait4+0x7c/0xe8

Also from the link in #15, looks it is still triggered on arm64 from the register information
in the log.

So it might be one arm64 dependent issue.

Revision history for this message
Ming Lei (tom-leiming) wrote :

From dann's reports:

1) system1
Code: 14000012 f9400273 b4000213 d1002274 (b9402280)

2) system2
Code: 14000012 f9400273 b4000213 d1002274 (b9402280)

And the upstem report in #15,
Code: 14000003 f9400273 b4000213 d1002274 (b9402282)

The code snippet should be the following in __d_lookup(): fs/dcache.c

ffffffc0001aae68: 14000003 b ffffffc0001aae74 <__d_lookup+0x84>
         *
         * See Documentation/filesystems/path-lookup.txt for more details.
         */
        rcu_read_lock();

        hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
ffffffc0001aae6c: f9400273 ldr x19, [x19]
ffffffc0001aae70: b40001f3 cbz x19, ffffffc0001aaeac <__d_lookup+0xbc>

                if (dentry->d_name.hash != hash)
ffffffc0001aae74: d1002274 sub x20, x19, #0x8
ffffffc0001aae78: b9402282 ldr w2, [x20,#32] #faulted instruction

Then the problem should be caused by bad pointer of dcache.

Revision history for this message
Riku Voipio (riku-voipio) wrote :

This has also been reproduced on wilu 4.2.0-16-generic #19-Ubuntu

Revision history for this message
Ming Lei (tom-leiming) wrote :

Riku,

Did you reproduce the issue with UEFI booting or U-boot booting? And it is on Mustang?

Thanks,

Revision history for this message
Loc Ho (lho-m) wrote :

This is an known issue with FW build number 1.15.20 or below. This issue is resolved with FW build 1.15.22 or above. This FW is currently under testing and will be release when ready for Mustang board.

-Loc

Revision history for this message
Fathi Boudra (fboudra) wrote :

Ming Lei,

yes, on Mustang. We're using U-Boot.

Revision history for this message
Ming Lei (tom-leiming) wrote : Re: [Bug 1440536] Re: Oops __d_lookup+0x88/0x194

On Tue, Oct 27, 2015 at 11:03 PM, Fathi Boudra <email address hidden> wrote:
> Ming Lei,
>
> yes, on Mustang. We're using U-Boot.

OK, we found the issue is triggered during booting, and finally
APM's fix on firmware can make the issue disappeared, but
it isn't released yet.

>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1440536
>
> Title:
> Oops __d_lookup+0x88/0x194
>
> Status in Linux:
> Unknown
> Status in linux package in Ubuntu:
> Triaged
> Status in linux source package in Vivid:
> Triaged
>
> Bug description:
> This started happening on a Mustang board after upgrading to
> 3.19.0-9.9 and persists in 3.19.0-11.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 15.04
> Package: linux-image-3.19.0-11-generic 3.19.0-11.11
> ProcVersionSignature: Ubuntu 3.19.0-11.11-generic 3.19.3
> Uname: Linux 3.19.0-11-generic aarch64
> AlsaDevices:
> total 0
> crw-rw---- 1 root audio 116, 1 Dec 31 1969 seq
> crw-rw---- 1 root audio 116, 33 Dec 31 1969 timer
> AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
> ApportVersion: 2.17-0ubuntu1
> Architecture: arm64
> ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
> AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/timer', '/dev/snd/seq'] failed with exit code 1:
> CRDA: Error: [Errno 2] No such file or directory: 'iw'
> Date: Sun Apr 5 08:32:28 2015
> HibernationDevice: RESUME=UUID=014663f6-5135-4075-bf04-d2f42c4fc90b
> IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
> Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
> PciMultimedia:
>
> ProcEnviron:
> TERM=xterm
> PATH=(custom, no user)
> XDG_RUNTIME_DIR=<set>
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> ProcFB:
>
> ProcKernelCmdLine: console=ttyS0,115200n8 ro earlyprintk=uart8250-32bit,0x1c020000
> RelatedPackageVersions:
> linux-restricted-modules-3.19.0-11-generic N/A
> linux-backports-modules-3.19.0-11-generic N/A
> linux-firmware 1.143
> RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
> SourcePackage: linux
> SystemImageInfo:
> current build number: 0
> device name: ?
> channel: daily
> last update: Unknown
> UpgradeStatus: No upgrade log present (probably fresh install)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/1440536/+subscriptions

Revision history for this message
Fathi Boudra (fboudra) wrote :

I flashed 1.15.22 firmware and indeed it helped. I don't observe the oops anymore.

dann frazier (dannf)
Changed in linux (Ubuntu):
status: Triaged → Invalid
Changed in linux (Ubuntu Vivid):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.