Need to apply patch: "ARM: Do not call flush_cache_user_range with mmap_sem held"

Bug #965840 reported by Olof Johansson
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Andy Whitcroft
Natty
Won't Fix
Undecided
Unassigned
Oneiric
Fix Released
Undecided
Unassigned
Precise
Fix Released
Undecided
Andy Whitcroft
linux-ac100 (Ubuntu)
New
Undecided
Unassigned
Oneiric
Won't Fix
Undecided
Unassigned
Precise
Won't Fix
Undecided
Unassigned
linux-linaro-lt-mx5 (Ubuntu)
New
Undecided
Unassigned
Oneiric
Won't Fix
Undecided
Unassigned
Precise
Won't Fix
Undecided
Unassigned
linux-ti-omap4 (Ubuntu)
Fix Released
Undecided
Andy Whitcroft
Natty
Won't Fix
Undecided
Unassigned
Oneiric
Fix Released
Undecided
Unassigned
Precise
Fix Released
Undecided
Andy Whitcroft

Bug Description

There's a well-known bug on ARM that if you have an SMP system, you can hang a process by doing a cache flush syscall at the same time as a page fault. It's a known issue, with a patch posted that unfortunately breaks pre-ARMv6 systems so it has not yet been applied.

Many other projects, such as Android and Chrome OS, have applied this locally in their tree. The patch is good and valid for ARMv6 and v7 systems since they have the appropriate exception handling tables in the page flush code. Until that's added to v5 and before though, the patch can't be upstreamed.

Please pick this up for Ubuntu on the v7 platforms, Panda in particular. I have a user here that can very reliably hit this when running a specific workload, and I would prefer if he could keep running a distro kernel instead of a locally built one.

Patch is at:

http://marc.info/?l=linux-arm-kernel&m=132068730012063&w=2

Note that the discussion went in a tangent -- the patch is valid for v6 and v7, and some of the misunderstanding in the discussion was because of this.

Thanks!

-Olof

Revision history for this message
Adam Conrad (adconrad) wrote :

I'm told via IRC that this is theoretically triggerable (though less frequently) on UP systems, so adding tasks for omap and mx5 kernels as well.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 965840

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Adam Conrad (adconrad)
no longer affects: linux-ac100 (Ubuntu Natty)
no longer affects: linux-linaro-lt-mx5 (Ubuntu Natty)
Revision history for this message
Olof Johansson (olof) wrote :

I'm not local on the machine in question, so I can't collect the logs. And I see no way to mark this as Confirmed on my own, I'm guessing I lack permission to do so?

Revision history for this message
Olof Johansson (olof) wrote :

Ah, there it was. Horrible UI.

Changed in linux (Ubuntu Precise):
status: Incomplete → Confirmed
Andy Whitcroft (apw)
tags: added: bot-quit-nagging
Andy Whitcroft (apw)
Changed in linux (Ubuntu Precise):
assignee: nobody → Andy Whitcroft (apw)
status: Confirmed → In Progress
Revision history for this message
Andy Whitcroft (apw) wrote :

Ok. I've added the suggested patch to an 'omap' build from precise master:
    http://people.canonical.com/~apw/lp965840-precise/

And to an omap4 build from ti-omap4:
    http://people.canonical.com/~apw/lp965840+ti-omap4-precise/

If those of you with the h/w could at least confirm they boot ok. If you know how to reproduce this then if you could confirm it also fixes things then great. Please report any testing back here. Thanks.

Changed in linux-ti-omap4 (Ubuntu Precise):
assignee: nobody → Andy Whitcroft (apw)
status: New → In Progress
Revision history for this message
Derek Schuff (dschuff) wrote :

I tested the omap4 kernel deb on a pandaboard running Oneiric; it boots, and the small reproducer we have seems to work. I've started up our full bot suite on it, will let you know.

Revision history for this message
Mark Seaborn (mrs) wrote :

For context, we hit the bug when testing Native Client's dynamic code loading support; the issue tracking this for Native Client is http://code.google.com/p/nativeclient/issues/detail?id=2678.

Attached is a small test case that reproduces the hang.

Revision history for this message
Derek Schuff (dschuff) wrote :

Update: As far as I can tell the patched kernel works with respect to this particular bug; I just can't use it because the board seems to lock up (for presumably unrelated reasons) when I leave it running overnight. no output on the serial console even, just stops responding and has to be reset.

Revision history for this message
Adam Conrad (adconrad) wrote :

Derek, can you confirm that the lockups are unrelated to this patch by installing the precise 1411 kernel from the archive and make sure it also locks up similarly for you?

Paolo Pisati (p-pisati)
Changed in linux-ti-omap4 (Ubuntu Natty):
status: New → Won't Fix
Revision history for this message
Paolo Pisati (p-pisati) wrote :

commit 4340608237cd7dc7437f7edaeceaccf1672104dc "ARM: 7409/1: Do not call flush_cache_user_range with mmap_sem held"

and

commit 99a98086bb950128e611aefa8d042caf81954f00 "ARM: 7365/1: drop unused parameter from flush_cache_user_range"

are both present in origin/master (and [OP]/omap4 are rebased trees), Natty is EOL and Q got it from upstream (the two patches were upstreamed).

Tester didn't report any problem in 6 months, closing here.

Changed in linux (Ubuntu Natty):
status: New → Won't Fix
Changed in linux (Ubuntu Precise):
status: In Progress → Fix Released
Changed in linux (Ubuntu Oneiric):
status: New → Fix Released
Changed in linux-ti-omap4 (Ubuntu Oneiric):
status: New → Fix Released
Changed in linux-ti-omap4 (Ubuntu Precise):
status: In Progress → Fix Released
Paolo Pisati (p-pisati)
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux-ti-omap4 (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Rolf Leggewie (r0lf) wrote :

oneiric has seen the end of its life and is no longer receiving any updates. Marking the oneiric task for this ticket as "Won't Fix".

Changed in linux-ac100 (Ubuntu Oneiric):
status: New → Won't Fix
Changed in linux-linaro-lt-mx5 (Ubuntu Oneiric):
status: New → Won't Fix
Revision history for this message
Steve Langasek (vorlon) wrote :

The Precise Pangolin has reached end of life, so this bug will not be fixed for that release

Changed in linux-ac100 (Ubuntu Precise):
status: New → Won't Fix
Changed in linux-linaro-lt-mx5 (Ubuntu Precise):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.