Fixup tlbie vs store ordering issue on POWER9

Bug #1848193 reported by bugproxy on 2019-10-15
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Medium
Canonical Kernel Team
linux (Ubuntu)
Medium
Manoj Iyer
Eoan
Medium
Unassigned

Bug Description

== Comment: #0 - Michael Ranweiler - 2019-10-07 01:52:38 ==
On POWER9, under some circumstances, a broadcast TLB invalidation will fail to invalidate the ERAT cache on some threads when there are parallel mtpidr/mtlpidr happening on other threads of the same core. This can cause stores to continue to go to a page after it's unmapped.

The workaround is to force an ERAT flush using PID=0 or LPID=0 tlbie flush. This additional TLB flush will cause the ERAT cache invalidation. Since we are using PID=0 or LPID=0, we don't get filtered out by the TLB snoop filtering logic.

We need to still follow this up with another tlbie to take care of store vs tlbie ordering issue explained in commit: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9"). The presence of ERAT cache implies we can still get new stores and they may miss store queue marking flush.

The upstream git commits to resolve the problem:

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=677733e296b5c7a37c47da391fc70a43dc40bd67

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=09ce98cacd51fcd0fa0af2f79d1e1d3192f4cbb0

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=047e6575aec71d75b765c22111820c4776cd1c43

bugproxy (bugproxy) on 2019-10-15
tags: added: architecture-ppc64le bugnameltc-181791 severity-medium targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Medium
Manoj Iyer (manjo) on 2019-10-18
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Manoj Iyer (manjo)
importance: Undecided → Medium
Manoj Iyer (manjo) wrote :

IBM can you please test the Eoan kernel in this PPA: https://launchpad.net/~ubuntu-power-triage/+archive/ubuntu/lp1848193/

Also, please share the following information with us:

1. Your testcase, and instructions on how to run the test cases.
2. Your system config information and firmware levels.
3. Is there any significant customer impact if this is not backported to earlier releases?

Manoj Iyer (manjo) on 2019-10-21
Changed in linux (Ubuntu):
status: New → In Progress
Changed in ubuntu-power-systems:
status: New → In Progress
Manoj Iyer (manjo) on 2019-10-22
Changed in linux (Ubuntu):
status: In Progress → Incomplete
Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Manoj Iyer (manjo) wrote :

IBM, any progress on testing results ?

------- Comment From <email address hidden> 2019-12-16 05:06 EDT-------
I tested the PPA kernel and it works. I did try a before kernel to verify.

You can test this out with the testcase from selftests - tools/testing/selftests/powerpc/mm/tlbie_test. I've used a witherspoon with OP920. The end result of this can be unexpected results.

Andrew Cloke (andrew-cloke) wrote :

Thanks for testing the PPA kernel. Moving back to "triaged".

Changed in ubuntu-power-systems:
status: Incomplete → Triaged
Frank Heimes (fheimes) on 2019-12-16
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Manoj Iyer (manjo) wrote :
Changed in linux (Ubuntu Eoan):
importance: Undecided → Medium
status: New → Fix Released
Changed in linux (Ubuntu):
status: Triaged → Fix Released
Changed in ubuntu-power-systems:
status: Triaged → Fix Released
Manoj Iyer (manjo) wrote :

$ uname -a
Linux bobone 5.3.0-24-generic #26-Ubuntu SMP Thu Nov 14 01:12:58 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

$ sudo ./tlbie_test
start_cpu list:0
number of worker threads:4 + 1 snapshot thread
Allocated address:0x000075e535810000 + secondary map:0x000075e535570000
logdir at : /tmp/logdir-26561
Timeout: 1200 seconds
=================================
     Starting Test
     Mon Dec 16 16:59:06 2019
=================================

$

Frank Heimes (fheimes) wrote :

Cool, many thx Manoj!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers