[public] SBP (Static Branch Prediction) problem - disable it

Bug #1006217 reported by mahmoh on 2012-05-30
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Eilt project
Undecided
Ike Panhc
linux-armadaxp (Ubuntu)
Undecided
Jani Monoses

Bug Description

A very rare CPU issue in the SBP (Static Branch Prediction) causing wrong execution and was reproduced while doing intensive kernel builds natively.

mahmoh (mahmoh) wrote :
Changed in eilt:
assignee: nobody → Ike Panhc (ikepanhc)
visibility: private → public
summary: - SBP (Static Branch Prediction) problem - disable it
+ [public] SBP (Static Branch Prediction) problem - disable it
Seif Mazareeb (seif-l) wrote :

please apply this as well since this is an errata.

Tawfik Bayouk (tawfik) wrote :

I am adding some explanation about the issue in case it might explain other unexpected behaviors.
This issue was found during doing intensive native kernel builds and was seen in most cases as ICE (Internal compiler error).
It has also been found in a JAVA benchmarks and is was reproduced and root caused easily because it was consistent.

The root cause of the bug is a wrong branch decision in the Static branch prediction feature of the BPU.
The issue was root caused and reproduced in logical simulations.
As a workaround for this bug, the SBP feature was disabled (attached patch performs this).
No performance effect or hit is expected as well as no behavior change.
This bug is already fixed in the new silicone revision.

tags: added: patch
Ike Panhc (ikepanhc) on 2012-05-31
tags: added: ike-radar
Ike Panhc (ikepanhc) on 2012-06-04
Changed in linux-armadaxp (Ubuntu):
assignee: nobody → Ike Panhc (ikepanhc)
Changed in eilt:
status: New → In Progress
Changed in linux-armadaxp (Ubuntu):
status: New → In Progress
Li Li (lli5) wrote :

@Ike, As the vendor has confirmed this is a CPU related issue and found the root cause already, I think we should merge this into the kernel tree, with a comment stating this should be removed later (once the old silicon is replaced by the new one). What's your thought?

Jani Monoses (jani) wrote :

Sorry, I assigned myself the wrong bug :(

Changed in linux-armadaxp (Ubuntu):
assignee: Ike Panhc (ikepanhc) → Jani Monoses (jani)
Changed in eilt:
assignee: Ike Panhc (ikepanhc) → Jani Monoses (jani)
assignee: Jani Monoses (jani) → Ike Panhc (ikepanhc)
Jani Monoses (jani) wrote :

Added patch to the oem-armadaxp.git tree on kernel.ubuntu.com

Adam Conrad (adconrad) wrote :

This has been in precise-proposed for quite a while now, though it was obviously not accepted in any formal SRUish way, or the bug status would reflect that. Can you guys verify that the binaries do indeed pass some regression testing, as well as fix the bug at hand and update the bug tag as appropriate?

In the future, could you perhaps adopt the kernel team's concept of "tracking bugs" for your SRUs? (See https://bugs.launchpad.net/bugs/1002329 as an example), they really make it much simpler for people to see what tasks are done or need doing so we can get updates through the pipe as efficiently as we can.

Changed in linux-armadaxp (Ubuntu):
status: In Progress → Fix Committed
tags: added: verification-needed
Adam Conrad (adconrad) wrote :

It was pointed out to me that there is a tracking bug (https://bugs.launchpad.net/ubuntu/+source/linux-armadaxp/+bug/1004556) and it just wasn't in the changelog, so we can probably move verification discussion there. Just leaving this comment here as a pointer. ;)

Li Li (lli5) wrote :

@Adam, trying to get some (regression) test cases from Marvell as the bug is really hard to reproduce by ourselves. And as Tawfik stated, they have found the root cause internally and this would be fixed in the new silicon revision. So basically the two patches are software workaround for current revision and hopefully could be removed as long as we move to the new hardware. Would ping them to get a clearer picture and update the status later.

Li Li (lli5) wrote :

According to Maen, this patch is a software fix for AXP A0 only. The HW issue should be fixed in Armada XP B0 (the next revision). So we need maintain the software fix for AXP A0 (until B0 replaces it).

To reproduce the issue, we can run derby subtest of SPECJVM benchmark. "the test will simply crash w/o the WA mentioned below." - Shadi

Launchpad Janitor (janitor) wrote :
Download full text (29.3 KiB)

This bug was fixed in the package linux-armadaxp - 3.2.0-1603.6

---------------
linux-armadaxp (3.2.0-1603.6) precise-proposed; urgency=low

  [ Jani Monoses ]

  * SAUCE: Add CONFIG_SHEEVA_ERRATA_ARM_CPU_6409
    - LP: #1006217

  [ Ubuntu: 3.2.0-24.39 ]

  * Release Tracking Bug
    - LP: #1002329
  * SAUCE: ata_piix: add a disable_driver option
    - LP: #994870

  [ Ubuntu: 3.2.0-24.38 ]

  * Release Tracking Bug
    - LP: #991925
  * linux: add Build-Depends for libnewt-dev, to enable perf TUI support
    - LP: #981717
  * SAUCE: Allow filtering of cpufreq drivers
    - LP: #984288
  * x86 bpf_jit: fix a bug in emitting the 16-bit immediate operand of AND
    - LP: #981162
  * tg3: Fix 5717 serdes powerdown problem
    - LP: #981162
  * sky2: dont overwrite settings for PHY Quick link
    - LP: #981162
  * rose_dev: fix memcpy-bug in rose_set_mac_address
    - LP: #981162
  * net: usb: cdc_eem: fix mtu
    - LP: #981162
  * Fix non TBI PHY access; a bad merge undid bug fix in a previous commit.
    - LP: #981162
  * ASoC: wm8994: Update WM8994 DCS calibration
    - LP: #981162
  * mtd: ixp4xx: oops in ixp4xx_flash_probe
    - LP: #981162
  * mtd: mips: lantiq: reintroduce support for cmdline partitions
    - LP: #981162
  * mtd: nand: gpmi: use correct member for checking NAND_BBT_USE_FLASH
    - LP: #981162
  * mtd: sst25l: initialize writebufsize
    - LP: #981162
  * mtd: block2mtd: initialize writebufsize
    - LP: #981162
  * mtd: lart: initialize writebufsize
    - LP: #981162
  * mtd: m25p80: set writebufsize
    - LP: #981162
  * ACPI: Do cpufreq clamping for throttling per package v2
    - LP: #981162
  * PNPACPI: Fix device ref leaking in acpi_pnp_match
    - LP: #981162
  * modpost: fix ALL_INIT_DATA_SECTIONS
    - LP: #981162
  * genirq: Adjust irq thread affinity on IRQ_SET_MASK_OK_NOCOPY return
    value
    - LP: #981162
  * tracing: Fix ftrace stack trace entries
    - LP: #981162
  * tracing: Fix ent_size in trace output
    - LP: #981162
  * m68k/mac: Add missing platform check before registering platform
    devices
    - LP: #981162
  * mac80211: fix possible tid_rx->reorder_timer use after free
    - LP: #981162
  * rtlwifi: rtl8192ce: rtl8192cu: rtl8192de: Fix low-gain setting when
    scanning
    - LP: #981162
  * drm: Validate requested virtual size against allocated fb size
    - LP: #981162
  * drm/radeon/kms: fix fans after resume
    - LP: #981162
  * drm/i915: no-lvds quirk on MSI DC500
    - LP: #981162
  * drm/i915: Add lock on drm_helper_resume_force_mode
    - LP: #981162
  * drm/i915: quirk away broken OpRegion VBT
    - LP: #981162
  * r8169: runtime resume before shutdown.
    - LP: #981162
  * target: Fix unsupported WRITE_SAME sense payload
    - LP: #981162
  * kgdb,debug_core: pass the breakpoint struct instead of address and
    memory
    - LP: #981162
  * kgdbts: Fix kernel oops with CONFIG_DEBUG_RODATA
    - LP: #981162
  * kgdbts: (1 of 2) fix single step awareness to work correctly with SMP
    - LP: #981162
  * kgdbts: (2 of 2) fix single step awareness to work correctly with SMP
    - LP: #981162
  * x86,kgdb: Fix DEBUG_RODATA limitation using text_poke()
    - LP: #981162
  * CIFS: Fix V...

Changed in linux-armadaxp (Ubuntu):
status: Fix Committed → Fix Released
Ike Panhc (ikepanhc) on 2012-08-21
Changed in eilt:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers