workaround cavium thunderx silicon erratum 23144

Bug #1589704 reported by dann frazier
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
dann frazier
Xenial
Fix Released
High
dann frazier

Bug Description

[Impact]
This impacts 2 socket Cavium ThunderX systems using pass-1.1 silicon, and can result in IO hangs.

[Test Case]
$ dmesg | grep "ITS command queue timeout"

[Regression Risk]
The workaround is in gicv3 specific code, and only activated when the hw revision in the IIDR matches ThunderX pass 1.x silicon, so risk to other platforms is minimal.

dann frazier (dannf)
Changed in linux (Ubuntu):
status: New → Confirmed
assignee: nobody → dann frazier (dannf)
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
status: New → Confirmed
importance: Undecided → High
assignee: nobody → dann frazier (dannf)
description: updated
dann frazier (dannf)
Changed in linux (Ubuntu Xenial):
status: Confirmed → In Progress
dann frazier (dannf)
summary: - workaround cavium thunderx silicon erratum
+ workaround cavium thunderx silicon erratum 23144
Revision history for this message
Robert Richter (rric.cavium) wrote :

Upstream commit (in v4.7-rc2):

 fbf8f40e1658 irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144

penalvch (penalvch)
tags: added: cherry-pick
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
dann frazier (dannf) wrote :

I rebuilt d-i against the kernel in proposed and booted it a couple of times. I have not reproduced this issue, so marking it verified.

~ # uname -a
Linux cvm3 4.4.0-25-generic #44-Ubuntu SMP Fri Jun 10 18:15:04 UTC 2016 aarch64 GNU/Linux
~ # dmesg | grep -i soft
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-di root=UUID=bc611283-1f38-4993-a2b9-883922c7ed1f ro earlycon=pl011,0x87e024000000 hardlockup_all_cpu_backtrace=1 softlockup_all_cpu_backtrace=1 earlycon=pl011,0x87e024000000 apt-setup/proposed=true
[ 0.000000] software IO TLB [mem 0xfbfed000-0xfffed000] (64MB) mapped at [ffff8000fb1ed000-ffff8000ff1ecfff]
[ 0.229859] CPU features: detected feature: Software prefetching using PRFM
[ 174.913338] xor: measuring software checksum speed

tags: added: verification-done-xenial
removed: verification-needed-xenial
tags: added: verification-needed-xenial
removed: verification-done-xenial
Revision history for this message
dann frazier (dannf) wrote :

Oops, comment #3 was posted against the wrong bug. This one still requires verification.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (24.1 KiB)

This bug was fixed in the package linux - 4.4.0-25.44

---------------
linux (4.4.0-25.44) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1591289

  * Xenial update to v4.4.13 stable release (LP: #1590455)
    - MIPS64: R6: R2 emulation bugfix
    - MIPS: math-emu: Fix jalr emulation when rd == $0
    - MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC
    - MIPS: Don't unwind to user mode with EVA
    - MIPS: Avoid using unwind_stack() with usermode
    - MIPS: Fix siginfo.h to use strict posix types
    - MIPS: Fix uapi include in exported asm/siginfo.h
    - MIPS: Fix watchpoint restoration
    - MIPS: Flush highmem pages in __flush_dcache_page
    - MIPS: Handle highmem pages in __update_cache
    - MIPS: Sync icache & dcache in set_pte_at
    - MIPS: ath79: make bootconsole wait for both THRE and TEMT
    - MIPS: Reserve nosave data for hibernation
    - MIPS: Loongson-3: Reserve 32MB for RS780E integrated GPU
    - MIPS: Use copy_s.fmt rather than copy_u.fmt
    - MIPS: Fix MSA ld_*/st_* asm macros to use PTR_ADDU
    - MIPS: Prevent "restoration" of MSA context in non-MSA kernels
    - MIPS: Disable preemption during prctl(PR_SET_FP_MODE, ...)
    - MIPS: ptrace: Fix FP context restoration FCSR regression
    - MIPS: ptrace: Prevent writes to read-only FCSR bits
    - MIPS: Fix sigreturn via VDSO on microMIPS kernel
    - MIPS: Build microMIPS VDSO for microMIPS kernels
    - MIPS: lib: Mark intrinsics notrace
    - MIPS: VDSO: Build with `-fno-strict-aliasing'
    - affs: fix remount failure when there are no options changed
    - ASoC: ak4642: Enable cache usage to fix crashes on resume
    - Input: uinput - handle compat ioctl for UI_SET_PHYS
    - ARM: mvebu: fix GPIO config on the Linksys boards
    - ARM: dts: at91: fix typo in sama5d2 PIN_PD24 description
    - ARM: dts: exynos: Add interrupt line to MAX8997 PMIC on exynos4210-trats
    - ARM: dts: imx35: restore existing used clock enumeration
    - ath9k: Add a module parameter to invert LED polarity.
    - ath9k: Fix LED polarity for some Mini PCI AR9220 MB92 cards.
    - ath10k: fix debugfs pktlog_filter write
    - ath10k: fix firmware assert in monitor mode
    - ath10k: fix rx_channel during hw reconfigure
    - ath10k: fix kernel panic, move arvifs list head init before htt init
    - ath5k: Change led pin configuration for compaq c700 laptop
    - hwrng: exynos - Fix unbalanced PM runtime put on timeout error path
    - rtlwifi: rtl8723be: Add antenna select module parameter
    - rtlwifi: btcoexist: Implement antenna selection
    - rtlwifi: Fix logic error in enter/exit power-save mode
    - rtlwifi: pci: use dev_kfree_skb_irq instead of kfree_skb in
      rtl_pci_reset_trx_ring
    - aacraid: Relinquish CPU during timeout wait
    - aacraid: Fix for aac_command_thread hang
    - aacraid: Fix for KDUMP driver hang
    - hwmon: (ads7828) Enable internal reference
    - mfd: intel-lpss: Save register context on suspend
    - mfd: intel_soc_pmic_core: Terminate panel control GPIO lookup table
      correctly
    - PM / Runtime: Fix error path in pm_runtime_force_resume()
    - cpuidle: Indicate when a device has been unregiste...

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
dann frazier (dannf) wrote :
Download full text (4.3 KiB)

Verification results from partner:

I was able to successfully test your kernel on a crb2s pass1.1 system.

This was tested:

 * started correct kernel,

 * executed on crb2s pass1.1,

 * 96 cpus detected,

 * node info in sysfs (indicating that numa was detected),

 * all pci bridges found (node 0 and node 1),

 * correct devicetree model provided by fw,

 * dtb containing node info,

 * booting to the prompt,

 * network detected on 10GB port on node 1 pci bridge (ssh working).

See below for some logs.

File: ./linux-common-uname.sh
------------------------------------------------------------
Linux arm64 4.4.0-25-generic #44~14.04.1-Ubuntu SMP Mon Jun 13 15:14:31 UTC 2016 aarch64 aarch64 aarch64 GNU/Linux
------------------------------------------------------------
PASSED: ./linux-common-uname.sh
============================================================
File: ./linux-common-cpurev-dmesg.sh
------------------------------------------------------------
[ 0.000000] Boot CPU: AArch64 Processor [430f0a11]
------------------------------------------------------------
PASSED: ./linux-common-cpurev-dmesg.sh
============================================================
File: ./linux-common-cpurev-cpuinfo.sh
------------------------------------------------------------
CPU architecture: 8
CPU implementer : 0x43
CPU part : 0x0a1
CPU revision : 1
CPU variant : 0x0
------------------------------------------------------------
PASSED: ./linux-common-cpurev-cpuinfo.sh
============================================================
File: ./linux-numa-cpuinfo.sh
------------------------------------------------------------
96
processor : 95
------------------------------------------------------------
PASSED: ./linux-numa-cpuinfo.sh
============================================================
File: ./linux-numa-sysfs-nodes.sh
------------------------------------------------------------
/sys/devices/system/node/node0
/sys/devices/system/node/node1
------------------------------------------------------------
PASSED: ./linux-numa-sysfs-nodes.sh
============================================================
File: ./linux-numa-pci-bridges-ecam-dmesg.sh
------------------------------------------------------------
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:14.0: PCI bridge to [bus 02]
pci 0000:00:15.0: PCI bridge to [bus 03]
pci 0000:00:16.0: PCI bridge to [bus 04]
pci 0002:00:02.0: PCI bridge to [bus 01]
pci 0004:00:01.0: PCI bridge to [bus 01]
pci 0004:00:14.0: PCI bridge to [bus 02]
pci 0004:00:15.0: PCI bridge to [bus 03]
pci 0004:00:16.0: PCI bridge to [bus 04]
pci 0006:00:02.0: PCI bridge to [bus 01]
------------------------------------------------------------
PASSED: ./linux-numa-pci-bridges-ecam-dmesg.sh
============================================================
File: ./linux-numa-pci-bridges-ecam-lspci.sh
------------------------------------------------------------
0000:00:01.0 PCI bridge: Cavium Networks Device a002 (rev 01)
0000:00:14.0 PCI bridge: Cavium Networks Device a002 (rev 01)
0000:00:15.0 PCI bridge: Cavium Networks Device a002 (rev 01)
0000:00:16.0 PCI bridge: Cavium Networks Device a002 (rev 01)
0002:00:02.0 PCI br...

Read more...

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (26.1 KiB)

This bug was fixed in the package linux - 4.4.0-28.47

---------------
linux (4.4.0-28.47) xenial; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1595874

  * Linux netfilter local privilege escalation issues (LP: #1595350)
    - netfilter: x_tables: don't move to non-existent next rule
    - netfilter: x_tables: validate targets of jumps
    - netfilter: x_tables: add and use xt_check_entry_offsets
    - netfilter: x_tables: kill check_entry helper
    - netfilter: x_tables: assert minimum target size
    - netfilter: x_tables: add compat version of xt_check_entry_offsets
    - netfilter: x_tables: check standard target size too
    - netfilter: x_tables: check for bogus target offset
    - netfilter: x_tables: validate all offsets and sizes in a rule
    - netfilter: x_tables: don't reject valid target size on some architectures
    - netfilter: arp_tables: simplify translate_compat_table args
    - netfilter: ip_tables: simplify translate_compat_table args
    - netfilter: ip6_tables: simplify translate_compat_table args
    - netfilter: x_tables: xt_compat_match_from_user doesn't need a retval
    - netfilter: x_tables: do compat validation via translate_table
    - netfilter: x_tables: introduce and use xt_copy_counters_from_user

  * Linux netfilter IPT_SO_SET_REPLACE memory corruption (LP: #1555338)
    - netfilter: x_tables: validate e->target_offset early
    - netfilter: x_tables: make sure e->next_offset covers remaining blob size
    - netfilter: x_tables: fix unconditional helper

linux (4.4.0-27.46) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1594906

  * Support Edge Gateway's Bluetooth LED (LP: #1512999)
    - Revert "UBUNTU: SAUCE: Bluetooth: Support for LED on Marvell modules"

linux (4.4.0-26.45) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1594442

  * linux: Implement secure boot state variables (LP: #1593075)
    - SAUCE: UEFI: Add secure boot and MOK SB State disabled sysctl

  * failures building userspace packages that include ethtool.h (LP: #1592930)
    - ethtool.h: define INT_MAX for userland

linux (4.4.0-25.44) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1591289

  * Xenial update to v4.4.13 stable release (LP: #1590455)
    - MIPS64: R6: R2 emulation bugfix
    - MIPS: math-emu: Fix jalr emulation when rd == $0
    - MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC
    - MIPS: Don't unwind to user mode with EVA
    - MIPS: Avoid using unwind_stack() with usermode
    - MIPS: Fix siginfo.h to use strict posix types
    - MIPS: Fix uapi include in exported asm/siginfo.h
    - MIPS: Fix watchpoint restoration
    - MIPS: Flush highmem pages in __flush_dcache_page
    - MIPS: Handle highmem pages in __update_cache
    - MIPS: Sync icache & dcache in set_pte_at
    - MIPS: ath79: make bootconsole wait for both THRE and TEMT
    - MIPS: Reserve nosave data for hibernation
    - MIPS: Loongson-3: Reserve 32MB for RS780E integrated GPU
    - MIPS: Use copy_s.fmt rather than copy_u.fmt
    - MIPS: Fix MSA ld_*/st_* asm macros to use PTR_ADDU
    - MIPS: Prevent "restoration" of MSA c...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.