In ZZ-BML (POWER9):ubuntu17.04 installation Fails

Bug #1675771 reported by bugproxy on 2017-03-24
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Tim Gardner
Zesty
Undecided
Tim Gardner

Bug Description

Ubuntu17.04 installation Fails on ZZ-BML (POWER9) as getting (rcu_sched detected stalls call traces )

Reproducible Step:
1- kick off the installation

while package installation rcu stalls detected and installation fails

Firmware version : FW910.00 (UL910_006)

LOG:

 ??????????????????????? Installing the base system ????????????????????????
  ? ?
  ? 32% ?
  ? ?
  ? [ 466.603008] Oops: Machine check, sig: 7 [#1] ?
[ 466.603069] SMP NR_CPUS=2048 ?
[ 466.603071] NUMA ?????????????????????????????????????????????????????????
[ 466.603108] Harmless Hypervisor Maintenance interrupt [Recovered]
[ 466.603111] Error detail: Processor Recovery done
[ 466.603113] HMER: 2040000000000000
[ 466.603117] Harmless Hypervisor Maintenance interrupt [Recovered]
[ 466.603119] Error detail: Processor Recovery done
[ 466.603121] HMER: 2040000000000000
[ 466.603123] Harmless Hypervisor Maintenance interrupt [Recovered]
[ 466.603125] Error detail: Processor Recovery done
[ 466.603128] HMER: 2040000000000000
[ 466.603909] PowerNV
[ 466.603963] Modules linked in: xfs jfs btrfs ntfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ipr scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac usb_storage tg3
[ 466.604408] CPU: 16 PID: 15340 Comm: debootstrap Tainted: G M 4.10.0-13-generic #15-Ubuntu
[ 466.604580] task: c0000000031bc800 task.stack: c00000000324c000
[ 466.604704] NIP: c000000000079aa4 LR: c0000000006124c4 CTR: c00000000034a980
[ 466.604851] REGS: c00000000fe7bd80 TRAP: 0200 Tainted: G M (4.10.0-13-generic)
[ 466.605020] MSR: 9000000000209033 <SF,HV,EE,ME,IR,DR,RI,LE>
[ 466.605043] CR: 88002881 XER: 20000000
[ 466.605214] CFAR: c000000000079a8c DAR: 00000000525984f0 DSISR: 00000400 SOFTE: 1
[ 466.605214] GPR00: c0000000006124b0 c00000000324fbe0 c00000000143c700 c0000003adf3ea8c
[ 466.605214] GPR04: 00000000525984f0 0000000000000001 c00000000324fd20 ffffffffffffffff
[ 466.605214] GPR08: c000000000000000 c000000000b60000 c000000000000000 c000000000b61060
[ 466.605214] GPR12: c00000000034a980 c00000000fb89000 0000000000000000 0000000000000000
[ 466.605214] GPR16: 0000000000000000 00003ffffd8246d8 000000004b7de900 00003ffffd83fefc
[ 466.605214] GPR20: 000000004b7de8c0 000000004b7e2528 0000000000000002 00000000525984f0
[ 466.605214] GPR24: c00000000324fd70 000000000000ea8c 0000000000000000 0000000000000001
[ 466.605214] GPR28: c00a000000eb7cc0 c0000003adf3ea8c 0000000000000001 c00000000324fd20
[ 466.607454] NIP [c000000000079aa4] __copy_tofrom_user_power7+0x250/0x7cc
[ 466.607687] LR [c0000000006124c4] copy_page_from_iter+0xe4/0x2e0
[ 466.607911] Call Trace:
[ 466.608010] [c00000000324fbe0] [c0000000006124b0] copy_page_from_iter+0xd0/0x2e0 (unreliable)
[ 466.608327] [c00000000324fc50] [c00000000034bca4] pipe_write+0x514/0x560
[ 466.608556] [c00000000324fd00] [c00000000033c8ec] new_sync_write+0xec/0x150
[ 466.608786] [c00000000324fd90] [c00000000033e414] vfs_write+0xd4/0x240
[ 466.609017] [c00000000324fde0] [c00000000033ffc8] SyS_write+0x68/0x110
[ 466.609250] [c00000000324fe30] [c00000000000b184] system_call+0x38/0xe0
[ 466.609476] Instruction dump:
[ 466.609614] 38630008 409d0014 80040000 38840004 90030000 38630004 409e0014 a0040000
[ 466.609894] 38840002 b0030000 38630002 409f000c <88040000> 98030000 38600000 4e800020
[ 466.610181] ---[ end trace cea5171b162d1d13 ]---
[ 466.610372]
[ 487.618915] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 487.619114] 16-...: (1 GPs behind) idle=093/140000000000000/0 softirq=2896/2940 fqs=2626
[ 487.619259] (detected by 14, t=5252 jiffies, g=17122, c=17121, q=3265)
[ 487.619393] Task dump for CPU 16:
[ 487.619468] debootstrap R running task 0 15340 8860 0x00042004
[ 487.619617] Call Trace:
[ 487.619677] [c00000000324faf0] [c00000000324fdd0] 0xc00000000324fdd0 (unreliable)
[ 550.642918] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 550.643121] 16-...: (1 GPs behind) idle=093/140000000000000/0 softirq=2896/2940 fqs=10504
[ 550.643269] (detected by 26, t=21008 jiffies, g=17122, c=17121, q=6735)
[ 550.643404] Task dump for CPU 16:
[ 550.643481] debootstrap R running task 0 15340 8860 0x00042004
[ 550.643635] Call Trace:
[ 550.643694] [c00000000324faf0] [c00000000324fdd0] 0xc00000000324fdd0 (unreliable)
[ 613.666915] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 613.667029] 16-...: (1 GPs behind) idle=093/140000000000000/0 softirq=2896/2940 fqs=18383
[ 613.667099] (detected by 8, t=36764 jiffies, g=17122, c=17121, q=10254)
[ 613.667229] Task dump for CPU 16:
[ 613.667303] debootstrap R running task 0 15340 8860 0x00042004
[ 613.667452] Call Trace:
[ 613.667509] [c00000000324faf0] [c00000000324fdd0] 0xc00000000324fdd0 (unreliable)
[ 676.690913] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 676.691106] 16-...: (1 GPs behind) idle=093/140000000000000/0 softirq=2896/2940 fqs=26262
[ 676.691252] (detected by 5, t=52520 jiffies, g=17122, c=17121, q=13570)
[ 676.691383] Task dump for CPU 16:
[ 676.691457] debootstrap R running task 0 15340 8860 0x00042004
[ 676.691606] Call Trace:

== Comment: #5 - Breno Henrique Leitao <email address hidden> - 2017-03-24 08:09:57 ==
We need to have these patches included in 17.04:

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=1363875bdb6317a2d0798284d7aaf320f0782f6d

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=c1bbf387d6191e6e18f3adc4db45b922822c2ba4

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=7b9f71f974a12740e79e918cfd58c2fce0b5b580

CVE References

bugproxy (bugproxy) on 2017-03-24
tags: added: architecture-ppc64le bugnameltc-152883 severity-critical targetmilestone-inin1704
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Download full text (6.6 KiB)

Leann,

More kernel patches supporting upcoming hardware.

                  Michael

On 03/24/2017 06:19 AM, Launchpad Bug Tracker wrote:
> bugproxy (bugproxy) has assigned this bug to you for Ubuntu:
>
> Ubuntu17.04 installation Fails on ZZ-BML (POWER9) as getting (rcu_sched
> detected stalls call traces )
>
> Reproducible Step:
> 1- kick off the installation
>
> while package installation rcu stalls detected and installation fails
>
> Firmware version : FW910.00 (UL910_006)
>
> LOG:
>
> ??????????????????????? Installing the base system ????????????????????????
> ? ?
> ? 32% ?
> ? ?
> ? [ 466.603008] Oops: Machine check, sig: 7 [#1] ?
> [ 466.603069] SMP NR_CPUS=2048 ?
> [ 466.603071] NUMA ?????????????????????????????????????????????????????????
> [ 466.603108] Harmless Hypervisor Maintenance interrupt [Recovered]
> [ 466.603111] Error detail: Processor Recovery done
> [ 466.603113] HMER: 2040000000000000
> [ 466.603117] Harmless Hypervisor Maintenance interrupt [Recovered]
> [ 466.603119] Error detail: Processor Recovery done
> [ 466.603121] HMER: 2040000000000000
> [ 466.603123] Harmless Hypervisor Maintenance interrupt [Recovered]
> [ 466.603125] Error detail: Processor Recovery done
> [ 466.603128] HMER: 2040000000000000
> [ 466.603909] PowerNV
> [ 466.603963] Modules linked in: xfs jfs btrfs ntfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ipr scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac usb_storage tg3
> [ 466.604408] CPU: 16 PID: 15340 Comm: debootstrap Tainted: G M 4.10.0-13-generic #15-Ubuntu
> [ 466.604580] task: c0000000031bc800 task.stack: c00000000324c000
> [ 466.604704] NIP: c000000000079aa4 LR: c0000000006124c4 CTR: c00000000034a980
> [ 466.604851] REGS: c00000000fe7bd80 TRAP: 0200 Tainted: G M (4.10.0-13-generic)
> [ 466.605020] MSR: 9000000000209033 <SF,HV,EE,ME,IR,DR,RI,LE>
> [ 466.605043] CR: 88002881 XER: 20000000
> [ 466.605214] CFAR: c000000000079a8c DAR: 00000000525984f0 DSISR: 00000400 SOFTE: 1
> [ 466.605214] GPR00: c0000000006124b0 c00000000324fbe0 c00000000143c700 c0000003adf3ea8c
> [ 466.605214] GPR04: 00000000525984f0 0000000000000001 c00000000324fd20 ffffffffffffffff
> [ 466.605214] GPR08: c000000000000000 c000000000b60000 c000000000000000 c000000000b61060
> [ 466.605214] GPR12: c00000000034a980 c00000000fb89000 0000000000000000 0000000000000000
> [ 466.605214] GPR16: 0000000000000000 00003ffffd8246d8 000000004b7de900 00003ffffd83fefc
> [ 466.605214] GPR20: 000000004b7de8c0 000000004b7e2528 0000000000000002 00000000525984f0
> [ 466.605214] GPR24: c00000000324fd70 000000000000ea8c 0000000000000000 0000000000000001
> [ 466.605214] GPR28: c00a000000eb7cc0 c0000003adf3ea8c 0000000000000001 c00000000324fd20
> [ 466.607454] NIP [c000000000079aa4] __copy_tofrom_user...

Read more...

Tim Gardner (timg-tpi) on 2017-03-24
Changed in linux (Ubuntu Zesty):
assignee: Taco Screen team (taco-screen-team) → Tim Gardner (timg-tpi)
status: New → Fix Committed

------- Comment From <email address hidden> 2017-03-24 14:57 EDT-------
*** Bug 152890 has been marked as a duplicate of this bug. ***

bugproxy (bugproxy) on 2017-03-28
tags: removed: bugnameltc-152883 severity-critical
bugproxy (bugproxy) on 2017-04-03
tags: added: bugnameltc-152883 severity-critical
Launchpad Janitor (janitor) wrote :
Download full text (9.0 KiB)

This bug was fixed in the package linux - 4.10.0-15.17

---------------
linux (4.10.0-15.17) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1675868

  * In ZZ-BML (POWER9):ubuntu17.04 installation Fails (LP: #1675771)
    - powerpc/64s: fix handling of non-synchronous machine checks
    - powerpc/64s: allow machine check handler to set severity and initiator
    - powerpc/64s: POWER9 machine check handler

  * [Feature] R3 mwait support for Knights Mill (LP: #1637550)
    - x86/cpufeature: Enable RING3MWAIT for Knights Landing
    - x86/cpufeature: Enable RING3MWAIT for Knights Mill
    - x86/msr: Add MSR_MISC_FEATURE_ENABLES and RING3MWAIT bit
    - x86/elf: Add HWCAP2 to expose ring 3 MONITOR/MWAIT
    - x86/cpufeature: Add RING3MWAIT to CPU features

  * [Feature] GLK:New device IDs (LP: #1645951)
    - mfd: intel-lpss: Add Intel Gemini Lake PCI IDs
    - pwm: lpss: Add Intel Gemini Lake PCI ID
    - i2c: i801: Add support for Intel Gemini Lake
    - spi: pxa2xx: Add support for Intel Gemini Lake
    - [Config] CONFIG_PINCTRL_GEMINILAKE=m
    - pinctrl: intel: Add Intel Gemini Lake pin controller support

  * Zesty update to v4.10.5 stable release (LP: #1675032)
    - net/mlx5e: Register/unregister vport representors on interface attach/detach
    - net/mlx5e: Do not reduce LRO WQE size when not using build_skb
    - net/mlx5e: Fix broken CQE compression initialization
    - net/mlx5e: Update MPWQE stride size when modifying CQE compress state
    - net/mlx5e: Fix wrong CQE decompression
    - vxlan: correctly validate VXLAN ID against VXLAN_N_VID
    - vti6: return GRE_KEY for vti6
    - vxlan: don't allow overwrite of config src addr
    - ipv4: add missing initialization for flowi4_uid
    - ipv4: mask tos for input route
    - sctp: set sin_port for addr param when checking duplicate address
    - net sched actions: decrement module reference count after table flush.
    - l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv
    - vxlan: lock RCU on TX path
    - geneve: lock RCU on TX path
    - mlxsw: spectrum_router: Avoid potential packets loss
    - net: bridge: allow IPv6 when multicast flood is disabled
    - net: don't call strlen() on the user buffer in packet_bind_spkt()
    - net: net_enable_timestamp() can be called from irq contexts
    - ipv6: orphan skbs in reassembly unit
    - dccp: Unlock sock before calling sk_free()
    - amd-xgbe: Stop the PHY before releasing interrupts
    - amd-xgbe: Be sure to set MDIO modes on device (re)start
    - amd-xgbe: Don't overwrite SFP PHY mod_absent settings
    - bonding: use ETH_MAX_MTU as max mtu
    - strparser: destroy workqueue on module exit
    - tcp: fix various issues for sockets morphing to listen state
    - net: fix socket refcounting in skb_complete_wifi_ack()
    - net: fix socket refcounting in skb_complete_tx_timestamp()
    - net/sched: act_skbmod: remove unneeded rcu_read_unlock in tcf_skbmod_dump
    - dccp: fix use-after-free in dccp_feat_activate_values
    - team: use ETH_MAX_MTU as max mtu
    - vrf: Fix use-after-free in vrf_xmit
    - net/tunnel: set inner protocol in network gro hooks
    - uapi: fix linux/packet_diag.h use...

Read more...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
bugproxy (bugproxy) on 2017-05-19
tags: added: targetmilestone-inin16043
removed: targetmilestone-inin1704
Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers