14.04: PANIC with "dcache shrink list corruption?" problem

Bug #1354234 reported by Tom Zhou on 2014-08-08
32
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Trusty
Undecided
Dave Chiluk
Utopic
Undecided
Unassigned

Bug Description

Hello,

14.04 kernel linux_3.13.0-32.57 panicked because of "dcache shrink list
corruption?" problem. Please see followings:
- details at https://lkml.org/lkml/2014/5/2/555.
- argument at https://lkml.org/lkml/2014/4/29/402
- patch set at https://lkml.org/lkml/2014/5/4/7

One of 9 patches has been applied to linux_3.13.0-32.57 already.
Could you apply other 8 patches to the next 14.04 kernel ?
We will need the kernel at the beginning of August badly. If we will not
be able to get it, we will not be able to start tests before service in.
Could you release it as soon as possible ?

We applied other 8 patches to linux_3.13.0-32.57 and now the panic does
not occur.

Public Bug URL:
https://lkml.org/lkml/2014/5/2/555

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1354234

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Tom Zhou (zhouqt) wrote :

Hey there,

The log files are being collected, I'll attach them later.

Thanks,
Tom

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: cherry-pick
Tim Gardner (timg-tpi) on 2014-08-11
Changed in linux (Ubuntu Utopic):
status: Confirmed → Fix Released
Changed in linux (Ubuntu Trusty):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
tags: added: trusty utopic
Dave Chiluk (chiluk) wrote :

Here's a soft lockup that is reportedly fixed by this patchset

BUG: soft lockup - CPU#1 stuck for 23s! [test.sh:1928]
Kernel panic - not syncing: softlockup: hung tasks
CPU: 1 PID: 1928 Comm: test.sh Tainted: G W 3.13.0-24-generic #47-Ubuntu
Hardware name: NEC Mercury CPU Module [NQ2250-902]/GC-MDC10-NJ, BIOS 5.6.0005 03/05/2014
000000000000419b ffff88017fc43e28 ffffffff81715ac4 ffffffff81a5ee64
ffff88017fc43ea0 ffffffff8170ecc5 0000000000000008 ffff88017fc43eb0
ffff88017fc43e50 0000000000000086 00000000a954a952 0000000000000007
Call Trace:
<IRQ> [<ffffffff81715ac4>] dump_stack+0x45/0x56
[<ffffffff8170ecc5>] panic+0xc8/0x1d7
[<ffffffff8110d015>] watchdog_timer_fn+0x165/0x170
[<ffffffff8108e537>] __run_hrtimer+0x77/0x1d0
[<ffffffff8110ceb0>] ? watchdog_cleanup+0x10/0x10
[<ffffffff8108ed3f>] hrtimer_interrupt+0xef/0x230
[<ffffffff81043087>] local_apic_timer_interrupt+0x37/0x60
[<ffffffff8172887f>] smp_apic_timer_interrupt+0x3f/0x60
[<ffffffff8172721d>] apic_timer_interrupt+0x6d/0x80
<EOI> [<ffffffff8171da2c>] ? _raw_spin_lock+0x1c/0x50
[<ffffffff811d02ce>] shrink_dentry_list+0x4e/0xe0
[<ffffffff811d10f8>] shrink_dcache_parent+0x28/0x70
[<ffffffff81224826>] proc_flush_task+0xb6/0x1b0
[<ffffffff81068540>] release_task+0x30/0x440
[<ffffffff8109e1b1>] ? thread_group_cputime_adjusted+0x41/0x50
[<ffffffff8106922f>] wait_consider_task+0x8df/0xb20
[<ffffffff81069570>] do_wait+0x100/0x240
[<ffffffff8106a6a4>] SyS_wait4+0x64/0xe0
[<ffffffff810682a0>] ? task_stopped_code+0x60/0x60
[<ffffffff817266bf>] tracesys+0xe1/0xe6

Also some machines are able to reliably reproduce this soft lockup using the following script.
"
#!/bin/bash

mkdir -p testdir
cd testdir

while true; do if ! (exec >xx1 2>xx2 ; rm -f xx*; host www.google.com); then echo failed; fi ; done &
while true; do ls -lR /proc >& /dev/null; done &
while true; do lsof > /dev/null; done &
while true; do ls -H >& /dev/null; done &

wait
"
Although I personally have not been able to reproduce using this script.

Dave Chiluk (chiluk) on 2014-08-12
tags: added: ua
Tim Gardner (timg-tpi) on 2014-08-19
Changed in linux (Ubuntu Trusty):
assignee: Tim Gardner (timg-tpi) → Dave Chiluk (chiluk)
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Dave Chiluk (chiluk) on 2014-09-08
tags: added: verification-done-trusty
removed: verification-needed-trusty
Launchpad Janitor (janitor) wrote :
Download full text (5.8 KiB)

This bug was fixed in the package linux - 3.13.0-36.63

---------------
linux (3.13.0-36.63) trusty; urgency=low

  [ Joseph Salisbury ]

  * Release Tracking Bug
    - LP: #1365052

  [ Feng Kan ]

  * SAUCE: (no-up) irqchip:gic: change access of gicc_ctrl register to read
    modify write.
    - LP: #1357527
  * SAUCE: (no-up) arm64: optimized copy_to_user and copy_from_user
    assembly code
    - LP: #1358949

  [ Ming Lei ]

  * SAUCE: (no-up) Drop APM X-Gene SoC Ethernet driver
    - LP: #1360140
  * [Config] Drop XGENE entries
    - LP: #1360140
  * [Config] CONFIG_NET_XGENE=m for arm64
    - LP: #1360140

  [ Stefan Bader ]

  * SAUCE: Add compat macro for skb_get_hash
    - LP: #1358162
  * SAUCE: bcache: prevent crash on changing writeback_running
    - LP: #1357295

  [ Suman Tripathi ]

  * SAUCE: (no-up) arm64: Fix the csr-mask for APM X-Gene SoC AHCI SATA PHY
    clock DTS node.
    - LP: #1359489
  * SAUCE: (no-up) ahci_xgene: Skip the PHY and clock initialization if
    already configured by the firmware.
    - LP: #1359501
  * SAUCE: (no-up) ahci_xgene: Fix the link down in first attempt for the
    APM X-Gene SoC AHCI SATA host controller driver.
    - LP: #1359507

  [ Tuan Phan ]

  * SAUCE: (no-up) pci-xgene-msi: fixed deadlock in irq_set_affinity
    - LP: #1359514

  [ Upstream Kernel Changes ]

  * iwlwifi: mvm: Add a missed beacons threshold
    - LP: #1349572
  * mac80211: reset probe_send_count also in HW_CONNECTION_MONITOR case
    - LP: #1349572
  * genirq: Add an accessor for IRQ_PER_CPU flag
    - LP: #1357527
  * arm64: perf: add support for percpu pmu interrupt
    - LP: #1357527
  * cifs: sanity check length of data to send before sending
    - LP: #1283101
  * KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
    - LP: #1329434
  * KVM: nVMX: Rework interception of IRQs and NMIs
    - LP: #1329434
  * KVM: vmx: disable APIC virtualization in nested guests
    - LP: #1329434
  * HID: Add transport-driver functions to the USB HID interface.
    - LP: #1353021
  * ahci_xgene: Removing NCQ support from the APM X-Gene SoC AHCI SATA Host
    Controller driver.
    - LP: #1358498
  * fold d_kill() and d_free()
    - LP: #1354234
  * fold try_prune_one_dentry()
    - LP: #1354234
  * new helper: dentry_free()
    - LP: #1354234
  * expand the call of dentry_lru_del() in dentry_kill()
    - LP: #1354234
  * dentry_kill(): don't try to remove from shrink list
    - LP: #1354234
  * don't remove from shrink list in select_collect()
    - LP: #1354234
  * more graceful recovery in umount_collect()
    - LP: #1354234
  * dcache: don't need rcu in shrink_dentry_list()
    - LP: #1354234
  * lift the "already marked killed" case into shrink_dentry_list()
  * split dentry_kill()
    - LP: #1354234
  * expand dentry_kill(dentry, 0) in shrink_dentry_list()
    - LP: #1354234
  * shrink_dentry_list(): take parent's ->d_lock earlier
    - LP: #1354234
  * dealing with the rest of shrink_dentry_list() livelock
    - LP: #1354234
  * dentry_kill() doesn't need the second argument now
    - LP: #1354234
  * dcache: add missing lockdep annotation
    - LP: #1354234
  * fs: convert use of typedef ctl_table to struct ctl_table
 ...

Read more...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers