watchdog bug: soft lockup

Bug #1845178 reported by Philip Vanloo
2
Affects Status Importance Assigned to Milestone
linux-raspi2 (Ubuntu)
Invalid
Undecided
Unassigned
Xenial
Invalid
Undecided
Unassigned

Bug Description

Hello

I have a Ubuntu Core 16 device (RPi CM3) which sometimes (at random) loses ethernet connection.

Core snap is 16-2.40
Kernel snap is pi2-kernel 4.4.0-1120.129

Now I’ve managed to capture serial data from it and it seems it crashes completely, however it never restores (or full resets).

[18037.433255] INFO: rcu_sched self-detected stall on CPU
[18037.441257] INFO: rcu_sched detected stalls on CPUs/tasks:
[18037.441264] e[12;17H1-...: (3708966 ticks this GP) idle=78d/140000000000001/0 softirq=7574/7589 fqs=1588
[18037.441270] e[13;17H(detected by 3, t=4495427 jiffies, g=2096, c=2095, q=35614)
[18037.441272] Task dump for CPU 1:
[18037.441278] manager-control R running 0 2310 2308 0x00000082
[18037.441285] rcu_sched kthread starved for 4493705 jiffies! g2096 c2095 f0x2 s3 ->state=0x0
[18037.720278] e[17;17H1-...: (3708966 ticks this GP) idle=78d/140000000000001/0 softirq=7574/7589 fqs=1588
[18037.782050] e[18;17H (t=4495512 jiffies g=2096 c=2095 q=35615)
[18037.813311] rcu_sched kthread starved for 4493798 jiffies! g2096 c2095 f0x2 s3 ->state=0x0
[18037.873341] Task dump for CPU 1:
[18037.901730] manager-control R running 0 2310 2308 0x00000082
[18037.933017] [<80112554>] (unwind_backtrace) from [<8010d7dc>] (show_stack+0x20/0x24)
[18037.990378] [<8010d7dc>] (show_stack) from [<801571bc>] (sched_show_task+0xb8/0x110)
[18038.048268] [<801571bc>] (sched_show_task) from [<80159a00>] (dump_cpu_task+0x48/0x4c)
[18038.107812] [<80159a00>] (dump_cpu_task) from [<801919b8>] (rcu_dump_cpu_stacks+0x9c/0xd4)
[18038.168554] [<801919b8>] (rcu_dump_cpu_stacks) from [<80195fec>] (rcu_check_callbacks+0x5c0/0x8bc)
[18038.230663] [<80195fec>] (rcu_check_callbacks) from [<8019c3cc>] (update_process_times+0x4c/0x74)
[18038.294469] [<8019c3cc>] (update_process_times) from [<801b04dc>] (tick_sched_handle+0x64/0x70)
[18038.358337] [<801b04dc>] (tick_sched_handle) from [<801b0550>] (tick_sched_timer+0x68/0xbc)
[18038.423493] [<801b0550>] (tick_sched_timer) from [<8019d1b0>] (__hrtimer_run_queues+0x188/0x364)
[18038.490259] [<8019d1b0>] (__hrtimer_run_queues) from [<8019db4c>] (hrtimer_interrupt+0xd8/0x244)
[18038.557715] [<8019db4c>] (hrtimer_interrupt) from [<80743e5c>] (arch_timer_handler_phys+0x40/0x48)
[18038.626525] [<80743e5c>] (arch_timer_handler_phys) from [<8018bd2c>] (handle_percpu_devid_irq+0x80/0x194)
[18038.696876] [<8018bd2c>] (handle_percpu_devid_irq) from [<80186f64>] (generic_handle_irq+0x34/0x44)
[18038.768565] [<80186f64>] (generic_handle_irq) from [<80187270>] (__handle_domain_irq+0x6c/0xc4)
[18038.840084] [<80187270>] (__handle_domain_irq) from [<801096f4>] (handle_IRQ+0x28/0x2c)
[18038.910902] [<801096f4>] (handle_IRQ) from [<801015e4>] (bcm2836_arm_irqchip_handle_irq+0xb8/0xbc)
[18038.982964] [<801015e4>] (bcm2836_arm_irqchip_handle_irq) from [<808a3678>] (__irq_svc+0x58/0x78)
[18039.055005] Exception stack(0xaea2b9d0 to 0xaea2ba18)
[18039.091313] b9c0: b9e3e01c 00000000 00000025 00000024
[18039.161335] b9e0: ae800040 ffffffff aea2bae4 00011000 aea2bae4 80e0355c 000b0000 aea2ba2c
[18039.230628] ba00: aea2ba30 aea2ba20 8027a628 808a2c30 200f0013 ffffffff
[18039.267788] [<808a3678>] (__irq_svc) from [<808a2c30>] (_raw_spin_lock+0x40/0x54)
[18039.335854] [<808a2c30>] (_raw_spin_lock) from [<8027a628>] (unmap_single_vma+0x1e8/0x64c)
[18039.404789] [<8027a628>] (unmap_single_vma) from [<8027bab4>] (unmap_vmas+0x64/0x78)
[18039.473309] [<8027bab4>] (unmap_vmas) from [<80282e38>] (exit_mmap+0x110/0x214)
[18039.511274] [<80282e38>] (exit_mmap) from [<80122264>] (mmput+0x6c/0x138)
[18039.548375] [<80122264>] (mmput) from [<80128a74>] (do_exit+0x32c/0xb38)
[18039.585148] [<80128a74>] (do_exit) from [<8010db5c>] (die+0x37c/0x38c)
[18039.621494] [<8010db5c>] (die) from [<8011bee0>] (__do_kernel_fault.part.0+0x74/0x1f4)
[18039.688470] [<8011bee0>] (__do_kernel_fault.part.0) from [<808a40d8>] (do_page_fault+0x244/0x3c4)
[18039.756059] [<808a40d8>] (do_page_fault) from [<80101284>] (do_DataAbort+0x58/0xe8)
[18039.822179] [<80101284>] (do_DataAbort) from [<808a35e4>] (__dabt_svc+0x44/0x80)
[18039.887651] Exception stack(0xaea2bd00 to 0xaea2bd48)
[18039.921312] bd00: b9f55fc0 00000037 00000038 00000000 b9036000 ad5f32a0 b9f55fc0 00017000
[18039.987397] bd20: ae80005c b8af3eec 80e0354c aea2bd6c aea2bd70 aea2bd50 802855e8 802a4a5c
[18040.054541] bd40: 60010113 ffffffff
[18040.086842] [<808a35e4>] (__dabt_svc) from [<802a4a5c>] (mem_cgroup_begin_page_stat+0x94/0xa0)
[18040.153350] [<802a4a5c>] (mem_cgroup_begin_page_stat) from [<802855e8>] (page_add_file_rmap+0x1c/0xa4)
[18040.220784] [<802855e8>] (page_add_file_rmap) from [<8027c2d4>] (do_set_pte+0xec/0x100)
[18040.287422] [<8027c2d4>] (do_set_pte) from [<802498e0>] (filemap_map_pages+0x27c/0x298)

and

[18064.097260] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [manager-control:2310]
[18064.163238] Modules linked in: cfg80211 nls_ascii bcm2835_wdt bcm2835_gpiomem spi_bcm2835 uio_pdrv_genirq uio i2c_bcm2708
[18064.233969] CPU: 1 PID: 2310 Comm: manager-control Tainted: G D L 4.4.0-1120-raspi2 #129-Ubuntu
[18064.302713] Hardware name: BCM2709
[18064.334894] task: b83f1440 ti: aea2a000 task.ti: aea2a000
[18064.368692] PC is at _raw_spin_lock+0x40/0x54
[18064.400880] LR is at unmap_single_vma+0x1e8/0x64c
[18064.432726] pc : [<808a2c30>] lr : [<8027a628>] psr: 200f0013
[18064.432726] sp : aea2ba20 ip : aea2ba30 fp : aea2ba2c
[18064.497639] r10: 000b0000 r9 : 80e0355c r8 : aea2bae4
[18064.528646] r7 : 00011000 r6 : aea2bae4 r5 : ffffffff r4 : ae800040
[18064.560481] r3 : 00000024 r2 : 00000025 r1 : 00000000 r0 : b9e3e01c
[18064.591720] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[18064.623483] Control: 10c5383d Table: 301ec06a DAC: 00000051
[18064.653583] CPU: 1 PID: 2310 Comm: manager-control Tainted: G D L 4.4.0-1120-raspi2 #129-Ubuntu
[18064.711017] Hardware name: BCM2709
[18064.737944] [<80112554>] (unwind_backtrace) from [<8010d7dc>] (show_stack+0x20/0x24)
[18064.793410] [<8010d7dc>] (show_stack) from [<804be668>] (dump_stack+0xc8/0x10c)
[18064.825162] [<804be668>] (dump_stack) from [<80109a78>] (show_regs+0x1c/0x20)
[18064.856288] [<80109a78>] (show_regs) from [<801ed21c>] (watchdog_timer_fn+0x258/0x2c0)
[18064.911793] [<801ed21c>] (watchdog_timer_fn) from [<8019d1b0>] (__hrtimer_run_queues+0x188/0x364)
[18064.968537] [<8019d1b0>] (__hrtimer_run_queues) from [<8019db4c>] (hrtimer_interrupt+0xd8/0x244)
[18065.025759] [<8019db4c>] (hrtimer_interrupt) from [<80743e5c>] (arch_timer_handler_phys+0x40/0x48)
[18065.084353] [<80743e5c>] (arch_timer_handler_phys) from [<8018bd2c>] (handle_percpu_devid_irq+0x80/0x194)
[18065.145475] [<8018bd2c>] (handle_percpu_devid_irq) from [<80186f64>] (generic_handle_irq+0x34/0x44)
[18065.208195] [<80186f64>] (generic_handle_irq) from [<80187270>] (__handle_domain_irq+0x6c/0xc4)
[18065.272665] [<80187270>] (__handle_domain_irq) from [<801096f4>] (handle_IRQ+0x28/0x2c)
[18065.338842] [<801096f4>] (handle_IRQ) from [<801015e4>] (bcm2836_arm_irqchip_handle_irq+0xb8/0xbc)
[18065.407526] [<801015e4>] (bcm2836_arm_irqchip_handle_irq) from [<808a3678>] (__irq_svc+0x58/0x78)
[18065.477520] Exception stack(0xaea2b9d0 to 0xaea2ba18)
[18065.513738] b9c0: b9e3e01c 00000000 00000025 00000024
[18065.583538] b9e0: ae800040 ffffffff aea2bae4 00011000 aea2bae4 80e0355c 000b0000 aea2ba2c

Was told to report this bug here from https://forum.snapcraft.io/t/watchdog-soft-lockup/13375

Juerg Haefliger (juergh)
Changed in linux-raspi2 (Ubuntu):
status: New → Invalid
Revision history for this message
Juerg Haefliger (juergh) wrote :

Given that the ticket is over a year old, is this still an issue?

Revision history for this message
Philip Vanloo (fragiledj) wrote :

Hi there, we've since moved to UC18 and another kernel. This issue is therefore obsolete and may be closed from my end.

Revision history for this message
Juerg Haefliger (juergh) wrote :

Will do, thanks for the response.

Changed in linux-raspi2 (Ubuntu Xenial):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.