NOHZ: local_softirq_pending 40 during download of kvm.qcow2.gz

Bug #1243882 reported by Mike Holmes
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linaro-networking
Fix Released
Medium
viresh kumar
Revision history for this message
Mike Holmes (mike-holmes) wrote :
Changed in linaro-networking:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Kim Phillips (kim-phillips) wrote :

we're thinking of adding a wget-a-large-file-and-make-sure-nothing-shows-up-in-dmesg, but this is lower priority vs. getting and seeing whether this is more appropriate in doing in the netperf tests, once they get in the CI loop.

Revision history for this message
Mike Holmes (mike-holmes) wrote :
Changed in linaro-networking:
assignee: nobody → viresh kumar (viresh.kumar)
importance: Medium → High
Revision history for this message
viresh kumar (viresh.kumar) wrote :

Mike: Have you tried this on a local board as well? Would be helpful if you can write down all the minute steps required to be done in order to reproduce it..

Revision history for this message
Kim Phillips (kim-phillips) wrote :

Viresh, I had tried this once on my local board & network setup, and failed to reproduce (but I didn't try that hard either).

The 'minute steps' to reproduce can be found in the job definition files pointed to by the validation links above.

Revision history for this message
viresh kumar (viresh.kumar) wrote :

This is what I found about such messages:

The NOHZ patch contains a check for softirqs pending when a CPU goes
idle. The BUG is unrelated to NOHZ, it just was made visible by the NOHZ
patch.

http://www.kubuntuforums.net/showthread.php?49223-Kernel-error-messsage-NOHZ-local_softirq_pending-100
https://groups.google.com/forum/#!topic/linux.kernel/YeH1CTNT9vU

Revision history for this message
viresh kumar (viresh.kumar) wrote :

This is what I did to reproduce it:

Download and flash following to my arndale uSD card:
"hwpack": "https://snapshots.linaro.org/kernel-hwpack/linux-lng-le-preempt-rt-arndale/129/hwpack_linaro-arndale_20131023-0042_b129_armhf_supported.tar.gz"

"rootfs": "http://snapshots.linaro.org/ubuntu/images/developer/531/linaro-raring-developer-20131023-531.tar.gz"

Simply booted my board and ran following:

wget --no-check-certificate http://snapshots.linaro.org/kernel-hwpack/linux-vexpress-kvm/linux-vexpress-kvm/kvm.qcow2.gz

Found following in dmesg:

[ 255.947002] NOHZ: local_softirq_pending 40
[ 267.074798] NOHZ: local_softirq_pending 40
[ 271.830864] NOHZ: local_softirq_pending 40
[ 273.179807] NOHZ: local_softirq_pending 40
[ 289.498922] NOHZ: local_softirq_pending 40
[ 303.775357] NOHZ: local_softirq_pending 40
[ 303.787284] NOHZ: local_softirq_pending 40
[ 358.424332] NOHZ: local_softirq_pending 40
[ 366.681314] NOHZ: local_softirq_pending 40
[ 369.909316] NOHZ: local_softirq_pending 40

So, yes I was able to reproduce it.

But there was no problem what so ever with the download..

Revision history for this message
Kim Phillips (kim-phillips) wrote :

sounds like NOHZ made a bug visible somewhere in drivers/net/usb/?

Changed in linaro-networking:
importance: High → Medium
Revision history for this message
Kevin Hilman (khilman-deactivatedaccount) wrote :

Viresh, did you actually enable the full NO_HZ feature the kernel when you were testing this? Or is this seen on a normal kernel boot also?

Revision history for this message
Kevin Hilman (khilman-deactivatedaccount) wrote :

This warning is happening because there is a tasklet pending (TASKLET_SOFTIRQ) when the tick is about to be shut down. This is harmless because the fact that there's a pending tasklet means that the tick does not get shut off and the tasklet runs.

At this point, I'm not sure why this would happen more on a full NO_HZ kernel than on a normal kernel, other than the fact that it's probably trying to shut the tick off more often on the full NO_HZ core(s).

Revision history for this message
viresh kumar (viresh.kumar) wrote :

Kevin: I have tried the image directly from: https://snapshots.linaro.org/kernel-hwpack/linux-lng-le-preempt-rt-arndale/129/
And the .config has:

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

So, NO_HZ_FULL isn't enabled but NO_HZ_IDLE is. So, this means the core went into idle with a tasklet pending? Because probably in that case only we are going to disable tick, isn't it?

Revision history for this message
Mike Holmes (mike-holmes) wrote : Re: [Bug 1243882] Re: NOHZ: local_softirq_pending 40 during download of kvm.qcow2.gz

The latest LTP RT test runs also have several of the softirq pending
messages if that is relevant.

http://validation.linaro.org/scheduler/job/85420/log_file#L_21_460

rtmigrate1_min measurement:39 units:us result:skip
 rtmigrate1_avg measurement:3664 units:us result:skip
 rtmigrate2_max measurement:80 units:us result:skip
 rtmigrate2_min measurement:41 units:us result:skip
 rtmigrate2_avg measurement:54 units:us result:skip
 [ 1163.110712] NOHZ: local_softirq_pending 182
 [ 1163.110826] NOHZ: local_softirq_pending 82
 [ 1164.094509] NOHZ: local_softirq_pending 182
 [ 1164.094591] NOHZ: local_softirq_pending 82
 [ 1165.078506] NOHZ: local_softirq_pending 82
 [ 1165.078509] NOHZ: local_softirq_pending 182
 [ 1166.062504] NOHZ: local_softirq_pending 182
 [ 1166.062506] NOHZ: local_softirq_pending 82
 [ 1166.066025] NOHZ: local_softirq_pending 182
 [ 1166.066028] NOHZ: local_softirq_pending 82

On 13 November 2013 00:02, viresh kumar <email address hidden> wrote:

> Kevin: I have tried the image directly from:
> https://snapshots.linaro.org/kernel-hwpack/linux-lng-le-preempt-rt-arndale/129/
> And the .config has:
>
> #
> # Timers subsystem
> #
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ_FULL is not set
> CONFIG_NO_HZ=y
> CONFIG_HIGH_RES_TIMERS=y
>
>
> So, NO_HZ_FULL isn't enabled but NO_HZ_IDLE is. So, this means the core
> went into idle with a tasklet pending? Because probably in that case only
> we are going to disable tick, isn't it?
>
> --
> You received this bug notification because you are a member of Linaro
> Networking Group, which is subscribed to linaro-networking.
> Matching subscriptions: LNG all
> https://bugs.launchpad.net/bugs/1243882
>
> Title:
> NOHZ: local_softirq_pending 40 during download of kvm.qcow2.gz
>
> Status in Linaro networking Group:
> Confirmed
>
> Bug description:
> http://validation.linaro.org/scheduler/job/80934/log_file#L_51_509
> appears to dissrupt the kvm.qcow2.gz transfer during 4 fresh
> lava_shell runs, see also
>
> http://validation.linaro.org/scheduler/job/80934/log_file#L_67_511
> http://validation.linaro.org/scheduler/job/80934/log_file#L_19_523
> http://validation.linaro.org/scheduler/job/80934/log_file#L_35_510
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linaro-networking/+bug/1243882/+subscriptions
>

Revision history for this message
Kevin Hilman (khilman-deactivatedaccount) wrote :

Viresh, the core did *not* go tickless with a tasklet pending. The warning means that a pending tasklet was detected, so it failed to turn off the tick.

Revision history for this message
viresh kumar (viresh.kumar) wrote :

kevin: Yeah I understood it that way only but wrote it badly :)

But what should we do with this bug now. Yes its not affecting our work but it still shouldn't come, isn't it? I have seen it being reported multiple times with older kernels and people actually got it solved with some fixes.

@Mike: Anyway this happens with 3.10.*, do we really care a lot about that version? What about the new versions?

Revision history for this message
Anders Roxell (aroxell) wrote :
Revision history for this message
Kevin Hilman (khilman-deactivatedaccount) wrote :

@Viresh: Can you reproduce this with a mainline kernel on the same hardware under the same load?

Revision history for this message
viresh kumar (viresh.kumar) wrote :

Kevin,

I tried it on today's Linus/master:

2d3c627 Revert "init/Kconfig: add option to disable kernel compression"

And couldn't reproduce it.. So that's almost v3.13-rc1

Revision history for this message
Kevin Hilman (khilman-deactivatedaccount) wrote :

viresh kumar <email address hidden> writes:

> I tried it on today's Linus/master:
>
> 2d3c627 Revert "init/Kconfig: add option to disable kernel compression"
>
> And couldn't reproduce it.. So that's almost v3.13-rc1

OK, so it doesn't happen in mainline and doesn't happen without full
NOHZ which means it's something in the LNG kernels that's triggering
this.

Not sure how else I can help here. Someone in the LNG team needs to
bisect this.

Revision history for this message
viresh kumar (viresh.kumar) wrote :

Mike: Do you want me to spend time on this ? Or maybe wait for 3.13 release and see if it is still present there?

Revision history for this message
Mike Holmes (mike-holmes) wrote :

We can review closing it in the weekly bug call given it is fixed in 3.13
and does not affect performance.
I think 3.10 is important to LNG members and we will be using it on two
platforms for the foreseeable future, so spending a little time to figure
out if it is an easy bisect on 3.10 makes sense.

On 20 November 2013 00:44, viresh kumar <email address hidden> wrote:

> Mike: Do you want me to spend time on this ? Or maybe wait for 3.13
> release and see if it is still present there?
>
> --
> You received this bug notification because you are a member of Linaro
> Networking Group, which is subscribed to linaro-networking.
> Matching subscriptions: LNG all
> https://bugs.launchpad.net/bugs/1243882
>
> Title:
> NOHZ: local_softirq_pending 40 during download of kvm.qcow2.gz
>
> Status in Linaro networking Group:
> Confirmed
>
> Bug description:
> http://validation.linaro.org/scheduler/job/80934/log_file#L_51_509
> appears to dissrupt the kvm.qcow2.gz transfer during 4 fresh
> lava_shell runs, see also
>
> http://validation.linaro.org/scheduler/job/80934/log_file#L_67_511
> http://validation.linaro.org/scheduler/job/80934/log_file#L_19_523
> http://validation.linaro.org/scheduler/job/80934/log_file#L_35_510
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linaro-networking/+bug/1243882/+subscriptions
>

Revision history for this message
viresh kumar (viresh.kumar) wrote :

On Wednesday 20 November 2013 06:29 PM, Mike Holmes wrote:
> We can review closing it in the weekly bug call given it is fixed in 3.13
> and does not affect performance.
> I think 3.10 is important to LNG members and we will be using it on two
> platforms for the foreseeable future, so spending a little time to figure
> out if it is an easy bisect on 3.10 makes sense.

Will try..

Revision history for this message
Mike Holmes (mike-holmes) wrote :

This is fixed upstream, closing

Changed in linaro-networking:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.