Bug #1224324 “Cannot get NO_HZ_FULL to work” : Bugs : linaro-networking

Revision history for this message

Magnus Karlsson (magnus-karlsson) wrote on 2013-09-12:

#1

Config file Edit (61.2 KiB, text/plain)

Revision history for this message

Gary S. Robertson (gary-robertson) wrote on 2013-09-13: Re: [Bug 1224324] [NEW] Cannot get NO_HZ_FULL to work

#2

Download full text (4.0 KiB)

I can't see anything wrong with your configuration or boot command line.
It looks like this should work as advertised, so we will need to see why
you are getting this behavior. I will begin setting up to test this right
away. This may take a bit of time as the Linaro Networking Group is
relatively new and we are still in the process of staffing and getting our
infrastructure established - but rest assured we will be working on this
issue. Thanks for reporting this behavior.

Gary Robertson

On Thu, Sep 12, 2013 at 2:45 AM, Magnus Karlsson <email address hidden>wrote:

> Public bug reported:
>
> Hi,
>
> I am trying to get NO_HZ_FULL to work with linux-lng-preempt-
> rt-v3.10.10-rt7 and am failing miserably. What am I doing wrong?
>
> Boot options:
>
> setenv bootargs "isolcpus=1 nohz_full=1 rcu_nocbs=1 root=/dev/mmcblk1p2
> rw rootwait console=ttySAC2,115200n8 init --no-log"
>
> Config file attached.
>
> >From the boot up
>
> Preemptible hierarchical RCU implementation.
> Experimental no-CBs for all CPUs
> Experimental no-CBs CPUs: 0-1.
> NO_HZ: Full dynticks CPUs: 1.
>
> The ticks are off on core 1 when I start the system. But run any program
> on core 1, by having the code below in the C file, and I see the tick
> starting in cat /proc/interrupts:
>
> CPU_ZERO(&cpu_set);
> CPU_SET(1, &cpu_set);
> if (sched_setaffinity(0, sizeof(cpu_set_t), &cpu_set) == -1)
> {
> perror("sched_setaffinity");
> }
>
> A ps -e -L -o pid,psr,pcpu,command tells me that only kernel threads
> that are not movable are one core 1.
>
> PID PSR %CPU COMMAND
> 17 1 0.0 [migration/1]
> 18 1 0.0 [ksoftirqd/1]
> 19 1 0.0 [kworker/1:0]
> 20 1 0.0 [kworker/1:0H]
> 2628 1 0.0 [kworker/1:1]
>
> I can even make sure with "taskset" that all threads have an affinity
> mask of 1 (only core 0) when possible, but it does not help. As soon as
> I run anything on core 1, the tick starts. What am I doing wrong?
>
> Thank you: Magnus
>
> ** Affects: linaro-networking
> Importance: Undecided
> Status: New
>
> ** Attachment added: "Config file"
>
> https://bugs.launchpad.net/bugs/1224324/+attachment/3817188/+files/config
>
> --
> You received this bug notification because you are subscribed to linaro-
> networking.
> Matching subscriptions: lng-bugs
> https://bugs.launchpad.net/bugs/1224324
>
> Title:
> Cannot get NO_HZ_FULL to work
>
> Status in Linaro networking Group:
> New
>
> Bug description:
> Hi,
>
> I am trying to get NO_HZ_FULL to work with linux-lng-preempt-
> rt-v3.10.10-rt7 and am failing miserably. What am I doing wrong?
>
> Boot options:
>
> setenv bootargs "isolcpus=1 nohz_full=1 rcu_nocbs=1
> root=/dev/mmcblk1p2 rw rootwait console=ttySAC2,115200n8 init --no-
> log"
>
> Config file attached.
>
> From the boot up
>
> Preemptible hierarchical RCU implementation.
> Experimental no-CBs for all CPUs
> Experimental no-CBs CPUs: 0-1.
> NO_HZ: Full dynticks CPUs: 1.
>
> The ticks are off on core 1 when I start the system. But run any
> program on core 1, by having the code below in the C file, and I see
> the tick starting in cat /proc/interrupts:
>
...

I can't see anything wrong with your configuration or boot command line.
It looks like this should work as advertised, so we will need to see why
you are getting this behavior.  I will begin setting up to test this right
away.  This may take a bit of time as the Linaro Networking Group is
relatively new and we are still in the process of staffing and getting our
infrastructure established - but rest assured we will be working on this
issue.  Thanks for reporting this behavior.

Gary Robertson

On Thu, Sep 12, 2013 at 2:45 AM, Magnus Karlsson <magnus.karlsson@lsi.com>wrote:

> Public bug reported:
>
> Hi,
>
> I am trying to get NO_HZ_FULL to work with  linux-lng-preempt-
> rt-v3.10.10-rt7 and am failing miserably. What am I doing wrong?
>
> Boot options:
>
> setenv bootargs "isolcpus=1 nohz_full=1 rcu_nocbs=1 root=/dev/mmcblk1p2
> rw rootwait console=ttySAC2,115200n8 init --no-log"
>
> Config file attached.
>
> >From the boot up
>
> Preemptible hierarchical RCU implementation.
>         Experimental no-CBs for all CPUs
>         Experimental no-CBs CPUs: 0-1.
> NO_HZ: Full dynticks CPUs: 1.
>
> The ticks are off on core 1 when I start the system. But run any program
> on core 1, by having the code below in the C file, and I see the tick
> starting in cat /proc/interrupts:
>
> CPU_ZERO(&cpu_set);
>   CPU_SET(1, &cpu_set);
>   if (sched_setaffinity(0, sizeof(cpu_set_t), &cpu_set) == -1)
>   {
>       perror("sched_setaffinity");
>   }
>
> A ps -e -L -o pid,psr,pcpu,command tells me that only kernel threads
> that are not movable are one core 1.
>
>   PID PSR %CPU COMMAND
>   17   1  0.0 [migration/1]
>    18   1  0.0 [ksoftirqd/1]
>    19   1  0.0 [kworker/1:0]
>    20   1  0.0 [kworker/1:0H]
>  2628   1  0.0 [kworker/1:1]
>
> I can even make sure with "taskset" that all threads have an affinity
> mask of 1 (only core 0) when possible, but it does not help. As soon as
> I run anything on core 1, the tick starts. What am I doing wrong?
>
> Thank you: Magnus
>
> ** Affects: linaro-networking
>      Importance: Undecided
>          Status: New
>
> ** Attachment added: "Config file"
>
> https://bugs.launchpad.net/bugs/1224324/+attachment/3817188/+files/config
>
> --
> You received this bug notification because you are subscribed to linaro-
> networking.
> Matching subscriptions: lng-bugs
> https://bugs.launchpad.net/bugs/1224324
>
> Title:
>   Cannot get NO_HZ_FULL to work
>
> Status in Linaro networking Group:
>   New
>
> Bug description:
>   Hi,
>
>   I am trying to get NO_HZ_FULL to work with  linux-lng-preempt-
>   rt-v3.10.10-rt7 and am failing miserably. What am I doing wrong?
>
>   Boot options:
>
>   setenv bootargs "isolcpus=1 nohz_full=1 rcu_nocbs=1
>   root=/dev/mmcblk1p2 rw rootwait console=ttySAC2,115200n8 init --no-
>   log"
>
>   Config file attached.
>
>   From the boot up
>
>   Preemptible hierarchical RCU implementation.
>           Experimental no-CBs for all CPUs
>           Experimental no-CBs CPUs: 0-1.
>   NO_HZ: Full dynticks CPUs: 1.
>
>   The ticks are off on core 1 when I start the system. But run any
>   program on core 1, by having the code below in the C file, and I see
>   the tick starting in cat /proc/interrupts:
>
>   CPU_ZERO(&cpu_set);
>     CPU_SET(1, &cpu_set);
>     if (sched_setaffinity(0, sizeof(cpu_set_t), &cpu_set) == -1)
>     {
>         perror("sched_setaffinity");
>     }
>
>   A ps -e -L -o pid,psr,pcpu,command tells me that only kernel threads
>   that are not movable are one core 1.
>
>     PID PSR %CPU COMMAND
>     17   1  0.0 [migration/1]
>      18   1  0.0 [ksoftirqd/1]
>      19   1  0.0 [kworker/1:0]
>      20   1  0.0 [kworker/1:0H]
>    2628   1  0.0 [kworker/1:1]
>
>   I can even make sure with "taskset" that all threads have an affinity
>   mask of 1 (only core 0) when possible, but it does not help. As soon
>   as I run anything on core 1, the tick starts. What am I doing wrong?
>
>   Thank you: Magnus
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linaro-networking/+bug/1224324/+subscriptions
>

Revision history for this message

Gary S. Robertson (gary-robertson) wrote on 2013-09-13:

#3

Once again, I suspect this may be a CPU isolation problem. NO_HZ_FULL is turned off on any core as soon as more than one thread is being scheduled there. If the scheduler begins operating on this core as soon as soon as tasks are scheduled on the other core, and schedules the idle task in conjunction with your RT idle loop application - then NO_HZ_FULL operation would by design cease on core 1.

Mike Holmes (mike-holmes) on 2013-09-26

Changed in linaro-networking:
assignee:	nobody → viresh kumar (viresh.kumar)
importance:	Undecided → High

Revision history for this message

Mike Holmes (mike-holmes) wrote on 2013-09-26:

#4

Is this this an RT specific issue, or does it also occur on linux-lng ?
I think runing on the current RT head as well just be sure might be worth it -> linux-lng-preempt-rt

Revision history for this message

Gary S. Robertson (gary-robertson) wrote on 2013-09-26: Re: [Bug 1224324] Re: Cannot get NO_HZ_FULL to work

#5

Download full text (4.5 KiB)

Looking at the symptom report for this bug and also for bug
#1224318 Preempt_rt kernel enters idle loop even when there are processes
ready <https://bugs.launchpad.net/linaro-networking/+bug/1224318>

which Magnus also reported, it looks likely to me that both sets of
erroneous behavior may have the same root cause. The scheduler runs the
idle task on the CPU which was supposed to be isolated and running
NO_HZ_FULL. When this happens, NO_HZ_FULL operation will cease by design
on that CPU core because more than one thread is in the scheduler queue
there.
The critical issue is: why is the idle task running on that CPU core?
Either the single process running on that core is sleeping occasionally or
something is broken in CPU isolation or in the scheduler itself.

* It was stated in one of these two bug reports that the behavior was the
same on linux-lng as on linux-lng-preempt-rt, so I don't believe it is an
RT-specific behavior.*

Magnus mentions a high-priority busy loop running on he CPU which is
isolated and running NO_HZ_FULL. But if this busy loop makes library or
kernel calls which might sleep - for example during the measurement of
latency or the recording of those measurements - then I suspect this might
override the CPU isolation via a trip through the scheduler from the
aforementioned system or library call. If the single process running on a
NO_HZ_FULL core sleeps, I think the scheduler HAS to enter the idle task on
that core. Even a system or library call to read a timer may encounter a
mutex which might cause the process to sleep.

Without knowing how the busy loop process operates we can only speculate
about this. Crafting a process which works successfully with NO_HZ_FULL
may be surprisingly elusive. Ideally the busy loop process should
accumulate measurements in RAM without making any system calls to perform
or store the latency measurements - perhaps by sampling some timer hardware
directly or using a wait loop to determine when it could safely read a
timer count provided in shared memory by an RT process running at a
slightly higher priority on the other core. After some finite loop count
the busy loop would then write its accumulated measurements to disk and
terminate. Also I would suggest using an RT scheduling priority of 49 or
less, since threaded ISRs typically run at priority 50 or 51 as best as I
recall. This would mean the busy loop at priority 99 might actually defer
timer hardware interrupt servicing and thus distort reported measurements.

On Thu, Sep 26, 2013 at 7:36 AM, Mike Holmes <email address hidden> wrote:

> Is this this an RT specific issue, or does it also occur on linux-lng ?
> I think runing on the current RT head as well just be sure might be worth
> it -> linux-lng-preempt-rt
>
> --
> You received this bug notification because you are subscribed to linaro-
> networking.
> Matching subscriptions: lng-bugs
> https://bugs.launchpad.net/bugs/1224324
>
> Title:
> Cannot get NO_HZ_FULL to work
>
> Status in Linaro networking Group:
> New
>
> Bug description:
> Hi,
>
> I am trying to get NO_HZ_FULL to work with linux-lng-preempt-
> rt-v3.10.10-rt7 and am failing miserably. What am I...

Looking at the symptom report for this bug and also for bug
#1224318 Preempt_rt kernel enters idle loop even when there are processes
ready <https://bugs.launchpad.net/linaro-networking/+bug/1224318>

which Magnus also reported, it looks likely to me that both sets of
erroneous behavior may have the same root cause.  The scheduler runs the
idle task on the CPU which was supposed to be isolated and running
NO_HZ_FULL.  When this happens,  NO_HZ_FULL operation will cease by design
on that CPU core because more than one thread is in the scheduler queue
there.
The critical issue is: why is the idle task running on that CPU core?
Either the single process running on that core is sleeping occasionally or
something is broken in CPU isolation or in the scheduler itself.

* It was stated in one of these two bug reports that the behavior was the
same on linux-lng as on linux-lng-preempt-rt, so I don't believe it is an
RT-specific behavior.*

Magnus mentions a high-priority busy loop running on he CPU which is
isolated and running NO_HZ_FULL.  But if this busy loop makes library or
kernel calls which might sleep - for example during the measurement of
latency or the recording of those measurements - then I suspect this might
override the CPU isolation via a trip through the scheduler from the
aforementioned system or library call.  If the single process running on a
NO_HZ_FULL core sleeps, I think the scheduler HAS to enter the idle task on
that core.  Even a system or library call to read a timer may encounter a
mutex which might cause the process to sleep.

Without knowing how the busy loop process operates we can only speculate
about this.  Crafting a process which works successfully with NO_HZ_FULL
may be surprisingly elusive.  Ideally the  busy loop process should
accumulate measurements in RAM without making any system calls to perform
or store the latency measurements - perhaps by sampling some timer hardware
directly or using a wait loop to determine when it could safely read a
timer count provided in shared memory by an RT process running at a
slightly higher priority on the other core.  After some finite loop count
the busy loop would then write its accumulated measurements to disk and
terminate.  Also I would suggest using an RT scheduling priority of 49 or
less, since threaded ISRs typically run at priority 50 or 51 as best as I
recall.  This would mean the busy loop at priority 99 might actually defer
timer hardware interrupt servicing and thus distort reported measurements.

On Thu, Sep 26, 2013 at 7:36 AM, Mike Holmes <mike.holmes@linaro.org> wrote:

> Is this this an RT specific issue, or does it also occur on linux-lng ?
> I think runing on the current RT head as well just be sure might be worth
> it -> linux-lng-preempt-rt
>
> --
> You received this bug notification because you are subscribed to linaro-
> networking.
> Matching subscriptions: lng-bugs
> https://bugs.launchpad.net/bugs/1224324
>
> Title:
>   Cannot get NO_HZ_FULL to work
>
> Status in Linaro networking Group:
>   New
>
> Bug description:
>   Hi,
>
>   I am trying to get NO_HZ_FULL to work with  linux-lng-preempt-
>   rt-v3.10.10-rt7 and am failing miserably. What am I doing wrong?
>
>   Boot options:
>
>   setenv bootargs "isolcpus=1 nohz_full=1 rcu_nocbs=1
>   root=/dev/mmcblk1p2 rw rootwait console=ttySAC2,115200n8 init --no-
>   log"
>
>   Config file attached.
>
>   From the boot up
>
>   Preemptible hierarchical RCU implementation.
>           Experimental no-CBs for all CPUs
>           Experimental no-CBs CPUs: 0-1.
>   NO_HZ: Full dynticks CPUs: 1.
>
>   The ticks are off on core 1 when I start the system. But run any
>   program on core 1, by having the code below in the C file, and I see
>   the tick starting in cat /proc/interrupts:
>
>   CPU_ZERO(&cpu_set);
>     CPU_SET(1, &cpu_set);
>     if (sched_setaffinity(0, sizeof(cpu_set_t), &cpu_set) == -1)
>     {
>         perror("sched_setaffinity");
>     }
>
>   A ps -e -L -o pid,psr,pcpu,command tells me that only kernel threads
>   that are not movable are one core 1.
>
>     PID PSR %CPU COMMAND
>     17   1  0.0 [migration/1]
>      18   1  0.0 [ksoftirqd/1]
>      19   1  0.0 [kworker/1:0]
>      20   1  0.0 [kworker/1:0H]
>    2628   1  0.0 [kworker/1:1]
>
>   I can even make sure with "taskset" that all threads have an affinity
>   mask of 1 (only core 0) when possible, but it does not help. As soon
>   as I run anything on core 1, the tick starts. What am I doing wrong?
>
>   Thank you: Magnus
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linaro-networking/+bug/1224324/+subscriptions
>

Revision history for this message

Magnus Karlsson (magnus-karlsson) wrote on 2013-09-27:

#6

Gary,

Thanks for looking into this. I think you are correct in that 1224318 and this issue might have the same root cause. You can find the test application attached to that issue. As you can see there are no system calls at all in the loop. Time is measured by reading HW registers directly from user space. Also, the benchmark runs fine on 3.6 and 3.7.

/Magnus

Mike Holmes (mike-holmes) on 2013-11-11

Changed in linaro-networking:
status:	New → In Progress

Revision history for this message

Mike Holmes (mike-holmes) wrote on 2013-11-14:

#7

Viresh, do you have any updates on this bug, the last comment was 013-09-27, is this the same root cause, that referred bug was closed as fixed in a newer version.

Revision history for this message

viresh kumar (viresh.kumar) wrote on 2013-11-14:

#8

On Thursday 14 November 2013 09:51 PM, Mike Holmes wrote:
> Viresh, do you have any updates on this bug, the last comment was
> 013-09-27, is this the same root cause, that referred bug was closed as
> fixed in a newer version.
>

I was trying this bug today and then went into reading about cpusets... I am
currently working into it only and so you can expect a update soon..

Revision history for this message

viresh kumar (viresh.kumar) wrote on 2013-11-18:

#9

Download full text (6.4 KiB)

Hi Kevin,

I was trying this bug today on Arndale.. I was running v3.10.13 with following patches on the top:

cb5c69a ARM: Kconfig: allow full nohz CPU accounting
7296e0e nohz: Drop generic vtime obsolete dependency on CONFIG_64BIT
530b0fd vtime: Add HAVE_VIRT_CPU_ACCOUNTING_GEN Kconfig

I tried running your script given here:https://wiki.linaro.org/WorkingGroups/PowerManagement/Doc/AdaptiveTickless

(Removed ftrace stuff and removal of cpusets)..

I see following messages when I run that:

Cannot move PID 2: kthreadd
Cannot move PID 3: ksoftirqd/0
Cannot move PID 5: kworker/0:0H
Cannot move PID 6: kworker/u4:0
Cannot move PID 7: migration/0
Cannot move PID 17: migration/1
Cannot move PID 18: ksoftirqd/1
Cannot move PID 20: kworker/1:0H
Cannot move PID 21: khelper
Cannot move PID 23: netns
Cannot move PID 24: kworker/u4:1
Cannot move PID 198: writeback
Cannot move PID 200: bioset
Cannot move PID 202: kblockd
Cannot move PID 357: kworker/1:1
Cannot move PID 424: crypto
Cannot move PID 1128: dw-mci-card
Cannot move PID 1130: dw-mci-card
Cannot move PID 1158: kworker/1:2
Cannot move PID 1162: deferwq
Cannot move PID 1241: kworker/0:1H
Cannot move PID 1257: kworker/1:1H
Cannot move PID 1332: ext4-dio-unwrit
Cannot move PID 1909: kworker/0:2
Cannot move PID 1954: kworker/0:0

All these tasks couldn't be moved to "rt" group..
ps -aFd gave this:

UID PID PPID C root 2 root 3 root 5 root 6 root 7 root 8 root 9 root 10 root 11 root 12 root 13 root 14 root 15 root 16 root 17 root 18 root 20 root 21 root 22 root 23 root 24 root 198 root 200 root 202 2 ... SZ RSS PSR STIME TTY TIME CMD
0 0 0 0 0 1969 ? 00:00:00 [kthreadd]
2 0 0 0 0 1969 ? 00:00:00 [ksoftirqd/0]
2 0 0 0 0 1969 ? 00:00:00 [kworker/0:0H]
2 0 0 0 1 1969 ? 00:00:00 [kworker/u4:0]
2 0 0 0 0 1969 ? 00:00:00 [migration/0]
2 0 0 0 0 1969 ? 00:00:00 [rcu_preempt]
2 0 0 0 0 1969 ? 00:00:00 [rcuop/0]
2 0 0 0 0 1969 ? 00:00:00 [rcuop/1]
2 0 0 0 0 1969 ? 00:00:00 [rcu_bh]
2 0 0 0 0 1969 ? 00:00:00 [rcuob/0]
2 0 0 0 0 1969 ? 00:00:00 [rcuob/1]
2 0 0 0 0 1969 ? 00:00:00 [rcu_sched]
2 0 0 0 0 1969 ? 00:00:00 [rcuos/0]
2 0 0 0 0 1969 ? 00:00:00 [rcuos/1]
2 0 0 0 1 1969 ? 00:00:00 [migration/1]
2 0 0 0 1 1969 ? 00:00:00 [ksoftirqd/1]
2 0 0 0 1 1969 ? 00:00:00 [kworker/1:0H]
2 0 0 0 1 1969 ? 00:00:00 [khelper]
2 0 0 0 0 1969 ? 00:00:00 [kdevtmpfs]
2 0 0 0 1 1969 ? 00:00:00 [netns]
2 0 0 0 0 1969 ? 00:00:00 [kworker/u4:1]
2 0 0 0 1 1969 ? 00:00:00 [writeback]
2 0 0 0 1 1969 ? 00:00:00 [bioset]

Hi Kevin,

I was trying this bug today on Arndale.. I was running v3.10.13 with following patches on the top:

cb5c69a ARM: Kconfig: allow full nohz CPU accounting
7296e0e nohz: Drop generic vtime obsolete dependency on CONFIG_64BIT
530b0fd vtime: Add HAVE_VIRT_CPU_ACCOUNTING_GEN Kconfig

I tried running your script given here:https://wiki.linaro.org/WorkingGroups/PowerManagement/Doc/AdaptiveTickless

(Removed ftrace stuff and removal of cpusets)..

I see following messages  when I run that:

Cannot move PID 2: kthreadd
Cannot move PID 3: ksoftirqd/0
Cannot move PID 5: kworker/0:0H
Cannot move PID 6: kworker/u4:0
Cannot move PID 7: migration/0
Cannot move PID 17: migration/1
Cannot move PID 18: ksoftirqd/1
Cannot move PID 20: kworker/1:0H
Cannot move PID 21: khelper
Cannot move PID 23: netns
Cannot move PID 24: kworker/u4:1
Cannot move PID 198: writeback
Cannot move PID 200: bioset
Cannot move PID 202: kblockd
Cannot move PID 357: kworker/1:1
Cannot move PID 424: crypto
Cannot move PID 1128: dw-mci-card
Cannot move PID 1130: dw-mci-card
Cannot move PID 1158: kworker/1:2
Cannot move PID 1162: deferwq
Cannot move PID 1241: kworker/0:1H
Cannot move PID 1257: kworker/1:1H
Cannot move PID 1332: ext4-dio-unwrit
Cannot move PID 1909: kworker/0:2
Cannot move PID 1954: kworker/0:0

All these tasks couldn't be moved to "rt" group..
ps -aFd gave this:

UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
root         2     0  0     0     0   0  1969 ?        00:00:00 [kthreadd]
root         3     2  0     0     0   0  1969 ?        00:00:00 [ksoftirqd/0]
root         5     2  0     0     0   0  1969 ?        00:00:00 [kworker/0:0H]
root         6     2  0     0     0   1  1969 ?        00:00:00 [kworker/u4:0]
root         7     2  0     0     0   0  1969 ?        00:00:00 [migration/0]
root         8     2  0     0     0   0  1969 ?        00:00:00 [rcu_preempt]
root         9     2  0     0     0   0  1969 ?        00:00:00 [rcuop/0]
root        10     2  0     0     0   0  1969 ?        00:00:00 [rcuop/1]
root        11     2  0     0     0   0  1969 ?        00:00:00 [rcu_bh]
root        12     2  0     0     0   0  1969 ?        00:00:00 [rcuob/0]
root        13     2  0     0     0   0  1969 ?        00:00:00 [rcuob/1]
root        14     2  0     0     0   0  1969 ?        00:00:00 [rcu_sched]
root        15     2  0     0     0   0  1969 ?        00:00:00 [rcuos/0]
root        16     2  0     0     0   0  1969 ?        00:00:00 [rcuos/1]
root        17     2  0     0     0   1  1969 ?        00:00:00 [migration/1]
root        18     2  0     0     0   1  1969 ?        00:00:00 [ksoftirqd/1]
root        20     2  0     0     0   1  1969 ?        00:00:00 [kworker/1:0H]
root        21     2  0     0     0   1  1969 ?        00:00:00 [khelper]
root        22     2  0     0     0   0  1969 ?        00:00:00 [kdevtmpfs]
root        23     2  0     0     0   1  1969 ?        00:00:00 [netns]
root        24     2  0     0     0   0  1969 ?        00:00:00 [kworker/u4:1]
root       198     2  0     0     0   1  1969 ?        00:00:00 [writeback]
root       200     2  0     0     0   1  1969 ?        00:00:00 [bioset]
root       202     2  0     0     0   1  1969 ?        00:00:00 [kblockd]
root       226     2  0     0     0   1  1969 ?        00:00:00 [khubd]
root       357     2  0     0     0   1  1969 ?        00:00:00 [kworker/1:1]
root       362     2  0     0     0   0  1969 ?        00:00:00 [khungtaskd]
root       363     2  0     0     0   1  1969 ?        00:00:00 [kswapd0]
root       409     2  0     0     0   1  1969 ?        00:00:00 [fsnotify_mark]
root       424     2  0     0     0   1  1969 ?        00:00:00 [crypto]
root      1128     2  0     0     0   1 00:00 ?        00:00:00 [dw-mci-card]
root      1130     2  0     0     0   1 00:00 ?        00:00:00 [dw-mci-card]
root      1134     2  0     0     0   0 00:00 ?        00:00:00 [mmcqd/0]
root      1135     2  0     0     0   0 00:00 ?        00:00:00 [mmcqd/0boot0]
root      1136     2  0     0     0   1 00:00 ?        00:00:00 [mmcqd/0boot1]
root      1158     2  0     0     0   1 00:00 ?        00:00:00 [kworker/1:2]
root      1162     2  0     0     0   1 00:00 ?        00:00:00 [deferwq]
root      1167     2  0     0     0   0 00:00 ?        00:00:01 [mmcqd/1]
root      1241     2  0     0     0   0 00:00 ?        00:00:00 [kworker/0:1H]
root      1257     2  0     0     0   1 00:00 ?        00:00:00 [kworker/1:1H]
root      1331     2  0     0     0   0 00:00 ?        00:00:00 [jbd2/mmcblk1p3-
root      1332     2  0     0     0   0 00:00 ?        00:00:00 [ext4-dio-unwrit
root      1372     1  0   482   396   1 00:00 ?        00:00:00 upstart-file-bri
root      1427     1  0   557   828   1 00:00 ?        00:00:00 upstart-udev-bri
syslog    1561     1  0  7560  1384   0 00:00 ?        00:00:00 rsyslogd -c5
root      1571  1431  0   581   672   1 00:00 ?        00:00:00 /sbin/udevd --da
root      1572  1431  0   581   672   1 00:00 ?        00:00:00 /sbin/udevd --da
root      1691     1  0   546   672   1 00:00 ?        00:00:00 upstart-socket-b
root      1763     1  0   448   692   1 00:02 tty4     00:00:00 /sbin/getty -8 3
root      1766     1  0   448   692   0 00:02 tty5     00:00:00 /sbin/getty -8 3
root      1768     1  0   731  1180   0 00:02 ttySAC2  00:00:00 /bin/login -f   
root      1789     1  0   448   692   0 00:02 tty2     00:00:00 /sbin/getty -8 3
root      1790     1  0   448   692   1 00:02 tty3     00:00:00 /sbin/getty -8 3
root      1796     1  0   448   692   1 00:02 tty6     00:00:00 /sbin/getty -8 3
root      1852  1768  0  1089  1732   0 00:02 ttySAC2  00:00:02 -bash
root      1909     2  0     0     0   0 00:02 ?        00:00:01 [kworker/0:2]
root      1910     1  0   731  1180   0 00:02 tty1     00:00:00 /bin/login -f   
root      1925  1910  0  1068  1612   1 00:02 tty1     00:00:00 -bash
root      1954     2  0     0     0   0 00:07 ?        00:00:00 [kworker/0:0]
root      2405  1852  0  1039  1024   0 00:30 ttySAC2  00:00:00 ps -aFd

When I tried to move task 1925 (bash) to gp group, it didn't shown a error but never moved it there as well.. for some other tasks like 23(netns), I got following error: 
-bash: echo: write error: Invalid argument

So, when I am running my terminal on gp, then the arch timer for CPU1 doesn't show any update in number. But as soon as I move my terminal to rt, arch timer count starts increasing..

So, it looks as if there are other tasks running on CPU1. How can we know about them? which are they?

Meanwhile I will try 3.12 as well..

Revision history for this message

viresh kumar (viresh.kumar) wrote on 2013-11-18:

#10

Kevin,

Same behavior observed on today's Linus/master:

2d3c627 Revert "init/Kconfig: add option to disable kernel compression"

..

But no additional patches were required now.. as all your patches are already in..

Revision history for this message

Kevin Hilman (khilman-deactivatedaccount) wrote on 2013-11-18:

#11

Viresh, it's normal that per-cpu threads do not get moved since they are pinned to CPUs. However, it's not expected that they run and get in the way. If you see those threads running, it would be useful to have a trace of the activity causing it.

Also, you wrote

> So, when I am running my terminal on gp, then the arch timer for CPU1 doesn't show any update in number.
> But as soon as I move my terminal to rt, arch timer count starts increasing..

Hmm, That's exactly what I expect to happen. Can you clarify what you're expecting vs what you're seeing?

The full NOHZ patches set does not itself *prevent* anything from running on specific CPUs. All it does allow the tick to be shut down when 1 (or less) tasks are running on a CPU. There is still a bunch of manual work to isolate a CPU using affinity/cpusets etc. in order to create the conditions for full NOHZ to work.

Also for linus/master, you'll need a couple of the debugfs patches to disable the 1Hz residual tick:

https://lkml.org/lkml/2013/9/16/499
https://lkml.org/lkml/2013/9/16/500

Revision history for this message

viresh kumar (viresh.kumar) wrote on 2013-11-19:

#12

On 19 November 2013 03:24, Kevin Hilman <email address hidden> wrote:
> Viresh, it's normal that per-cpu threads do not get moved since they are
> pinned to CPUs. However, it's not expected that they run and get in the
> way. If you see those threads running, it would be useful to have a
> trace of the activity causing it.

Okay..

>> So, when I am running my terminal on gp, then the arch timer for CPU1 doesn't show any update in number.
>> But as soon as I move my terminal to rt, arch timer count starts increasing..
>
> Hmm, That's exactly what I expect to happen. Can you clarify what
> you're expecting vs what you're seeing?

I though we have just moved a single thread there and so we shouldn't
have tick running..

But it looks like we have moved two threads there.. One is the terminal
and second is the 'ps' command that I have run :)

So, if I simply keep the terminal on CPU0 and somehow run only ps
on CPU1 tick shouldn't be running?

> The full NOHZ patches set does not itself *prevent* anything from
> running on specific CPUs. All it does allow the tick to be shut down
> when 1 (or less) tasks are running on a CPU. There is still a bunch of
> manual work to isolate a CPU using affinity/cpusets etc. in order to
> create the conditions for full NOHZ to work.

Affinity is already tied to CPU0 for most of the irqs, and I have used
cpusets as well..

> Also for linus/master, you'll need a couple of the debugfs patches to
> disable the 1Hz residual tick:
>
> https://lkml.org/lkml/2013/9/16/499
> https://lkml.org/lkml/2013/9/16/500

I see..

Revision history for this message

Kevin Hilman (khilman-deactivatedaccount) wrote on 2013-11-19:

#13

On Mon, Nov 18, 2013 at 7:51 PM, viresh kumar <email address hidden> wrote:
>
> I though we have just moved a single thread there and so we shouldn't
> have tick running..
>
> But it looks like we have moved two threads there.. One is the terminal
> and second is the 'ps' command that I have run :)

That's correct, using a shell as a test case is problematic because it
will spawn other processes as you run commands.

> So, if I simply keep the terminal on CPU0 and somehow run only ps
> on CPU1 tick shouldn't be running?

Correct. Either use a a utility like taskset with a single-threaded
test app, or use CPUsets like I do in my test script.

Revision history for this message

viresh kumar (viresh.kumar) wrote on 2013-11-20:

#14

my-nohz-test-cpuset.sh Edit (1.7 KiB, application/x-sh; name="my-nohz-test-cpuset.sh")

On 19 November 2013 21:07, Kevin Hilman <email address hidden> wrote:
> Correct. Either use a a utility like taskset with a single-threaded
> test app, or use CPUsets like I do in my test script.

Thanks for you help Kevin, I was able to shut down ticks for 30 seconds
on CPU1, with help of attached script (mostly like yours)

One more thing, when I run 'stress' with --cpu 1, I can see two threads
created for stress on my CPU. Why so?

Revision history for this message

viresh kumar (viresh.kumar) wrote on 2013-11-20:

#15

On 27 September 2013 12:32, Magnus Karlsson <email address hidden> wrote:
> Thanks for looking into this. I think you are correct in that 1224318
> and this issue might have the same root cause. You can find the test
> application attached to that issue. As you can see there are no system
> calls at all in the loop. Time is measured by reading HW registers
> directly from user space. Also, the benchmark runs fine on 3.6 and 3.7.

Hi Magnus,

I am able to get NO_HZ working on Mainline and there are some problems
in your setup I believe due to which you failed to get that working earlier..

Your bootargs:

setenv bootargs "isolcpus=1 nohz_full=1 rcu_nocbs=1
root=/dev/mmcblk1p2 rw rootwait console=ttySAC2,115200n8 init
--no-log"

First of all, its not recommended to use isolcpus anymore (as you already
know) and don't know how it will behave with NO_HZ.. Better is to use
cpusets instead..

Then, I wasn't required nohz_full=1 rcu_nocbs=1 as I had following in my
.config: CONFIG_NO_HZ_FULL_ALL=y

After booting the system I used cpusets to move all existing tasks to CPU0
and then run 'stress --cpu 1' (1 here is no. of threads and not CPU) on CPU1.

When I use max deferment for tick I am able to shut up tick for almost 30
seconds and without it, by default it is set to 1 HZ and so gets a tick every
second..

Can you please try the script which I already shared on bug tracker:
my-nohz-test-cpuset.sh

You need to run on mainline for that + plus the patches that Kevin suggested.

To make it easy for you to reproduce it I have pushed my branch here:
https://git.linaro.org/gitweb?p=people/vireshk/mylinux.git;a=shortlog;h=refs/heads/nohz-working

This contains defconfig updates as well which are required to get exact
setup.. Just use exynos_defconfig and it should work..

Let me know if you have any more issues with this stuff..

Changed in linaro-networking:
status:	In Progress → Invalid

Revision history for this message

Magnus Karlsson (magnus-karlsson) wrote on 2013-11-20:

#16

Thanks Viresh. I will try this out.

Revision history for this message

Kevin Hilman (khilman-deactivatedaccount) wrote on 2013-11-20:

#17

viresh kumar <email address hidden> writes:

> One more thing, when I run 'stress' with --cpu 1, I can see two threads
> created for stress on my CPU. Why so?

Because of the CPUset method you're using, I suspect the shell itself is
the second task you're seeing. If you can generated/send a trace, I
could tell you for sure.

Kevin

Revision history for this message

viresh kumar (viresh.kumar) wrote on 2013-11-26:

#18

On 20 November 2013 21:19, Kevin Hilman <email address hidden> wrote:
> viresh kumar <email address hidden> writes:
>
>> One more thing, when I run 'stress' with --cpu 1, I can see two threads
>> created for stress on my CPU. Why so?
>
> Because of the CPUset method you're using, I suspect the shell itself is
> the second task you're seeing. If you can generated/send a trace, I
> could tell you for sure.

So, this is what I get normally on shell:

root@linaro-developer:/home/linaro# stress -q --cpu 1 --timeout 2000 &
[1] 21078

root@linaro-developer:/home/linaro# ps
PID TTY TIME CMD
1782 ttySAC2 00:00:00 login
1874 ttySAC2 00:00:00 bash
21078 ttySAC2 00:00:00 stress
21079 ttySAC2 00:00:01 stress
21080 ttySAC2 00:00:00 ps
root@linaro-developer:/home/linaro#

See, two tasks for stress ??

Revision history for this message

Kevin Hilman (khilman-deactivatedaccount) wrote on 2013-12-02:

#19

viresh kumar <email address hidden> writes:

> On 20 November 2013 21:19, Kevin Hilman <email address hidden> wrote:
>> viresh kumar <email address hidden> writes:
>>
>>> One more thing, when I run 'stress' with --cpu 1, I can see two threads
>>> created for stress on my CPU. Why so?
>>
>> Because of the CPUset method you're using, I suspect the shell itself is
>> the second task you're seeing. If you can generated/send a trace, I
>> could tell you for sure.
>
> So, this is what I get normally on shell:
>
> root@linaro-developer:/home/linaro# stress -q --cpu 1 --timeout 2000 &
> [1] 21078
>
> root@linaro-developer:/home/linaro# ps
> PID TTY TIME CMD
> 1782 ttySAC2 00:00:00 login
> 1874 ttySAC2 00:00:00 bash
> 21078 ttySAC2 00:00:00 stress
> 21079 ttySAC2 00:00:01 stress
> 21080 ttySAC2 00:00:00 ps
> root@linaro-developer:/home/linaro#
>
> See, two tasks for stress ??

Sure, one of them is probably a parent waiting for a child. That
doesn't mean 2 threads are active at the same time.

Revision history for this message

viresh kumar (viresh.kumar) wrote on 2013-12-03:

#20

On 2 December 2013 23:55, Kevin Hilman <email address hidden> wrote:
> Sure, one of them is probably a parent waiting for a child. That
> doesn't mean 2 threads are active at the same time.

I was sure that only one thread is running and so CPU is isolated but
wasn't sure how 'stress' actually works, i.e. parent process starts a
child one.. Thanks.

linaro-networking

Cannot get NO_HZ_FULL to work

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches