Preempt_rt kernel enters idle loop even when there are processes ready

Bug #1224318 reported by Magnus Karlsson
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linaro-networking
Invalid
High
viresh kumar

Bug Description

Hi,

I am examining the core isolation properties between 3.6/3.7 and 3.10 with preempt_rt and for some reason the numbers are much worse for the 3.10 preempt_rt kernel (LNG version). This is the experiment. I boot up Linux with the following boot options.

setenv bootargs "isolcpus=1 nohz_full=1 rcu_nocbs=1 root=/dev/mmcblk1p2 rw rootwait console=ttySAC2,115200n8 init --no-log"

Linux config options can be found in the attached file (or so I thought. I seem only to be able to attach one single file. How to attach multiple?). I run a heavy load on core 0 and only a busy loop on core 1 where I measure how long it takes to traverse the loop. Usually a loop is very quick, but once in a while there is a tick or some other disturbance and the time it takes to traverse the loop becomes much longer. I am only interested in the maximum latency for traversing the loop when measured over a minute or so.

In 3.6/3.7 the max latency was 15 usec, but with 3.10 it is 10 times as much 150 usec. If you take a look at the function trace log attached (filtered to only show core 1), it seems like core 1 goes into idle even though I am executing a busy loop on highest real time priority, locked to core 1, SCHED_FIFO and there is nothing else on that core. It should get 100% of that core. But if I do a top, I see that it gets around 95% and sometimes even less than that. It is also dependent on the load I run on core 0. If I do not load it, the numbers get slightly better. This did not happen in 3.6/3.7. I am currently running linux-lng-preempt-rt-v3.10.10-rt7. When I was running the 3.10.6-rt3 this behavior was even worse. As soon as I started to load core 0, core 1 went into idle for long periods of time for some reason. It is much better in 3.10.10-rt7, but still no cigar. Problems with the RCU implementation in preempt_rt? It would be great if you could take a look at this, since the latencies are way too high at the moment.

In 3.6/3.7, only the tick has an impact on the max latency of the loop.

BTW, I have tried without NO_HZ_FULL in the kernel with the same results.

Thanks: Magnus

Revision history for this message
Magnus Karlsson (magnus-karlsson) wrote :
Revision history for this message
Magnus Karlsson (magnus-karlsson) wrote :

Added config file.

Revision history for this message
Gary S. Robertson (gary-robertson) wrote :

Since this occurs with or without NO_HZ_FULL configured in, it actually sounds more like a CPU isolation failure than a NO_HZ_FULL failure. It rather sounds as though the scheduler isn't fully honoring the 'isolcpus=1' boot command line option, and is causing the idle task to run on the 'isolated' CPU since no other tasks are 'scheduled' there. But this is just conjecture until we can do some actual testing.

Revision history for this message
viresh kumar (viresh.kumar) wrote :

Magnus,

I hope the testcase would be similar if this happens on our non-RT kernel as well.. (i.e. linux-lng)..
I don't thing there should be any difference here with or with RT support.. as isolcpus should work
for you busy task..

Can you give it a try on non-RT kernel? That will enable us to separate the offending piece of code..

Revision history for this message
viresh kumar (viresh.kumar) wrote :

Magnus,

Can you provide us your test scripts that you run... Also what hardware you used for your tests? Exynos? and maybe any other detail that you think is important for reproducing this issue... Probably all the minute steps to follow ?

Revision history for this message
Magnus Karlsson (magnus-karlsson) wrote :

Viresh,

Here is the latency benchmark that was used.

Revision history for this message
Magnus Karlsson (magnus-karlsson) wrote :

Here is the configuration.

I am running on an Arndale board at 1.4 GHz. Boot cmdline options in previous posting. Once up and running, I load core 0 with:

find /usr -type f -exec scp -q \{\} mkarlsso@10.0.0.1:/dev/null \; &

But you can probably skip this as the number will be bad even without it.

Then I launch the latency benchmark with ./latency. A max latency around 15 us is expected from 3.6 and 3.7. ignore the first run though. On 3.10 I get nearly 5 times as much.

I am using linux-linaro-lng-preempt-rt-3.10.6-2013.08.

BTW, this might not have anything to do with preempt_rt as I can get bad numbers with the regular preempt in linux-linaro-lng-3.10.6-2013.08 too.

Let me know if you need more information.

/Magnus

Revision history for this message
Magnus Karlsson (magnus-karlsson) wrote :

Forgot to add that you have to launch the latency application on core 1 by using e.g., cpuset file system or taskset. Or you could add this into the benchmark itself:

  cpu_set_t cpu_set;

  CPU_ZERO(&cpu_set);
  CPU_SET(1, &cpu_set);
  if (sched_setaffinity(0, sizeof(cpu_set_t), &cpu_set) == -1)
  {
      perror("sched_setaffinity");
  }

Revision history for this message
Kevin Hilman (khilman-deactivatedaccount) wrote : Re: [Bug 1224318] Re: Preempt_rt kernel enters idle loop even when there are processes ready

Magnus Karlsson <email address hidden> writes:

> Here is the configuration.
>
> I am running on an Arndale board at 1.4 GHz. Boot cmdline options in
> previous posting. Once up and running, I load core 0 with:
>
> find /usr -type f -exec scp -q \{\} mkarlsso@10.0.0.1:/dev/null \; &
>
> But you can probably skip this as the number will be bad even without
> it.
>
> Then I launch the latency benchmark with ./latency. A max latency around
> 15 us is expected from 3.6 and 3.7. ignore the first run though. On 3.10
> I get nearly 5 times as much.
>
> I am using linux-linaro-lng-preempt-rt-3.10.6-2013.08.
>
> BTW, this might not have anything to do with preempt_rt as I can get bad
> numbers with the regular preempt in linux-linaro-lng-3.10.6-2013.08 too.

Can you reproduce against mainline?

Kevin
<>

Revision history for this message
Magnus Karlsson (magnus-karlsson) wrote :

I can reproduce this in mainline. Tried tip of linux-3.10.y (3.10.13) from linux.org. Problem is still there. I have attached the function trace. Search for the "Max latency" marker that I print out when the latency in the loop is above 500 us in the run. If you go back in time in the trace, you can see that the "latency" app is context switched in from the idle thread on core 1.

Note that I am running with the standard preempt option (not the preempt_full patch or the server version option).

Revision history for this message
Magnus Karlsson (magnus-karlsson) wrote :

Here is the config.

Revision history for this message
Mike Holmes (mike-holmes) wrote :

The latency bechmark is being is added to CI, pending B&B supporting out of tree kernel module builds.

Changed in linaro-networking:
importance: Undecided → Medium
Changed in linaro-networking:
importance: Medium → High
Changed in linaro-networking:
assignee: nobody → viresh kumar (viresh.kumar)
Revision history for this message
Magnus Karlsson (magnus-karlsson) wrote :

I cannot reproduce this in latest kernel, so I am going to close it. Do not know exactly what fixed it.

Changed in linaro-networking:
status: New → Invalid
Revision history for this message
viresh kumar (viresh.kumar) wrote :

Magnus,

I tried the cpuset code you talked about earlier, but with that my task isn't actually attached to cpu 1. I would be attaching my updated code as well..

I traced it using ps -aF with running latency in background.. And on a number of occasions it went to cpu 0.

Then I used `taskset -c 1 ./latency` and it worked without any issues and so task is sticking to cpu1..

Revision history for this message
viresh kumar (viresh.kumar) wrote :

The numbers I am getting on 3.10.13, with top most commit:
cff43fc Linux 3.10.13

are:

maxrange=16301 cycles = ~12 us.

And so I am unable to reproduce bug with the kernel where Magnus has reported it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.