Comment 57 for bug 177895

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi,

I will try to draw some clarity on this issue with respect to the rescheduling interrupts issue. I've been looking at various scheduler patches upstream and been devoting time to try to corner this issue. From my analysis, some level of rescheduling interrupts due to IPI events will occur on multi core machines to try and spread load across available cores. The scheduler will try and get full utilization across as many cores rather than let one core be so overloaded that it moves into higher P states. The rule of thumb is that it is better to have all the processes spread across lots of available cores running in low P states than one core running at a high P state.

I've looked at one scenario quite deeply. For example, with an idle system and amarok playing in the background one can see several hundred rescheduling interrupts, which superficially looks worrying - surely that's a lot of activity for an nearly idle system one may conclude. However, the reality is that there are hundreds of wakeups per second occurring and the scheduler is trying to keep the cores in the lowest C3 state as much as possible. For example, upstream patch 33b0c4217dcd67b788318c3192a2912b530e4eef superficially looks like a possible solution to reduce the scheduler over zealously causing rescheduling interrupts. After experimenting with this (and other patches too), one can observe that a reduction in rescheduling interrupts causes an overall increase residency in the C0 state and decrease in the low power C3 state - and this consumes more power overall.

For example, with an "idle" system just running Amarok playing a 128Kbit/s mp3 full screen with PowerTop 1.9 monitoring the system activity, I get:

Patched scheduler, 2.6.24-14:
   Rescheduling Interrupts/Sec: ~210-220
C0 residency 37%
C1 "" 0%
C2 "" 0%
C3 "" 63%

Unpatched scheduler, 2.6.24-14
   Rescheduling Interrupts/Sec: ~240-250
C0 residency 23%
C1 "" 0%
C2 "" 0%
C3 "" 77%

And a 3 Watts more power consumed with the patched kernel that tries to reduced the "Rescheduling Interrupts".

A lot of these "Rescheduling Interrupts" occur when one or more processes are tightly coupled, for example, Amarok updating it's graphics and X rendering them. With fairly idle system one will see IPI events (showing up as Rescheduling Interrupts) as one core triggers another to wake up, do something, and then it falls asleep again. I am fairly confident that a lot of the IPI events are because there are a lot of wakeups occurring in application space and one is now able to see this under PowerTop manifest with the new kernel because of the "alarming" level of Resheduling Interrupts. However, I do not believe the "Rescheduling Interrupts" is the problem - I think this is a distraction as it looks alarming - but in fact the scheduler is doing it's best when faced with a lot of wakeups from user space processes.

Barteq above has stated that he is seeing ~50K wakeup/sec and PowerTop is reporting ~2600 Rescheduling Interrupts/second and the system appears to be 97.7% in the lowest 800Mhz state consuming ~23 Watts. 50K wakeups/sec is not good - and most probably coming from the applications being run, however from the PowerTop data it is perplexing to see what the offending process is. Perhaps culling applications one by one and observing which one is causing all these wakeups is the next step in finding a rogue process that is causing the extra activity.

My current conclusion is do not concern oneself too much with the high levels of Rescheduling Interrupts but we need to focus on what is causing all the extraneous wakeup events.