no_hz full isolation benchmark fails

Bug #1270873 reported by Mike Holmes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linaro-networking
Fix Released
Medium
viresh kumar

Bug Description

https://validation.linaro.org/scheduler/job/98097/log_file#L_21_482

The latest no_hz isolation updates which use hotplug break regression, it is thought that an Arndale bug in hotplug is the issue

Changed in linaro-networking:
status: New → Confirmed
assignee: nobody → viresh kumar (viresh.kumar)
Revision history for this message
Mike Holmes (mike-holmes) wrote :

Patch sent to the linaro-networking list by Viresh, Gary to try it out.

Revision history for this message
Mike Holmes (mike-holmes) wrote :

This is really wanted in time for the results to contribute to LCA14 presentations.

Changed in linaro-networking:
importance: Undecided → Medium
Revision history for this message
Gary S. Robertson (gary-robertson) wrote :

I'm afraid I should have been adding comments to this bug rather than #1260397 Latest NO_HZ_FULL patches cause CPU stalls & spinlock deadlocks in 3.10 PREEMPT_RT kernel ... I have been working mostly with the 3.12 kernels and the latest NO_HZ_FULL patches for those. However the non-RT kernels (both 3.10 and 3.12) look as though they may be acheiving satisfactory CPU isolation and NO_HZ_FULL operation, but the LAVA test shell becomes unresponsive under those conditons. For some reason it does not time out as intended and return an 'indefinite isolation' result. Instead it just stops communicating and LAVA subsequently times out and returns with no test results.

The 3.12.10-rt15 kernel on my last run returned quickly with isolation durations which were all zero seconds in length.
The 3.10.27-rt25 kernel appears to have actually crasshed during this test sometime after hot-unplugging CPU 1.

Have temporarily suspended further testing on this pending completion of preparations for the RT presentation at LCA14.

Revision history for this message
viresh kumar (viresh.kumar) wrote :

I need a hardware package with the kernel changes we are talking about, to test it myself in LAVA.. Can you please arrange for one?
Also please make sure you have latest copy of: is-cpu-isolated.sh script.

Revision history for this message
Gary S. Robertson (gary-robertson) wrote : Re: [Bug 1270873] Re: no_hz full isolation benchmark fails

I am set up to give you either a hardware package alone or a complete SD
image including the appropriate OE filesystem. Our tests have been using
the OE FS rather than Ubuntu. Let me know which you prefer - or I can send
both if you like.

On Fri, Feb 14, 2014 at 12:30 AM, viresh kumar <email address hidden>wrote:

> I need a hardware package with the kernel changes we are talking about, to
> test it myself in LAVA.. Can you please arrange for one?
> Also please make sure you have latest copy of: is-cpu-isolated.sh script.
>
> --
> You received this bug notification because you are subscribed to linaro-
> networking.
> Matching subscriptions: LNG all, lng-bugs
> https://bugs.launchpad.net/bugs/1270873
>
> Title:
> no_hz full isolation benchmark fails
>
> Status in Linaro networking Group:
> Confirmed
>
> Bug description:
> https://validation.linaro.org/scheduler/job/98097/log_file#L_21_482
>
> The latest no_hz isolation updates which use hotplug break regression,
> it is thought that an Arndale bug in hotplug is the issue
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linaro-networking/+bug/1270873/+subscriptions
>

Revision history for this message
viresh kumar (viresh.kumar) wrote :
Revision history for this message
Gary S. Robertson (gary-robertson) wrote :

Viresh,

I uploaded the binaries overnight to Google Drive and just shared them with
you prior to seeing the email above. For the links you requested, look at
the definitions for LAVA jobs 110505 and 110506 on validation.linaro.org.
They are referencing the same hardware packs stored on my
people.linaro.orgpublic_html directory. I will leave these binaries
there for your access
for at least a week... longer if you need them.

On Fri, Feb 14, 2014 at 2:59 AM, viresh kumar <email address hidden>wrote:

> I need a web link of hwpacks which can be passed in my json like this:
>
> "parameters": {
> "hwpack": "
> http://snapshots.linaro.org/kernel-hwpack/linux-lng-le-arndale/143/hwpack_linaro-arndale_20131126-1222_b143_armhf_supported.tar.gz
> ",
> "rootfs": "
> http://snapshots.linaro.org/ubuntu/images/developer/565/linaro-raring-developer-20131127-565.tar.gz
> "
> },
>
> --
> You received this bug notification because you are subscribed to linaro-
> networking.
> Matching subscriptions: LNG all, lng-bugs
> https://bugs.launchpad.net/bugs/1270873
>
> Title:
> no_hz full isolation benchmark fails
>
> Status in Linaro networking Group:
> Confirmed
>
> Bug description:
> https://validation.linaro.org/scheduler/job/98097/log_file#L_21_482
>
> The latest no_hz isolation updates which use hotplug break regression,
> it is thought that an Arndale bug in hotplug is the issue
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linaro-networking/+bug/1270873/+subscriptions
>

Revision history for this message
Gary S. Robertson (gary-robertson) wrote :

With the latest patches (CPU hotplug interim patch from I. Singh and disabling THUMB2_KERNEL), both the 3.10.27 and 3.12.10 kernels appear to be passing in the linux-lng (non-preempt-rt) branches. Likewise in both kernels the RT branch fails to achieve isolation and fails the LAVA test.

Revision history for this message
Mike Holmes (mike-holmes) wrote :

https://validation.linaro.org/dashboard/image-charts/LNG-NO_HZ

no_hz full isolation is now generally working, but fails specifically for RT which is a new bug with a different root cause.

Changed in linaro-networking:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.