Ubuntu: bl-agitator fails to switch sometimes in random switching mode

Bug #995857 reported by Avik Sil on 2012-05-07
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
big.LITTLE Reference Switcher
New
Medium
Omar Ramirez Luna

Bug Description

hwpack: http://snapshots.linaro.org/oneiric/hwpacks/vexpressdt-rtsm/70/

When bl-agitator is run in random switching mode, sometimes it fails to switch:

root@linaro-developer:~/core# bl-agitator -r -l 1000 -n
***bl-agitator***
CPU count: 4
CPU0: big freq 1000000 LITTLE freq 100000
CPU1: big freq 1000000 LITTLE freq 100000
CPU2: big freq 1000000 LITTLE freq 100000
CPU3: big freq 1000000 LITTLE freq 100000
Random switcher seed 0 limit 1000
Random switcher seed 0 limit 1000
Random switcher seed 0 limit 1000
Random switcher seed 0 limit 1000
cpu3 scaling_setspeed target 100000 current 1000000... FAIL
error on iteration 27 period 929
cpu0 scaling_setspeed target 100000 current 1000000... FAIL
error on iteration 75 period 907
^CTime elapsed: 0:03:10.1000
Terminated because of SIG 2
root@linaro-developer:~/core#

Paul Larson (pwlars) on 2012-05-10
Changed in linaro-big-little-reference:
assignee: nobody → Dave Martin (dave-martin-arm)
Changed in linaro-big-little-reference:
importance: Undecided → Medium
Dave Martin (dave-martin-arm) wrote :

I think this is not a bug.

Because the reference switcher does not switch the CPUs independently, the threads will interfere with each other when running the agitator in threaded mode on the reference switcher:

Thread 1: switch to big on CPU0
    * all CPUs move to big
Thread 2: switch to little on CPU1
    * all CPUs move to little
Thread 1: check that CPU0 is big -> fail

The attached patch adds some locking, to help understand whether this analysis is correct: if so, the modified agitator should not display the failures.

This is not a good fix, though -- we *do* want to switch asynchronously from different threads, because this is a good test of the switcher.

One option is to use a counter to detect concurrent switches so we don't flag up errors unnecessarily. Note that we should only do this for the reference switcher (really we should pay attention to the affected_cpus mask in sysfs).

bl_set_frequency() {
    static volatile unsigned switch_count = 0;
    unsigned expected_count;

...

    expected_count = __sync_add_and_fetch(&count, 1);
    sysfs_write_file("scaling_setspeed", ...);
    sysfs_read_file("cpuinfo_cur_freq", ...);
    curr_freq = ...;

    /* Treat frequency mismatch as an error, but only if another thread has not done another switch already: */

    __sync_synchronize();
    err = curr_freq != target_freq && switch_count != expected_count;

...
}

(This is totally untested)

Mounir Bsaibes (mounir-bsaibes) wrote :

Should this be assigned to Omar to update the agitator?

Dave Martin (dave-martin-arm) wrote :

Yes, that makes sense

@Omar, over to you :)

Changed in linaro-big-little-reference:
assignee: Dave Martin (dave-martin-arm) → Omar Ramirez Luna (omar.ramirez)

Is this bug still valid, I have spent some time trying to reproduce, even modified the agitator to use 16 threads on the same cpu, it also looks like independent switching is supported on the integrated switcher, is this not the case for the reference switcher?

BTW, the hwpack causes the model to be stuck in a loop printing "CPU0: Booted".

Avik Sil (aviksil) wrote :

Yes, it is still reproducible in the latest reference hwpack: https://snapshots.linaro.org/precise/restricted/integrated-big.little-fastmodels/latest/hwpack_linaro-vexpressdt-rtsm-reference_20120605-77_armhf_unsupported.tar.gz

root@linaro-developer:~# bl-agitator -r -l 1000 -n
***bl-agitator***
CPU count: 4
CPU0: big freq 1000000 LITTLE freq 100000
CPU1: big freq 1000000 LITTLE freq 100000
CPU2: big freq 1000000 LITTLE freq 100000
CPU3: big freq 1000000 LITTLE freq 100000
Random switcher seed 0 limit 1000
Random switcher seed 0 limit 1000
Random switcher seed 0 limit 1000
Random switcher seed 0 limit 1000
failed to write scaling_setspeed, errno 22
failed to write scaling_setspeed, errno 22
failed to write scaling_setspeed, errno 22
failed to write scaling_setspeed, errno 22
cpu3 scaling_setspeed target 1000000 current 100000... FAIL
error on iteration 53 period 212
failed to write scaling_setspeed, errno 22
cpu2 scaling_setspeed target 1000000 current 100000... FAIL
error on iteration 59 period 72
failed to write scaling_setspeed, errno 22
cpu1 scaling_setspeed target 1000000 current 100000... FAIL
error on iteration 49 period 44
failed to write scaling_setspeed, errno 22
cpu0 scaling_setspeed target 1000000 current 100000... FAIL
error on iteration 52 period 892
Time elapsed: 0:00:53.1000
root@linaro-developer:~#

Strange, I can't boot with this hwpack due to an initial loop with the message: CPU0: Booted printing infinitely.

I have tried the following model versions:

Fri Mar 16 12:45:55 CDT 2012 Installer Generator Build 50823 - installed ARM Fast Models RTSM A15x14-A7x14 VE (Build 2)
Fri May 25 12:43:42 CDT 2012 Installer Generator Build 50823 - installed ARM Fast Models RTSM A15x14-A7x14 VE (Build 4)

Then version: RTSM_A15-A7x14_VE-7.1_3 both 32 and 64

Perhaps, I'm missing some parameter when launching the model: I'm using the following wiki (previously pointed by Avik):
https://wiki.linaro.org/Internal/Projects/Big.Little.Switcher/Ubuntu

And following tarballs:

linaro-image-tools-2012.04
linaro-precise-developer-20120524-143.tar.gz
hwpack_linaro-vexpressdt-rtsm-reference_20120605-78_armhf_unsupported.tar.gz

I don't have this issue with the integrated switcher tarball.

Avik Sil (aviksil) wrote :

Omar,

Instructions for running reference switcher is little different, you need to follow this instruction: https://wiki.linaro.org/RikuVoipio/FastModelNotes

FYI, my fast model version is 7.0.48 (Build date: Feb 16 2012)

Regards,
Avik

With new parameters for the model, I was able to reproduce and confirm this behaviour, indeed on errors a thread who changed the cluster (T1) is preempted by a thread changing it to the opposite cluster (T2), T2 will succeed and T1 will fail as the current frequency is not the expected target.

Solving it with the use of counters works fine for above scenario, but might fail, with same T1 and T2 as the expected_count (local variable) and switch_count (static variable) is incremented for every thread, this means:

Thread 1: Preempted
    * expected_count = 1, switch_count = 1
Thread 2: switch to little on CPU1
    * expected_count = 2, switch_count = 2
    * all CPUs move to little
Thread 1: switch to big on CPU0
    * all CPUs move to big
Thread 1: check that CPU0 is big -> PASS
    * No need to check that expected_count = 1, switch_count = 2
Thread 2: check that CPU1 is little -> FAIL
    * expected_count = 2, switch_count = 2

So, T2 has no idea that another thread switched the cluster after itself and the counters don't reflect that it was the case.

AFAICS, affected_cpus doesn't reflect the non/dependency between cpu clusters but it seems that related_cpus does, so I could use that to tell the agitator to expect its changes to affect other cpus and make the thread transitions sequential through mutexes.

However, I don't see much gain from a sequential thread execution to the non-threaded option (that sequentially changes all CPUs). Perhaps I could add a warn print that the threaded option is not supported on the reference switcher, does anybody object?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers