NUMA task migration race condition due to stop task not being checked when balancing happens

Bug #1461620 reported by Rafael David Tinoco on 2015-06-03
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Trusty
Medium
Rafael David Tinoco
Vivid
Undecided
Rafael David Tinoco

Bug Description

SRU Justification:

Impact:
 - Deadlock when migrating processes in between NUMA domains.
 - Came with 1 kernel dump given to me.
 - Hard to trigger.

Fix:
 - Upstream development after upstream discussion.
 - Discussion: https://lkml.org/lkml/2015/6/15/531
 - commit b17718d02f54b90978d0e0146368b512b11c3e84

Testcase:
 - Stress test in a virtual NUMA environment
 - Wait indefinitely... Hard to trigger
 - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/comments/8
 - Can, at least, make sure the logic did not introduce regression

----

It was brought to my attention the follow kernel panic:

"""
[3367068.076488] Code: 23 66 2e 0f 1f 84 00 00 00 00 00 83 fb 03 75 05 45 84 ed 75 66 f0 41 ff 4c 24 24 74 26 89 da 83 fa 04 74 3d f3 90 41 8b 5c 24 20 <39> d3 74 f0 83 fb 02 75 d7 fa 66 0f 1f 44 00 00 eb d8 66 0f 1f
[3367068.092735] BUG: soft lockup - CPU#16 stuck for 22s! [migration/16:153]
[3367068.100368] Modules linked in: iptable_raw xt_nat xt_REDIRECT veth openvswitch(OF) gre vxlan ip_tunnel libcrc32c dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables ipmi_devintf 8021q garp stp mrp llc bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel gpio_ich joydev aes_x86_64 lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sb_edac wmi lpc_ich edac_core mac_hid acpi_power_meter nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache lp parport hid_generic ixgbe fnic libfcoe dca ptp
[3367068.100409] libfc megaraid_sas pps_core mdio usbhid scsi_transport_fc hid enic scsi_tgt
[3367068.100415] CPU: 16 PID: 153 Comm: migration/16 Tainted: GF O 3.13.0-34-generic #60-Ubuntu
[3367068.100417] Hardware name: Cisco Systems Inc UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.1.5.4f.0.111320130449 11/13/2013
[3367068.100419] task: ffff881fd2f517f0 ti: ffff881fd2f1c000 task.ti: ffff881fd2f1c000
[3367068.100420] RIP: 0010:[<ffffffff810f5944>] [<ffffffff810f5944>] multi_cpu_stop+0x64/0xf0
[3367068.100426] RSP: 0000:ffff881fd2f1dd98 EFLAGS: 00000246
[3367068.100427] RAX: ffffffff8180af40 RBX: 0000000000000086 RCX: 000000000000a402
[3367068.100428] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff883e607edb48
[3367068.100430] RBP: ffff881fd2f1ddb8 R08: 0000000000000282 R09: 0000000000000001
[3367068.100431] R10: 000000000000b6d8 R11: ffff881fc374dc80 R12: 0000000000014440
[3367068.100432] R13: ffff881fd291ae00 R14: ffff881fd291ae08 R15: 0000000200000010
[3367068.100433] FS: 0000000000000000(0000) GS:ffff881fffd00000(0000) knlGS:0000000000000000
[3367068.100434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3367068.100435] CR2: 00007f6202134b98 CR3: 0000000001c0e000 CR4: 00000000001407e0
[3367068.100437] Stack:
[3367068.100438] ffff883e607edb70 ffff881fffd0ede0 ffff881fffd0ede8 ffff883e607edb48
[3367068.100441] ffff881fd2f1de78 ffffffff810f5b5e ffffffff8109dfc4 ffff881fffd14440
[3367068.100443] ffff881fd2f1de08 ffffffff81097508 0000000000000000 ffff881fffd14440
[3367068.100446] Call Trace:
[3367068.100450] [<ffffffff810f5b5e>] cpu_stopper_thread+0x7e/0x150
[3367068.100454] [<ffffffff8109dfc4>] ? vtime_common_task_switch+0x24/0x40
[3367068.100458] [<ffffffff81097508>] ? finish_task_switch+0x128/0x170
[3367068.100462] [<ffffffff8171fd41>] ? __schedule+0x381/0x7d0
[3367068.100465] [<ffffffff810926af>] smpboot_thread_fn+0xff/0x1b0
[3367068.100467] [<ffffffff810925b0>] ? SyS_setgroups+0x1a0/0x1a0
[3367068.100470] [<ffffffff8108b3d2>] kthread+0xd2/0xf0
[3367068.100473] [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0
[3367068.100477] [<ffffffff8172c6bc>] ret_from_fork+0x7c/0xb0
[3367068.100479] [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0
[3367068.100480] Code: db 85 db 41 0f 95 c5 31 f6 31 d2 eb 23 66 2e 0f 1f 84 00 00 00 00 00 83 fb 03 75 05 45 84 ed 75 66 f0 41 ff 4c 24 24 74 26 89 da <83> fa 04 74 3d f3 90 41 8b 5c 24 20 39 d3 74 f0 83 fb 02 75 d7
"""

I'm explaining WHY this is happening in the first comments and HOW to fix it.

CVE References

Download full text (3.2 KiB)

You can follow my comments in LKML:

https://lkml.org/lkml/2015/3/6/484

"""
Basically in kernel 3.13 we are getting the follow situation:

I have a core dump locked on the same place
(state machine for powering cpu down for the task swap) from a 3.13 (+
upstream patches) and this commit wasn't backported yet.

-> multi_cpu_stop -> do { } while (curstate != MULTI_STOP_EXIT);
In my case, curstate is WAY different from enum containing MULTI_STOP_EXIT (4).

Register totally messed up (probably after cpu_relax(), right where
you were trapped -> after the pause instruction).

my case:

PID: 118 TASK: ffff883fd28ec7d0 CPU: 9 COMMAND: "migration/9"
...
    [exception RIP: multi_cpu_stop+0x64]
    RIP: ffffffff810f5944 RSP: ffff883fd2907d98 RFLAGS: 00000246
    RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000246
    RDX: ffff883fd2907d98 RSI: 0000000000000000 RDI: 0000000000000001
    RBP: ffffffff810f5944 R8: ffffffff810f5944 R9: 0000000000000000
    R10: ffff883fd2907d98 R11: 0000000000000246 R12: ffffffffffffffff
    R13: ffff883f55d01b48 R14: 0000000000000000 R15: 0000000000000001
    ORIG_RAX: 0000000000000001 CS: 0010 SS: 0000
--- <NMI exception stack> ---
 #4 [ffff883fd2907d98] multi_cpu_stop+0x64 at ffffffff810f5944
208 } while (curstate != MULTI_STOP_EXIT);
       ---> RIP
RIP 0xffffffff810f5944 <+100>: cmp $0x4,%edx
       ---> CHECKING FOR MULTI_STOP_EXIT
RDX: ffff883fd2907d98 -> does not make any sense
###

If i'm reading this right,

"""
CPU 05 - PID 14990

do_numa_page
task_numa_fault
numa_migrate_preferred
task_numa_migrate
migrate_swap (curr: 14990, task: 14996)
stop_two_cpus (cpu1=05(14996), cpu2=00(14990))
wait_for_completion

14990 - CPU05
14996 - CPU00

stop_two_cpus:
    multi_stop_data (msdata->state = MULTI_STOP_PREPARE)
    smp_call_function_single (min=cpu2=00, irq_cpu_stop_queue_work, wait=1)
        smp_call_function_single (ran on lowest CPU, 00 for this case)
        irq_cpu_stop_queue_work
            cpu_stop_queue_work(cpu1=05(14996)) # add work
(multi_cpu_stop) to cpu 05 cpu_stopper queue
            cpu_stop_queue_work(cpu2=00(14990)) # add work
(multi_cpu_stop) to cpu 00 cpu_stopper queue
    wait_for_completion() --> HERE
"""

in my case, checking task structs for tasks scheduled when
"waiting_for_completion()":

PID 14990 CPU 05 -> PID 14996 CPU 00
PID 14991 CPU 30 -> PID 14998 CPU 01
PID 14992 CPU 30 -> PID 14998 CPU 01
PID 14996 CPU 00 -> PID 14992 CPU 30
PID 14998 CPU 01 -> PID 14990 CPU 05

AND

> 102 2 6 ffff881fd2ea97f0 RU 0.0 0 0 [migration/6]
> 118 2 9 ffff883fd28ec7d0 RU 0.0 0 0 [migration/9]
> 143 2 14 ffff883fd29d47d0 RU 0.0 0 0 [migration/14]
> 148 2 15 ffff883fd29fc7d0 RU 0.0 0 0 [migration/15]
> 153 2 16 ffff881fd2f517f0 RU 0.0 0 0 [migration/16]

THEN

I am still waiting for 5 cpu_stopper_thread -> multi_cpu_stop just
scheduled (probably in the per cpu's queue of cpus 0,1,5,30), not
running yet.

AND

I don't have any "wait_for_completion" for those "OLDER" migration
threads (6, 9, 14, 15 and 16)
Probably wait_for_completion s...

Read more...

Changed in linux (Ubuntu):
status: New → In Progress
assignee: nobody → Rafael David Tinoco (inaddy)

Sasha pointed me the a fix for this particular behaviour in between 3.16 and 3.17:

https://lkml.org/lkml/2014/4/10/297

[PATCH] sched: Checking for stop task appearance when balancing happens

Saying that indeed mine previous observation:

"""
--- <NMI exception stack> ---
 #4 [ffff883fd2907d98] multi_cpu_stop+0x64 at ffffffff810f5944
208 } while (curstate != MULTI_STOP_EXIT);
       ---> RIP
RIP 0xffffffff810f5944 <+100>: cmp $0x4,%edx
       ---> CHECKING FOR MULTI_STOP_EXIT
RDX: ffff883fd2907d98 -> does not make any sense
"""

was right due to a stop task being picked by scheduler when it should not.

And this commit is present into:

$ git tag --contains a1d9a3231eac4117cadaf4b6bba5b2902c15a33e
v3.15-rc2
v3.15-rc3
...
v4.1-rc5

So only Trusty is affected.

It happens that the fix relies on checking if the stop worker needs task selection re-start:

+ if (need_pull_dl_task(rq, prev)) {
   pull_dl_task(rq);
+ /*
+ * pull_rt_task() can drop (and re-acquire) rq->lock; this
+ * means a stop task can slip in, in which case we need to
+ * re-start task selection.
+ */
+ if (rq->stop && rq->stop->on_rq)
+ return RETRY_TASK;

And this is done by returning RETRY_TASK. This logic was not available in 3.13 AND I don't want to jeopardise our 3.13 scheduler.

To understand better if this bug was triggered easy I created the following test case:

I've been using a KVM guest emulating a NUMA environment with 32 different domains (1 for each vCPU):

root@numa:~# numactl -H
available: 32 nodes (0-31)
node 0 cpus: 0
node 0 size: 237 MB
node 0 free: 82 MB
node 1 cpus: 1
node 1 size: 251 MB
node 1 free: 15 MB
node 2 cpus: 2
node 2 size: 251 MB
node 2 free: 52 MB
node 3 cpus: 3
node 3 size: 251 MB
node 3 free: 240 MB
node 4 cpus: 4
node 4 size: 251 MB
node 4 free: 15 MB
node 5 cpus: 5
node 5 size: 251 MB
node 5 free: 15 MB
node 6 cpus: 6
node 6 size: 251 MB
node 6 free: 17 MB
node 7 cpus: 7
node 7 size: 251 MB
node 7 free: 15 MB
node 8 cpus: 8
node 8 size: 251 MB
node 8 free: 16 MB
node 9 cpus: 9
node 9 size: 251 MB
node 9 free: 16 MB
node 10 cpus: 10
node 10 size: 251 MB
node 10 free: 15 MB
node 11 cpus: 11
node 11 size: 187 MB
node 11 free: 13 MB
node 12 cpus: 12
node 12 size: 251 MB
node 12 free: 15 MB
node 13 cpus: 13
node 13 size: 251 MB
node 13 free: 17 MB
node 14 cpus: 14
node 14 size: 251 MB
node 14 free: 15 MB
node 15 cpus: 15
node 15 size: 251 MB
node 15 free: 16 MB
node 16 cpus: 16
node 16 size: 251 MB
node 16 free: 17 MB
node 17 cpus: 17
node 17 size: 251 MB
node 17 free: 17 MB
node 18 cpus: 18
node 18 size: 251 MB
node 18 free: 16 MB
node 19 cpus: 19
node 19 size: 251 MB
node 19 free: 15 MB
node 20 cpus: 20
node 20 size: 251 MB
node 20 free: 16 MB
node 21 cpus: 21
node 21 size: 251 MB
node 21 free: 17 MB
node 22 cpus: 22
node 22 size: 251 MB
node 22 free: 51 MB
node 23 cpus: 23
node 23 size: 251 MB
node 23 free: 37 MB
node 24 cpus: 24
node 24 size: 251 MB
node 24 free: 120 MB
node 25 cpus: 25
node 25 size: 251 MB
node 25 free: 115 MB
node 26 cpus: 26
node 26 size: 251 MB
node 26 free: 41 MB
node 27 cpus: 27
node 27 size: 251 MB
node 27 free: 15 MB
node 28 cpus: 28
node 28 size: 251 MB
node 28 free: 15 MB
node 29 cpus: 29
node 29 size: 251 MB
node 29 free: 17 MB
node 30 cpus: 30
node 30 size: 251 MB
node 30 free: 164 MB
node 31 cpus: 31
node 31 size: 251 MB
node 31 free: 228 MB

And stressing the environment (as you can see in "free memory" for every NUMA node with a specific tool that allocates a certain amount of memory and "touches" every 32 bytes of this memory (and dirtying it at the end, restarting the same behavior). Together with that I'm creating enough kernel tasks concurrent to these memory allocators for them to compete for CPU -> forcing the memory threads to migrate between CPUs (and NUMA domains since every CPU is inside a different NUMA domain).

Using ftrace I can make sure that we are triggering the logic that is responsible for the dead lock to happen (in a frequent basis) but until now without the success of making it to happen.

root@numa:~# trace-cmd record -p function -l numa_migrate_preferred -l task_numa_migrate -l migrate_swap -l stop_two_cpus

...
stress-1547 [012] 136.309393: function: numa_migrate_preferred
stress-1547 [012] 136.309394: function: task_numa_migrate
stress-1547 [012] 136.309414: function: migrate_swap
stress-1547 [012] 136.309414: function: stop_two_cpus
stress-1539 [017] 136.309519: function: numa_migrate_preferred
stress-1539 [017] 136.309519: function: task_numa_migrate
stress-1539 [017] 136.309528: function: migrate_swap
stress-1539 [017] 136.309528: function: stop_two_cpus
stress-1563 [006] 136.313389: function: numa_migrate_preferred
stress-1563 [006] 136.313391: function: task_numa_migrate
stress-1428 [004] 136.313415: function: numa_migrate_preferred
stress-1428 [004] 136.313416: function: task_numa_migrate
stress-1428 [004] 136.313434: function: migrate_swap
stress-1428 [004] 136.313434: function: stop_two_cpus
stress-1421 [016] 136.325398: function: numa_migrate_preferred
stress-1464 [025] 136.386219: function: numa_migrate_preferred
stress-1464 [025] 136.386221: function: task_numa_migrate
stress-1464 [025] 136.386240: function: migrate_swap
stress-1464 [025] 136.386241: function: stop_two_cpus
stress-1435 [014] 136.400792: function: numa_migrate_preferred
stress-1435 [014] 136.400793: function: task_numa_migrate
<...>-1513 [023] 136.401345: function: numa_migrate_preferred
stress-1447 [019] 136.410245: function: numa_migrate_preferred
stress-1447 [019] 136.410246: function: task_numa_migrate
stress-1517 [012] 136.413338: function: numa_migrate_preferred
stress-1554 [024] 136.417383: function: numa_migrate_preferred
stress-1554 [024] 136.417384: function: task_numa_migrate
stress-1554 [024] 136.417407: function: migrate_swap
stress-1554 [024] 136.417408: function: stop_two_cpus
<...>-1507 [023] 136.421348: function: numa_migrate_preferred
stress-1500 [018] 136.445321: function: numa_migrate_preferred
stress-1525 [025] 136.473330: function: numa_migrate_preferred
stress-1472 [029] 136.502245: function: numa_migrate_preferred
stress-1472 [029] 136.502247: function: task_numa_migrate
stress-1472 [029] 136.502270: function: migrate_swap
stress-1472 [029] 136.502270: function: stop_two_cpus
stress-1496 [004] 136.569273: function: numa_migrate_preferred
stress-1496 [004] 136.569275: function: task_numa_migrate
...

root@ttwcnuma:~# trace-cmd report | grep stop_two_cpus | wc -l
475

Meaning that I caused a task to be migrated between NUMA domains 475 times in less the 3 seconds.

But unfortunately I could not reproduce the issue (although I know it is in there). I'll create a small logic similar to:

Commit a1d9a3231eac4117cadaf4b6bba5b2902c15a33e
Author: Kirill Tkhai <email address hidden>
Date: Thu Apr 10 17:38:36 2014 +0400

    sched: Check for stop task appearance when balancing happens

    We need to do it like we do for the other higher priority classes..

    Signed-off-by: Kirill Tkhai <email address hidden>
    Cc: Michael wang <email address hidden>
    Cc: Sasha Levin <email address hidden>
    Signed-off-by: Peter Zijlstra <email address hidden>
    Link: http://<email address hidden>
    Signed-off-by: Ingo Molnar <email address hidden>

Where I'll just "bypass" task selection instead of returning RETRY_TASK. Since 3.13 scheduler does not have the RETRY_TASK logic, it will be just a question of not choosing the stop worker (kthread) to run in the same conditions (since the rest is pretty much the same).

Asking for kernel team review while I work on this.

Brad Figg (brad-figg) on 2015-06-03
Changed in linux (Ubuntu):
status: In Progress → Invalid
Changed in linux (Ubuntu Trusty):
status: New → In Progress
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in linux (Ubuntu):
assignee: Rafael David Tinoco (inaddy) → nobody
Changed in linux (Ubuntu Trusty):
importance: Undecided → Medium

Just got an update from Peter:

https://lkml.org/lkml/2015/6/15/531

asking for feedback on a patch:

Subject: stop_machine: Fix deadlock between multiple stop_two_cpus()
From: Peter Zijlstra <email address hidden>
Date: Fri, 5 Jun 2015 17:30:23 +0200

Will try to test the latest builds + this patch with the NUMA migration test. Unfortunately it is REALLY hard to reproduce the issue so I cannot know if the patch fixed anything, just test if it looks good or not.

I'm running the NUMA tests on 3.13 for some time now and it looks like the change did not introduce any regression...

$ uname -a
Linux sf00079894trusty 3.13.11-ckt22-201507231149 #2 SMP Thu Jul 23 13:45:04 BRT 2015 x86_64 x86_64 x86_64 GNU/Linux

I'm using a virtualized 16 Domains / 16 CPUs NUMA environment with the stress test tool:

$ sudo numactl -H
available: 16 nodes (0-15)
node 0 cpus: 0
node 0 size: 363 MB
node 0 free: 23 MB
node 1 cpus: 1
node 1 size: 121 MB
node 1 free: 7 MB
node 2 cpus: 2
node 2 size: 377 MB
node 2 free: 23 MB
node 3 cpus: 3
node 3 size: 377 MB
node 3 free: 23 MB
node 4 cpus: 4
node 4 size: 377 MB
node 4 free: 23 MB
node 5 cpus: 5
node 5 size: 377 MB
node 5 free: 23 MB
node 6 cpus: 6
node 6 size: 377 MB
node 6 free: 35 MB
node 7 cpus: 7
node 7 size: 313 MB
node 7 free: 19 MB
node 8 cpus: 8
node 8 size: 377 MB
node 8 free: 61 MB
node 9 cpus: 9
node 9 size: 377 MB
node 9 free: 57 MB
node 10 cpus: 10
node 10 size: 377 MB
node 10 free: 63 MB
node 11 cpus: 11
node 11 size: 377 MB
node 11 free: 30 MB
node 12 cpus: 12
node 12 size: 377 MB
node 12 free: 67 MB
node 13 cpus: 13
node 13 size: 377 MB
node 13 free: 68 MB
node 14 cpus: 14
node 14 size: 377 MB
node 14 free: 68 MB
node 15 cpus: 15
node 15 size: 377 MB
node 15 free: 64 MB

$ sudo stress --vm 16 --vm-bytes 314572800 --vm-stride 1 --vm-keep &

Causing memory allocations of around 300MB on each node and "touching" every byte of the allocation (causing all the pages to be "hot" on the CPU running).

And generating concurrency:

$ sudo stress --cpu 16 &

So kernel scheduler has to migrate tasks, triggering the buggy logic's fix. I can confirm the logic is being triggered by using ftrace:

$ sudo trace-cmd record -p function -l numa_migrate_preferred -l task_numa_migrate -l migrate_swap -l stop_two_cpus
$ sudo trace-cmd report | grep stop_two_cpus | wc -l162

And can't find any regression.

I'll let the tests to run a bit more and will suggest the fix to our kernel team to merge it as a Stable Release Update for Trusty, Utopic and Vivid.

Changed in linux (Ubuntu Vivid):
status: New → In Progress
assignee: nobody → Rafael David Tinoco (inaddy)
description: updated
Chris J Arges (arges) on 2015-07-23
description: updated
description: updated
Andy Whitcroft (apw) on 2015-07-27
Changed in linux (Ubuntu):
status: Invalid → Fix Committed
Luis Henriques (henrix) on 2015-07-27
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Vivid):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.1.0-3.3

---------------
linux (4.1.0-3.3) wily; urgency=low

  [ Andy Whitcroft ]

  * Release Tracking Bug
    - LP: #1478897

  [ Colin Ian King ]

  * SAUCE: KEYS: ensure we free the assoc array edit if edit is valid
    - CVE-2015-1333

  [ Seth Forshee ]

  * SAUCE: overlayfs: Enable user namespace mounts for the "overlay" fstype
    - LP: #1478578

  [ Upstream Kernel Changes ]

  * sched/stop_machine: Fix deadlock between multiple stop_two_cpus()
    - LP: #1461620
  * x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
  * x86/nmi/64: Remove asm code that saves cr2
  * x86/nmi/64: Switch stacks on userspace NMI entry
  * x86/nmi/64: Reorder nested NMI checks
  * x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI
    detection

 -- Andy Whitcroft <email address hidden> Tue, 28 Jul 2015 11:59:03 +0100

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-vivid

Started verifying the fix.. will provide results soon.

Trusty verification:

inaddy@sf00079894trusty:~$ uname -a
Linux sf00079894trusty 3.13.0-62-generic #101-Ubuntu SMP Thu Jul 30 09:01:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

inaddy@sf00079894trusty:~$ sudo trace-cmd report | grep stop_two_cpus | wc -l
74

In 5 seconds the logic was executed 74 times. I kept it running for quite sometime and it does not look like there is a regression. Marking this as verification-done-trusty. Moving on to Vivid's verification...

tags: added: verification-done-trusty
removed: verification-needed-trusty

Vivid verification:

inaddy@sf00079894vivid:~$ uname -a
Linux sf00079894vivid 3.19.0-26-generic #27-Ubuntu SMP Tue Jul 28 18:27:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

inaddy@sf00079894vivid:~$ sudo trace-cmd report | grep stop_two_cpus | wc -l
46

In 5 seconds the logic was executed 46 times. I kept it running for quite sometime and it does not look like there is a regression. Marking this as verification-done-vivid.

Thank you

tags: added: verification-done
removed: verification-done-trusty verification-needed-vivid
tags: added: sts
tags: added: cts
Launchpad Janitor (janitor) wrote :
Download full text (30.6 KiB)

This bug was fixed in the package linux - 3.19.0-26.28

---------------
linux (3.19.0-26.28) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1483630

  [ Upstream Kernel Changes ]

  * Revert "Bluetooth: ath3k: Add support of 04ca:300d AR3012 device"

linux (3.19.0-26.27) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1479055
  * [Config] updateconfigs for 3.19.8-ckt4 stable update

  [ Chris J Arges ]

  * [Config] Add MTD_POWERNV_FLASH and OPAL_PRD
    - LP: #1464560

  [ Mika Kuoppala ]

  * SAUCE: i915_bpo: drm/i915: Fix divide by zero on watermark update
    - LP: #1473175

  [ Tim Gardner ]

  * [Config] ACORN_PARTITION=n
    - LP: #1453117
  * [Config] Add i40e[vf] to d-i
    - LP: #1476393

  [ Timo Aaltonen ]

  * SAUCE: i915_bpo: Rebase to v4.2-rc3
    - LP: #1473175
  * SAUCE: i915_bpo: Revert "mm/fault, drm/i915: Use pagefault_disabled()
    to check for disabled pagefaults"
    - LP: #1473175
  * SAUCE: i915_bpo: Revert "drm: i915: Port to new backlight interface
    selection API"
    - LP: #1473175

  [ Upstream Kernel Changes ]

  * Revert "tools/vm: fix page-flags build"
    - LP: #1473547
  * Revert "ALSA: hda - Add mute-LED mode control to Thinkpad"
    - LP: #1473547
  * Revert "drm/radeon: adjust pll when audio is not enabled"
    - LP: #1473547
  * Revert "crypto: talitos - convert to use be16_add_cpu()"
    - LP: #1479048
  * module: Call module notifier on failure after complete_formation()
    - LP: #1473547
  * gpio: gpio-kempld: Fix get_direction return value
    - LP: #1473547
  * ARM: dts: imx27: only map 4 Kbyte for fec registers
    - LP: #1473547
  * ARM: 8356/1: mm: handle non-pmd-aligned end of RAM
    - LP: #1473547
  * x86/mce: Fix MCE severity messages
    - LP: #1473547
  * mac80211: don't use napi_gro_receive() outside NAPI context
    - LP: #1473547
  * iwlwifi: mvm: Free fw_status after use to avoid memory leak
    - LP: #1473547
  * iwlwifi: mvm: clean net-detect info if device was reset during suspend
    - LP: #1473547
  * drm/plane-helper: Adapt cursor hack to transitional helpers
    - LP: #1473547
  * ARM: dts: set display clock correctly for exynos4412-trats2
    - LP: #1473547
  * hwmon: (ntc_thermistor) Ensure iio channel is of type IIO_VOLTAGE
    - LP: #1473547
  * mfd: da9052: Fix broken regulator probe
    - LP: #1473547
  * ALSA: hda - Fix noise on AMD radeon 290x controller
    - LP: #1473547
  * lguest: fix out-by-one error in address checking.
    - LP: #1473547
  * xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
    - LP: #1473547
  * xfs: xfs_iozero can return positive errno
    - LP: #1473547
  * fs, omfs: add NULL terminator in the end up the token list
    - LP: #1473547
  * omfs: fix sign confusion for bitmap loop counter
    - LP: #1473547
  * d_walk() might skip too much
    - LP: #1473547
  * dm: fix casting bug in dm_merge_bvec()
    - LP: #1473547
  * hwmon: (nct6775) Add missing sysfs attribute initialization
    - LP: #1473547
  * hwmon: (nct6683) Add missing sysfs attribute initialization
    - LP: #1473547
  * target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST
    - LP: #1473547
  * net...

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released

 inaddy@mylinux  ~/Work/Kernel/Ubuntu/ubuntu-trusty   master  git tag --contains 64863995563d71836fa48b743148dce993154a4e
Ubuntu-3.13.0-60.99
Ubuntu-3.13.0-62.101
Ubuntu-3.13.0-62.102
Ubuntu-3.13.0-63.103
Ubuntu-3.13.0-64.104
Ubuntu-3.13.0-65.105

 linux-image-generic | 3.13.0.24.28 | trusty | amd64, arm64, armhf, i386, ppc64el
 linux-image-generic | 3.13.0.65.71 | trusty-security | amd64, arm64, armhf, i386, ppc64el
 linux-image-generic | 3.13.0.65.71 | trusty-updates | amd64, arm64, armhf, i386, ppc64el

This is already fixed. Updating case status.

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers