[Hyper-V] Race condition in SMP bootup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Trusty |
Invalid
|
Medium
|
Joseph Salisbury | ||
Vivid |
Fix Released
|
Medium
|
Joseph Salisbury | ||
Wily |
Fix Released
|
Medium
|
Joseph Salisbury | ||
linux-lts-utopic (Ubuntu) |
Fix Released
|
Medium
|
Joseph Salisbury | ||
Trusty |
Fix Released
|
Medium
|
Joseph Salisbury |
Bug Description
Please integrate the following upstream commit.
sched: Fix cpu_active_
There is a race condition in SMP bootup code, which may result
in
WARNING: CPU: 0 PID: 1 at kernel/
workqueue_
or
kernel BUG at kernel/
It can be triggered with a bit of luck in Linux guests running
on busy hosts.
CPU0 CPUn
==== ====
_cpu_up()
__cpu_up()
cpumask_
cpu_
<do stuff, see below>
cpumask_
During the various CPU_ONLINE callbacks CPUn is online but not
active. Several things can go wrong at that point, depending on
the scheduling of tasks on CPU0.
Variant 1:
cpu_notify(
workqueue_
rebind_
This call fails because it requires an active CPU; rebind_workers()
ends with a warning:
WARNING: CPU: 0 PID: 1 at kernel/
workqueue_
Variant 2:
cpu_notify(
smpboot_
smpboot_
..
..
The ->wake_cpu of the unparked thread is not allowed, making a call
to select_
find an allowed, active CPU and promptly resets the allowed CPUs, so
that the task in question ends up on CPU0.
When those unparked tasks are eventually executed, they run
immediately into a BUG:
kernel BUG at kernel/
Just changing the order in which the online/active bits are set
(and adding some memory barriers), would solve the two issues
above. However, it would change the order of operations back to
the one before commit 6acbfb96976f ("sched: Fix hotplug vs.
set_cpus_
problem.
Going further back into history, we have at least the following
commits touching this topic:
- commit 2baab4e90495 ("sched: Fix select_
- commit 5fbd036b552f ("sched: Cleanup cpu_active madness")
Together, these give us the following non-working solutions:
- secondary CPU sets active before online, because active is assumed to
be a subset of online;
- secondary CPU sets online before active, because the primary CPU
assumes that an online CPU is also active;
- secondary CPU sets online and waits for primary CPU to set active,
because it might deadlock.
Commit 875ebe940d77 ("powerpc/smp: Wait until secondaries are
active & online") introduces an arch-specific solution to this
arch-independent problem.
Now, go for a more general solution without explicit waiting and
simply set active twice: once on the secondary CPU after online
was set and once on the primary CPU after online was seen.
set_cpus_
tags: | added: kernel-da-key kernel-hyper-v |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Trusty): | |
status: | New → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Joseph Salisbury (jsalisbury) |
Changed in linux (Ubuntu Vivid): | |
assignee: | nobody → Joseph Salisbury (jsalisbury) |
Changed in linux (Ubuntu Wily): | |
assignee: | nobody → Joseph Salisbury (jsalisbury) |
Changed in linux (Ubuntu Vivid): | |
status: | Confirmed → Fix Committed |
Changed in linux (Ubuntu Trusty): | |
status: | In Progress → Invalid |
Changed in linux (Ubuntu Vivid): | |
status: | Fix Committed → Fix Released |
no longer affects: | linux-lts-utopic (Ubuntu Wily) |
no longer affects: | linux-lts-utopic (Ubuntu Vivid) |
Changed in linux-lts-utopic (Ubuntu): | |
status: | New → In Progress |
Changed in linux-lts-utopic (Ubuntu Trusty): | |
status: | New → In Progress |
Changed in linux-lts-utopic (Ubuntu): | |
importance: | Undecided → Medium |
Changed in linux-lts-utopic (Ubuntu Trusty): | |
importance: | Undecided → Medium |
Changed in linux-lts-utopic (Ubuntu): | |
assignee: | nobody → Joseph Salisbury (jsalisbury) |
Changed in linux-lts-utopic (Ubuntu Trusty): | |
assignee: | nobody → Joseph Salisbury (jsalisbury) |
Changed in linux-lts-utopic (Ubuntu): | |
status: | In Progress → Fix Released |
Changed in linux-lts-utopic (Ubuntu Trusty): | |
status: | In Progress → Fix Released |
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1508609
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.