Comment 178 for bug 263555

Revision history for this message
Yingying Zhao (yingying-zhao) wrote :

This is the patch which has passed Intel's testing and with this patch that issue can't be reproduced again now. It looks to be a work-around and with the .28 a fix for the root cause of the problem.

> Date: Wed, 15 Oct 2008 18:21:44 -0400 (EDT)
> From: Steven Rostedt <email address hidden>
> To: LKML <email address hidden>, <email address hidden>
> cc: Linus Torvalds <email address hidden>,
> Andrew Morton <email address hidden>,
> Arjan van de Ven <email address hidden>, <email address hidden>,
> <email address hidden>, Thomas Gleixner <email address hidden>,
> Ingo Molnar <email address hidden>
> Subject: [PATCH -stable] disable CONFIG_DYNAMIC_FTRACE due to possible memory
> corruption on module unload
>
>
> While debugging the e1000e corruption bug with Intel, we discovered
> today that the dynamic ftrace code in mainline is the likely source of
> this bug.
>
> For the stable kernel we are providing the only viable fix patch: labeling
> CONFIG_DYNAMIC_FTRACE as broken. (see the patch below)
>
> We will follow up with a backport patch that contains the fixes. But since
> the fixes are not a one liner, the safest approach for now is to
> disable the code in question.
>
> The cause of the bug is due to the way the current code in mainline
> handles dynamic ftrace. When dynamic ftrace is turned on, it also
> turns on CONFIG_FTRACE which enables the -pg config in gcc that places
> a call to mcount at every function call. With just CONFIG_FTRACE this
> causes a noticeable overhead. CONFIG_DYNAMIC_FTRACE works to ease this
> overhead by dynamically updating the mcount call sites into nops.
>
> The problem arises when we trace functions and modules are unloaded.
> The first time a function is called, it will call mcount and the mcount
> call will call ftrace_record_ip. This records the calling site and
> stores it in a preallocated hash table. Later on a daemon will
> wake up and call kstop_machine and convert any mcount callers into
> nops.
>
> The evolution of this code first tried to do this without the kstop_machine
> and used cmpxchg to update the callers as they were called. But I
> was informed that this is dangerous to do on SMP machines if another
> CPU is running that same code. The solution was to do this with
> kstop_machine.
>
> We still used cmpxchg to test if the code that we are modifying is
> indeed code that we expect to be before updating it - as a final
> line of defense.
>
> But on 32bit machines, ioremapped memory and modules share the same
> address space. When a module would load its code into memory and execute
> some code, that would register the function.
>
> On module unload, ftrace incorrectly did not zap these functions from
> its hash (this was the bug). The cmpxchg could have saved us in most
> cases (via luck) - but with ioremap-ed memory that was exactly the wrong
> thing to do - the results of cmpxchg on device memory are undefined.
> (and will likely result in a write)
>
> The pending .28 ftrace tree does not have this bug anymore, as a general push
> towards more robustness of code patching, this is done differently: we do not
> use cmpxchg and we do a WARN_ON and turn the tracer off if anything deviates
> from its expected state. Furthermore, patch sites are statically identified
> during build time so there's no runtime discovery of dynamic code areas
> anymore, and no room for code unmaps to cause the hash to become out of date.
>
> We believe the fragility of dynamic patching has been sufficiently
> addressed in the development code via the static patching method, but further
> suggestions to make it more robust are welcome.
>
> Signed-off-by: Steven Rostedt <email address hidden>
> Acked-by: Ingo Molnar <email address hidden>
> Acked-by: Thomas Gleixner <email address hidden>
> ---
> kernel/trace/Kconfig | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> Index: linux-compile.git/kernel/trace/Kconfig
> ===================================================================
> --- linux-compile.git.orig/kernel/trace/Kconfig 2008-10-02 10:18:49.000000000
> -0400
> +++ linux-compile.git/kernel/trace/Kconfig 2008-10-15 17:29:34.000000000
> -0400
> @@ -103,7 +103,8 @@ config CONTEXT_SWITCH_TRACER
> all switching of tasks.
>
> config DYNAMIC_FTRACE
> - bool "enable/disable ftrace tracepoints dynamically"
> + bool "enable/disable ftrace tracepoints dynamically (BROKEN)"
> + depends on BROKEN
> depends on FTRACE
> depends on HAVE_DYNAMIC_FTRACE
> default y
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to <email address hidden>
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/