Comment 96 for bug 1007765

Revision history for this message
Marwan Tanager (marwan-tngr) wrote : Re: [Bug 1007765] Re: brightness adjusting crashes system

On Thu, Aug 22, 2013 at 06:10:50PM -0000, Aditya wrote:
> May be the community is looking for the solution in the wrong direction.

Well, I will elaborate on this bug once and for all.

First of all, we can't blame the Linux kernel for this bug, because it is
firmware-specific.

"firmware-specific" means that the peace of software (i.e., the backlight
driver) causing this bug is written by the affected systems manufacturers
(Dell, HP, etc.), and burned into the BIOS ROM chips of those systems. This
software is closed-source, so there is no way to fix it, other than via
firmware updates provided by the manufacturers.

The freeze happens at the time of the interaction between the Linux kernel and
the backlight driver provided by the firmware. Basically, It happens as
follows:

 - The user adjust the brightness using the keyboard hot keys, or a
   software application (e.g., gnome-control-center).

  - In the first case, The X window system provides a prioritized
    list of backlight drivers interfaces (exported by the Linux
    kernel under /sys/class/backlight/) that are used for
    handling the keyboard key presses. The highest priority
    interface, which are present on the system, is used.

  - In the second case, the software application just selects the
    interface to use for adjusting the brightness from the
    currently available interfaces under /sys/class/backlight/

 - Depending on the driver underlying the interface selected at the
   previous step, the system may or may not enter SMM (System Management
   Mode) to adjust the brightness. SMM is an operational mode which the
   system enters when it wants to run critical firmware (e.g., the peace
   of software responsible for shutting down the system when the
   processor temperature hit a critical threshold, or in our case, the
   peace of software responsible for adjusting the brightness level,
   etc). When the system is executing software in SMM, it is no longer
   under the control of the Linux kernel, and is fully controlled by the
   firmware executed.

   The video0_acpi and the dell_backlight (or whatever it's called on
   systems other than Dell) interfaces under /sys/class/backlight/, are
   interfaces, exported by the Linux kernel, for firmware drivers that
   execute in SMM. So if they are selected in the previous step, the
   system is going to enter SMM for adjusting the brightness. On the
   other hand, the intel_backlight interface on systems with Intel
   Graphics is just an interface for the Linux kernel driver responsible
   for adjusting the backlight level of the intel graphics chip. It is
   executed just like any other driver in the Linux kernel, without the
   need to enter special modes like SMM. So, if this driver is buggy, we
   can say that the Linux kernel is buggy, because it is considered part
   of the kernel. So, we have two cases:

  1. The driver underlying the interface selected (by the X
     window system or the software application used for adjusting
     the brightness) is firmware. In this case, when adjusting the
     brightness, the Linux kernel just instructs the processor to
     enter SMM in order to execute the instructions of this
     driver, and when finished, it takes control back on the
     system.

  2. The driver underlying the interface selected, is the Linux
     kernel driver for the graphics chip. In this case, when
     adjusting the brightness, this driver, provided by the Linux
     kernel, is responsible for doing the job, while the system
     is fully controlled by the Linux kernel, and without the
     need for entering any special modes like SMM for executing
     opaque firmware.

   In case 1, we have a problem, because the kernel has another driver,
   exported via /proc/sys/kernel/nmi_watchdog, that uses a hardware
   timer to periodically issue signals called NMIs (Non-Maskable
   Interrupts) every second or two.

   If an NMI is emitted while the system is operating in SMM, the buggy
   firmware executing in SMM causes the system to freeze.

   In case 2, we are fine, because there is no buggy firmware involved.

The apci_backlight=vendor solution is not reliable, because all it does is to
instruct the Linux kernel to not export the acpi_video0 interface, which is the
interface for the BIOS ACPI backlight driver that is executed from SMM. But, it
also instructs the kernel to export the interface for the vendor driver, which
is also firmware and is executed from SMM. So, depending on the priority of
those two drivers in the X window system configurations (having higher/lower
priority than a kernel-supplied driver like intel_backlight), or the ad-hoc way
, by which, a backlight-adjusting application selects the interface to use, the
interface for the vendor driver (executing in SMM) may be the one that is
selected after booting with the acpi_backlight=vendor kernel parameter. That's
why using this kernel parameter sometimes doesn't work; we just replaced one
buggy firmware driver executing in SMM with another buggy firmware driver
executing in SMM.

CONCLUSION: The only reliable way of avoiding this bug on systems with buggy
firmware is by putting this line in /etc/rc.local

echo 0 >/proc/sys/kernel/nmi_watchdog

When executed, this command will instruct the kernel to stop emitting NMIs
periodically, and therefore, we can avoid the conflict resulting when the X
window system or backlight-adjusting software applications selects an interface
exported by the Linux kernel for a firmware driver that has to be executed from
SMM.

> Looking in /sys/class/backlight/ lists 3 folders on my Dell inspiron 7520.
> One of the folders is intel_backlight .
>
> Manually doing
> root@Sirius:~# echo 2000 > /sys/class/backlight/intel_backlight/brightness
> works.

You are using the wrong way for testing this bug. You have to do quick
successive adjustments to reproduce the bug. Try using a script like the
fluctuate_backlight.sh shell script provided in the attachments of the bug
report at https://bugzilla.kernel.org/show_bug.cgi?id=57571 for reproducing the
bug via the sysfs interfaces. I wrote it and used it a lot for testing, while
following up to this bug report. It always works right with intel_backlight,
though, so it won't make a difference in this case, because, as mentioned
above, intel_backlight is an interface for a driver that isn't executed from
SMM... However, I was able to use it to reproduce the bug with all other
interfaces for drivers executing from SMM (e.g., acpi_video0, and
dell_backlight, on Dell systems).

Oh, forgot to mention that there is another more reliable way for avoiding this
bug: buy a new laptop, and don't forget to try it out in the store before
finishing the deal :-)