Comment 165 for bug 1040557

Revision history for this message
hanishkvc (hanishkvc) wrote :

Hi Steve/All,

Summary of what I have understood after going thro related things over the last few days.

NOTE: This is my understanding, anyone using this info, should cross checking things on their own before experiment on their samsung laptop. Unless others (who understand this fully) can also confirm my understanding to be correct.

NOTE: I am looking at the possibility of installing Ubuntu with UEFI enabled and ideally even Secure boot enabled. So my thoughts are based around that. And also to try and ensure that possibility of the laptop bricking is eliminated as much as possible.

There are potentially 3 known bugs in Samsung UEFI Firmware which can trip up Linux/Ubuntu installation, they are

BUG_A) (As noted by Matthew Garrett) High probability of Corruption of NVRAM / Firmware on using UEFI Runtime Service (RT) SetVariable functionality

In turn This will be triggered in Linux if any kernel crash occurs, as the pstore logic of efivars.c will try to write the crash dump to NVRAM.

POSSIBLE_SOLUTIONFOR_A) Now passing noefi kernel boot param will avoid this direct corruption path in the latest linux kernels (which have the noefi related bugfix - i.e which clears EFI_RUNTIME_SERVICE flag rather than wrongly clearing the EFI_BOOT flag) .

However at this juncture I am not sure if the linux kernel's acpi erst error logging functionality related pstore logic will trigger the same issue or not as this logic is not disabled on passing noefi boot param (However as uefi runtime service SetVirtualAddressMap is not called when noefi is passed - may be it won't create a problem but I am not sure, as I haven't dug sufficiently deep into uefi yet). So may be the safest bet may be to disable pstore logic of acpi erst (apei) in the linux kernel for now (This suggestion is partly due to my lack of knowledge wrt this fully currently, as I haven't dug enough into uefi+acpi interaction and their runtime environment [I don't mean uefi runtime service here] and its implication wrt os runtime).

BUG_B) Existance of the Samsung BIOS's old SMM equivalent handshaking or its vestiges in the newer UEFI firmware, and it leading to MCE.

Now the samsung-laptop module in the linux kernel is dependent on this old mechanism for achieving its functionality, and this inturn would lead to the BUG_A mentioned above in Samsung UEFI firmware, as it would trigger a MCE and kernel crash dump.

POSSIBLE_SOLUTIONFOR_B) AGAIN the latest linux kernels have WORKED AROUND this by disabling the loading of the samsung-laptop module, if the uefi booting is used. NOTE that passing noefi doesn't interfere with proper handling of this work around, as the linux kernel still knows that it was booted using UEFI even if noefi is passed.

BUG_C) The UEFI RT GetNextVariableName service doesn't handle its input parameters properly.

As noted by Jakob Heinemann during his exploration of the Samsung firmware bug, passing a variable name size greater than 128 potentially returns a error from GetNextVariableName (when in reality it shouldn't). However the Linux efivars.c in the kernel is currently written to pass 1024 has the variable name size and in turn there is no easy/direct way for a application using efivars to know about any possible error encountered by efivars (what if EFI_DEVICE_ERROR occurs or a buggy firmware craps out like in this samsung case). So that would mean that on these buggy Samsung UEFI firmwares, a linux system will not be able to read/get any variables. It will appear has if no variables are there. INTURN the efibootmgr would potentially wrongly overwrite the first boot entry used by the uefi firmware boot logic for uefi firmware module or some other os or ...

NOTE: I haven't gone thro the efibootmgr code currently. If the efibootmgr is not reading the kernel error logs to cross check for any warning messages related to get_next_variable, then it ideally should be updated to cross check for it. Also if efibootmgr (or programs which use it) is currently not treating a empty efivars file (in turn boot entry) listing has a possible corner case to be handled specially by cross checking with the system user rather than blinding writing to 1st boot entry, then it should be fixed to handle this corner case in a proper user controlled manner.

POSSIBLE_SOLUTIONFOR_C) If noefi kernel parameter is passed, then as efivars won't be loaded, efibootmgr shouldn't trigger this corner case. However one will be required to handle the installation and suitable configuration of the uefi boot entries to allow linux booting on their own. Which will involve at a minimum copying of the Grub2 efi bootloader to the EFI partition, and in case secure boot is required then even the signed shim by Matthew Garrett or the new signed loader from Linux foundation will require to be copied to EFI partition. Also secure boot will require the equivalent/proper configuration of the Grub2/elilo/linux kernel (signed/hash) to let the secure boot to continue.