[Samsung NP530U3C-A01] LID close, AC, and battery status events not produced anymore

Bug #1283589 reported by juanmanuel
140
This bug affects 28 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
High
linux (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

On my Samsung Series 5 NP530U3C-A01 the LID close, AC, and battery status events not produced anymore. The situation is the same (tested with trusty-desktop-amd64.iso 2014-02-22). I created a blog post where I can put all the information related to this bug here: http://zenstep.com.ar/samsung-linux/

In summary, after a suspend sleep with many events (8 plug/unplug, or battery dropping or increasing about 16%, maybe less), the Embedded Controller accumulates those events and stops producing GPE 0x17. Neither Windows nor Linux query those accumulated events, because the EC doesn't produce GPE 0x17 anymore. The issue persists between restarts and shutdowns, and even after re-suspending/re-resuming. Hitting the reset button through the hole in the back fixes the problem temporarily. That is, until the next unlucky suspend.

IMPORTANT: if the laptop is never ever suspended, the issue never comes back.

To force the issue to happen, you can either:
1) Sleep the computer in linux (by closing the lid or any other means).
2) Unplug from the wall, plug, unplug, plug, unplug, plug, unplug, plug (8 actions or more).
3) Resume from sleep. You'll note that the battery icon is fixed, and unplugging or plugging doesn't update battery status anymore. Also note that the ability to suspend by closing the LID is lost.

OR:
1) echo disable > /sys/firmware/acpi/interrupts/gpe17
2) plug, unplug, plug, unplug, plug, unplug, plug, unplug (8 actions)
3) echo enable > /sys/firmware/acpi/interrupts/gpe17

WORKAROUND: Turn off the computer, unplug it, hit the reset button in the back of the laptop, and then plug it again.

WORKAROUND (kernel patch 1): https://bugzilla.kernel.org/show_bug.cgi?id=44161#c133

BETTER WORKAROUND (kernel patch 2): https://bugzilla.kernel.org/show_bug.cgi?id=44161#c149

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-11-generic 3.13.0-11.31
ProcVersionSignature: Ubuntu 3.13.0-11.31-generic 3.13.3
Uname: Linux 3.13.0-11-generic x86_64
ApportVersion: 2.13.2-0ubuntu5
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 2546 F.... pulseaudio
CasperVersion: 1.337
Date: Sat Feb 22 22:57:34 2014
LiveMediaBuild: Ubuntu 14.04 LTS "Trusty Tahr" - Alpha amd64 (20140222)
MachineType: SAMSUNG ELECTRONICS CO., LTD. 530U3C/530U4C
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: initrd=/casper/initrd.lz file=/cdrom/preseed/hostname.seed boot=casper quiet splash -- BOOT_IMAGE=/casper/vmlinuz.efi
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-11-generic N/A
 linux-backports-modules-3.13.0-11-generic N/A
 linux-firmware 1.125
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/15/2013
dmi.bios.vendor: Phoenix Technologies Ltd.
dmi.bios.version: P14AAJ
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: SAMSUNG_NP1234567890
dmi.board.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.board.version: FAB1
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 9
dmi.chassis.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.chassis.version: 0.1
dmi.modalias: dmi:bvnPhoenixTechnologiesLtd.:bvrP14AAJ:bd04/15/2013:svnSAMSUNGELECTRONICSCO.,LTD.:pn530U3C/530U4C:pvr0.1:rvnSAMSUNGELECTRONICSCO.,LTD.:rnSAMSUNG_NP1234567890:rvrFAB1:cvnSAMSUNGELECTRONICSCO.,LTD.:ct9:cvr0.1:
dmi.product.name: 530U3C/530U4C
dmi.product.version: 0.1
dmi.sys.vendor: SAMSUNG ELECTRONICS CO., LTD.

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

While the power supply device (ADP1) seems to always understand when the cord is plugged in and removed, the battery (BAT1) does not seem to know when it is discharging or not.

Samsung Series 9 np900x4b with Fedora 17

$ uname -a
Linux Lethe 3.4.4-3.fc17.x86_64 #1 SMP Tue Jun 26 20:54:56 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

(Results of acpi -i -b -a)
*** AC PLUGGED IN ***
Battery 0: Full, 100%
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: on-line

*** AC REMOVED ***
Battery 0: Charging, 100%, until charged
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: off-line

*** System Suspended and woken back up ***

*** AC STILL REMOVED ***
Battery 0: Discharging, 100%, 06:33:49 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: off-line

*** AC PLUGGED IN ***
Battery 0: Unknown, 98%
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: on-line

*** SUSPEND AND WAKE ***

*** AC PLUGGED IN ***
Battery 0: Unknown, 98%
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: on-line

*** AC REMOVED ***
Battery 0: Charging, 98%, charging at zero rate - will never fully charge.
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: off-line

The same information is reflected when looking directly at the sysfs enteries.
/sys/class/power_supply/ADP1/online shows the correct status all the time. (1 for online, 0 offline)

/sys/class/power_supply/BAT1/status will show the correct status after a fresh suspend, reflecting the above results.

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

Created attachment 74581
dmesg

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

Created attachment 74591
lsmod

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

Created attachment 74601
lspci

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

Created attachment 74611
upower -d

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

Created attachment 74621
ACPI DSDT DSL

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

The effect this has on my system is that upowerd does not always know if the laptop is running on AC or Battery. This also means that things like the battery indicator in Gnome 3 and tuned do not end up in the right state.

Revision history for this message
In , bryan+lk (bryan+lk-linux-kernel-bugs) wrote :

Also affects new Ivy Bridge Samsung Series 9 models (observed on 900X4C). Also tested this using mainline kernels without vendor changes and got the same behavior.

Revision history for this message
In , mark (mark-linux-kernel-bugs) wrote :

Just in case the NP400x4C (Ivy Bridge version) give different values, here are the results of acpi -i -b -a on this system.

** AC Off **

Battery 0: Discharging, 70%, 03:46:06 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: off-line

** AC On **

Battery 0: Discharging, 70%, 01:55:16 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: on-line

** Suspend and wake **

** AC Still On **

Battery 0: Charging, 70%, 00:50:05 until charged
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: on-line

** AC Off **

Battery 0: Charging, 70%, 02:11:35 until charged
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: off-line

** Suspend and wake in this state **

** AC Off **

Battery 0: Discharging, 70%, 05:06:01 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: off-line

** AC On **

Battery 0: Discharging, 70%, 406:00:00 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: on-line

What I find quite interesting about these is that the times change as the AC state changes and seem to reflect the "correct" time, i.e. time to discharge when disconnected and time to charged when connected. So something is detecting the change, it's just not propagating to all properties.

Revision history for this message
In , mark (mark-linux-kernel-bugs) wrote :

Should have said, the above came from an Ubuntu 12.04 machine using mainline kernel packages.

Linux XXXXXX 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

Also affected, Samsung 5 Series NP530U3C laptop, same symptoms:

Plugged in:
Battery 0: Charging, 79%, 00:27:20 until charged
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: on-line

Not plugged in:
Battery 0: Charging, 79%, 01:26:32 until charged
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: off-line

uname -a
Linux jaytec 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 14:25:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Also had Ubuntu Quantal 3.5 kernel before wiping and reinstalling 3.2. The bug existed there also.

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

Created attachment 75961
dmesg

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

Created attachment 75971
lshw

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

Created attachment 75981
lsmod

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

Created attachment 75991
lspci

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

Created attachment 76001
upower

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

This for me seems fixed, although my Samsung laptop brightness buttons cause the screen to flicker and thus are useless now :)

uname -a
Linux jaytec 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 14:25:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Kernel has not updated, not quite sure what update fixed it, possibly a driver update from Ubuntu http://ppa.launchpad.net/ubuntu-x-swat/x-updates/ Ubuntu PPA which offers updated X drivers.

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

Interesting... though I don't see how it could be x drivers that would fix things. I can say though that the status updates seemed to work at *some* point for me... I remember it working quite clearly... but then it stopped working again for some reason and has not worked since. Almost makes me think it's an ACPI issue on the laptop itself... though things do work in windows so I am really not sure.

(Actually it *may* be broken in windows too, but Windows may be seeing the fact it's discharging and updating the icon based on that vs the ac being unplugged)

Revision history for this message
In , mark (mark-linux-kernel-bugs) wrote :

Just tried booting my NP400X4C from a USB stick built with the Ubuntu Quantal Alpha 3 build and it seems to be somewhat more successful in that the icon states change as the AC adapter is plugged and unplugged. However, acpi -i -b -a reflects the state absolutely fine immediately after applying the AC adapter (i.e. Charging) and then at some point later, fairly quickly - within a minute, the state goes from Charging to Unknown.

I'm not sure if there is a log or something I can monitor to see these state changes and what is triggering them?

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

And now broken again for me, as Lori said had happened.

The only package changes I made were these via normal updates:
Start-Date: 2012-08-02 08:10:51
Commandline: apt-get upgrade
Upgrade: libcupsppdc1:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcupsimage2:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcupsimage2:i386 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcupscgi1:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcupsdriver1:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), empathy-common:amd64 (3.4.2.1-0ubuntu1, 3.4.2.3-0ubuntu1), cups-client:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), empathy:amd64 (3.4.2.1-0ubuntu1, 3.4.2.3-0ubuntu1), cups-ppdc:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), cups-common:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcups2:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcups2:i386 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), cups:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), cups-bsd:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcupsmime1:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), nautilus-sendto-empathy:amd64 (3.4.2.1-0ubuntu1, 3.4.2.3-0ubuntu1)
End-Date: 2012-08-02 08:11:11

No boot after upgrades so unlikely to have anything to do with these?

Yesterday I booted several times (have this freezing issue as well on this Samsung), then suspended when leaving from work. Restored today and it didn't work.

Maybe returning from suspend broke something? A full reboot though did not help to resolve situation. AC is plugged in but acpi shows:

jaywink@jaytec: pts/2: 4 files 72Kb -> acpi -a -i -b
Battery 0: Discharging, 27%, 00:33:27 remaining
Battery 0: design capacity 6100 mAh, last full capacity 5600 mAh = 91%
Adapter 0: on-line

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

Just updated to 2.5.0 and the issue is still there :)

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

Odd you are having a freezing issue Jason - Mine wakes and sleeps several times day for weeks without issue. I am however running Fedora and not Ubuntu for what that matters.

Is the suspend happening on resume?

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

Jason: Also very weird on the flickering! Do you have the samsung_laptop module loaded? (lsmod | grep samsung)

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

Jason, on your flickering issue it looks like there is a ticket in the system for that.

https://bugzilla.kernel.org/show_bug.cgi?id=43031

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

Thanks Lori for a note about the bug with flickering. It's interesting that when my screen started flickering, the ACPI issue was fixed - until it broke again. Wondering if these are somehow related.

Also it seems this bug also concerns so far only Samsung users.

My laptop suspends and resumes ok - though it doesn't suspend on closing the lid even though it is not plugged in and power settings have been checked to be correct. Need to find/raise a bug about that too.

The freezing issue happens at random, 0-2 times a day. Suddenly everything just freezes, fan also goes on quite highly. Only way out is holding the power button for a long time. No solution from killing X or trying ctrl-alt-f1 etc.

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

Jason,

What version of Ubuntu are you running? I was running 12.04 without crashes for a week or so before going back to Fedora.

Have you run Windows on the laptop for any length of time? I'm wondering if the freezing is being caused by an issue with the laptop itself... bad memory or something?

Revision history for this message
In , marc-kernel (marc-kernel-linux-kernel-bugs) wrote :

For information, some Ubuntu users reported that upgrading the BIOS fixed the problem:

https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061

Revision history for this message
In , aviram (aviram-linux-kernel-bugs) wrote :

Marc: Users that installed the BIOS upgrade a month or so ago, all reported the bug 'returning' after a while, myself included. It's unclear what causes the bug to re-manifest itself, but it always does.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

Can also verify this on NP530U3C with ubuntu and latest mainline kernel:

Linux notebook 3.5.2-030502-generic #201208151151 SMP Wed Aug 15 15:52:12 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
In , nils (nils-linux-kernel-bugs) wrote :

I can also confirm this. Tested on 2 samsung 900x3c laptops running ubuntu 12.04 x86_64. upgraded kernel to 3.5.0-13.

Revision history for this message
In , jos.ferraris (jos.ferraris-linux-kernel-bugs) wrote :

i was creating acpid rules for my archlinux from my new np530uc3 and everything was working great, than acpi started to not log stuff like lid closing, brightness button and others so, is an acpi problem or, et least something related to it.
my config worked for at least a week without any issue...
i'm sure to not have installed any kernel update neither acpi, bios or somethig... so i can't figure out what can be the problem

Revision history for this message
In , jlee (jlee-linux-kernel-bugs) wrote :

Please try:
echo 0 > /sys/module/battery/parameters/cache_time

Revision history for this message
In , marcus.m.maertens (marcus.m.maertens-linux-kernel-bugs) wrote :

Just stumbled upon a possible carrot:
http://www.fossfactory.org/project/p330

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

(In reply to comment #31)
> Please try:
> echo 0 > /sys/module/battery/parameters/cache_time

Interesting, things change slightly after this:

/*plugged in */
jaywink@jaytec: pts/5: 24 files 828Kb -> acpi -abi
Battery 0: Full, 100%
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: on-line

/*unplugged */
jaywink@jaytec: pts/5: 24 files 828Kb -> acpi -abi
Battery 0: Discharging, 100%, 03:39:19 remaining
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: off-line

I guess that kind of is better. If I change the value back to 1000 which it was before, the following happens:

/*plugged in */
jaywink@jaytec: pts/5: 24 files 828Kb -> acpi -abi
Battery 0: Unknown, 98%
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: on-line

/*unplugged */
jaywink@jaytec: pts/5: 24 files 828Kb -> acpi -abi
Battery 0: Discharging, 98%, 03:46:44 remaining
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: off-line

Anyway, even with acpi correctly saying whether it is discharding or charging, Ubuntu battery indicator doesn't notice the change. Anyone know how to trigger it manually?

Revision history for this message
In , jlee (jlee-linux-kernel-bugs) wrote :

That's better also check the daemon on userspace, e.g. upower or HAL.

Different machine might have different situation.

Revision history for this message
In , jos.ferraris (jos.ferraris-linux-kernel-bugs) wrote :

If use a needle and reset the (battery/bios ?!) from the hole on the bottom of the laptop the acpi events work correctly for few days after falls again in acpi issues.

Have to shutdown and without powercord to perform the reset; have to plug the powercord again or the laptop will not wakeup.

Revision history for this message
In , sebastian.riemer (sebastian.riemer-linux-kernel-bugs) wrote :

It seems like everything related to the embedded controller "Device (H_EC)" doesn't work. Power source "ADP1", the battery "BAT1" and "LID0" are connected to it. Bug 45461 is related. LID0 should return the 1-bit value "LIDS" from the Controller registers with method "_LID". This method is described in the ACPI Spec: http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf .

The "_LID" method is only called by the following: "Notify (LID0, 0x80)" and this is part of "Method (_Q5E, 0, NotSerialized)" as well as "Method (_Q5F, 0, NotSerialized)" which are embedded controller queries.

So if nothing queries that embedded controller, then it can't be reported if the lid is closed or not. Perhaps, something has to activate that embedded Controller first.

It is the Intel HM76 chipset.

http://www.intel.com/content/www/us/en/chipsets/performance-chipsets/mobile-chipset-hm76.html

They have an OEM manual where they describe their embedded controller.

http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/7-series-chipset-pch-datasheet.pdf

Look at page 197. The EC is connected to the SMBus via "SMLink1".

Revision history for this message
In , sebastian.riemer (sebastian.riemer-linux-kernel-bugs) wrote :

This seems to be difficult to fix.

The EC on SMLink1 is connected to "Intel ME SMBus Controller 3" and not to SMBus. This requires the "Advanced TCO Mode" and I guess this also requires the MEI/HECI driver. On Linux this driver is quite new. It needs CONFIG_INTEL_MEI and is located in "drivers/misc/mei/" since kernel 3.5.

ME = Management Engine
MEI = Management Engine Interface
HECI = Host Embedded Controller Interface

Revision history for this message
In , leon_ti (leonti-linux-kernel-bugs) wrote :

I can confirm this issue for Samsung 535 laptop, which is AMD A6 platform, not Intel chipset.
No reaction for power supply plugging/unplugging.
CPU0: AMD A6-4455M APU with Radeon(tm) HD Graphics stepping 01

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

Bumped my laptop to Ubuntu 3.7.2-raring kernel (http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7.2-raring/) with no change:

/* not plugged in */
$ acpi -abi
Battery 0: Discharging, 67%, 01:36:05 remaining
Battery 0: design capacity 6100 mAh, last full capacity 3700 mAh = 60%
Adapter 0: off-line

/* plugged in */
$ acpi -abi
Battery 0: Discharging, 66%, 09:30:07 remaining
Battery 0: design capacity 6100 mAh, last full capacity 3700 mAh = 60%
Adapter 0: on-line

Laptop again is Samsung 530U3C.

Revision history for this message
In , mytools (mytools-linux-kernel-bugs) wrote :

Same problem here. Ubuntu 12.10 all updates installed, 32bit. Notebook Samsung 900X3C. Discharging or Charging always in both cases when AC plugged or not.
Same as in comments 39, 10, 8. Under Win8 all works fine even with no samsung drivers installed.

Some time ago when Ubuntu 12.10 just reliased I do not observe this problem and all works fine too. I think this happens after some update. But I don't know after what.

Revision history for this message
In , tianyu.lan (tianyu.lan-linux-kernel-bugs) wrote :

Hi all:
       It there someone that can check whether the problem also exits on the windows? This seems a bios issue. The battery state directly comes from bios.
The /sys/class/power_supply/BAT1/status only presents the value from bios.

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

Hi Lan,

I can't say for certain,... but for the couple of days I had Windows on the machine before installing Linux I didn't notice the problem. I did put my Windows image back on the machine attempting to troubleshoot things not long after running into the issue and Windows still seemed to act fine. Same again when I put windows on the machine to upgrade the firmware to see if that would address the issue. (Interestingly, it seemed to make the issue go away for a day or so...)

Having said that, it's also very possible that Window's may be handling battery status differently then we do in Linux.

In Linux, we do actually see when the transformer is plugged into the device... we just can't tell if the battery is discharging/recharging. Window's could just be inferring that because the transformer is unplugged, that we are discharging... again with the reverse.

If you could recommend a tool to get real diagnostic information on the battery state in Windows I will happily flash it back to the machine and get some logs.

I think it's pretty easy to just say "the machine / bios is just broke" (Though in all honesty, I'm very willing to believe such)... but the laptop has obviously passed Q/A at Samsung under Windows. Though this could also just mean there was a workaround implemented, or again that Windows is just not as picky about how it determines the battery state.

Is it possible that perhaps the firmware is reacting poorly to something we are doing? I can't help but to also think of this issue: http://www.pcworld.com/article/2027819/not-just-linux-windows-can-brick-samsung-laptops-too.html

Anyhow, please let me know if there is anything I can do to help.

Revision history for this message
In , mark (mark-linux-kernel-bugs) wrote :

Behaviourally Windows works correctly, I've actually restored the factory load onto my Series 9 as the behaviour of this bug means that Linux is just untenable. By behaviourally I mean that things like power options change correctly when the AC is attached and detached. One of the most impactful implications of this bug was that if the AC was attached after being started on battery the system would still consider itself to be discharging and do things like suspend/hibernate after 10 minutes of inactivity, not what you want if the machine is plugged in a playing music etc. This sort of switch works as expected in Windows. So, it may be that the Windows user-space just looks at different properties and "does the right thing" or it may be that the bug doesn't exist. Either way, the user experience "works" in Windows and doesn't work in Linux and that at the end of the day is the important thing.

Revision history for this message
In , mytools (mytools-linux-kernel-bugs) wrote :

I confirm comment #35. Also I think that this is BIOS/Firmware issue on partial Samsung models. But this is not affected by Windows OS (7/8). So I think linux should do the same as Windows.

Revision history for this message
In , tianyu.lan (tianyu.lan-linux-kernel-bugs) wrote :

Hi all:
         Can you test whether Bug 45461 is the same as this one? Seem they all happen on Samsung system and related with EC.

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

(In reply to comment #45)
> Hi all:
> Can you test whether Bug 45461 is the same as this one? Seem they
> all
> happen on Samsung system and related with EC.

Yes can confirm this too, laptop has no idea lid is closed but suspend works very well indeed when triggered via OS.

Revision history for this message
In , tianyu.lan (tianyu.lan-linux-kernel-bugs) wrote :

Ok. Mark this dupicate

*** This bug has been marked as a duplicate of bug 45461 ***

Revision history for this message
In , mytools (mytools-linux-kernel-bugs) wrote :

I can not confirm this. When I close lid computer suspends well. Heh.

I think this also can be related how ubuntu (linux) have been installed.
My idea is:
1. If ubuntu (linux) installed just right after Win8 -> bugs happen.
2. If installed just right after Win7 -> bugs not happen.
3. If installed just right after Win8 and pressed reset button -> no bugs.

Maybe someone can confirm this...

Revision history for this message
In , nils (nils-linux-kernel-bugs) wrote :

For me this has been fixed on samsung 9 series 900X3C-A01_NL by the
latest biosupdate provided by samsung (P04AAC). I think Windows doesn't
suffer from this since it's using WMI.

Revision history for this message
In , nichtstuer (nichtstuer-linux-kernel-bugs) wrote :

(In reply to comment #41)
> It there someone that can check whether the problem also exits on the
> windows? This seems a bios issue.
When I plug/unplug AC while in bios setup the laptop changes the brightness, but under Linux it doesn't.

(In reply to comment #35)
> If use a needle and reset the (battery/bios ?!) from the hole on the bottom
> of
> the laptop the acpi events work correctly for few days after falls again in
> acpi issues.
Works for me

(In reply to comment #48)
> I think this also can be related how ubuntu (linux) have been installed.
> My idea is:
> 1. If ubuntu (linux) installed just right after Win8 -> bugs happen.
> 2. If installed just right after Win7 -> bugs not happen.
> 3. If installed just right after Win8 and pressed reset button -> no bugs.
> Maybe someone can confirm this...
I can confirm this:
> 2. If installed just right after Win7 -> bugs not happen.

Revision history for this message
In , email (email-linux-kernel-bugs) wrote :

I am not sure why this would be a duplicate... closing the lid works just fine for me.

Revision history for this message
In , tianyu.lan (tianyu.lan-linux-kernel-bugs) wrote :

Oh. Sorry. I am not notice that. Reopen it. Many people suffer the issue on the Samsung machine. But seems the causes are different.

Revision history for this message
In , jos.ferraris (jos.ferraris-linux-kernel-bugs) wrote :

>
> (In reply to comment #35)
> > If use a needle and reset the (battery/bios ?!) from the hole on the
> bottom of
> > the laptop the acpi events work correctly for few days after falls again
> in
> > acpi issues.
> Does this fix you problem?

This is a temporary workaround that last untill something screw again the
bios // acpi connectivity. In general, the issue rise again after a X
resume (can be the first or after several resumes). This problem is closely
related to the LID issue because the Bios // acpi control also it; in fact
i have also this problem while the Bios is screwed... I'm on a Samsung
np530 ultrabook, but the issue is the same!

2013/2/20 <email address hidden>

> https://bugzilla.kernel.org/show_bug.cgi?id=44161
>
>
>
>
>
> --- Comment #53 from Lan Tianyu <email address hidden> 2013-02-20
> 06:16:58 ---
> Hi Lori:
>
> (In reply to comment #35)
> > If use a needle and reset the (battery/bios ?!) from the hole on the
> bottom of
> > the laptop the acpi events work correctly for few days after falls again
> in
> > acpi issues.
> Does this fix you problem?
>
> (In reply to comment #49)
> > For me this has been fixed on samsung 9 series 900X3C-A01_NL by the
> > latest biosupdate provided by samsung (P04AAC). I think Windows doesn't
> > suffer from this since it's using WMI.
> Since there is a newer bios version. Please have a try.
> I check there is no samsung-wmi driver and maybe this is the cause.
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
>

Revision history for this message
In , mytools (mytools-linux-kernel-bugs) wrote :

In my Samsung 900X3C I have BIOS ver. P05AAC. After reset button I have NOT observe issue.
I can not test my idea about something happens only usage of Win8. But I think this can be some "interference".

Revision history for this message
In , nichtstuer (nichtstuer-linux-kernel-bugs) wrote :

(In reply to comment #55)
Same with np530u3b and pre-installed Win7.

Revision history for this message
In , fringd (fringd-linux-kernel-bugs) wrote :

Is there some way to log the signals being sent to the battery? if so, i could reset the battery and mess with it until it breaks (suspend and resume a lot seem to be good starting places) and then mark when it fails. Anybody have any clues how to accomplish this?

Revision history for this message
In , nichtstuer (nichtstuer-linux-kernel-bugs) wrote :

After installing opensuse 12.3 the solution with needle doesn't work for me anymore both with suse and with linux mint 14.

Revision history for this message
In , dutra.julio (dutra.julio-linux-kernel-bugs) wrote :

After amount of time my system ACPI returns to reflect charging/discharging status for the battery correctly . I have a 530u3c with Linux 3.8.6.

Revision history for this message
In , jos.ferraris (jos.ferraris-linux-kernel-bugs) wrote :

#59
Are you saing you're not experiencing anymore theACPI problems with kernel 3.8.6?
What Os do you have?

Revision history for this message
In , dutra.julio (dutra.julio-linux-kernel-bugs) wrote :

My english is not so good. Sorry.

No. The problem persists. I turn on my computer without power cord. Then i plug the power cord and ACPI don't detects them. The power state remains in discharging. I continue working and after certain time the power state is detected as charging.

I have Archlinux.

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

I have a samsung series 9 np900x3c affected by the same issue.
I've noticed a pattern in this bug's behavior and I would like to know if someone else can confirm the following:

The bug triggers ONLY if a suspend/resume (hibernate/thaw) event happens while the laptop is connected to the charger. I've tested this for about 3 weeks now; I always unplug the charger before suspending (heck, even before shutting down the system, although that's unlikely to cause any harm) and always plug it in (if needed) only after resume.

Can anyone confirm that I haven't just hit a lucky streak?

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

Weird, the battery events just started working on my Samsung 530u3c - I didn't do anything really, specifically did not reset the battery via the pin on the bottom.

Maybe some update from the Ubuntu repos?

Kernel 3.8.8-030808-generic
Ubuntu 13.04

Revision history for this message
In , reshad (reshad-linux-kernel-bugs) wrote :

@Jason Is the battery and power monitor still working for you.

Can you find out what changed around the time of the update(which packages were updated, etc) Ubuntu stores these logs in /var/log/apt and should either be in the files history.log or in one of the history.log.*.tar.gz files.

I'm on Arch using the stock Arch Kernel 3.9.2-1 and both the battery/charger and lid still are not working for me.

Revision history for this message
In , mail (mail-linux-kernel-bugs) wrote :

(In reply to comment #64)
> @Jason Is the battery and power monitor still working for you.

Sorry forgot to update here. A few days later it stopped working. I think it started working not due to updates but if I remember correctly around that time my laptop had ran out of battery over the weekend. Maybe it worked like the 'battery pin' temporary fix.

Anyway, didn't last :)

Revision history for this message
In , mmmaaarrrnnneeesss (mmmaaarrrnnneeesss-linux-kernel-bugs) wrote :

For me too, battery status, lid and etc do not work as expected. Resetting BIOS solves temporarily.

Samsung 530U
Ubuntu 12.10
kernel package 3.5.0-30-generic

Revision history for this message
In , psep.pentauc (psep.pentauc-linux-kernel-bugs) wrote :

(In reply to comment #65)
> (In reply to comment #64)
> > @Jason Is the battery and power monitor still working for you.
>
> Sorry forgot to update here. A few days later it stopped working. I think it
> started working not due to updates but if I remember correctly around that
> time
> my laptop had ran out of battery over the weekend. Maybe it worked like the
> 'battery pin' temporary fix.
>
> Anyway, didn't last :)

You forgive me for my terrible English.

I really did something similar with the ultrabook, this was without power for a day and the next I was running the battery widget until yesterday.

I use Debian GNU/Linux sid/experimental.
Kernel 3.8-2

Revision history for this message
In , teppot (teppot-linux-kernel-bugs) wrote :

Has anyone tried contacting Samsung over this? Do we have Samsung kernel developers who could maybe help?

Revision history for this message
In , psep.pentauc (psep.pentauc-linux-kernel-bugs) wrote :

(In reply to comment #68)
> Has anyone tried contacting Samsung over this? Do we have Samsung kernel
> developers who could maybe help?

From what I understand samsung developers together with the kernel would be "working" on the problem.

Revision history for this message
In , psep.pentauc (psep.pentauc-linux-kernel-bugs) wrote :

I have updated the kernel to 3.10 in Debian GNU/Linux sid/experimental and the problem still persists.

Revision history for this message
In , w.and.gvb (w.and.gvb-linux-kernel-bugs) wrote :

I have a Samsung NP530U3C with Gentoo Linux and Kernel 3.8.13 and the problem is still present.

Revision history for this message
In , mark (mark-linux-kernel-bugs) wrote :

What can we do to get some progress on this issue?

Is there any further information that can be gathered from the kernel or interfaces to improve diagnosis of the rot cause of the problem?

Or this just not going to get any interest from anyone who is in a position to address it?

Revision history for this message
In , jan.christian.hoffmann (jan.christian.hoffmann-linux-kernel-bugs) wrote :

This problem exists on a NP540U3C, as well.

When the system is in this "broken state" the same problems appear on Windows 8, too. Probably this is because I did a fresh install and only installed the Samsung drivers that were needed when the system was not in the "working state". So, apparently, this bug does happen using the standard Windows ACPI implementation, as well, but on the preinstalled system there must be some Samsung specific driver to "fix" this.

Another thing I noticed: On Windows, the power supply state gets updated when I click on the battery icon.

Now, back to Linux: The problem seems to be that there is no system control interrupt (SCI) when the power supply is plugged in or out. I checked this by looking at /sys/firmware/acpi/interrupts/sci before and after connecting / disconnecting it. That also explains the behaviour on Windows, as it seems to poll the value independently of a SCI when you click on the battery icon

What I don't know, is whether it is possible to cause the system to switch from "working" to "broken state" independently of the operating system. But I can't try that right now, as it would mean using Windows only for some time.

Revision history for this message
In , sebastian.riemer (sebastian.riemer-linux-kernel-bugs) wrote :

@Jan: No wonder, it is the same chipset, the "Intel HM76". Have you installed the HECI driver from Samsung to your Windows?
After installing it, it should work if the doc from Intel is correct that the embedded controller and therefore lid, power and battery are controlled via HECI/MEI instead of SMBus.

Can you check your Linux kernel >= 3.5 config for CONFIG_INTEL_MEI and CONFIG_INTEL_MEI_ME please?

Locations:
-> Device Drivers -> Misc devices -> Intel Management Engine Interface
-> Device Drivers -> Misc devices -> ME Enabled Intel Chipsets

You should have a /dev/mei device then. The other driver provides MEI support for the following Intel chipsets:

7 Series Chipset Family
6 Series Chipset Family
5 Series Chipset Family
4 Series Chipset Family
Mobile 4 Series Chipset Family
ICH9
82946GZ/GL
82G35 Express
82Q963/Q965
82P965/G965
Mobile PM965/GM965
Mobile GME965/GLE960
82Q35 Express
82G33/G31/P35/P31 Express
82Q33 Express
82X38/X48 Express

The HM76 belongs to the 7 Series Chipset Family.

There is also kernel documentation: Documentation/misc-devices/mei/
Sending an email to <email address hidden> could also help.

Intel AMT (Active Management Technology) is required in the user-space it seems:

http://software.intel.com/en-us/blogs/2009/08/21/working-with-intel-amt-and-linux

There are "Power Packages" and the LMS (Local Manageability Service). OpenAMT seems to be legacy.

This is the place for latest Open Source Intel AMT downloads:
http://software.intel.com/en-us/articles/download-the-latest-intel-amt-open-source-drivers
http://software.intel.com/en-us/articles/intel-active-management-technology-start-here-guide-intel-amt-9

Please check that out if you get it working with that!

Revision history for this message
In , david.shirvanyants (david.shirvanyants-linux-kernel-bugs) wrote :

The bug is still present in NP900X3E-A02US, BIOS version P07AKB.
Resetting with a pin fixes the problem for the short period, usually until charger is plugged a few times.
Comparing the dmidecode output in the "fixed" and "broken" states shows only one line difference:

"Fixed":
Wake-up Type: Power Switch

"Broken":
Wake-up Type: Other

My sister has an NP900X3C with windows 7 installed, and that machine has no problems with lid or charger events. I suspect windows samsung drivers somehow reset some internal switch after system wakeup. I could do some tests on windows machine, but I just don't know how to debug system drivers.

Revision history for this message
In , equilibriumtr (equilibriumtr-linux-kernel-bugs) wrote :

Adding

acpi_osi='!Windows 2012'

to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub seems to have fixed the problem for me (linked to the solution given in the comments on https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061 ). The fix seems permanent; after about 10 reboots and 15 sleeps, the battery's charge/discharge states are still detected correctly. On top of that, the keyboard backlight started and is still working too :) It adjusts the brightness automatically according to the ambient light. Adjustment from keyboard (Fn+F9, Fn+F10) still has no effect though.

I am on the following configuration:

-Samsung NP900X4C
-Kubuntu 12.04.2
-Kernel 3.10.4-031004-generic
-BIOS P08ABK
-acpid-2.0.10

Revision history for this message
In , bzhmurov (bzhmurov-linux-kernel-bugs) wrote :

> acpi_osi='!Windows 2012' to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub
> seems

> to have fixed the problem

No it doesn't. On my NP900X3C it still does not work. linux-3.4.61.

Revision history for this message
In , psep.pentauc (psep.pentauc-linux-kernel-bugs) wrote :

(In reply to boris from comment #77)
> > acpi_osi='!Windows 2012' to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub
> seems
>
> > to have fixed the problem
>
> No it doesn't. On my NP900X3C it still does not work. linux-3.4.61.

A year has passed and still no solution. I preferred ultrabook change.

Revision history for this message
In , bzhmurov (bzhmurov-linux-kernel-bugs) wrote :

> A year has passed and still no solution. I preferred ultrabook change.

Yep. Right now we are picking notebook vendor for our employes. I'll recommend to buy Lenovo Carbon X1 instead. Samsung has very poor linux support.

Revision history for this message
In , leho (leho-linux-kernel-bugs) wrote :

(In reply to psep from comment #78)
> A year has passed and still no solution. I preferred ultrabook change.

This and lack of mini-DP was part of the reason why Samsung got dumped for Dell here.

Revision history for this message
In , w.and.gvb (w.and.gvb-linux-kernel-bugs) wrote :

(In reply to Sebastian Riemer from comment #74)
> @Jan: No wonder, it is the same chipset, the "Intel HM76". Have you
> installed the HECI driver from Samsung to your Windows?
> After installing it, it should work if the doc from Intel is correct that
> the embedded controller and therefore lid, power and battery are controlled
> via HECI/MEI instead of SMBus.
>
> Can you check your Linux kernel >= 3.5 config for CONFIG_INTEL_MEI and
> CONFIG_INTEL_MEI_ME please?
>
> Locations:
> -> Device Drivers -> Misc devices -> Intel Management Engine Interface
> -> Device Drivers -> Misc devices -> ME Enabled Intel Chipsets
>
> You should have a /dev/mei device then. The other driver provides MEI
> support for the following Intel chipsets:
>
> 7 Series Chipset Family
> 6 Series Chipset Family
> 5 Series Chipset Family
> 4 Series Chipset Family
> Mobile 4 Series Chipset Family
> ICH9
> 82946GZ/GL
> 82G35 Express
> 82Q963/Q965
> 82P965/G965
> Mobile PM965/GM965
> Mobile GME965/GLE960
> 82Q35 Express
> 82G33/G31/P35/P31 Express
> 82Q33 Express
> 82X38/X48 Express
>
> The HM76 belongs to the 7 Series Chipset Family.
>
> There is also kernel documentation: Documentation/misc-devices/mei/
> Sending an email to <email address hidden> could also help.
>
> Intel AMT (Active Management Technology) is required in the user-space it
> seems:
>
> http://software.intel.com/en-us/blogs/2009/08/21/working-with-intel-amt-and-
> linux
>
> There are "Power Packages" and the LMS (Local Manageability Service).
> OpenAMT seems to be legacy.
>
> This is the place for latest Open Source Intel AMT downloads:
> http://software.intel.com/en-us/articles/download-the-latest-intel-amt-open-
> source-drivers
> http://software.intel.com/en-us/articles/intel-active-management-technology-
> start-here-guide-intel-amt-9
>
> Please check that out if you get it working with that!

I've followed your tips and compiled INTEL_MEI into the kernel. The lid event has been working properly ever since. For me the issue seems solved, but I can only speak for myself.

Revision history for this message
In , nichtstuer (nichtstuer-linux-kernel-bugs) wrote :

https://lkml.org/lkml/2013/6/29/98
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1196155
Since then I have MEI and MEI_ME compiled as modules in kernels 3.10-3.11 and the problem with Samsung still exists.

Revision history for this message
In , equilibriumtr (equilibriumtr-linux-kernel-bugs) wrote :

Ignore my previous comment, after about 15 sleep/resumes, all of the errors came back.

Revision history for this message
In , zygimantas.b (zygimantas.b-linux-kernel-bugs) wrote :

I started having this on my SAMSUNG NP530U3C-A05EE machine this September (or at least I did not notice it before, although I had been using it with Quantal since June). Decided to upgrade to Saucy, however this did not fix the issue for me. No ACPI event is triggered after connecting/disconnecting the AC cable and/or closing the laptop lid.

Booted into Windows to see if there is a BIOS update avaible, as suggested by someone here, alas nothing has been provided by SAMSUNG so far. (And I do not believe it will be.)

Revision history for this message
In , webbino (webbino-linux-kernel-bugs) wrote :

I have the same problem !

My PC is a samsung serie 5 (np530u3c), I have upgrade the BIOS to latest version (P09ABH).

Revision history for this message
In , zygimantas.b (zygimantas.b-linux-kernel-bugs) wrote :

(In reply to Andrea Fagiani from comment #62)
> I have a samsung series 9 np900x3c affected by the same issue.
> I've noticed a pattern in this bug's behavior and I would like to know if
> someone else can confirm the following:
>
> The bug triggers ONLY if a suspend/resume (hibernate/thaw) event happens
> while the laptop is connected to the charger. I've tested this for about 3
> weeks now; I always unplug the charger before suspending (heck, even before
> shutting down the system, although that's unlikely to cause any harm) and
> always plug it in (if needed) only after resume.
>
> Can anyone confirm that I haven't just hit a lucky streak?

I have been following your advice for the past week and it indeed seems to be the case. If I unplug it before suspending AND resume with it being unplugged (in fact, you can actually plug it in *after* suspending, just be sure to unplug it before resuming), the bug DOES NOT get triggered.

However, I should I forget to unplug it before suspending or resume while it is plugged in, the bug occurs and the only way to get rid of it is to press the reset button underneath the laptop.

In my case it is a SAMSUNG Series 5 Ultra NP530U3C-A05EE machine.

Revision history for this message
In , abarto (abarto-linux-kernel-bugs) wrote :

The same problem affects my Samsung NP550P5C.

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Same deal on my NP9003XC-K01AU.

My main work-around is to just suspend and resume whenever I plug/unplug so that the state gets updated.

Revision history for this message
In , zygimantas.b (zygimantas.b-linux-kernel-bugs) wrote :

Kieran, what do you mean? Can you explain the sequence of your actions in more detail?

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Žygimantas, I have found that the charging state is detected correctly if I suspend and resume.

So, if the state is incorrect (i.e. showing "charging" when unplugged, or vice versa), I press the power button gently to suspend the laptop. Then, one second later, I press the power button again and the charging state will then be correct.

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

You can also run something like

# /usr/sbin/rtcwake -m standby -s 2

standby is somewhat faster than suspend to ram and using rtcwake wakes up your device automatically in 2 seconds.

Anyway, I can now confirm, several months later, than the issue does not trigger if you follow my advice from comment 62.

Revision history for this message
In , zygimantas.b (zygimantas.b-linux-kernel-bugs) wrote :

Kieran, thanks! I will try that out once this bug is triggered. For the time being, I am following Andrea's method of avoiding this.

Andrea, I have been following you advice for nearly a month now and I can only second that. That is clearly a pattern. The bug gets triggered only when I suspend/resume with the charger being connected.

Revision history for this message
In , 3daniele03 (3daniele03-linux-kernel-bugs) wrote :

(In reply to Žygimantas Beručka from comment #92)
> Kieran, thanks! I will try that out once this bug is triggered. For the time
> being, I am following Andrea's method of avoiding this.
>
> Andrea, I have been following you advice for nearly a month now and I can
> only second that. That is clearly a pattern. The bug gets triggered only
> when I suspend/resume with the charger being connected.

I've been tried this trick for a couple of week, but today the bug has come back.

Revision history for this message
In , 3daniele03 (3daniele03-linux-kernel-bugs) wrote :

A stupid question (I admit my ignorance). Why is the status of this bug "NEEDINFO"?
I would be happy to provide more info if needed...

I have a 900X4C with gentoo, kernel 3.10.7-r1 and, as far as I know, the only ways to fully detect the change of power state (including relevant change of /sys/devices/system/cpu/cpu*/cpufreq/bios_limit) are to suspend/resume hibernate/resume or reboot

Revision history for this message
In , sebastian.riemer (sebastian.riemer-linux-kernel-bugs) wrote :

Gabriel followed my instructions to setup HECI/MEI and he says that it works, now. What is required, is that more people test that. If enough people can confirm this, then we could talk to distributions to automatically integrate this.

I don't know why Intel doesn't care for this - they should know!

I'm just a kernel developer for storage with a hobby interest in ACPI and reverse engineering. My colleague with this issue has already sold his Samsung laptop again.
E.g. on LinuxTag in Berlin most speakers come with an IBM ThinkPad or a Mac. The ThinkPad guys have connections to ThinkPad ACPI developers from Japan. This is why they have so good Linux support. But in the thinkpad driver you can also see indication for reverse engineering in comments like "unknown magic constant".

Revision history for this message
In , alessandro.crismani (alessandro.crismani-linux-kernel-bugs) wrote :

Wait,

is having MEI and MEI_ME compiled as kernel modules enough, or do we need them compiled in the kernel (y in the config)?

Second question, do they still break suspend, as other people pointed out earlier?

I have MEI and MEI_ME compiled as modules, and lid detection is broken. I can try to compile them in the kernel, but I need a working suspend!

Revision history for this message
In , nichtstuer (nichtstuer-linux-kernel-bugs) wrote :

(In reply to Sebastian Riemer from comment #95)

I have MEI and MEI_ME compiled as modules, but the problem still exists. Could you write a more detailed guide on how to compile the kernel with them? I'd really appreciate that.

Revision history for this message
In , sebastian.riemer (sebastian.riemer-linux-kernel-bugs) wrote :

Please compile the MEI and MEI_ME drivers into the kernel as Gabriel did (CONFIG_...=y)! This ensures that they are really available during early boot and loaded if the hardware is detected. Required modules are not always integrated into the initrd automatically.

Make sure that Intel AMT services are running in the user-space. Without them the MEI drivers are quite useless.

As I don't have access to such hardware and can only give recommendations, please contact Gabriel for details how he managed it. His email address is available here.

Revision history for this message
In , sebastian.riemer (sebastian.riemer-linux-kernel-bugs) wrote :

(In reply to alex from comment #97)
> I have MEI and MEI_ME compiled as modules, but the problem still exists.
> Could you write a more detailed guide on how to compile the kernel with
> them? I'd really appreciate that.

The issues you had in kernels < 3.12 should be fixed in 3.12.
Download the git source with

git clone https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable

The Google mirror is much faster than kernel.org. Then you can look at the MEI commits with:

git log drivers/misc/mei/

Looks like they've introduced a lot of bugs in 3.9 as they've sent many commits to linux-stable 3.9+. Ubuntu is not really known for picking the most stable kernels. So compiling a kernel yourself can help in some cases.

This should be the fix for suspend issues:
mei: me: clear interrupts on the resume path

And these 3.12 commits look interesting:
mei: make me client counters less error prone
mei: bus: stop wait for read during cl state transition
mei: cancel stall timers in mei_reset

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

@Sebastian Riemer

I can confirm this is fixed on 3.12; I've been running it on ArchLinux since it entered testing a few days ago and I haven't been unable to reproduce the issue!

Revision history for this message
In , marc-kernel (marc-kernel-linux-kernel-bugs) wrote :

Tested on my Samsung np530u3b with kernel 3.12 : problem still here.

Revision history for this message
In , nichtstuer (nichtstuer-linux-kernel-bugs) wrote :

(In reply to Marc MAURICE from comment #101)
> Tested on my Samsung np530u3b with kernel 3.12 : problem still here.

Did you compile MEI and MEI_ME drivers and LMS (see Sebastian tips above)?

Revision history for this message
In , marc-kernel (marc-kernel-linux-kernel-bugs) wrote :

arf no. I just tested with the last ubuntu kernel 3.12.0
it seems to have MEI and MEI_ME, but no LMS.

CONFIG_INTEL_MEI=m
CONFIG_INTEL_MEI_ME=m

I should try compiling LMS when I have some time.

Revision history for this message
In , reshad (reshad-linux-kernel-bugs) wrote :

(In reply to Andrea Fagiani from comment #100)
> I can confirm this is fixed on 3.12; I've been running it on ArchLinux since
> it entered testing a few days ago and I haven't been unable to reproduce the
> issue!

Did you compile the recompile the kenrnel.

I just upgraded to the stock arch kernel to 3.12 and the bug remains.

Let me know if you did anything differet to make it work.

Revision history for this message
In , nichtstuer (nichtstuer-linux-kernel-bugs) wrote :

(In reply to Marc MAURICE from comment #103)
Firstly, you need to compile kernel yourself with
CONFIG_INTEL_MEI=y
CONFIG_INTEL_MEI_ME=y
Ubuntu kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/ won't work.
Secondly, I faced some problems compiling LMS on Ubuntu. Here are instructions from readme-file and some ubuntu-specific tips:
> To build and install the LMS, call "./configure" with no arguments. Next,
> call "make install".
If you got an error, add
#include <unistd.h>
header in tools/ATVersion.cpp before building.
> To start the LMS use "service lms start".
> To ensure that the LMS will load upon startup, use "chkconfig --level 35 lms
> on" command to activate the service in specific runlevels.
Yo'll probably need to use "sysv-rc-conf" instead of "chkconfig" and add "2" to levels, since Debian-Ubuntu use second as default the default multi-user mode.

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

(In reply to Reshad Patuck from comment #104)
> Let me know if you did anything differet to make it work.

Actually no, I haven't done anything besides upgrading to the official Arch 3.12 package, which by the way has the INTEL_MEI stuff compiled as modules, i.e.
CONFIG_INTEL_MEI=m
CONFIG_INTEL_MEI_ME=m

Also, I am *not* running LMS, the kernel update alone seems to be enough to solve the issue on my NP900X3C; I'll report back if the problem comes up again in the following days.

Revision history for this message
In , mauritiusdadd (mauritiusdadd-linux-kernel-bugs) wrote :

Hi,

I'm running Archlinux on a Samsung NP530U3C-A03IT, and I can also confirm that upgrading to kernel version 3.12 seems to solve this bug, in fact after few days of suspend/resume and hibernation/resume cycles I can't reproduce this bug anymore.

Like Andrea Fagiani (comment #106), I have the kernel provided form official Archlinux repositories that has Intel MEI built as module (CONFIG_INTEL_MEI=m and CONFIG_INTEL_MEI_ME=m) and no LMS.

Best regards.

Revision history for this message
In , nichtstuer (nichtstuer-linux-kernel-bugs) wrote :

Well, I can confirm that neither ubuntu nor vanilla kernel 3.12 solves the problem. I followed Sebastian's instructions (thanks a lot!) and only after compilation of BOTH kernel and LMS the problem seems to be gone (although I'm still not sure).
I have NP530U3B running under Ubuntu 13.04.

Revision history for this message
In , sebastian.riemer (sebastian.riemer-linux-kernel-bugs) wrote :

I've helped Daniele debugging the issue from LMS to kernel level. And on his laptop (Samsung NP900X4C-A03US) the LMS doesn't work.

syslog entry:
LMS: Cannot connect to Intel AMT via MEI driver

We've debugged it further down and the MEI ioctl "IOCTL_MEI_CONNECT_CLIENT" fails.

$ strace -e ioctl -p `pidof -s lms`
Process 6109 attached
ioctl(5, HIDIOCGRDESCSIZE or HIDIOCGVERSION or SNDRV_HWDEP_IOCTL_INFO,
0x7fff86321af0) = -1 ENODEV (No such device)

The kernel returns -ENODEV in drivers/misc/mei/main.c, function mei_ioctl().

We've brought this issue to Tomas Winkler as he is the driver maintainer:

$ scripts/get_maintainer.pl -f drivers/misc/mei/main.c
Tomas Winkler <email address hidden> (supporter:INTEL MANAGEMENT...)

He says that the system is not a (by hardware) vPRO (AMT) enabled system. This is why the driver refuses the LMS to connect. There are other possible MEI usages with less functionality. These are displayed under:
<debugfs>/mei/meclients.

So it looks like Samsung has a special MEI client software running on the systems where AMT or LMS can't be used.

Tomas has sent this issue to the Intel support. But no response, yet. IMHO he should know best what's going on with this hardware. :-/

I would look at hardware access in the driver (setting activation bits. etc.). Perhaps it can be solved by custom kernel hacking.

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

After a few days, the bug has appeared again on my system as well.
I tried using LMS but having a ULV ivy bridge processor (which is not vPro enabled) I encounter the same issue Daniele has.

Revision history for this message
In , lops776 (lops776-linux-kernel-bugs) wrote :

Same Problem as in first post from "Lori Holden" 2012-07-03 and here with

* Samsung Series 9 np900x3f
* Ubuntu 13.10
* Kernel 3.11.0-13-generic #20-Ubuntu SMP Wed Oct 23 07:38:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Its remarkable that this problem has been existing for so long...

Revision history for this message
In , leho (leho-linux-kernel-bugs) wrote :

(In reply to lops from comment #111)
>
> Its remarkable that this problem has been existing for so long...

I hate to see all these good people's energy being wasted on Samsung's booboo. Went through similar back in the day with an Acer Travelmate 8172 and Broadcom wifi adapter. Wasted so much effort trying to debug the problem it's insane. At least the resulting life rule "never touch Broadcom again" has served me well from then on.

The solution is to support the right companies and right models, and while trying to not spam this bug, I'd like to influence the gatherers here to do the same. Dell and Lenovo do seem to be on the forefront with Linux so that's what I've chosen for my companies.

Maybe I should note I also purchased 900X3A for myself earlier. You have to hand it to Samsung's design team, the Series 9 is a hit look-and-feel wise. Luckily, 900X3A also worked flawlessly on Linux. But this bug single-handedly reminded me of the lurking Windows-company dangers. So when the battery started fading and I needed a higher-res screen + mini-DisplayPort, I dropped all newer S9-s out of the race. Dell XPS14, while weightier and mid-res screen, so far is doing quite well on all heavy duty Linux stuff I've thrown at it.

That being said, I hope someone gets this thing solved. Hate to see good hardware going to waste.

Revision history for this message
In , abarto (abarto-linux-kernel-bugs) wrote :

I agree 100% that people should avoid Samsung hardware if they're planning on using it with Linux. The response upon support requests for Linux issues has always been "use Windows" and we shouldn't encourage that kind of toxic behavior from a manufacturer.

I have an NP550P5C which is also plagued with ACPI issues, including this one, and doing a little bit of research I found a workaround that works for me:

1) Remove the battery and leave it out for a minute or two (some people mentioned that I could use a reset button that's below the battery but my model doesn't have one)
2) Remove the AC plug before suspending or resuming.

If I do 1) once and keep doing 2), the AC detection seems to work. For some reason, suspending and resuming with the AC plugged seems to cause the issue, and the only way to revert it is doing 1). Booting on Windows does nothing if the issue has already occurred.

I hope this helps.

Revision history for this message
In , Robert.Moore (robert.moore-linux-kernel-bugs) wrote :

Please post the acpidump for the original Samsung machine.

Revision history for this message
In , 3daniele03 (3daniele03-linux-kernel-bugs) wrote :

Created attachment 116461
acpidump output, as requested by Robert in Comment #114

Revision history for this message
In , lops776 (lops776-linux-kernel-bugs) wrote :

Created attachment 116471
acpidump for SAMSUNG NP900x3f-k01de

Revision history for this message
In , lops776 (lops776-linux-kernel-bugs) wrote :

Ok, I did it like in post #1:

Samsung Series 9 np900x3f with Ubuntu 13.10

$ uname -a
Linux mars 3.11.0-13-generic #20-Ubuntu SMP Wed Oct 23 07:38:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

(Results of acpi -i -b -a)

1.*** AC IS PLUGGED IN ***
Battery 0: Full, 100%
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: on-line

2.*** AC REMOVED ***
Battery 0: Charging, 100%, until charged
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: off-line

3.*** System Suspended and woken back up ***
  *** AC STILL REMOVED ***
Battery 0: Discharging, 100%, 05:56:22 remaining
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: off-line

4. *** AC PLUGGED IN ***
Battery 0: Unknown, 98%
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: on-line

5. *** SUSPEND AND WAKE ***
   *** AC STILL PLUGGED IN ***
Battery 0: Unknown, 98%
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: on-line

6. *** AC REMOVED ***
Battery 0: Charging, 98%, 00:06:57 until charged
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: off-line

7. *** AC PLUGGED IN ***
Battery 0: Unknown, 98%
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: on-line

8. *** AC STILL PLUGGED IN ***
   This time acpi -V

Battery 0: Unknown, 98%
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: on-line
Thermal 0: ok, 29.8 degrees C
Thermal 0: trip point 0 switches to mode critical at temperature 106.0 degrees C
Thermal 0: trip point 1 switches to mode passive at temperature 95.0 degrees C
Thermal 1: ok, 50.0 degrees C
Thermal 1: trip point 0 switches to mode critical at temperature 106.0 degrees C
Thermal 1: trip point 1 switches to mode active at temperature 87.0 degrees C
Thermal 1: trip point 2 switches to mode active at temperature 55.0 degrees C
Cooling 0: LCD 70 of 100
Cooling 1: pkg-temp-0 no state information available
Cooling 2: intel_powerclamp no state information available
Cooling 3: Processor 0 of 10
Cooling 4: Processor 0 of 10
Cooling 5: Processor 0 of 10
Cooling 6: Processor 0 of 10
Cooling 7: Fan 0 of 1
Cooling 8: Fan 0 of 1
Cooling 9: Fan 0 of 1
Cooling 10: Fan 0 of 1
Cooling 11: Fan 0 of 1

9. *** AC STILL PLUGGED ***

 sudo acpidump >acpidump.dat
 Wrong checksum for FADT!

The dump file i have attached

I hope that is the information you need. If not, tell me plz...

greetings

Revision history for this message
In , nichtstuer (nichtstuer-linux-kernel-bugs) wrote :

Created attachment 116481
acpidump output

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

Created attachment 116521
acpidump NP900X3C

Revision history for this message
In , mauritiusdadd (mauritiusdadd-linux-kernel-bugs) wrote :

Created attachment 116571
acpidump for Samsung NP530U3C-A03IT

Revision history for this message
In , mauritiusdadd (mauritiusdadd-linux-kernel-bugs) wrote :

Please also ignore my previous comment #comment 107 because unfortunately today the bug appeared again and on my machine the intel LMS does not work.
best regards.

Revision history for this message
In , david.shirvanyants (david.shirvanyants-linux-kernel-bugs) wrote :

Recent Samsung laptop Book 9 Plus (NP940X3G-K01US) or recent Linux versions seem to have this bug fixed.
I am running Fedora 20 Beta and do not see any problem with lid detection, etc. I have tried to plug and unplug power in sleeping and active states, with lid open and closed and see no problem.

A side note - reset hole on the bottom seems to be gone, so if acpi/power problems arise again the old fix won't work.

Revision history for this message
In , lops776 (lops776-linux-kernel-bugs) wrote :

(In reply to david.shirvanyants from comment #122)

On my Samsung Series 9 np900x3f I have started the Fedora live Image:

 "Fedora-Live-Desktop-x86_64-20-1.iso" from a USB stick.

The Kernel version was 3.11.10-301.fc20.x86_64. None of the issues seem to be solved.

Revision history for this message
In , san (san-linux-kernel-bugs) wrote :

Created attachment 125541
acpidump for NP535U3C-A01PL

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Possible fix found by juanmanuel at:

https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061

See comment numbers 100 through 103 there.

I can confirm that this fix is working since I tried it 20 minutes ago.

It seems that once a particular buffer fills up, no more events are generated. To fix this, it seems to be necessary to manually poll and clear this buffer so that new events are sent.

I wonder if this can be handled better in the kernel, so that an external program is not needed? Also, rather than blindly ignoring built-up events, should the kernel be processing them in some way?

Revision history for this message
In , lops776 (lops776-linux-kernel-bugs) wrote :

(In reply to Kieran Clancy from comment #125)
> Possible fix found by juanmanuel at:

I can confirm that the fix is working on my Samsung Series 9 np900x3f Ubuntu 13.10...!

LID/BATTERY/AC detection is working now

nice work by juanmanuel

Revision history for this message
In , abarto (abarto-linux-kernel-bugs) wrote :

(In reply to Kieran Clancy from comment #125)
> Possible fix found by juanmanuel at:
>
> https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061
>
> See comment numbers 100 through 103 there.
>
> I can confirm that this fix is working since I tried it 20 minutes ago.
>
> It seems that once a particular buffer fills up, no more events are
> generated. To fix this, it seems to be necessary to manually poll and clear
> this buffer so that new events are sent.
>
> I wonder if this can be handled better in the kernel, so that an external
> program is not needed? Also, rather than blindly ignoring built-up events,
> should the kernel be processing them in some way?

It seems to fix the problem on my NP550P5C too.

Revision history for this message
In , berend.de.schouwer (berend.de.schouwer-linux-kernel-bugs) wrote :

juanmanuel's workaround WorksForMe(tm)

Samsung NP900X3C
Fedora 20
Kernel 3.12.10-300.fc20.x86_64

Lid open/closed
Power cable plugged/unplugged

A note on deleting the events:

I unplugged/plugged the cable and opened/closed the lid while suspended. Even though those events are now deleted instead of processed, the status icons in gnome-shell and files in /proc/acpi/ were correct pretty much instantaneous.

systemd uses:

/usr/lib/systemd/system-sleep/samsung.sh

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

Hello! I'm Juan Manuel Cabo
I just registered here to reply to the last post:

> A note on deleting the events:
> I unplugged/plugged the cable and opened/closed the lid while suspended.
> Even > though those events are now deleted instead of processed, the status
> icons in > gnome-shell and files in /proc/acpi/ were correct pretty much
> instantaneous.

The reason that they AC plugged in/unplugged status is reported correctly on resume, is that the OS calls the _WAK method of the DSDT on resume, which checks to see if there were any changes in LID/AC/Battery status and generates the corresponding events through Notify(). This is independant of the EC.
    Also the EC events were discarded anyways on resume, since the GPE's are disabled during that critical stage of resuming.

Now let me try to summarize the bug here, which I already explained in
     https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061
(posts #100 and #102).

The embedded controller accumulates events during sleep. If the events are too many (16 according to my "binary search" by hand), as in too many times plugging/unplugging the PSU, or battery change percentage decreasing or increasing 16 times, or LID open/close, a combination of them, during sleep, then the EC decides to not interrupt the OS with any more events (GPE 0x17) until they are queried.

This situation remains even after rebooting or shutting down, and there are only three ways to unstuck the EC:

     1) Force Query the events (which my little C program does, attached in the launchpad issue). The OS will never query the events by itself because the EC's GPE 0x17 is never produced again.

     2) Or remove the battery (these laptops don't have removable battery), which leaves us with hitting the reset button through the small hole in the back of the ultrabook.

     3) Or flash any bios, which seems to reset the EC.

What my little program does is the following:

    do {
        outb(ACPI_EC_COMMAND_QUERY, EC_COMMAND_PORT);
        status = inb(EC_COMMAND_PORT);
        data = inb(EC_DATA_PORT);
    } while (data != 0 && ++count < 10000);

The eventy type comes back in 'data'. When data==0, no more events are left in the EC. And the EC starts to produce them again, comming back to normal immediately.

("normal" meaning that the laptop starts to produce LID close events and AC and battery change events again).

So far, from the feedback in this and the other thread, the models which have the same problem, that can be fixed by this fix are:

     Samsung Series 5 (models NP530U3C, NP535U3C, NP530U3B, NP550P5C)

     Samsung Series 9 (models NP900X3F, NP900X4B, NP900X4C, NP900X4D, NP900X3C)

I personally have a series 5 NP530U3C. My DSDT is posted in the launchpad issue, in post #101.

My workaround fix is attached in post #103. It is a simple C program which reads the events from the EC until there are no more, so that the EC can start producing GPE 0x17 again.

https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061

--
Juan Manuel Cabo

Revision history for this message
In , dennis.jansen (dennis.jansen-linux-kernel-bugs) wrote :

I can confirm this. Now we only have to turn this into a generic kernel patch. Does it always make sense to read the EC until it's empty? Could someone confirm this? Maybe this will fix other problems we were not even aware of?
At least it seems the listed Samsung system should get a fix.

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

(In reply to Dennis Jansen from comment #130)
> I can confirm this. Now we only have to turn this into a generic kernel
> patch. Does it always make sense to read the EC until it's empty? Could
> someone confirm this? Maybe this will fix other problems we were not even
> aware of?
> At least it seems the listed Samsung system should get a fix.

A proper fix would be to have some way to tell the EC to not accumulate events during S3 sleep, failing that, running the loop right after suspend to empty those useless events produced during sleep I think is the best option. A good place to do that would be in <linuxkernel>/drivers/acpi/ec.c function acpi_ec_unblock_transactions_early() or in some other function called from sleep.c at resume.

The events accumulated during S3 sleep are useless, so it is does make sense to clear them all. Again, the best fix would be to have a way to tell the EC not to try to report events during sleep.

After all, the DSDT method _WAK() does Notify() whether the batterycharge/AC/LID changed during sleep, and does this independently of the EC.

--
Juan Manuel Cabo

Revision history for this message
In , aaron.lu (aaron.lu-linux-kernel-bugs) wrote :

Very nice analysis and it's pretty cool to know what happened underneath, thanks a lot.

(In reply to Juan Manuel Cabo from comment #131)
> (In reply to Dennis Jansen from comment #130)
> > I can confirm this. Now we only have to turn this into a generic kernel
> > patch. Does it always make sense to read the EC until it's empty? Could
> > someone confirm this? Maybe this will fix other problems we were not even
> > aware of?
> > At least it seems the listed Samsung system should get a fix.
>
> A proper fix would be to have some way to tell the EC to not accumulate
> events during S3 sleep, failing that, running the loop right after suspend
> to empty those useless events produced during sleep I think is the best
> option. A good place to do that would be in <linuxkernel>/drivers/acpi/ec.c
> function acpi_ec_unblock_transactions_early() or in some other function
> called from sleep.c at resume.

Or add a resume callback for the EC driver and clear all the events gathered during sleep phase there.

>
> The events accumulated during S3 sleep are useless, so it is does make sense
> to clear them all. Again, the best fix would be to have a way to tell the EC
> not to try to report events during sleep.

I've no idea how to do that, sounds like we need some way of manipulating the EC?

>
> After all, the DSDT method _WAK() does Notify() whether the
> batterycharge/AC/LID changed during sleep, and does this independently of
> the EC.

Also, the LID/Battery/AC driver will also update the device's status on their resume callback so there doesn't seem to be any meaning to store these events to me.

Revision history for this message
In , tianyu.lan (tianyu.lan-linux-kernel-bugs) wrote :

Created attachment 126801
ec.patch

I write a debug patch to query EC event during system resume. Please have a try.

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

For what it's worth, Windows seems to use the "stop EC from generating events" approach rather than "clear events on wake". So it must be possible.

However, it seems prudent to do BOTH, as there are situations where just doing one still leads to problems.

That is, if we only clear events on wake and don't stop them on sleep, then the event buffer will still fill and potentially interfere with other operating systems (e.g. if something wrong happens during suspend/resume or the battery goes flat and next boot goes into Windows with a full event buffer, Windows doesn't clear the events and has most of the same problems described here).

Alternatively, if we only stop the EC from generating events during sleep, we won't clear the buffer if it has been filled by another operating system or by other yet unforeseen triggers (for example, if the kernel is locked up or indisposed the buffer could fill in the EC without ever going into sleep).

--

For the same reason, any function to clear the events on wake should also run on normal system boot.

--

I think the ideal solution would be to stop the EC from generating events during sleep in addition to clearing the events on wake and on boot (in case the buffer is full from a previous operating system or kernel).

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

(In reply to Lan Tianyu from comment #133)
> Created attachment 126801 [details]
> ec.patch
>
> I write a debug patch to query EC event during system resume. Please have a
> try.

After looking at the patch, a long recompile (and nice coffee), and testing thrice, I can confirm that this patch does the job perfectly. Thanks Lan Tianyu!!!!!

One small thing: maybe it'd be reasonable to include one small comment on the source, so that it doesn't get deleted in the future by someone not understanding the issue, such as:

--
    /* This loop consumes events accumulated during sleep by the EC. Some EC's stop reporting GPE events until they are polled, if too many accumulated. (samsung series 5 and 9). Fixes LID close / AC status ignored problem. */
--
(and a 'thank you' note in the patch would not be unwelcome; uncovering this bug took me quite some nights ;-) )

My laptop is the Samsung Series 5 NP530U3C-A01, I'm on Kubuntu 13.10, Here is exactly what I did:

    1) Disabled my workaround first.

    2) I applied the patch it to the source of ubuntu's linux kernel 3.11.0-14-generic_3.11.0-14.21, which is what I used in my previous tests. Recompiled and installed. Reboot into new kernel.

    3) Started 'sudo acpi_listen' in a console and kept running.

    4) Suspended the laptop. Unplug/plug/unplug... (more than 8 times).

    5) Resumed. Observed events in acpi_listen, everything normal, all good. Tried unplugging, it was recognized. Run my little program to see if any event was left, printing each output. None really left ('status' didn't have 0x20 bit).

    6) Tried again, but without running my program afterwards to see if any left. All good.

    7) Restarted, tried again (plugging/unplugging like crazy). All good.

This is what I did to test potential solutions before comming up with the workaround. Nothing resisted the tests. This patch did, and it works like a charm.

More details of my computer:

    Samsung Series 5 NP530U3C-A01

    # dmesg | grep -i dmi
    DMI: SAMSUNG ELECTRONICS CO., LTD. 530U3C/530U4C/SAMSUNG_NP1234567890, BIOS P14AAJ 04/15/2013

    # uname -a
    Linux varinia 3.11.0-14-generic #21 SMP Thu Feb 20 00:49:17 ART 2014 x86_64 x86_64 x86_64 GNU/Linux

    # cat /etc/issue
    Ubuntu 13.10 \n \l

Thank you again Lan Tianyu!!
--
Juan Manuel Cabo

Revision history for this message
In , dennis.jansen (dennis.jansen-linux-kernel-bugs) wrote :

Created attachment 127081
samsung_fix_ec_events patched to show count

I've patched the program to show the count. I often get over 800 EC events even if nothing happens (no AC replugging, no charging). Maybe this is specific to my system. But it seems the issues would appear more than expected. Maybe other people could also use the attached source to find out how many events are in the buffer during normal run time.

Revision history for this message
In , dennis.jansen (dennis.jansen-linux-kernel-bugs) wrote :

Also, for the kernel patch I propose these notes:

BUG: https://bugzilla.kernel.org/show_bug.cgi?id=57271
BUG: https://bugzilla.kernel.org/show_bug.cgi?id=44161

Signed-Off-By: Lan Tianyu <<email address hidden><
Signed-Off-By: Juan Manuel Cabo <email address hidden>
Tested-By: Juan Manuel Cabo <email address hidden>
Tested-By: Dennis Jansen <email address hidden>

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

(In reply to Dennis Jansen from comment #136)
> Created attachment 127081 [details]
> samsung_fix_ec_events patched to show count
>
> I've patched the program to show the count. I often get over 800 EC events
> [..]

They are not 800 events, its just that my little program doesn't sleep between queries, and gets its queries replied with the same event for a little while. I realized this after I first posted it.

Cheers!
--
Juan Manuel Cabo

Revision history for this message
In , dennis.jansen (dennis.jansen-linux-kernel-bugs) wrote :

Ah, OK. so even if I get a count of e.g. 3000, this does not mean there is more than one event? I've added a check if the data is the same as before. Let's see what count I get now. Of course if the EC is filled with the same data over and over again, this would remain unnoticed.

Thanks so much for your time spent on this, Juan!! Finally my Notebook is starting to really behave well in Linux.

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

No, adding a check to see if the data is the same as before can be tricky. The best is to add a usleep() or sleep() call, to give time to the EC.

Because sometimes the same event _is_ actually queued many times.

Try a call to sleep(1) inside the loop, and you'll see better.

Anyways, the kernel patch in comment #133 doesn't avoids all this because it uses the actual ec.c functions to query the EC properly (as opposed to my little program which uses a more direct (ahem quick and dirty :-) ) approach).
--
Juan Manuel Cabo

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

s/doesn't avoids/avoids/

Revision history for this message
juanmanuel (rockerito99) wrote :
description: updated
Revision history for this message
juanmanuel (rockerito99) wrote :

Attached DSDT of a Samsung Series 5 NP530U3C ultrabook with the same problem. Bios version is latest: P14AAJ

description: updated
Revision history for this message
juanmanuel (rockerito99) wrote :

This is the program I made, that "unstucks" the computer so that it can send LID and AC and Battery events again. (it queries the embedded controller queued events, thus unblocking the EC so that it can start sending them again).

Ideally run after resume from sleep, or at any other time.

Revision history for this message
juanmanuel (rockerito99) wrote :

Script that calls the program found in the other attachment after resume from sleep.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
juanmanuel (rockerito99) wrote : Re: LID close and AC and battery status events not produced anymore on samsung ultrabook.

This is a patch created by Lan Tianyu on the kernel bugzilla to do the same that my workaround did, but in a more proper way, and from the kernel:

          https://bugzilla.kernel.org/show_bug.cgi?id=44161#c133

I tested it:

          https://bugzilla.kernel.org/show_bug.cgi?id=44161#c135

and it works.
Lets hope more people test this patch so that it can be included some day in the kernel.

description: updated
Revision history for this message
juanmanuel (rockerito99) wrote :

More description about this issue found I posted in blog format here:

           http://www.zenstep.com.ar/samsung-laptop
--
Juan Manuel Cabo

Revision history for this message
juanmanuel (rockerito99) wrote :
tags: added: patch
penalvch (penalvch)
summary: - LID close and AC and battery status events not produced anymore on
- samsung ultrabook.
+ [Samsung NP530U3C-A01] LID close and AC and battery status events not
+ produced anymore
summary: - [Samsung NP530U3C-A01] LID close and AC and battery status events not
+ [Samsung NP530U3C-A01] LID close, AC, and battery status events not
produced anymore
penalvch (penalvch)
description: updated
Revision history for this message
juanmanuel (rockerito99) wrote :

Christopher: you changed my description of the bug, and in doing so, confused the two different ways to force the issue to show up. You listed the second way as a fourth item of the first way, when it is a different way.

Take a look at the original description.
--
Juan Manuel Cabo

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Created attachment 127121
acpi_ec_clear function, only run on affected hardware

There is already a function which runs on resume, namely acpi_ec_unblock_transactions(). This patch uses that function rather than setting up a new hook to be run on wake.

To avoid potential side-effects on non-Samsung hardware, I have also added DMI info which is checked along with other quirks when the ACPI bus is initialised. If the system vendor is Samsung, the flag EC_FLAGS_CLEAR_ON_RESUME is set to 1.

When the system resumes and acpi_ec_unblock_transactions() is run, this flag is checked and acpi_ec_clear() is run if necessary.

Finally, acpi_ec_clear() is also run if needed when the EC is finalised in acpi_ec_add(), in case the EC was jammed when the system boots. Otherwise the EC wouldn't be cleared until a manual suspend+resume (which may not happen often, or at all, depending on the user).

I'd also like to reiterate that we should ideally find out how to stop the EC filling during suspend, since this:
a) would prevent problems if resume never happens and another operating system is booted which doesn't clear the EC (e.g. Windows, or a Linux kernel prior to this patch)
b) may reduce power consumption while in sleep - perhaps the EC or other hardware is supposed to be put in a lower power state as part of this process

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Created attachment 127131
acpi_ec_clear function, only run on affected hardware (fixed)

Fix some problems with the previous patch. A function was used before its definition.

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Created attachment 127141
acpi_ec_clear function, only run on affected hardware

Fixed newline in pr_info()

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

(In reply to Kieran Clancy from comment #144)
> Created attachment 127141 [details]
> acpi_ec_clear function, only run on affected hardware
>
> Fixed newline in pr_info()

It works!!! I like this patch better!!

I do see the new pr_info message "Clearing state EC events" in dmesg at startup.

The other message, I don't see. Because by the time I enable debug messages
for ec.c, that pr_debug("Detected sys..") message was already in the past.

The events get cleared both on resume and on reboot according to my tests with your patch.

'acpi_listen' shows events happening correctly at all times.

--------
PLEASE: More people should test this kernel patch. There is a very simple guide for compiling your own kernel if you use ubuntu/kubuntu/xubuntu:

     https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel

To apply the patch, just change directory to the place were the kernel got uncompressed, and do:
     patch < thePachFile.patch
(you might need to point the patch util, if asked interactively, to: drivers/acpi/ec.c )

One more thing: with kernel 3.11 a line of context is different in the patch than in the ec.c, so I had to apply hunk #6 by hand of Kieran's patch. I guess kieran generated the patch against the latest kernel (3.13?). Since the difference is in an unrelated context line, it is a harmless difference.
-------

Thanks Kieran Clancy!
--
Juan Manuel Cabo

Revision history for this message
In , dennis.jansen (dennis.jansen-linux-kernel-bugs) wrote :

Thanks Kieran!

I've tested whether 20 cycles are sufficient on my system and... they are for me. I get to a maximum of 9.

I've modified the patch by Kieran
- to change the ec_reset function into a void and
- to show the number of loops. This way we can do some debugging on that number of 20 loops
- to warn if all loops are used up. On such systems there may be need to increase the number of loops.

I think it might be hard to get the patch in for *all* Samsung systems. We should probably try to collect the concrete systems (535,530,900) and restrict the fix to these systems. My system is
535U3C / NP535U3C-A04DE
(sudo dmidecode | grep Product)

But as there may be more affected systems - also non-Samsung systems, should we try to create a boot parameter which activates the EC clearing as well?

Attaching patch next.

Revision history for this message
In , dennis.jansen (dennis.jansen-linux-kernel-bugs) wrote :

Created attachment 127151
acpi_ec_clear function: void, show loops, warn

penalvch (penalvch)
tags: added: cherry-pick
Revision history for this message
In , dennis.jansen (dennis.jansen-linux-kernel-bugs) wrote :
Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Created attachment 127161
acpi_ec_clear function, only run on affected hardware

Thanks Dennis, some good suggestions there. I've attached a slightly cleaner version of the patch (hopefully).

1. Yes, acpi_ec_clear() should have been void - thanks.
2. I've #define'd a constant ACPI_EC_CLEAR_ITER to store the magical "20".
3. I've incorporated your extra messages

Lastly, I've been thinking about whether this should be more hardware specific. While it would be trivial to instead add a sequence of table entries such as:

        {
        ec_clear_on_resume, "Samsung hardware", {
        DMI_MATCH(DMI_SYS_VENDOR, "SAMSUNG ELECTRONICS CO., LTD."),
        DMI_MATCH(DMI_PRODUCT_NAME, "900X")}, NULL}

I currently feel that just picking up the system vendor is enough, on the basis that:

- this probably affects more Samsung systems than we realise
- it really shouldn't do any harm on systems without this bug
- if you look at the other entries in the table, they turn on quirks for things like "all ASUS systems", so it's not unprecedented

But, if it's knocked back for not being specific enough, here's a suggested list of product name substrings:

530U
535U
540U
550P
900X

(Mine is 900X3F/NP900X3F-K01AU)

As for adding another boot parameter, my personal opinion is that it's not necessary since we have a simple user space program which can be used to check if systems are affected. That is, I can't really think of a situation where the user space program couldn't work just as well as a stop-gap measure until other hardware was added to the table.

Revision history for this message
In , san (san-linux-kernel-bugs) wrote :

I do not know if this question is right here, but is there anyone who can put patched and compiled kernel somewhere so person who want to test it can just install it? I can put it on my server if there is a problem.

Regards!
and thanks for all who supported this patch.

Revision history for this message
Guillaume LAURENT (laurent-guillaume) wrote :

Also affect Serie 9 (NP900X3C in my case)

 Description: Ubuntu 13.10
 Release: 13.10

 Ubuntu 3.11.0-17.31-generic 3.11.10.3

juanmanuel workaround works.

Waiting Ubuntu patch

Revision history for this message
In , dennis.jansen (dennis.jansen-linux-kernel-bugs) wrote :

(In reply to Kieran Clancy from comment #149)

Thanks!

> Lastly, I've been thinking about whether this should be more hardware
> specific. While it would be trivial to instead add a sequence of table
> entries such as:
...

> I currently feel that just picking up the system vendor is enough, on the
> basis that:
>
> - this probably affects more Samsung systems than we realise
> - it really shouldn't do any harm on systems without this bug
> - if you look at the other entries in the table, they turn on quirks for
> things like "all ASUS systems", so it's not unprecedented

Yes, actually personally I think it might even be a good idea to always do this. I'm just not sure if the maintainers will agree.

>
> As for adding another boot parameter, my personal opinion is that it's not
> necessary since we have a simple user space program which can be used to
> check if systems are affected. That is, I can't really think of a situation
> where the user space program couldn't work just as well as a stop-gap
> measure until other hardware was added to the table.

Well, the issue with the user space program is currently that the EC status and data ports may have to be adjusted. At least for non-Samsung system I believe they would have to be. On the other hand - shouldn't it be possible to detect them automatically? You're right, a user space program might be a better way to test if this fixes any issues on different systems if we can make it work across different systems.

@San: Yes, that sounds like a good idea. We could use kernel.ubuntu.com/~kernel-ppa/mainline/ (and similar Fedora sources) as a basis and then create a .deb/rpm package for easy testing. Though I think there's already quite compelling evidence that this helps all around and does not cause any issues.

I'm still surprised how many things seem to work better now.

Revision history for this message
In , dennis.jansen (dennis.jansen-linux-kernel-bugs) wrote :

(In reply to Dennis Jansen from comment #151)
> > As for adding another boot parameter, my personal opinion is that it's not
> > necessary since we have a simple user space program which can be used to
> > check if systems are affected. That is, I can't really think of a situation
> > where the user space program couldn't work just as well as a stop-gap
> > measure until other hardware was added to the table.
>
> Well, the issue with the user space program is currently that the EC status
> and data ports may have to be adjusted. At least for non-Samsung system I
> believe they would have to be. On the other hand - shouldn't it be possible
> to detect them automatically? You're right, a user space program might be a
> better way to test if this fixes any issues on different systems if we can
> make it work across different systems.

Maybe we can grab the information from dmesg?
dmesg| grep ACPI | grep GPE

Revision history for this message
mmalmeida (mmalmeida) wrote :

Alfo affects: Samsung series 9 NP900X4C

Revision history for this message
In , mauritiusdadd (mauritiusdadd-linux-kernel-bugs) wrote :

Hi,
I have tested the mainline kernel (3.14-rc3) with the patch applied and I can confirm that it works on my Samsung NP530U3C-A03IT.
Thank you very much!

@Dennis Jansen (In reply to Dennis Jansen from comment #152)

> Maybe we can grab the information from dmesg?
> dmesg| grep ACPI | grep GPE

I think a better way to find the EC ports is to read the file /proc/ioports

For example on my system I have:

[daddona@530U3C ~]$ cat /proc/ioports | grep EC
  0062-0062 : EC data
  0066-0066 : EC cmd

Revision history for this message
Tim Edwards (tkedwards) wrote :

I have a Samsung NP530U3C-A01 laptop and I can confirm this bug affects my laptop. After applying the workaround script from juanmanuel in #3 and #4 I re-tested and the problem no longer appears.

Thanks juanmanuel for writing the fix and being so patient with Ubuntu's ridiculous bug policies - just don't lose your laptop otherwise we'll have to start this whole bug report process again!

Revision history for this message
Andrea Lazzarotto (Lazza) (andrea-lazzarotto) wrote :

«Thanks juanmanuel for writing the fix and being so patient with Ubuntu's ridiculous bug policies»

I think it's more of a one-man trolling issue than a policy problem.

Revision history for this message
juanmanuel (rockerito99) wrote :

UPDATED v2: Safer because it obtains the EC ports automatically from /proc/ioports, and it micro-pauses between queries, so that the EC returns each event only once. This replaces the little program in #3.
________
This is the program I made, that "unstucks" the computer so that it can send LID and AC and Battery events again. (it queries the embedded controller queued events, thus unblocking the EC so that it can start sending them again).

Ideally run after resume from sleep, or at any other time.

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

Hello again!
I updated the little program to read ports from /proc/ioports.

You can find it in:
     http://zenstep.com.ar/samsung-linux

or in post #14 here:
     https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1283589

or in post #158 here:
     https://bugs.launchpad.net/ubuntu/+source/linux/+bug/971061

the little program is also now GPL, and it also now micro-pauses between queries so that the EC tends to return each event only once.

Cheers!
--
Juan Manuel Cabo

Revision history for this message
In , san (san-linux-kernel-bugs) wrote :

Works fine on Samsung NP535U3C-A01PL.

Here you can download deb packages:
http://kernel.zabałaganionemiejsce.pl/
or http://kernel.xn--zabaaganionemiejsce-8fd.pl

You can find patch and modified ec.c source.

Build made by this steps:
1. git clone git://kernel.ubuntu.com/ubuntu/ubuntu-saucy.git
2. apply patch from comment #149
3. apply manual what does not apply automagically
4. fakeroot debian/rules clean && fakeroot debian/rules binary-headers binary-generic

Nothing modified with config.
Some package did not install (i do not remember which one...).

san@sammie:~ > uname -a
Linux sammie 3.11.0-18-generic #32 SMP Sun Feb 23 13:43:26 CET 2014 x86_64 x86_64 x86_64 GNU/Linux

san@sammie:~ > cat /proc/ioports | grep EC
  0062-0062 : EC data
  0066-0066 : EC cmd

I use it for an half hour and works fine - it seems.

Regards!

Revision history for this message
Tim Edwards (tkedwards) wrote :

@andrea-lazzarotto
To be fair it does say "One defect, per person, per hardware, per report" at https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue.

The 'One defect, per person' is obviously stupid when there's >100 of us with the same hardware and same problem, opening 100 bugs isn't going to help anyone. But I wouldn't make it personal against the guy if he's just a volunteer enforcing Canonical's rules.

Revision history for this message
Tomer (tbrisker) wrote :

@tkedwards
Only that rule was made to prevent multiple bugs being cluttered together in one bug report, not to be taken literally as a means of filling the bug system with hundreds of useless duplicates, and I quote:
"Don't amalgamate every issue and hardware you find a problem in after an update... Instead, do a separate report for each distinct issue, on each hardware. This is how different hardware can have similar problematic symptoms (ex. computer won't boot), but different root causes, and patches that fix the different causes."
Obviously this doesn't mean that every affected user should file a separate report for a common problem - otherwise what's the point of having the "this affects me too" on the top.

juanmanuel (rockerito99)
description: updated
Revision history for this message
Jon Cowell (info-synct) wrote :

I also have a Samsung NP-A530U3C-A01UK laptop that exhibits the same behaviour. Since applying the fix from juanmanul all is now well.

Many thanks juanmanual for identifying the cause for this long running issue and for working diligently to find the fix.

Revision history for this message
In , mr203010spam (mr203010spam-linux-kernel-bugs) wrote :

Applied the patch from comment #149 to the most current Kernel available on openSUSE (12.3): linux-3.13.4-1.g6eda950-desktop

The patch applied smoothly without any hickups.

dmidecode | grep Product
        Product Name: 900X3C/900X3D/900X4C/900X4D
        Product Name: SAMSUNG_NP1234567890

On the casing it says NP900X3C-A07DE

Plugging in and unplugging power is recognized now, so is lid close.

Thanks ever so much to all who invested so much work and time to tackle this bug!

I just hope this patch makes it into the main kernel soon!

Revision history for this message
juanmanuel (rockerito99) wrote :

This is a NEW and BETTER patch created by Kieran Clancy on the kernel bugzilla to do the same that my workaround did, but in a more proper way, and from the kernel:

          https://bugzilla.kernel.org/show_bug.cgi?id=44161#c149

I also tested it this weekend, and it works (I forced the issue to happen again, and then saw it being resolved automatically by a kernel compiled with this patch).

The most salient advantage over the previous patch is that it also unstucks the EC when the computer starts (in addition to sleep resume). This is more fool proof and suits more use cases.

It also checks the laptop model.

Lets hope more people test this patch so that it can be included some day in the kernel.

Revision history for this message
juanmanuel (rockerito99) wrote :

> Many thanks juanmanuel for identifying the cause for this long running issue and for working diligently to find the fix.

You're welcome, it fills me with joy reading all this good feedback!!
--
Juan Manuel Cabo
http://zenstep.com.ar/samsung-linux

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@juanmanuel ,

Do you plan on submitting your patch for inclusion in the mainline kernel?

tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
Revision history for this message
juanmanuel (rockerito99) wrote :

Joseph: The best kernel patch so far was made by Kieran Clancy (see post #18 here). His patch is attached to a kernel bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44161 in comment 149.

I'm just the author of the userspace workaround (post #3, #4, #14) that started it all (see the other issue here https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061 comments 102 onwards) . It is a standalone program that unstucks the EC. On the other hand, Kieran's kernel patch now does the same in a cleaner way from inside the kernel.

In my humble opinion, Kieran Clancy's patch can be included in the kernel. I don't know whether Kieran will submit it.

Cheers!
--
Juan Manuel Cabo

Revision history for this message
In , san (san-linux-kernel-bugs) wrote :

cat /proc/acpi/button/lid/LID0/state
is always open on Samsund NP535U3C with AMD A4.

Revision history for this message
In , dennis.jansen (dennis.jansen-linux-kernel-bugs) wrote :

@San: Have you tried keeping it closed for 10-20 seconds? In my case it simply took a long time. Otherwise you can try to reset the computer with the reset button on the bottom side of your notebook. It should keep working afterwards. If not, let us know.
I have an NP535U3C and it works for me. But it takes quite some time to be recognized.

Revision history for this message
In , joseph.salisbury (joseph.salisbury-linux-kernel-bugs) wrote :

@Kieran Clancy, do you plan on submitting your patch for inclusion in the mainline kernel?

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

@Joseph Salisbury

At this stage it's only been tested on a few machines. I'd like to give it a few more days, especially to see what kind of numbers come back from the "... EC events polled" messages.

If all goes well I plan to submit the patch.

Revision history for this message
In , san (san-linux-kernel-bugs) wrote :

san@sammie:~/Dokumenty > cat lid
         #!/bin/bash
         COUNTER=0
         while [ $COUNTER -lt 30 ]; do
             echo About $COUNTER s `cat /proc/acpi/button/lid/LID/state`
             let COUNTER=COUNTER+1
                sleep 1
         done

Agree. After about 11 sec it says closed. Many thanks and best regards.

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

(In reply to Kieran Clancy from comment #160)
> @Joseph Salisbury
>
> At this stage it's only been tested on a few machines. I'd like to give it a
> few more days, especially to see what kind of numbers come back from the
> "... EC events polled" messages.
>
> If all goes well I plan to submit the patch.

----------------
Testing results with San build of Kieran patch (comment #149 and comment #155)
----------------

*) Boot with an old kernel. Suspend. Unplug/plug/....(more than 8 times). Resume. Issue shows. Install new kernel (download, sudo dpkg -i *.deb). Reboot. Result in /var/log/syslog and dmesg:
     Clearing stale EC events
     8 EC events polled.
(as previously tested, Kieran patch continues to poll at boot. Good!)
Everything returned to normal, issue resolved.

*) Suspend with lid. Open lid while suspended (1 action).
Result: "1 EC events polled.".
I guess it was "lid open event". Will test without the lid so I don't have to subtract its lid open event.

*) Suspend through menu (instead of lid). Resume without doing nothing.
Result: "0 EC events polled."

*) Suspend through menu. Unplug/plug/unplug/plug (4 actions).
Result: "4 EC events polled."

*) Suspend through menu. Unplug/plug... (6 actions).
Result: "6 EC events polled."

*) Suspend through menu. Unplug/plug... (18 actions).
Result: "8 EC events polled."

CONCLUSIONS:

A) The patch keeps working perfectly, resolving all situations.

B) My Series 5 NP530U3C accumulates up to 8 events. Not more as I previously thought.

C) I think 20 is a good number for ACPI_EC_CLEAR_ITER.

So I was wrong about the 16 events number. It is actually definitely 8. I previously assumed it was double, because when unplugging with 'acpi_listen' open, one can see both ADP1 and BAT1 devices being triggered (the CPU devices are notified from the DSDT when the AC _Q51 or _Q52 methods are called, so they don't count). So I assumed that two events were generated per plug. But it seems not, or at least not during sleep.
    When I discovered that 8 unplug or plug actions were necessary to trigger the problem state, I assumed that each accumulated two events, so I thought 16 total. I was wrong there. Definitely it's 8 max accumulated events.

Cheers!
--
Juan Manuel Cabo

Revision history for this message
In , dennis.jansen (dennis.jansen-linux-kernel-bugs) wrote :

(In reply to Juan Manuel Cabo from comment #154)
> Hello again!
> I updated the little program to read ports from /proc/ioports.
>
> You can find it in:
> http://zenstep.com.ar/samsung-linux

Great! And again: thanks so much! :)

Revision history for this message
In , miticotoby (miticotoby-linux-kernel-bugs) wrote :

thanks soo much! been annoying as hell this bug.
can confirm this worked for me too with no issues (Model: NP900X3C)

Revision history for this message
juanmanuel (rockerito99) wrote :

Breaking News!

A fix in the form of a kernel patch has now been posted to the linux-acpi and linux-kernel mailing list:

        "[PATCH v2] ACPI / EC: Clear stale EC events on Samsung systems"
        http://marc.info/?l=linux-acpi&m=139359680828880&w=2

That kernel patch was made by Kieran Clancy and tested by me and others.

--
Juan Manuel Cabo
http://zenstep.com.ar/samsung-linux

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

Breaking News!

A fix in the form of a kernel patch has now been posted to the linux-acpi and linux-kernel mailing list:

        "[PATCH v2] ACPI / EC: Clear stale EC events on Samsung systems"
        http://marc.info/?l=linux-acpi&m=139359680828880&w=2

That kernel patch was made by Kieran Clancy and tested by me and others.

--
Juan Manuel Cabo
http://zenstep.com.ar/samsung-linux

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

The patch has been accepted into Rafael J. Wysocki's linux-pm development tree:
https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/commit/?id=27095111cbafd3212c7e9a4a8cef1099b7520ca8

From there it will hopefully land in the 3.15 stable release.

Revision history for this message
zeeeeee (miesogeno) wrote :

juanmanuel,
No matter who found the latest/better solution, you are truly a hero.
After reading all the associated bug reports, I decided to use your script because I don't know how to compile the kernel, and I presume I would have to redo it every time the kernel got an update.

My laptop model is NP530U4C, finally with a normal behavior.
Gracias!

Revision history for this message
In , joseph.salisbury (joseph.salisbury-linux-kernel-bugs) wrote :

Thanks so much for creating that patch, Kieran! Would it be possible for you to also request that the patch be submitted for inclusion in upstream stable?

Revision history for this message
In , abarto (abarto-linux-kernel-bugs) wrote :

For Fedora 20 users, the patch was merged into the latest kernel package. I would also like to thank everyone involved in fixing this bug.

Revision history for this message
In , jacco.ermers (jacco.ermers-linux-kernel-bugs) wrote :

Cool, I already got used to manually suspending my machine. Can't wait for it to make it into the Kernel (using Linux Mint).

So far we are stuck at 3.13...

Now the question remains, can we install using EFI or still legacy BIOS

Revision history for this message
In , info (info-linux-kernel-bugs) wrote :

(In reply to Jacco from comment #169)
> Cool, I already got used to manually suspending my machine. Can't wait for
> it to make it into the Kernel (using Linux Mint).
>
> So far we are stuck at 3.13...
>
Fedora has ported this patch to their 3.13.6 kernel. So with Fedora 20 I don't have this issue anymore. Perhaps Linux Mint has a backport policy as well ?

> Now the question remains, can we install using EFI or still legacy BIOS

Indeed...

Revision history for this message
In , miticotoby (miticotoby-linux-kernel-bugs) wrote :

does anybody know if/when Ubuntu is trying to build in this patch? just wondering when I'll be able to run all the updates again without having to uncheck the kernel every time ;)

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

(In reply to toby from comment #171)
> does anybody know if/when Ubuntu is trying to build in this patch? just
> wondering when I'll be able to run all the updates again without having to
> uncheck the kernel every time ;)

Hi!!!

The patch has been merged recently to the stable kernel 3.13 in version 3.13.7. (look in http://kernel.org in the changelog of 3.13).

To my knowledge, its too fresh to be in regular ubuntu repositories yet, but you can get the most recent kernel for ubuntu/kubuntu/xubuntu from here:

   http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.7-trusty/

and it includes the patch. Just download the "generic" packages from there, and install with "sudo dpkg -i *.deb".

Cheers!
--
Juan Manuel Cabo

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Created attachment 131151
process rather than discard events in acpi_ec_clear

Hi all,

If you have a machine that was (previously) affected by this bug, could you please take the time to try a new patch for this issue?

A user with an older Samsung machine found that the earlier patch introduced some issues for them, so this is a new patch designed to be a better solution to the problem.

Currently the new patch has only been tested on 2 affected machines (to my knowledge), so I would be very grateful if anyone else can test this patch (especially users with Samsung series 5 or 7 machines).

You will need to use a recent kernel source like 3. (one which includes the earlier acpi_ec_clear patch), or otherwise apply the previous patch first:

https://github.com/torvalds/linux/commit/ad332c8a45330d170bb38b95209de449b31cd1b4.patch

Then apply the attachment ec_process_stale_events.patch

Please test to see if your lid closing / suspending / etc. still work as intended with this patch.

This new patch also provides some debug information after every resume/boot which will show up in dmesg something like:

[ 1192.208056] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.217844] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.218844] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.222835] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.223834] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.224833] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.227833] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.228834] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.229832] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.230828] ACPI : EC: 8 stale EC events cleared

Please report if you get any EC_SC bytes here _other than_ 0x28. I only get 0x28 on my machine but I am very interested to know if other affected machines produce something else.

If you test it, please also report your system's product code, e.g.:

# dmidecode -s system-product-name
900X3F
# dmidecode -s baseboard-product-name
NP900X3F-K01AU

Many thanks,
Kieran

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Meant to say "You will need to use a recent kernel source like 3.14". Sorry for the extra bug email.

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

(In reply to Kieran Clancy from comment #173)
> Created attachment 131151 [details]
> process rather than discard events in acpi_ec_clear

> [..] could you
> please take the time to try a new patch for this issue?

Will test tonight
Cheers!
--
Juan Manuel Cabo

Revision history for this message
In , jacco.ermers (jacco.ermers-linux-kernel-bugs) wrote :

NP900X3E here, upgraded to kernel 3.13.7, works like a charm. Had to re-initiate the trackpad after reboot, but all else is fine.

Revision history for this message
In , mauritiusdadd (mauritiusdadd-linux-kernel-bugs) wrote :

(In reply to Kieran Clancy from comment #173)

Hi,

I have just built the mainline kernel 3.14 with the new patch applied, and it works perfectly.

# dmidecode -s system-product-name
530U3C/530U4C

# dmidecode -s baseboard-product-name
530U3C/530U4C

# dmesg | grep "ACPI : EC"
[ 0.288950] ACPI : EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[ 0.288954] ACPI : EC: acpi_ec_clear: EC_SC 0x00
[ 0.290821] ACPI : EC: 0 stale EC events cleared
[ 30.575003] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 30.577929] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 30.581249] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 30.584568] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 30.587893] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 30.591236] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 30.594559] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 30.597890] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 30.601229] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 30.604569] ACPI : EC: 8 stale EC events cleared

Revision history for this message
In , juanmanuel.cabo (juanmanuel.cabo-linux-kernel-bugs) wrote :

Hello!!

I tested last night. My computer is Samsung Series 5 NP530U3C:

    # dmidecode -s system-product-name
    530U3C/530U4C
    # dmidecode -s baseboard-product-name
    SAMSUNG_NP1234567890

for some reason, my baseboard-product-name seems generic.

On boot I got:

    [ 0.226027] ACPI : EC: acpi_ec_clear: EC_SC 0x00
    [ 0.226299] ACPI : EC: 0 stale EC events cleared

Then suspended, unplugged/replugged more than 10 times, and I got:

    [ 101.449890] CPU3 is up
    [ 101.454065] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [ 101.459620] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [ 101.463622] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [ 101.467604] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [ 101.471603] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [ 101.475605] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [ 101.479608] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [ 101.483609] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [ 101.487610] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [ 101.491610] ACPI : EC: 8 stale EC events cleared
    [ 101.491899] ACPI: Waking up from system sleep state S3

Then suspended again, just closed the lid and didn't unplug or replug:

    [ 225.274756] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [ 225.276221] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [ 225.280200] ACPI : EC: 1 stale EC events cleared
    [ 225.280478] ACPI: Waking up from system sleep state S3

at that time, after resume, "acpi_listen" showed the following:

    button/lid LID close
    button/lid LID open

Kernel is 3.13.8. I got the .tar.xz from www.kernel.org, applied the patch that you attached in your comment, and compiled with:

     make menuconfig
     make
     make modules_install install

Everything is working fine as expected on my machine.

Cheers!
--
Juan Manuel Cabo

Revision history for this message
In , giannis.koutsou (giannis.koutsou-linux-kernel-bugs) wrote :

(In reply to Kieran Clancy from comment #173)

Hi,

patched kernel 3.13.8 on Arch Linux using attachement from comment #173

So far everything is working perfectly; plug, unplug and lid close events are all being detected. I'm on a Series 9:

# dmidecode -s baseboard-product-name
NP900X3C-A05UK

# dmidecode -s system-product-name
900X3C/900X3D/900X3E/900X4C/900X4D

Up for almost 24hrs, suspended/resumed some five times or so:

[ 0.707289] ACPI : EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[ 0.707292] ACPI : EC: acpi_ec_clear: EC_SC 0x00
[ 0.707534] ACPI : EC: 0 stale EC events cleared
[ 234.145738] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 234.148802] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 234.152118] ACPI : EC: 1 stale EC events cleared
[ 1144.739072] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1144.742253] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1144.745590] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1144.748935] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1144.752238] ACPI : EC: 3 stale EC events cleared
[39577.534490] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[39577.537478] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[39577.540795] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[39577.544126] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[39577.547465] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[39577.550821] ACPI : EC: 4 stale EC events cleared
[47768.342146] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[47768.357869] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[47768.361202] ACPI : EC: 1 stale EC events cleared
[48527.104686] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[48527.105043] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[48527.108386] ACPI : EC: 1 stale EC events cleared

Cheers, and thanks a lot for the fix!

Revision history for this message
In , thomas (thomas-linux-kernel-bugs) wrote :

Is the patch from comment #173 available in any git tree yet?

Revision history for this message
In , miclud89 (miclud89-linux-kernel-bugs) wrote :

Just wanted to report working acpi events (FINALLY!).
Plug/unplug power, suspend on lid-close etc.

Running Gentoo and just installed kernel from gentoo-sources 3.14.1

# dmidecode -s baseboard-product-name
NP900X3E-A02SE

# dmidecode -s system-product-name
900X3C/900X3D/900X3E/900X4C/900X4D

Revision history for this message
In , pablo.caron (pablo.caron-linux-kernel-bugs) wrote :

I just downloaded kernel 3.14.2 form kernel.org and patched it. I compiled and installed it. Everything is working as expected, plug/unplug power-cord is detected by KDE, and the system goes to suspend when I close the lid.

I'm running OpenSUSE 13.1 x86_64 and KDE 4.13.0.

# dmidecode -s baseboard-product-name
SAMSUNG_NP1234567890

# dmidecode -s system-product-name
530U3C/530U4C

# dmesg | grep "ACPI : EC"
[ 0.163833] ACPI : EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[ 0.163837] ACPI : EC: acpi_ec_clear: EC_SC 0x00
[ 0.164158] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[ 0.164536] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[ 0.164848] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[ 0.165164] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[ 0.165540] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[ 0.165922] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[ 0.166236] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[ 0.166549] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[ 0.166862] ACPI : EC: 8 stale EC events cleared
[ 546.002726] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 546.003455] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 546.004445] ACPI : EC: 1 stale EC events cleared

Thank you for fixing it!!!!!

Cheers, Pablo

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Thanks to everyone who tested patches!

A second patch has just been accepted to the kernel to fix a regression on earlier Samsung machines N150/N210/N220.

Thanks to those who tested to make sure the new patch didn't break anything on Samsung series 5/7/9.

The EC_SC data which you've provided should be of some help if future patches are required.

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

On my NP900X3C, since kernel 3.16.3 (all the way up to 3.17rc7), I've been experiencing this bug again.

I've narrowed it down to the following commits:

"ACPI / EC: Add support to disallow QR_EC to be issued when SCI_EVT isn't set"
https://github.com/torvalds/linux/commit/3afcf2ece453e1a8c2c6de19cdf06da3772a1b08

"ACPI / EC: Add support to disallow QR_EC to be issued before completing previous QR_EC"
https://github.com/torvalds/linux/commit/558e4736f2e1b0e6323adf7a5e4df77ed6cfc1a4

On 3.17rc7, reverting the latter seems to solve the issue for acpi events related to the power supply, however LID switch is still broken. Reverting both seems to resolve the issue completely.

Anyone else experiencing similar behavior?

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

@Andrea

I have the same problem on my NP900X3C. I noticed it a week ago and started drafting a regression notice for lkml but things got busy and I forgot about it until now.

I'll post the notice and hopefully the author of those two patches will be able to offer some help, since he knows much more about handling the EC than I do.

Revision history for this message
In , parchd+kernel (parchd+kernel-linux-kernel-bugs) wrote :

Since updating to kernel 3.16.3, same problem has arisen on a Samsung NP900X4D.

Revision history for this message
In , zygimantas.b (zygimantas.b-linux-kernel-bugs) wrote :

I can confirm that 3.17rc7 (earlier RCs too) on Samsung NP530U3C-A05EE exhibits the same bug. That is, it does not detect charging/discharging state changes and does not suspend on closing the laptop lid.

Revision history for this message
In , san (san-linux-kernel-bugs) wrote :

Since i turn off suspend on lid close i do not know if the bug of lid persist (my LVDS does not wake up after return from suspend.

I belive changing form ac to dc and backwards does work but with huge delay (few minutes.

Just to recall: NP535U3C-A01PL.

Revision history for this message
In , bjorn.snoen (bjorn.snoen-linux-kernel-bugs) wrote :

@Andrea
Both of those also affect the Samsung NP900X3E model as well. Started noticing the problem about a week ago, running on Arch Linux, but guessing from the build time of the kernel it's been there a bit longer than that.

With charger plugged in:
acpi -i -b -a
Battery 0: Charging, 92%, 00:08:30 until charged
Battery 0: design capacity 5880 mAh, last full capacity 5500 mAh = 93%
Adapter 0: on-line

After disconnecting charger:
acpi -i -b -a
Battery 0: Charging, 88%, 00:24:19 until charged
Battery 0: design capacity 5880 mAh, last full capacity 5500 mAh = 93%
Adapter 0: off-line

uname -a
Linux Geneva 3.16.3-1-ARCH #1 SMP PREEMPT Wed Sep 17 21:54:13 CEST 2014 x86_64 GNU/Linux

Revision history for this message
In , san (san-linux-kernel-bugs) wrote :

No longer updates for me.

uname -a
Linux sammie 3.17.0-031700rc7-generic #201409281835 SMP Sun Sep 28 22:36:30 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
In , odi (odi-linux-kernel-bugs) wrote :

(In reply to Andrea Fagiani from comment #184)
> Anyone else experiencing similar behavior?

Same here on 900X3D. Reverting the two patches fixes it.

I think you should open a new issue instead of commenting on this closed one, as it is a regression.

Revision history for this message
In , parchd+kernel (parchd+kernel-linux-kernel-bugs) wrote :

@Ortwin, if you open a new issue please link to it from here so we can subscribe.

Revision history for this message
In , odi (odi-linux-kernel-bugs) wrote :

For now I have emailed the author and ACPI list:

http://thread.gmane.org/gmane.linux.kernel/1813529

Revision history for this message
In , aaron.lu (aaron.lu-linux-kernel-bugs) wrote :

Assign to Lv.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

(In reply to Ortwin Glück from comment #193)
> For now I have emailed the author and ACPI list:
>
> http://thread.gmane.org/gmane.linux.kernel/1813529

Thanks for the report.
I'll post permanent fixes here for Samsung laptops.
IRQs need to be both "polled" and "handled using interrupts" for a driver that tries to support hardware/firmware variants.

For me, it looks like that we need to implement "EC GPE polling" in the driver.
Currently we have "EC GPE polling" implemented for OBF/IBF interrupts.
But for SCI_EVT, we don't have such a polling mode.

I can see that a poller in the userspace can fix this.
So what I need to do is to implement the threaded IRQ (IRQ is polled in the thread and is also reported through interrupts) for EC GPEs in the kernel.

The patches are already ready as I mentioned in the mailing list.
I'll post them for you.
But please wait for a while because I need to do a rebase and basic unit test locally before posting.

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Note that I also reported this on the mailing lists 4 weeks ago, not sure why that was missed.

Lv, I disagree that we need to implement GPE polling in a new kernel thread. It was working perfectly without that before the two recent patches. It simply needs to be cleared on boot or resume from suspend. After that, the interrupts work fine as intended.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Created attachment 155351
[PATCH] ACPI/EC: Fix a regression caused by conflict firmware behavior between Samsung and Acer.

If Samsung firmware doesn't require driver to issue QR_EC until 0 (no event) is returned or Samsung firmware doesn't require driver to timely issue QR_EC to poll events, then this fix is sufficient.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

This patch is generated on top of the following git branch:
repository: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
branch: linux-next

Please first help to check if only this patch (attachment 155351) can quickly fix the regression.

If this is not sufficient, we will need to implement a QR_EC polling thread to timely poll events or poll events until firmware returning 0 (no event).
I'll post the QR_EC polling thread patches after this comment.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Created attachment 155371
[PATCH] ACPI/EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag.

My SCI_EVT polling thread is implemented on top of previous patch and a flushing support.
This is the 1st patch to support flushing.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Created attachment 155381
[PATCH] ACPI/EC: Enhance the checks to apply to QR_EC transactions.

The 2nd patch for flushing support.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Created attachment 155391
[PATCH] ACPI/EC: Add reference counting for query handlers.

The 3rd patch for flushing support.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Created attachment 155401
[PATCH] ACPI/EC: Add reference counting for query handlers.

The flushing support.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Created attachment 155411
[PATCH] ACPI/EC: Add command flushing support.

The flushing support.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Created attachment 155421
[PATCH] ACPI/EC: Cleanup QR_SC command processing by adding a kernel thread to poll EC events.

The polling event thread support.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Created attachment 155431
[PATCH] ACPI/EC: Add timely SCI_EVT polling support for Samsung models.

First let's try to use timely polling instead of clear_on_resume for Samsung laptops.
The other option is to let the polling thread to poll events until the firmware returning 0 (no event).

I'll add test requirement after this post.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

The test requests are:

1. Trying the regression fix as mentioned in comment 198.
2. If this cannot fix everything, please try:
   Using the same git branch and apply the following patches orderly:
   attachment 155351
   attachment 155371
   attachment 155381
   attachment 155391
   attachment 155411
   attachment 155421
   attachment 155431
   And try again. Since we have kernel polling thread implemented, the old clear_on_resume quirk code is deleted.

Revision history for this message
In , odi (odi-linux-kernel-bugs) wrote :

Somehow during my tests the power plug detection started working again with any kernel. I don't know what caused it. But the LID switch is fixed only with the complete patch series.

Test results:

vanilla 3.18-rc2:
 - LID switch not detected
 - power plug sometimes stops working

vanilla linux-pm/linux-next:
 - same as 3.18-rc2

vanilla linux-pm/linux-next + ec-flush1.patch:
 - same as 3.18-rc2

vanilla linux-pm/linux-next + ec-flush{1,2,3,4,5,6,7}.patch:
 - LID switch works
 - power plug works

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Hi, Kieran

(In reply to Kieran Clancy from comment #196)
> Note that I also reported this on the mailing lists 4 weeks ago, not sure
> why that was missed.
>
> Lv, I disagree that we need to implement GPE polling in a new kernel thread.
> It was working perfectly without that before the two recent patches. It
> simply needs to be cleared on boot or resume from suspend. After that, the
> interrupts work fine as intended.

Threaded IRQ is very common in Linux, please see kernel/irq/manage.c.
Software has reasons to do so.
IMO, Windows might already handle EC events in both "polling" and "interrupt" modes, so it doesn't suffer from such firmware variants.
If we switched our driver into this mode, then we needn't do additional "drain" after resuming.

OTOH, ec-flush1.patch ensures that the following commit won't affect Samsung laptops.
"ACPI / EC: Add support to disallow QR_EC to be issued when SCI_EVT isn't set"
https://github.com/torvalds/linux/commit/3afcf2ece453e1a8c2c6de19cdf06da3772a1b08

If ec-flush1.patch cannot fix everything, we should check what the following commit brings to us:
"ACPI / EC: Add support to disallow QR_EC to be issued before completing previous QR_EC"
https://github.com/torvalds/linux/commit/558e4736f2e1b0e6323adf7a5e4df77ed6cfc1a4
This commit only ensures there's only one query running. If it must be reverted, then it means Samsung laptops need multiple queries to be issued in an out of control way - which sounds like a "happen to work" environment.
We shouldn't go back to a "happen to work" environment where bugs could be hidden by the hidden logics.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

(In reply to Ortwin Glück from comment #207)
> Somehow during my tests the power plug detection started working again with
> any kernel. I don't know what caused it. But the LID switch is fixed only
> with the complete patch series.
>
> Test results:
>
> vanilla 3.18-rc2:
> - LID switch not detected
> - power plug sometimes stops working
>
> vanilla linux-pm/linux-next:
> - same as 3.18-rc2
>
> vanilla linux-pm/linux-next + ec-flush1.patch:
> - same as 3.18-rc2
>
> vanilla linux-pm/linux-next + ec-flush{1,2,3,4,5,6,7}.patch:
> - LID switch works
> - power plug works

Maybe we need more test results, also test results after a suspend/resume...

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Hi, Kieran and Ortwin

The 2nd commit need to be reverted means Samsung firmware behaves like:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
                           QR_EC
                           event returned
[ -event 1 ] SCI_EVT clear
Then without commit 2, Samsung can work:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
                           QR_EC
                           can prepare next QR_EC because SCI_EVT is set
                           event returned
[ -event 1 ] SCI_EVT clear
                           QR_EC
                           event returned
[ -event 2 ] SCI_EVT clear
And with commit 2, Samsung cannot work:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
                           QR_EC
                           event returned
[ -event 1 ] SCI_EVT clear
                           cannot prepare next QR_EC because SCI_EVT is cleared
The Samsung specific behavior conflicts with Acer one:
[ +event 1 ] SCI_EVT set
[ +event 2 ]
                           QR_EC
                           event returned
[ -event 1 ] SCI_EVT clear
[ +event 2 ] SCI_EVT set
Without commit 2, Acer cannot work when there is only 1 event:
[ +event 1 ] SCI_EVT set
                           QR_EC
                           can prepared next QR_EC because SCI_EVT is set
                           event returned
[ -event 1 ] SCI_EVT clear
                           QR_EC (but SCI_EVT is cleared!!!)
And with commit 2, Acer can work:
[ +event 1 ] SCI_EVT set
[ +event 2 ]
                           QR_EC
                           event returned
[ -event 1 ] SCI_EVT set
                           can prepare next QR_EC because SCI_EVT is cleared

For Samsung behavior, reverting commit 2 will still have chances to have events lost:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
                           QR_EC
---
                           event returned
[ -event 1 ] SCI_EVT clear
                           can prepare next QR_EC because SCI_EVT is set
---
[ -event 2 ] SCI_EVT clear
The --- marked period is racy, after sending the QR_EC command, firmware can behave faster than driver, and we still cannot prepare another QR_EC when this happens.
So I believe this is a Samsung EC firmware specific bug. And the better solution for Samsung is to implement timely event polling on top of the polling thread commit so that when this race happens the lost events can still be found back.

But the polling thread is a new feature, and we need to be more careful with new features, it shouldn't be merged as a regression fix. So I'll propose them to the upstream after proposing a regression fix and seeing more test reports here.

As a regression fix, I'll propose 2 simpler patches:
The first patch will revert the 2nd commit - 558e4736f2e1b0e6323adf7a5e4df77ed6cfc1a4, since it is not required by Acer as Acer platform can be fixed the 1st commit.
The second patch will add a Acer specific quirk flag to the 1st commit - 3afcf2ece453e1a8c2c6de19cdf06da3772a1b08.

Thanks

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Hi, Ortwin

To be more careful, can you help to confirm:
1. the test result of vanilla linux-pm/linux-next + ec-flush{1,2,3,4,5,6}.patch. Without ec-flush7.patch, can this still work for Samsung?
2. the post-resuming test result of vanilla linux-pm/linux-next + ec-flush{1,2,3,4,5,6,7}.patch - originally Samsung laptops suffer from post-resuming event problems, and we need to confirm if this set can solve the old problem.

You can perform suspend/resume test in this way:
# echo mem > /sys/power/state
During this period, do something like opening/closing lid
Push the power button to resume

Thanks in advance

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Thanks Lv, I will do some testing on my own laptop this coming weekend.

To others testing, the right way to ensure that the system is fixed or not with a given patch set is to:
Suspend and then while it is suspended, unplug and replug the AC adapter 8 times, leaving a couple of seconds between each plug/unplug.

If the bug is still present, the EC buffer will become full and cease to work properly upon resume (further AC plugs/unplugs will not be detected, amongst other problems).

Revision history for this message
In , odi (odi-linux-kernel-bugs) wrote :

(In reply to Lv Zheng from comment #211)
I tested both kernels and couldn't see any difference. The power plug detection always worked. But LID switch didn't.

However, due to another regression I get a dark screen after resume, so I shut down cleanly (hacking blindly) and rebooted into the same kernel and tested if the events work.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

(In reply to Ortwin Glück from comment #213)
> (In reply to Lv Zheng from comment #211)
> I tested both kernels and couldn't see any difference. The power plug
> detection always worked. But LID switch didn't.

What do you mean by both?
This - vanilla linux-pm/linux-next + ec-flush{1,2,3,4,5,6}.patch - and which one?

Without timely polling, I think any kernel, even backing to where the Kieran's quirk is first introduced, will still have issues.
Because, Samsung EC will not flag SCI_EVT as long as there is already events queued up.

So IMO, during runtime, we can only allow up to 2 events queued up. While doing testing, frequent actions could cause many events to be queued up, and our EC driver will surely cannot prepare QR_EC from 3rd events because at that time, SCI_EVT surely will not be flagged:
[ +event 1 ] SCI_EVT set
[ +event 2 ]
[ +event 3 ]
                           1st QR_EC
                           can prepare 2nd QR_EC because SCI_EVT=1
                           event returned
[ -event 1 ] SCI_EVT clear
                           2nd QR_EC
                           event returned
                           cannot prepare 3rd QR_EC because SCI_EVT=0
[ -event 2 ] SCI_EVT clear
So IMO, only sending QR_EC timely without checking SCI_EVT or draining events until 0x0 returned can fix this.

Thus I guess the test result depends on how frequently you trigger EC events.

> However, due to another regression I get a dark screen after resume, so I
> shut down cleanly (hacking blindly) and rebooted into the same kernel and
> tested if the events work.

Do you mean currently we are not able to confirm if the patchset can solve old problems because a regression prevents the suspend/resume from working?
I also noticed using linux-3.18-rc1, I could not perform suspend/resume any more. All my local tests are done using patched linux-3.17-rc4.
Maybe other guys here have platforms that are not suffered from this regression.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Created attachment 155851
[PATCH 7/8] ACPI/EC: Add timely SCI_EVT polling support for Samsung models.

(In reply to Lv Zheng from comment #205)
> Created attachment 155431 [details]
> [PATCH] ACPI/EC: Add timely SCI_EVT polling support for Samsung models.
>
> First let's try to use timely polling instead of clear_on_resume for Samsung
> laptops.
> The other option is to let the polling thread to poll events until the
> firmware returning 0 (no event).
>
> I'll add test requirement after this post.

I can see an issue in this patch.
We need to add acpi_ec_started(ec) check for the timely polling support, or it may prevent system from suspending.

I've updated the patch and please use the new one next time.

Thanks
-Lv

Revision history for this message
In , odi (odi-linux-kernel-bugs) wrote :

(In reply to Lv Zheng from comment #214)
> What do you mean by both?
> This - vanilla linux-pm/linux-next + ec-flush{1,2,3,4,5,6}.patch - and which
> one?

The two that you asked me to test: linux-pm/linux-next + ec-flush{1,2,3,4,5,6}.patch and linux-pm/linux-next + ec-flush{1,2,3,4,5,6,7}.patch

> Do you mean currently we are not able to confirm if the patchset can solve
> old problems because a regression prevents the suspend/resume from working?

Yes, once the power plug detection started working again, I was unable to make it fail even if 3.17.y, using plug/unplug/lid switches during suspend. I think we still lack an actual test case...

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Hi, Ortwin

I'm sorry I cannot understand your test results clearly.
I was confused by what you said on the bugzilla.

The results have conflicts here:
You said in comment #207, without doing suspend/resume:
linux-pm/linux-next +ec-flush{1,2,3,4,5,6,7}.patch
  - LID switch works
  - power plug works
But in comment 213, you said you cannot perform suspend/resume.
And in comment 216, you said this combination works differently:
linux-pm/linux-next +ec-flush{1,2,3,4,5,6,7}.patch
  - LID switch not detected
  - power plug sometimes stops working

Can I confirm this again with you:
1. What's the actual test result of linux-pm/linux-next +ec-flush{1,2,3,4,5,6,7}.patch before a suspend/resume test?
2. Using linux-pm/linux-next +ec-flush{1,2,3,4,5,6,7}.patch, you cannot perform a suspend/resume test, can you?
3. Using linux-pm/linux-next, can you perform a suspend/resume test?
4. Using linux-pm/linux-next+ec-flush{1,2,3,4,5,6}.patch+updated ec-flush7.patch, can you perform a suspend/resume test?
5. Then what's the actual test result of linux-pm/linux-next +ec-flush{1,2,3,4,5,6,7}.patch after a suspend/resume test?

I think for some distribution, each time you've built a new kernel, you may need to cleanup old binaries by deleting /lib/modules/<kernel version> and /boot/vmlinuz-<kernel version> before typing "make modules_install install".
According to my experience, if I don't delete the old binaries, I can always see confused result to do with "git bisect" using the same kernel.

Thanks and best regards

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Created attachment 156591
[PATCH] ACPI/OSL: Add IRQ handler flushing support in the OSL.

This patchset was originally dependent on another "GPE dead lock" fix series.
After rebasing, the dead lock may happen during suspend.
And without the "GPE dead lock" fix series, we need this patch to be applied to prevent this dead lock as a work around.

Let me describe the patch applying order after posting.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Hi, Ortwin:

Sorry for the inconvenience.

The patchset was originally based on another GPE fix series.
The fix has been discussed for more than 3 months, and didn't get merged.
So my environment is a bit different from yours.

You may suffer from a dead lock during suspending without the fix.
In order to make this patchset working for "system suspend" without the fix, the gpe-handle21.patch workaround is required.

For this test:
4. Using linux-pm/linux-next+ec-flush{1,2,3,4,5,6}.patch+updated ec-flush7.patch, can you perform a suspend/resume test?
It should be updated as:
4. Using linux-pm/linux-next+gpe-handle21.patch+ec-flush{1,2,3,4,5,6}.patch+updated ec-flush7.patch, can you perform a suspend/resume test?

linux-pm/linux-next maintainer has merged the regression fixes, so if you have synchronized the branch, you need to:
1. git revert 79149001
2. git revert df9ff918
3. git am gpe-handle21.patch (attachment 156591)
4. git am ec-flush1.patch (attachment 155351)
5. git am ec-flush2.patch (attachment 155371)
6. git am ec-flush3.patch (attachment 155381)
7. git am ec-flush4.patch (attachment 155391)
8. git am ec-flush5.patch (attachment 155411)
9. git am ec-flush6.patch (attachment 155421)
10.git am ec-flush7.patch (use updated attachment 155851, do not use attachment 155431)
11. rebuild and boot the kernel.
12. try the suspend/resume test again.

For this test:
5. Then what's the actual test result of linux-pm/linux-next +ec-flush{1,2,3,4,5,6,7}.patch after a suspend/resume test?
It also should be updated as:
5. Then what's the actual test result of linux-pm/linux-next +gpe-handle21.patch+ec-flush{1,2,3,4,5,6}.patch+updated ec-flush7.patch after a suspend/resume test?

Sorry again.

Thanks and best regards

Revision history for this message
In , san (san-linux-kernel-bugs) wrote :

Hi!

I'm not sure what i have to do. Some time ago i made kernel and uploaded it onto my server.
I would like to do it again. So I'm now compiling this way:

git clone git://kernel.ubuntu.com/ubuntu/ubuntu-utopic.git
cd ubuntu-utopic/
patch -p1 < ../155351.patch
make mrproper
wget http://beta.paragrafówka.pl/eksport/config-3.16.3 && mv config-3.16.3 .config
fakeroot debian/rules clean && fakeroot debian/rules binary-headers binary-generic

I'm building as suggested in comment 206 (https://bugzilla.kernel.org/show_bug.cgi?id=44161#c206) for now and then I'll try whole set (as suggested in second point).

Is that right way?

Revision history for this message
In , odi (odi-linux-kernel-bugs) wrote :

Lv,

Sorry for the confusion. I am trying to clear the clouds:

1. Power plug detection is no longer a problem once it started working again. I am unable to make it fail on purpose, so I can't test if any kernel fixes that.

2. The lid switch is now non-functional with any kernel (state is always "open"):
- 3.17
- 3.18
- linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6,7}
- linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6} + ec-flush7-updated
However during ~40 lid open/close cycle with the latter kernel it worked 2 times (state = "closed").

3. suspend: doesn't work by closing the lid, works fine by suspending manually (KDE start menu)

4. resume: always works, but there is a backlight regression which has nothing to do with this issue. See https://bugs.freedesktop.org/show_bug.cgi?id=80773

Don't worry about my kernel building skills. I do that very very often and have established a workflow that always works. They are all straight from git, hand-configured per the specific hardware and use no modules. They are loaded by a hand-configured grub2 straight without initramfs.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Hi, San

Thanks for your help.

(In reply to San from comment #220)
> Hi!
>
> I'm not sure what i have to do. Some time ago i made kernel and uploaded it
> onto my server.
> I would like to do it again. So I'm now compiling this way:
>
> git clone git://kernel.ubuntu.com/ubuntu/ubuntu-utopic.git
> cd ubuntu-utopic/
> patch -p1 < ../155351.patch
> make mrproper
> wget http://beta.paragrafówka.pl/eksport/config-3.16.3 && mv config-3.16.3
> .config
> fakeroot debian/rules clean && fakeroot debian/rules binary-headers
> binary-generic
>
> I'm building as suggested in comment 206
> (https://bugzilla.kernel.org/show_bug.cgi?id=44161#c206) for now and then
> I'll try whole set (as suggested in second point).
>
> Is that right way?

Ortwin is providing me the testing using private email.
I'll post some of the results here after the progress are made for the investigation.
Maybe you can hold a while until the problem is correctly figured out.

The 2 commits are now reverted in linux-pm, it will appear in the Linux mainline in 1 month.

Best regards

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Hi, Ortwin

(In reply to Ortwin Glück from comment #221)
> Lv,
>
> Sorry for the confusion. I am trying to clear the clouds:
>
> 1. Power plug detection is no longer a problem once it started working
> again. I am unable to make it fail on purpose, so I can't test if any kernel
> fixes that.

So the 2 commits didn't make a regression on the power plug?

> 2. The lid switch is now non-functional with any kernel (state is always
> "open"):
> - 3.17
> - 3.18
> - linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6,7}
> - linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6} +
> ec-flush7-updated
> However during ~40 lid open/close cycle with the latter kernel it worked 2
> times (state = "closed").
>
> 3. suspend: doesn't work by closing the lid, works fine by suspending
> manually (KDE start menu)
>
> 4. resume: always works, but there is a backlight regression which has
> nothing to do with this issue. See
> https://bugs.freedesktop.org/show_bug.cgi?id=80773

Can you perform a resume by opening the LID?
Also please upload the output of /proc/acpi/wakeup here.

> Don't worry about my kernel building skills. I do that very very often and
> have established a workflow that always works. They are all straight from
> git, hand-configured per the specific hardware and use no modules. They are
> loaded by a hand-configured grub2 straight without initramfs.

Good to know this!
That makes every testing possible. :-)

Thanks and best regards
-Lv

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

Created attachment 157011
proc/acpi/wakeup

I was finally able to do some tests, here are my results.
- Samsung NP900X3C
- using linux-pm/linux-next with patches from Comment 219.

Upon boot, I get the following behavior:
- Power plug detection appears to be working no matter what.
- Lid switch is recognized (open/close states are reported correctly using acpi_listen), but suspend does NOT trigger.

However, after a while, (maybe after a few ACPI events?), the LID switch actually triggers suspend correctly; I'm not sure the issue is related.
It does not seem to revert to the broken behavior even after some time. It works both on battery and AC power.

Opening the LID does not trigger a resume, but tbh that was never the case on this machine.

Lv, if you have more tests in mind / need any logs just let me know.

Attaching my /proc/acpi/wakeup.

Thanks,
Andrea

Revision history for this message
In , san (san-linux-kernel-bugs) wrote :

I'm unable to compile kernel. It patches ok, but i get:

== code ==
  Using /home/publiczny2/kernel/ubuntu-utopic as source for kernel
  /home/publiczny2/kernel/ubuntu-utopic is not clean, please run 'make mrproper'
  in the '/home/publiczny2/kernel/ubuntu-utopic' directory.
==/code ==

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Hi,

(In reply to Andrea Fagiani from comment #224)
> Created attachment 157011 [details]
> proc/acpi/wakeup
>
> I was finally able to do some tests, here are my results.
> - Samsung NP900X3C
> - using linux-pm/linux-next with patches from Comment 219.
>
> Upon boot, I get the following behavior:
> - Power plug detection appears to be working no matter what.
> - Lid switch is recognized (open/close states are reported correctly using
> acpi_listen), but suspend does NOT trigger.

Can you confirm if they still work after suspend/resume?
See test requirement A below.

> However, after a while, (maybe after a few ACPI events?), the LID switch
> actually triggers suspend correctly; I'm not sure the issue is related.
> It does not seem to revert to the broken behavior even after some time. It
> works both on battery and AC power.
>
> Opening the LID does not trigger a resume, but tbh that was never the case
> on this machine.

According to the /proc/acpi/wakeup - yes, only power button can wakeup the system.

> Lv, if you have more tests in mind / need any logs just let me know.

A. We want to confirm that this set of patches can fix old issues. We need to be careful before removing old quirks.
1. using linux-pm/linux-next with patches from Comment 219.
2. build/boot the kernel
3. do a suspend/resume
Can power plug and LID switch still work after resuming?

B. We want to see some proofs that Samsung EC firmware acts exactly as what we've inferred from the reversion requirements. So could you:
1. using linux-pm/linux-next with patches from Comment 219.
2. Uncomment the following line in the drivers/acpi/ec.c:
     /* #define DEBUG */
3. build/boot the kernel
4. do some LID state changes and make sure they are captured by acpi_listen.
5. upload the dmesg output here.

C. We want to confirm if the timely polling support is really required by Samsung EC firmware:
1. using linux-pm/linux-next
3. git am gpe-handle21.patch (attachment 156591 [details])
4. git am ec-flush1.patch (attachment 155351 [details])
5. git am ec-flush2.patch (attachment 155371 [details])
6. git am ec-flush3.patch (attachment 155381 [details])
7. git am ec-flush4.patch (attachment 155391 [details])
8. git am ec-flush5.patch (attachment 155411 [details])
9. git am ec-flush6.patch (attachment 155421 [details])
Do not apply the ec-flush7.patch, can power plug and LID switch still work, and can still work even after resuming?

D. We want to confirm if this is really a regression and current fix can:
1. Using linux-pm/linux-next
2. git revert 79149001
3. git revert df9ff918
Do not apply any of the posted patches, can power plug and LID switch still work? And
1. Using linux-pm/linux-next
Do not revert the "regression fix" commits and do not apply any of the posted patches, can power plug and LID switch still work?

Thanks in advance.

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

> A. We want to confirm that this set of patches can fix old issues. We need
> to be careful before removing old quirks.
> 1. using linux-pm/linux-next with patches from Comment 219.
> 2. build/boot the kernel
> 3. do a suspend/resume
> Can power plug and LID switch still work after resuming?

Yes, both power plug and LID switch work and are correctly captured by acpi_listen.

> B. We want to see some proofs that Samsung EC firmware acts exactly as what
> we've inferred from the reversion requirements. So could you:
> 1. using linux-pm/linux-next with patches from Comment 219.
> 2. Uncomment the following line in the drivers/acpi/ec.c:
> /* #define DEBUG */
> 3. build/boot the kernel
> 4. do some LID state changes and make sure they are captured by acpi_listen.
> 5. upload the dmesg output here.

Opened/closed 5 times, all of them showed up in acpi_listen.
Attaching dmesg.

> C. We want to confirm if the timely polling support is really required by
> Samsung EC firmware:
> 1. using linux-pm/linux-next
> 3. git am gpe-handle21.patch (attachment 156591 [details])
> 4. git am ec-flush1.patch (attachment 155351 [details])
> 5. git am ec-flush2.patch (attachment 155371 [details])
> 6. git am ec-flush3.patch (attachment 155381 [details])
> 7. git am ec-flush4.patch (attachment 155391 [details])
> 8. git am ec-flush5.patch (attachment 155411 [details])
> 9. git am ec-flush6.patch (attachment 155421 [details])
> Do not apply the ec-flush7.patch, can power plug and LID switch still work,
> and can still work even after resuming?

Both work fine, even after a suspend/resume.

Note that the ec-flush patches depend on the reverts, so I applied them even if they weren't explicitly listed. I can test without them later today if required.

> D. We want to confirm if this is really a regression and current fix can:
> 1. Using linux-pm/linux-next
> 2. git revert 79149001
> 3. git revert df9ff918
> Do not apply any of the posted patches, can power plug and LID switch still
> work? And

Upon boot, it works correctly. After a suspend/resume, power plug detection does not work anymore (it is greatly delayed) and neither does the LID switch (doesn't trigger at all).

> 1. Using linux-pm/linux-next
> Do not revert the "regression fix" commits and do not apply any of the
> posted patches, can power plug and LID switch still work?

I wasn't able to do this last test due to time constraints, I'll try and do that later today.

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

Created attachment 157111
dmesg - #DEBUG enabled

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :
Download full text (3.6 KiB)

Hi,

The test results are very useful for us to understand this issue. :-)

(In reply to Andrea Fagiani from comment #227)
> > B. We want to see some proofs that Samsung EC firmware acts exactly as what
> > we've inferred from the reversion requirements. So could you:
> > 1. using linux-pm/linux-next with patches from Comment 219.
> > 2. Uncomment the following line in the drivers/acpi/ec.c:
> > /* #define DEBUG */
> > 3. build/boot the kernel
> > 4. do some LID state changes and make sure they are captured by
> acpi_listen.
> > 5. upload the dmesg output here.
>
> Opened/closed 5 times, all of them showed up in acpi_listen.
> Attaching dmesg.

Yes. I also found the 0x5E (5 times) and 0x5F (4 times) in the dmesg.
TBH, I didn't find proofs for comment 210 in it.
So the issue isn't as what I imagined before.

> > C. We want to confirm if the timely polling support is really required by
> > Samsung EC firmware:
> > 1. using linux-pm/linux-next
> > 3. git am gpe-handle21.patch (attachment 156591 [details])
> > 4. git am ec-flush1.patch (attachment 155351 [details])
> > 5. git am ec-flush2.patch (attachment 155371 [details])
> > 6. git am ec-flush3.patch (attachment 155381 [details])
> > 7. git am ec-flush4.patch (attachment 155391 [details])
> > 8. git am ec-flush5.patch (attachment 155411 [details])
> > 9. git am ec-flush6.patch (attachment 155421 [details])
> > Do not apply the ec-flush7.patch, can power plug and LID switch still work,
> > and can still work even after resuming?
>
> Both work fine, even after a suspend/resume.

Thus, in order to fix Samsung issue.
Either we need timely polling quirk (since test A works)
Or we need old CLEAR_ON_RESUME quirk (since test C works).

> Note that the ec-flush patches depend on the reverts, so I applied them even
> if they weren't explicitly listed. I can test without them later today if
> required.

Not necessary :-). I only want the kernel test results __wih__ the reverts (as they conflict with the patches posted here) so I don't need to post another set of rebased patches for current linux-pm/linux-next.

> > D. We want to confirm if this is really a regression and current fix can:
> > 1. Using linux-pm/linux-next
> > 2. git revert 79149001
> > 3. git revert df9ff918
> > Do not apply any of the posted patches, can power plug and LID switch still
> > work? And
>
> Upon boot, it works correctly. After a suspend/resume, power plug detection
> does not work anymore (it is greatly delayed) and neither does the LID
> switch (doesn't trigger at all).

Thus the regression seems to be: "the 2 commits have broken the CLEAR_ON_RESUME quirk".

I guess SCI_EVT is set after resuming but GPE doesn't fire.
If this is the case, this isn't a silicon/firmware bug but a driver bug.
As EC GPE is edge triggered and GPE might have already been fired before suspending.
If so, we need to improve the driver and shouldn't fix the driver bug using any kind of quirk.

So I want to check if the SCI_EVT is set after resuming.
Let me think about a way to confirm this.
I'll add another test later.

> > 1. Using linux-pm/linux-next
> > Do not revert the "regression fix" commits and do not apply any of the
> > posted patches...

Read more...

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Hi, Andrea

Please use either of the following working combinations:
1. test A
2. test C
3. test D2 (which you haven't done yet)

If you'll do test D2 for us, you can just do it and capture the confirmation information at same time.

Please do:
1. suspend
2. do some LID opening/closing to trigger old issue
3. resume
4. dmesg > dmesg-right-after-resume.txt

And post the dmesg-right-after-resume.txt here.
If the kernel log buffer size is not big enough to contain all messages right after resuming, please increase it using the kernel boot parameter - log_buf_len=1M.

Thanks and best regards

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Hi, Andrea

Sorry that I forgot to ask you to enable the EC log to capture the debug information.

2. Uncomment the following line in the drivers/acpi/ec.c:
     /* #define DEBUG */

(In reply to Lv Zheng from comment #230)
> Hi, Andrea
>
> Please use either of the following working combinations:
> 1. test A
> 2. test C
> 3. test D2 (which you haven't done yet)
>
> If you'll do test D2 for us, you can just do it and capture the confirmation
> information at same time.
>
> Please do:
> 1. suspend
> 2. do some LID opening/closing to trigger old issue
> 3. resume
> 4. dmesg > dmesg-right-after-resume.txt
>
> And post the dmesg-right-after-resume.txt here.
> If the kernel log buffer size is not big enough to contain all messages
> right after resuming, please increase it using the kernel boot parameter -
> log_buf_len=1M.

Thanks
-Lv

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

Created attachment 157351
dmesg-right-after-resume_A

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

Created attachment 157361
dmesg-right-after-resume_C

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

Created attachment 157371
dmesg-right-after-resume_D2

Revision history for this message
In , andfagiani (andfagiani-linux-kernel-bugs) wrote :

Hi Lv,
I captured and uploaded the dmesg as requested.

Concerning test D2, both power plug and LID switch seem to work fine even after a suspend/resume. You'll notice I've played with it a little bit before performing the suspend test.

Hope it helps,
Andrea

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :
Download full text (4.0 KiB)

Hi, Andrea

The results make something clear to me.
Thanks for the great job.

For the 3 test results, SCI_EVT is always flagged after resuming:
Test A (regression fixes + cleanup + timely polling quirk):
[ 42.608720] PM: Restoring platform NVS memory
...
[ 42.649355] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
Test C (regression fixes + cleanup + CLEAR_ON_RESUME quirk):
[ 21.764941] PM: Restoring platform NVS memory
...
[ 21.811916] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
Test D2 (regression fixes + CLEAR_ON_RESUME quirk):
[ 172.102165] PM: Restoring platform NVS memory
...
[ 172.149860] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
This can explain the cause of the original bug identified in comment 129.
The EC behavior is not broekn here. We shouldn't implement a quirk for it since this is a driver bug - if the EC GPE is not fired, we don't have means to poll the events.

For the regression reported in comment 184, I only find the proof in test A that we need both the 2 commits to be reverted:
Test A (regression fixes + cleanup + timely polling quirk):
[ 42.662695] ACPI : EC: ***** Command(QR_EC) started *****
[ 42.662697] ACPI : EC: ===== TASK (2) =====
[ 42.662702] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
[ 42.662703] ACPI : EC: EC_SC(W) = 0x84
[ 42.662707] ACPI : EC: ***** Event running *****
[ 42.662827] ACPI : EC: ##### Query(0x5e) stopped #####
[ 42.666037] ACPI : EC: ===== TASK (2) =====
[ 42.666042] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 42.666045] ACPI : EC: EC_DATA(R) = 0x5f
[ 42.666046] ACPI : EC: ***** Command(QR_EC) hardware completion *****
[ 42.666047] ACPI : EC: ***** Command(QR_EC) stopped *****
The event of 0x5F is returned when SCI_EVT=0.
So we need to revert the 2 commits to allow QR_EC to be sent when SCI_EVT=0 and there is really non 0x00 events need to be drained when SCI_EVT=0.
TBH, the EC behavior is broekn here, thus IMO this is a different issue than the one identified in comment 129.
It seems both the quirks (drain way of CLEAR_ON_RESUME, timely polling way) can work it around. If this issue can be reproduced during runtime, then the CLEAR_ON_RESUME can not work it around at that time. According to your tests, the problem cannot be easily reproduced during runtime. So we can stay unchanged until someone else blaims.

If we have to assume this is only a problem happened during resuming and if we want to root cause why the SCI_EVT is suddently cleared after resuming, we will have to assume that something during the resuming steps has caused this issue because:
1. In test C, 2 stale events are returned with SCI_EVT=1:
[ 21.814019] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 21.814022] ACPI : EC: EC_DATA(R) = 0x5e
[ 21.817333] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 21.817351] ACPI : EC: EC_DATA(R) = 0x5f
So SCI_EVT will not be spontaneously cleared during acpi_ec_unblock_transactions().
2. In test A, 2 stale events are returned with different SCI_EVT value:
[ 42.656421] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF...

Read more...

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Hi, Andrea

I checked the log again. And the following seems to be wrong.

> It seems both the quirks (drain way of CLEAR_ON_RESUME, timely polling way)
> can work it around. If this issue can be reproduced during runtime, then the
> CLEAR_ON_RESUME can not work it around at that time. According to your
> tests, the problem cannot be easily reproduced during runtime. So we can
> stay unchanged until someone else blaims.

Though you didn't face real functional issues, there are following entries in the test D2 log:
[ 31.072607] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 31.072613] ACPI : EC: EC_DATA(R) = 0x66
[ 77.835639] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 77.835647] ACPI : EC: EC_DATA(R) = 0x66
[ 79.939044] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 79.939053] ACPI : EC: EC_DATA(R) = 0x66
[ 82.006084] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 82.006087] ACPI : EC: EC_DATA(R) = 0x66
Thanks for you long time test or we won't find them out.
The _Q66 is as follows:
                    Method (_Q66, 0, NotSerialized)
                    {
                        P8XH (Zero, 0x66)
                        If (LEqual (B1EX, One))
                        {
                            Notify (BAT1, 0x80)
                        }
                    }
It updates battery status.
Then it means the assumption in comment 210 might be real and we need the "timely polling quirk" for samsung to have everything wroked around...
So I'll prepare the whole patchset posted here for upstream according to this result.

In test A, I can even see such log entries:
[ 1.788647] ACPI : EC: ***** Command(QR_EC) started *****
[ 1.788651] ACPI : EC: ===== TASK (1) =====
[ 1.788656] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[ 1.788659] ACPI : EC: EC_SC(W) = 0x84
...
[ 1.800289] ACPI : EC: ===== IRQ (0) =====
[ 1.800295] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 1.800300] ACPI : EC: EC_DATA(R) = 0x70
...
[ 1.800385] ACPI : EC: ***** Command(QR_EC) stopped *****
It seems for Samsung EC, the firmware can even return event that happens after we send the query command. There can really be a big time slot between driver pushes the command byte and firmware fetches it.

Thanks and best regards

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Hi Lv,

Thanks for all your work on this issue. It's certainly not easy, especially if you don't have the hardware available to test on yourself.

You are probably right about the racy behaviour. One important point though, is that you would need to miss either 8 (or was it 16?) events in a row during run time for the buffer to fill and for SCI_EVT to stop being given entirely.

Under normal usage, I think there's almost always going to be an event regularly enough that this "full" condition never occurs. It was only found to be a problem during suspend, where the EC controller seemed to continue logging, and after enough battery level or AC adapter changes it would stop triggering events even after resume.

My HDD was too full recently to build kernels, but I have just freed some space (a bit late, sorry) so if you want me to test anything please let me know.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :
Download full text (3.4 KiB)

(In reply to Kieran Clancy from comment #238)
> Hi Lv,
>
> Thanks for all your work on this issue. It's certainly not easy, especially
> if you don't have the hardware available to test on yourself.
>
> You are probably right about the racy behaviour. One important point though,
> is that you would need to miss either 8 (or was it 16?) events in a row
> during run time for the buffer to fill and for SCI_EVT to stop being given
> entirely.
>
> Under normal usage, I think there's almost always going to be an event
> regularly enough that this "full" condition never occurs. It was only found
> to be a problem during suspend, where the EC controller seemed to continue
> logging, and after enough battery level or AC adapter changes it would stop
> triggering events even after resume.
>
> My HDD was too full recently to build kernels, but I have just freed some
> space (a bit late, sorry) so if you want me to test anything please let me
> know.

Hi,

Thanks a lot!
We failed to find this platform selling on China market.

I was thinking this issue doesn't exist during runtime. That's why I said in comment 236 that it might be a BIOS _WAK issue or something in acpi_hw_legacy_wake() has cleared SCI_EVT and we may need further investigation. But finally in comment 237, I found it occurred not only after resuming, but also runtime, so further investigation doesn't matter any more because this is definitely an EC firmware behavior according to the facts.

In our driver, we will queue 2nd QR_EC after sending the 1st QR_EC and before fetching the returned event. At that time, SCI_EVT should still be 1, so we can always send 2 QR_EC for 1 SCI_EVT=1 indication. This somewhat can reduce the reproduction ratio of the Samsung EC issue and that's why during runtime it can hardly be reproduced as we can drain events faster than they are accumulated in the firmware.

But the above code can only make a happen to work environment. If the host acts a bit slower than the target, this issue might still occurr during runtime when there are more than 3 events queued up and the driver will be unable to queue 3rd QR_EC if SCI_EVT is cleared after fetching the 1st event.

It seems the best way for Samsung is always sending QR_EC whatever SCI_EVT is until 0x00 returned. This driver behavior is reasonable from ACPI spec's point of view. The CLEAR_ON_RESUME quirk provided by you did this after resuming, we may need to extend this behavior to runtime.

But we do have many platforms act differently:
1. Some platforms never return 0x00 even when SCI_EVT=0, they return certain event value. Some of them even return the event value for which there is no _Qxx prepared in the namespace. So if we continously send QR_EC until 0x00 is returned, the process will never end on such platforms. The users of such platforms will sure blame us.
2. Some platforms (Acer) do not return anything if SCI_EVT=0, the EC query transaction will be blocked, and our driver cannot issue further EC transaction unless previous one is completed. So if we allow QR_EC to be sent without checking SCI_EVT, users of such platforms will complain.

What I'm going to do is:
1. extending the draining behavior - poll...

Read more...

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

Hi, Kieran

Can you:
1. use current linux-pm/linux-next branch
2. enable EC debugging
3. disable quirk
4. trigger the bug
5
. capture the dmesg right after resume
6. upload the dmesg here
for investigation?

Thanks in advance

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Created attachment 157601
dmesg on linux-next without quirk showing EC working before but not after triggering bug

(In reply to Lv Zheng from comment #240)
> Hi, Kieran
>
> Can you:
> 1. use current linux-pm/linux-next branch
> 2. enable EC debugging

Is it enough to just uncomment the #define DEBUG?

> 3. disable quirk

Is it enough to prevent EC_FLAGS_CLEAR_ON_RESUME from being set? Is this what you meant?

> 4. trigger the bug
> 5. capture the dmesg right after resume
> 6. upload the dmesg here

See attached. I annotated the dmesg with a few extra > /dev/kmsg lines starting with 'KC'. They are:

[ 107.787915] KC just booted and logged in
[ 137.240937] KC screen close/open works as intended
[ 203.943414] KC about to suspend, but not trigger bug this time
[ 226.584995] KC back from suspend
[ 265.000194] KC screen close/open works as intended
[ 298.608956] KC AC change detected as intended
[ 318.874279] KC about to suspend and trigger bug
[ 348.834217] KC back from suspend
[ 374.777099] KC screen open/close not detected
[ 398.000040] KC AC change not detected
[ 425.733615] KC about to manually clear EC events with script
[ 442.663539] KC EC events cleared
[ 469.766095] KC AC change detected again
[ 496.219180] KC screen open/close detected again

So between 0 and 318 you can see the EC operating as intended including one suspend cycle.

After 318 I trigger the bug (by unplugging AC a bunch of times), and after that you can see that there is nothing at all logged when I either closed the screen or unplugged AC.

At 425 I manually simulated the quirk by sending 0x84 EC command queries and reading the data until the data is 0x00 using Juan Manuel's userspace program.

After that, EC events are detected properly again.

----

My analysis is below:

After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately after resume:

[ 337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0

It does not seem as though we ever handle this event properly though. Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally different?

----

Further testing shows that this SCI_EVT=1 happens for the first resume after the bug is triggered, but not for the second or subsequent resumes.

That means that we are still going to need a wakeup quirk because if for some reason we fail to clear the EC state before the next suspend, we will never get another SCI_EVT=1 (even after a power cycle, I believe).

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Created attachment 157611
dmesg on linux-next without quirk showing bug persisting over several suspend cycles

Of the three suspend/resume cycles shown in this dmesg, I trigger the EC bug during the first suspend time.

It comes back with SCI_EVT=1 set the first time, but the 2nd and 3rd resumes do not have this. In a moment I will confirm if this persists across power cycles.

It seems the right behaviour for affected Samsung machines is to send QR_EC until data is 0x00 not just if we get SCI_EVT=1, but additionally on boot or resume.

Revision history for this message
In , clancy.kieran+kernel (clancy.kieran+kernel-linux-kernel-bugs) wrote :

Created attachment 157631
dmesg on linux-next without quirk showing bug persisting over power cycle

Confirming that once the bug is initially triggered, we don't get SCI_EVT=1 even on a power cycle, so we still need the boot/resume quirk. At least, that's my interpretation.

I hope the dmesgs were helpful. Unfortunately I'm not going to have internet access for the next week (going to the outback where there is not even any mobile reception), but I'm happy to test more things when I return.

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :
Download full text (3.5 KiB)

(In reply to Kieran Clancy from comment #241)
> Created attachment 157601 [details]
> dmesg on linux-next without quirk showing EC working before but not after
> triggering bug
>
> (In reply to Lv Zheng from comment #240)
> > Hi, Kieran
> >
> > Can you:
> > 1. use current linux-pm/linux-next branch
> > 2. enable EC debugging
>
> Is it enough to just uncomment the #define DEBUG?

Yes.

> > 3. disable quirk
>
> Is it enough to prevent EC_FLAGS_CLEAR_ON_RESUME from being set? Is this
> what you meant?

Yes.

> > 4. trigger the bug
> > 5. capture the dmesg right after resume
> > 6. upload the dmesg here
>
> See attached. I annotated the dmesg with a few extra > /dev/kmsg lines
> starting with 'KC'. They are:
>
> [ 107.787915] KC just booted and logged in
> [ 137.240937] KC screen close/open works as intended
> [ 203.943414] KC about to suspend, but not trigger bug this time
> [ 226.584995] KC back from suspend
> [ 265.000194] KC screen close/open works as intended
> [ 298.608956] KC AC change detected as intended
> [ 318.874279] KC about to suspend and trigger bug
> [ 348.834217] KC back from suspend
> [ 374.777099] KC screen open/close not detected
> [ 398.000040] KC AC change not detected
> [ 425.733615] KC about to manually clear EC events with script
> [ 442.663539] KC EC events cleared
> [ 469.766095] KC AC change detected again
> [ 496.219180] KC screen open/close detected again

Great information!
Thanks.

> So between 0 and 318 you can see the EC operating as intended including one
> suspend cycle.
>
> After 318 I trigger the bug (by unplugging AC a bunch of times), and after
> that you can see that there is nothing at all logged when I either closed
> the screen or unplugged AC.
>
> At 425 I manually simulated the quirk by sending 0x84 EC command queries and
> reading the data until the data is 0x00 using Juan Manuel's userspace
> program.
>
> After that, EC events are detected properly again.

Great test cases!
Thanks.

> My analysis is below:
>
> After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately
> after resume:
>
> [ 337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
>
> It does not seem as though we ever handle this event properly though.
> Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a
> couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally
> different?

This is a bug, in ec_poll(), there is no code to check SCI_EVT.
And event will thus be lost.

I think this has been fixed in the ec-flush6.patch.
+out:
+ if (status & ACPI_EC_FLAG_SCI &&
+ (!t || t->flags & ACPI_EC_COMMAND_COMPLETE))
+ __acpi_ec_set_event(ec);

We will have this flag checked in the advance_transaction.
So you won't see this with ec-flush[1-6].patch applied.

> Further testing shows that this SCI_EVT=1 happens for the first resume after
> the bug is triggered, but not for the second or subsequent resumes.
>
> That means that we are still going to need a wakeup quirk because if for
> some reason we fail to clear the EC state before the next suspend, we will
> never get another SCI_EVT=1 (even after a power cycle, I believe).

Yes, I think this is requi...

Read more...

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

(In reply to Kieran Clancy from comment #242)
> Created attachment 157611 [details]
> dmesg on linux-next without quirk showing bug persisting over several
> suspend cycles
>
> Of the three suspend/resume cycles shown in this dmesg, I trigger the EC bug
> during the first suspend time.
>
> It comes back with SCI_EVT=1 set the first time, but the 2nd and 3rd resumes
> do not have this. In a moment I will confirm if this persists across power
> cycles.
>
> It seems the right behaviour for affected Samsung machines is to send QR_EC
> until data is 0x00 not just if we get SCI_EVT=1, but additionally on boot or
> resume.

Yes.
I was planning to support this in this way.

Thanks

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

(In reply to Kieran Clancy from comment #243)
> Created attachment 157631 [details]
> dmesg on linux-next without quirk showing bug persisting over power cycle
>
> Confirming that once the bug is initially triggered, we don't get SCI_EVT=1
> even on a power cycle, so we still need the boot/resume quirk. At least,
> that's my interpretation.
>
> I hope the dmesgs were helpful. Unfortunately I'm not going to have internet
> access for the next week (going to the outback where there is not even any
> mobile reception), but I'm happy to test more things when I return.

Yes, it's very helpful.
Thanks for the help!

Best regards
-Lv

Revision history for this message
In , lv.zheng (lv.zheng-linux-kernel-bugs) wrote :

(In reply to Lv Zheng from comment #244)
> (In reply to Kieran Clancy from comment #241)
> > My analysis is below:
> >
> > After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately
> > after resume:
> >
> > [ 337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0
> OBF=0
> >
> > It does not seem as though we ever handle this event properly though.
> > Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a
> > couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally
> > different?
>
> This is a bug, in ec_poll(), there is no code to check SCI_EVT.
> And event will thus be lost.
>
> I think this has been fixed in the ec-flush6.patch.
> +out:
> + if (status & ACPI_EC_FLAG_SCI &&
> + (!t || t->flags & ACPI_EC_COMMAND_COMPLETE))
> + __acpi_ec_set_event(ec);
>
> We will have this flag checked in the advance_transaction.
> So you won't see this with ec-flush[1-6].patch applied.

I should take this back. :-)
This still need to be improved to indicate SCI_EVT even when there is a transaction running.

Thanks
-Lv

Revision history for this message
In , lenb (lenb-linux-kernel-bugs) wrote :

commit 74443bbed72ab22ee005ecb6ecdc657a8018e1db
Author: Lv Zheng <email address hidden>
Date: Wed Jan 14 19:28:47 2015 +0800

    ACPI / EC: Fix issues related to the SCI_EVT handling

shipped in Linux-4.0-rc1
closed.

Revision history for this message
In , el (el-linux-kernel-bugs) wrote :

commit 4c237371f290d1ed3b2071dd43554362137b1cce
Author: Lv Zheng <email address hidden>
Date: Wed Jan 4 11:17:17 2017 +0800

    ACPI / EC: Remove old CLEAR_ON_RESUME quirk

The above reintroduced this bug on my NP900x4c. Reverting the commit makes things work again.

Revision history for this message
In , balazs4web (balazs4web-linux-kernel-bugs) wrote :

(In reply to Elvis Pranskevichus from comment #249)
> commit 4c237371f290d1ed3b2071dd43554362137b1cce
> Author: Lv Zheng <email address hidden>
> Date: Wed Jan 4 11:17:17 2017 +0800
>
> ACPI / EC: Remove old CLEAR_ON_RESUME quirk
>
> The above reintroduced this bug on my NP900x4c. Reverting the commit makes
> things work again.

Confirmed, the bug is back again on samsung latops :-(

https://bbs.archlinux.org/viewtopic.php?pid=1767061#p1767061

Revision history for this message
In , mark (mark-linux-kernel-bugs) wrote :

Not exactly surprising when the change that fixed then gets reverted out of the kernel. Seems like Intel trying to force obsolescence of these systems.

Revision history for this message
In , balazs4web (balazs4web-linux-kernel-bugs) wrote :

(In reply to Mark Syms from comment #251)
> Not exactly surprising when the change that fixed then gets reverted out of
> the kernel. Seems like Intel trying to force obsolescence of these systems.

what has to be done to bring the fix back into the kernel code?
I mean at the first glance it was only for `Samsung hardware`, so I guess it does not affect any other vendors/hardwares.
The mentioned commit looks like just a `code cleanup` changes. but I'm not familiar with the kernel code.

Shall I file a new issue or could this be opened again?

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

Lv is not working on this now, and I'm new to the EC code, first let's confirm a quick revert patch.

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

Created attachment 274121
revert patch

please confirm
1. the problem exists with the latest upstream kernel, say, 4.16-rc1
2. the problem is gone after applying the patch attached

Revision history for this message
In , odi (odi-linux-kernel-bugs) wrote :

I can confirm that the regression exists with 4.15.3 and that your patch fixes it. 4.16 not tested.

Revision history for this message
In , cribari (cribari-linux-kernel-bugs) wrote :

I am experiencing the same problem (Arch Linux, kernel 4.15.3, KDE). Hardware: Samsung laptop model 900X3L. Will the patch be included in the Linux kernel?

Revision history for this message
In , balazs4web (balazs4web-linux-kernel-bugs) wrote :

Created attachment 274447
revert-patch works

confirmed, pacth is working on top of `4.15`

Revision history for this message
In , cribari (cribari-linux-kernel-bugs) wrote :

I am still facing this problem with Arch Linux + kernel 4.16.0-2. Hardware is a Samsung laptop (model: NP900X3L-KW1BR). Suggestions are welcome.

Revision history for this message
In , cribari (cribari-linux-kernel-bugs) wrote :

I compiled kernels 4.16.0 and 4.16.2 (in Arch Linux) using the patch in Comment #254 and I confirm that the patch fixes the problem. (Hardware: Samsung model NP900X3L-KW1BR.)

Revision history for this message
In , yu.c.chen (yu.c.chen-linux-kernel-bugs) wrote :

Hi @Rui,
may I know if the patch #Comment 254 will be pushed upstream ?

Changed in linux:
importance: Unknown → High
status: Unknown → Confirmed
Revision history for this message
In , jonas (jonas-linux-kernel-bugs) wrote :

Hi

Poll on status for this bug. Does anyone know if/when it will be fixed upstream?

Revision history for this message
In , cribari (cribari-linux-kernel-bugs) wrote :

Has this been fixed upstream? Despite what we read at

https://www.systutorials.com/linux-kernels/59801/acpi-ec-fix-regression-related-to-triggering-source-of-ec-event-handling-linux-4-14-3/

some people still face the problem. Thank you.

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

sorry that I thought I have already sent this patch upstream, but apparently I didn't.

(In reply to Francisco Cribari from comment #263)
> Has this been fixed upstream? Despite what we read at
>
> https://www.systutorials.com/linux-kernels/59801/acpi-ec-fix-regression-
> related-to-triggering-source-of-ec-event-handling-linux-4-14-3/
>
I can not access this page.
But if you can confirm the problem still exists in 4.19-rc1, and the revert indeed fixes it, I will send the revert patch out.

Changed in linux:
status: Confirmed → Incomplete
Revision history for this message
Kira (kirasglimmer) wrote :

This happens for me.

Linux 4.15.0-34-generic #37-Ubuntu SMP Mon Aug 27 15:21:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I'm running Linux Mint 19 Cinnamon 3.8.9. If I disconnect power, remove the back cover, disconnect both the main battery and the backup battery, I am able to get power status updates (e.g. battery will show as charging/charged when plugged in, or just regular battery when not). Once I sleep, I have to repeat this process if I want the battery status to work again.

`acpi -p` correctly shows the charging state, but `tlp-stat -b` does not.

Revision history for this message
Kira (kirasglimmer) wrote :

Additionally,

`acpitool -ab` shows when unplugged,
  Battery #1 : Charging, 100.0%, 00:00:00
  AC adapter : off-line

and when plugged in:
  Battery #1 : Charging, 100.0%, 00:00:00
  AC adapter : online

Note: Battery charge percentage does work.

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

Created attachment 278797
revert patch on top of 4.19-rc4

please confirm this revert patch works for you on top of 4.19-rc kernel.
I will send it out after got your confirmation.

Revision history for this message
In , odi (odi-linux-kernel-bugs) wrote :

(In reply to Zhang Rui from comment #265)
> please confirm this revert patch works for you on top of 4.19-rc kernel.
> I will send it out after got your confirmation.

It does! Also the CPU max frequency is now back to 2.4GHz after unplug/replug cycle.

VANILLA 4.19-rc4

plugged:
bat ~ # cat /sys/class/power_supply/ADP1/online
1
bat ~ # cat /sys/class/power_supply/BAT1/status
Charging

unplugged:
bat ~ # cat /sys/class/power_supply/ADP1/online
0
bat ~ # cat /sys/class/power_supply/BAT1/status
Charging

PATCHED 4.19-rc4

plugged:
bat ~ # cat /sys/class/power_supply/BAT1/status
Charging

unplugged:
bat ~ # cat /sys/class/power_supply/ADP1/online
0
bat ~ # cat /sys/class/power_supply/BAT1/status
Discharging

replugged:
bat ~ # cat /sys/class/power_supply/ADP1/online
1
bat ~ # cat /sys/class/power_supply/BAT1/status
Charging

Revision history for this message
In , jonas (jonas-linux-kernel-bugs) wrote :

(In reply to Zhang Rui from comment #265)
> Created attachment 278797 [details]
> revert patch on top of 4.19-rc4
>
> please confirm this revert patch works for you on top of 4.19-rc kernel.
> I will send it out after got your confirmation.

Hi Rui,

Any chance this will hit 4.19 rc before final release?

Revision history for this message
In , odi (odi-linux-kernel-bugs) wrote :

ping... what else is required to get this merged?

Revision history for this message
In , pablo.caron (pablo.caron-linux-kernel-bugs) wrote :

The bug is still present in kernel 4.20. I am using Ubuntu and installed the "standard" kernel downloaded from:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.20/

Revision history for this message
In , mjg (mjg-linux-kernel-bugs) wrote :

Same problem on 4.20.11 on Fedora 29 (which has a patched upower, so it's not that other bug).

With the revert patch fromm comment #265, everything works as expected (and as it used to before the regression).

Please let me know if you need more info.

In case someone wants to test on Fedora:

https://copr.fedorainfracloud.org/coprs/mjg/kernel-book9/

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

The revert patch has been merged by Rafael, and it will show up in v5.1

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :
Changed in linux:
status: Incomplete → Fix Released
Revision history for this message
In , mjg (mjg-linux-kernel-bugs) wrote :

(In reply to Zhang Rui from comment #271)
> The revert patch has been merged by Rafael, and it will show up in v5.1

Just for anyone wondering: it's in the stable kernel 5.0.9, also.

Revision history for this message
In , pablo.caron (pablo.caron-linux-kernel-bugs) wrote :

(In reply to Michael J Gruber from comment #273)
> (In reply to Zhang Rui from comment #271)
> > The revert patch has been merged by Rafael, and it will show up in v5.1
>
> Just for anyone wondering: it's in the stable kernel 5.0.9, also.

I can confirm that the bug has been fixed in the stable kernel 5.0.9 downloaded from https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.9/

Thank you!

You can find some test I did to check the correct behavior.

~$ uname -sr
Linux 5.0.9-050009-generic
Plugged
~$ cat /sys/class/power_supply/BAT1/status
Charging

Unplugged
~$ cat /sys/class/power_supply/ADP1/online
0
~$ cat /sys/class/power_supply/BAT1/status
Discharging

Replugged
~$ cat /sys/class/power_supply/ADP1/online
1
~$ cat /sys/class/power_supply/BAT1/status
Charging

Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
In , cinemaapk.in (cinemaapk.in-linux-kernel-bugs) wrote :

watch your loving colors tv show online
https://biggboss13tvs.com

Revision history for this message
In , cinemaapk.in (cinemaapk.in-linux-kernel-bugs) wrote :

watch pinoy tv channel and pinoy tv shows free
https://pinoytvz.su

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.