[Samsung NP900X3B-A01US] Lid state changes not detected

Bug #986724 reported by adampaetznick on 2012-04-21
330
This bug affects 66 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned

Bug Description

Neither lid open nor lid close events are detected. Running:
cat /proc/acpi/button/lid/LID0/state

always returns:
state: open

This would seem to cause issues with suspend/resume.

WORKAROUND: Suspend via GUI or command line.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-23-generic 3.2.0-23.36
ProcVersionSignature: Ubuntu 3.2.0-23.36-generic 3.2.14
Uname: Linux 3.2.0-23-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu5
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: PCH [HDA Intel PCH], device 0: ALC269VC Analog [ALC269VC Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: adam 1893 F.... pulseaudio
Card0.Amixer.info:
 Card hw:0 'PCH'/'HDA Intel PCH at 0xf0700000 irq 50'
   Mixer name : 'Intel CougarPoint HDMI'
   Components : 'HDA:10ec0269,144dc0cd,00100202 HDA:80862805,80860101,00100000'
   Controls : 22
   Simple ctrls : 10
Date: Sat Apr 21 19:53:30 2012
EcryptfsInUse: Yes
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Release amd64 (20111012)
MachineType: SAMSUNG ELECTRONICS CO., LTD. 900X3B/900X4B
ProcEnviron:
 LANGUAGE=en_CA:en
 TERM=xterm
 PATH=(custom, user)
 LANG=en_CA.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-23-generic root=UUID=0b0abe89-812f-48c8-98ad-18dbf49290f8 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-23-generic N/A
 linux-backports-modules-3.2.0-23-generic N/A
 linux-firmware 1.79
SourcePackage: linux
StagingDrivers: mei
UpgradeStatus: Upgraded to precise on 2012-03-28 (24 days ago)
dmi.bios.date: 03/20/2012
dmi.bios.vendor: Phoenix Technologies Ltd.
dmi.bios.version: P05AAH
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: 900X3B/900X4B
dmi.board.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.board.version: FAB1
dmi.chassis.asset.tag: Asset Tag
dmi.chassis.type: 9
dmi.chassis.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.chassis.version: 0.1
dmi.modalias: dmi:bvnPhoenixTechnologiesLtd.:bvrP05AAH:bd03/20/2012:svnSAMSUNGELECTRONICSCO.,LTD.:pn900X3B/900X4B:pvr0.1:rvnSAMSUNGELECTRONICSCO.,LTD.:rn900X3B/900X4B:rvrFAB1:cvnSAMSUNGELECTRONICSCO.,LTD.:ct9:cvr0.1:
dmi.product.name: 900X3B/900X4B
dmi.product.version: 0.1
dmi.sys.vendor: SAMSUNG ELECTRONICS CO., LTD.

adampaetznick (adampaetznick) wrote :
Brad Figg (brad-figg) on 2012-04-22
Changed in linux (Ubuntu):
status: New → Confirmed
adampaetznick (adampaetznick) wrote :
adampaetznick (adampaetznick) wrote :
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.4kernel[1] (Not a kernel in the daily directory). Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc4-precise/

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: needs-upstream-testing
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
adampaetznick (adampaetznick) wrote :

Tested with kernel 3.4-rc4-precise. Problem still exists. I changed the tag to kernel-bug-exists-upstream.

tags: added: kernel-fixed-upstream
removed: needs-upstream-testing
tags: added: kernel-bug-exists-upstream
removed: kernel-fixed-upstream
adampaetznick (adampaetznick) wrote :

I should note that lid detection worked briefly for me after updating BIOS. But after a short time (or maybe just after rebooting once??) it stopped working. Similar experience was reported by another person at http://ubuntuforums.org/showthread.php?t=1737086&page=18

Սահակ (petrosyan) on 2012-05-01
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
alistair (alistair-tyeurgain) wrote :

I have an additional problem on my dual-boot U12.04/Win7 machine. I need to have both on this machine. I need to have access to software that will not run on and has no equivalent on Linux.

If I boot Ubuntu and then Windows, the lid-close suspend does not then work on Windows either! The only cure is to shut down, and then stick a pin in the tiny hole in the bottom of the machine that causes some sort of extra-hard shutdown. This suggests to me that somehow, the Linux kernel (or some other bit of GNU/Linux/Ubuntu software) seems to be breaking the lid switch.

As an aside, if you do this hard shutdown, it is not possible to get the machine going again unless it is connected to mains power. If it is on battery, the on-switch has no effect. This has nothing to do with Ubuntu/Linux - this seems to be a design "feature" of the machine.

A

Andrew Minter (agm) wrote :

I'm seeing the exact same behaviour as Alistair on an NP530U3B. The curious thing is that after pressing the magic reset button all works well for a while. Under Linux the lid switch works, the laptop sleeps just fine and nothing I can do seems to break it.

A day or two later, it mysteriously stops generating ACPI events (according to acpi_listen) and it take a press of the reset button to get it all working. Really hard to debug and although this behaviour is consistent, I can't quickly reproduce.

Սահակ (petrosyan) wrote :

After the latest BIOS update (P07AAH) laptop suspends when the lid is closed, but when the lid is opened it doesn't resume and one needs to press the power button in order for laptop to resume from sleep. This is on Samsung Series 9 NP900X3B.

Can anybody else confirm this behavior?

adampaetznick (adampaetznick) wrote :

I confirm that lid detection is working with BIOS version P07AAH (at least for the moment). Suspend works on lid close, but wake must be triggered by pressing the power button.

Marcin Bojko (marcinbojko) wrote :

Hi,
I would like to confirm this - very, very annoing error on Samsung NP530U3B-A02PL
I changed/compiled new kernel- 3.4.4 stable, tweaked acpid, tweaked /etc/pm/config.d. tweaked laptop-mode - error still exists.
It can be confirmed quite fast - just load your battery to 100%, suspend your machine by closing the lid, go back 6-8 hours later (when battery is discharged to 80-90% - and there it is).
Not only lid acpi events are silent but most of them is just gone - laptop-mode still depends on this events and starts to fail AC detection.

Bryan Stine (stinebd) wrote :

Precise on a new 2012 NP900X4C (Ivy Bridge), same issues.

Confirming #6 as well, lid switch detection worked briefly after a firmware update, but has since stopped working. cat /proc/acpi/button/lid/LID0/state shows that it's always open, no matter the physical state.

Also confirming #10, that the machine does not resume on opening the lid (and didn't even when the lid detection was working properly).

Obviously it's likely an issue in ACPI, so I will look through the DSDT and see if any OSI changes have an effect.

Marcin Bojko (marcinbojko) wrote :

I've extracted mine DSDT table - if someone wants to compare it with other models - I am happy to send it.

adampaetznick (adampaetznick) wrote :

Sadly the P07AAH BIOS has returned to the same symptoms as before. Lid close detection no longer works in either Ubuntu or Windows 7.

Սահակ (petrosyan) wrote :

It no longer works for me either.

I wonder how is it possible that it works at first and then stops working?

Marcin Bojko (marcinbojko) wrote :

Ok. I've extracted DSDT table, recompiled it (without any changes) and included in my new 3.5.0rc4 kernel.
Dmesg shows a lots of ACPI warning (conflicted spaces) but the error stays.

adampaetznick (adampaetznick) wrote :

Marcin, have you tried the battery disconnect pin-hole button as described by alistair in comment #7? The hope would be that, once working, your kernel wouldn't cause the lid detection problem to re-occur.

Marcin Bojko (marcinbojko) wrote :

@Adam,
of course, cleaned the state of lid sensor, recompiled (tested on few stanby's) and then left for couple of hours.
I suppose it's not a sensor trouble but whole acpi subsystem/sensor - my Mint stops detecting any AC/battery state changes.
Only pinhole and hairpin are helpful enough :(

But, on the other hand - I just tweaked DSDT again (fixed all warning and errors), waiting for kernel to recompile again.

Andrew Minter (agm) wrote :

I confirm the same behaviour as #11. This procedure reproduces the bug every time. Does NOT happen when the machine is left sleeping while attached to power.

eric (obrowny06) wrote :

On the new 9 serie np900x3c

I confirm everything ; lid not detected, can't reboot after hard shutdown unless you plu in the main power.
suspend seems to work but the wake up doesn't at all (except wit kernel 3.5-rc2)
keyboard backlight doesn't work at all :-(

Same on the 900X4C, which is the bigger brother of the 900X3C

Marcin Bojko (marcinbojko) wrote :

As far as I know - Intel MEI is to blame.
I've blacklisted mei driver in modprobe.d and machine hangs right after first suspend.

Bryan Stine (stinebd) wrote :

While running a 3.5 kernel, I noticed a few ACPI warnings about resource conflicts with the lpc_ich module. After blacklisting the module and resetting the laptop (with the pinhole battery disconnect in the underside), I've been running normally for a few days and still have AC adapter and battery events reporting and even lid switch detection (although it's lagged by about 10 seconds).

I'll give it a try.
mb
W dniu 2012-07-12 02:08, Bryan Stine pisze:
> While running a 3.5 kernel, I noticed a few ACPI warnings about resource
> conflicts with the lpc_ich module. After blacklisting the module and
> resetting the laptop (with the pinhole battery disconnect in the
> underside), I've been running normally for a few days and still have AC
> adapter and battery events reporting and even lid switch detection
> (although it's lagged by about 10 seconds).
>

Solution from #23 didn't work for me.

I tried blacklisting lpc_ich and resetting the laptop with the pinhole and so far battery state changes and lid events are recognised. This is using 3.5-rc6

Aviram Jenik (q-aviram) wrote :

@Bryan and @Marcin: Did you put the pin in when the computer was plugged in? Or without AC power? And for how long, just a quick click or longer?

The reason I'm asking, is when I tried it the Motherboard got fried and I had to go to Samsung for replacement. Before anyone else makes the same mistake I did, it would be good to have your process better documented :-)

Bryan Stine (stinebd) wrote :

A quick update: last night I lost ACPI events again, so blacklisting lpc_ich seems not to be a permanent/consistent fix. I plan to continue testing with this module blacklisted, and then with the kernel argument acpi_enforce_resources=strict. It certainly did last significantly longer than usual with the module blacklisted, but I have to verify that it wasn't just a fluke.

Aviram/#27: Resetting the laptop can be done by shutting it down normally, unplugging the AC adapter, and sticking a pin into the battery disengage hole on the underside of the laptop. There is a button in the hole which lightly clicks, and it does not have to be held down (pressing it once will suffice). WARNING that this will cause the laptop not to boot at all until the AC adapter is plugged back in (this is the state in which the laptop comes when first unboxing it), at which point it will continue to boot normally thereafter.

Marcin Bojko (marcinbojko) wrote :

@Aviram #27
Bryan's description is very acurate.

@Bryan
As I was saying - blacklisting lpc_ich and/or mei module doesn't work for me.
Blacklisting lpc end with hanged window manager/decorator, blacklisting mei ends with immediate power hang (no need to wait 6-8 hours).
BTW. I've found quicker way to test suspend hanging - just suspend machine (closing the lid), connect AC power, disconnect, connect, disconnect - at voila - hangin' there ;(

Aviram Jenik (q-aviram) wrote :

Thanks Bryan and Marcin for the clarification. I followed the instructions (didn't fry the MB this time) but the problem is still there. 3.5.0rc6 from the Ubuntu ppa.

Marcin Bojko (marcinbojko) wrote :

Confirm #30 - 3.5.0rc4/rc5/rc6 - problem still exists.

MarkS (mark-marksyms) wrote :

Possibly related to this other ACPI bug as it's all in ACPI event detection? https://bugzilla.kernel.org/show_bug.cgi?id=44161

alistair (alistair-tyeurgain) wrote :

I installed OpenSUSE 12.2 on this machine as soon as it became final.

Since then, I have booted into SuSE and back into W7, suspended and resumed on both OSes many times, and have had no problems.

In fact, everything* but the "toys" (ambient light sensor and keyboard illumination) works perfectly. (Make sure you configure your trackpad in "System Settings"!)

By default, KDE plays a little jingle just before it shuts down on suspend. This is normally really annoying, but in this case, it is quite handy, because both Ubuntu and W7 suspend silently, and on this computer, which is normally silent in use and has no external LEDs, there is no way to tell whether or not it has suspended, apart from pinging it from another machine on the LAN.

So, maybe this muddies the waters. Is the problem solved in the SuSE kernel, or is it a problem with Ubuntu at a higher level?

A

* Lid open is not detected; I have to press the power button to get the machine to resume. However, I cannot recall any distro *ever* detecting lid open on any post-APM laptop I have owned, and W7 sometimes fails to, so I cannot really blame OpenSUSE for this.

Can you give us a hint with 'uname -a'?
mb

W dniu 2012-09-13 00:03, alistair pisze:
> I installed OpenSUSE 12.2 on this machine as soon as it became final.
>
> Since then, I have booted into SuSE and back into W7, suspended and
> resumed on both OSes many times, and have had no problems.
>
> In fact, everything* but the "toys" (ambient light sensor and keyboard
> illumination) works perfectly. (Make sure you configure your trackpad
> in "System Settings"!)
>
> By default, KDE plays a little jingle just before it shuts down on
> suspend. This is normally really annoying, but in this case, it is
> quite handy, because both Ubuntu and W7 suspend silently, and on this
> computer, which is normally silent in use and has no external LEDs,
> there is no way to tell whether or not it has suspended, apart from
> pinging it from another machine on the LAN.
>
> So, maybe this muddies the waters. Is the problem solved in the SuSE
> kernel, or is it a problem with Ubuntu at a higher level?
>
> A
>
> * Lid open is not detected; I have to press the power button to get the
> machine to resume. However, I cannot recall any distro *ever* detecting
> lid open on any post-APM laptop I have owned, and W7 sometimes fails to,
> so I cannot really blame OpenSUSE for this.
>

alistair@juno:~/Documents> uname -a
Linux junalistair@juno:~/Documents> uname -a
Linux juno.ty-eurgain 3.4.6-2.10-desktop #1 SMP PREEMPT Thu Jul 26 09:36:26 UTC 2012 (641c197) x86_64 x86_64 x86_64 GNU/Linux
o.ty-eurgain 3.4.6-2.10-desktop #1 SMP PREEMPT Thu Jul 26 09:36:26 UTC 2012 (641c197) x86_64 x86_64 x86_64 GNU/Linux

KDE 4.91 from the repo at dowload.opensuse.org .

svenmeier (sven-meiers) wrote :

No change with Ubuntu 12.10 beta:

sven@mithril:~$ uname -a
Linux mithril 3.5.0-14-generic #18-Ubuntu SMP Fri Sep 14 15:52:50 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

nils (5il0) wrote :

I installed opensuse 12.2 after the acpi issues had appeared on ubuntu. Behaviour was exactly the same. Battery indicator in gnome didn't pickup power changes, suspend when lidclose did not work.

I did a battery reset, so for now it works. Lets see if it stays this way or if the same issues appear in a week or so.

On 22/09/12 15:32, nils wrote:
> I installed opensuse 12.2 after the acpi issues had appeared on ubuntu.
> Behaviour was exactly the same. Battery indicator in gnome didn't pickup
> power changes, suspend when lidclose did not work.
>
> I did a battery reset, so for now it works. Lets see if it stays this
> way or if the same issues appear in a week or so.
>
Hi

I did not try Gnome, only KDE, and it has yet to give a problem.

A

I'll try Gnome then :) I find it hard to believe that the source of the problem would be in the desktopenvironment. I now have one ubuntu laptop using opensuse's kernel and one opensuse laptop. Identical hardware, identical kernel. The features are already back on the ubuntu laptop while the opensuse laptop is still going.

My machine went wrong about an hour ago. It failed to suspend on closing the lid. However, it did not need a hard reset for well over two weeks, over numerous suspend /resume cycles and reboots,  which is much better than I ever had with Ubuntu.

So,  I still have no clue what is going on,  but at least the machine is usable with OpenSUSE..

Anils <email address hidden> wrote:I'll try Gnome then :) I find it hard to believe that the source of the
problem would be in the desktopenvironment. I now have one ubuntu laptop
using  opensuse's kernel and one opensuse laptop. Identical hardware,
identical kernel. The features are already back on the ubuntu laptop
while the opensuse laptop is still going.

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/986724

Title:
  Lid state changes not detected on Samsung NP900X3B

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/986724/+subscriptions

tags: added: 900x3a 900x3b 900x3c 900x4c
tags: added: needs-full-computer-model needs-upstream-testing
removed: 900x3a 900x3b 900x3c 900x4c
Changed in linux (Ubuntu):
status: Confirmed → Opinion
status: Opinion → Incomplete
summary: - Lid state changes not detected on Samsung NP900X3B
+ [Samsung NP900X3B-A01US] Lid state changes not detected
tags: added: latest-bios-p10aah
removed: needs-full-computer-model
tags: added: regression-potential
tags: added: kernel-bug-exists-upstream-v3.11-rc5
removed: kernel-bug-exists-upstream needs-upstream-testing
tags: added: needs-upstream-testing
removed: kernel-bug-exists-upstream-v3.11-rc5
tags: added: kernel-bug-exists-upstream kernel-bug-exists-upstream-3.11.0-031100rc7
removed: needs-upstream-testing
tags: added: kernel-bug-exists-upstream-v3.11.rc7
removed: kernel-bug-exists-upstream-3.11.0-031100rc7
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: needs-upstream-testing
Սահակ (petrosyan) on 2014-02-21
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
description: updated
288 comments hidden view all 368 comments

Lv,

Sorry for the confusion. I am trying to clear the clouds:

1. Power plug detection is no longer a problem once it started working again. I am unable to make it fail on purpose, so I can't test if any kernel fixes that.

2. The lid switch is now non-functional with any kernel (state is always "open"):
- 3.17
- 3.18
- linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6,7}
- linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6} + ec-flush7-updated
However during ~40 lid open/close cycle with the latter kernel it worked 2 times (state = "closed").

3. suspend: doesn't work by closing the lid, works fine by suspending manually (KDE start menu)

4. resume: always works, but there is a backlight regression which has nothing to do with this issue. See https://bugs.freedesktop.org/show_bug.cgi?id=80773

Don't worry about my kernel building skills. I do that very very often and have established a workflow that always works. They are all straight from git, hand-configured per the specific hardware and use no modules. They are loaded by a hand-configured grub2 straight without initramfs.

Hi, San

Thanks for your help.

(In reply to San from comment #220)
> Hi!
>
> I'm not sure what i have to do. Some time ago i made kernel and uploaded it
> onto my server.
> I would like to do it again. So I'm now compiling this way:
>
> git clone git://kernel.ubuntu.com/ubuntu/ubuntu-utopic.git
> cd ubuntu-utopic/
> patch -p1 < ../155351.patch
> make mrproper
> wget http://beta.paragrafówka.pl/eksport/config-3.16.3 && mv config-3.16.3
> .config
> fakeroot debian/rules clean && fakeroot debian/rules binary-headers
> binary-generic
>
> I'm building as suggested in comment 206
> (https://bugzilla.kernel.org/show_bug.cgi?id=44161#c206) for now and then
> I'll try whole set (as suggested in second point).
>
> Is that right way?

Ortwin is providing me the testing using private email.
I'll post some of the results here after the progress are made for the investigation.
Maybe you can hold a while until the problem is correctly figured out.

The 2 commits are now reverted in linux-pm, it will appear in the Linux mainline in 1 month.

Best regards

Hi, Ortwin

(In reply to Ortwin Glück from comment #221)
> Lv,
>
> Sorry for the confusion. I am trying to clear the clouds:
>
> 1. Power plug detection is no longer a problem once it started working
> again. I am unable to make it fail on purpose, so I can't test if any kernel
> fixes that.

So the 2 commits didn't make a regression on the power plug?

> 2. The lid switch is now non-functional with any kernel (state is always
> "open"):
> - 3.17
> - 3.18
> - linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6,7}
> - linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6} +
> ec-flush7-updated
> However during ~40 lid open/close cycle with the latter kernel it worked 2
> times (state = "closed").
>
> 3. suspend: doesn't work by closing the lid, works fine by suspending
> manually (KDE start menu)
>
> 4. resume: always works, but there is a backlight regression which has
> nothing to do with this issue. See
> https://bugs.freedesktop.org/show_bug.cgi?id=80773

Can you perform a resume by opening the LID?
Also please upload the output of /proc/acpi/wakeup here.

> Don't worry about my kernel building skills. I do that very very often and
> have established a workflow that always works. They are all straight from
> git, hand-configured per the specific hardware and use no modules. They are
> loaded by a hand-configured grub2 straight without initramfs.

Good to know this!
That makes every testing possible. :-)

Thanks and best regards
-Lv

Created attachment 157011
proc/acpi/wakeup

I was finally able to do some tests, here are my results.
- Samsung NP900X3C
- using linux-pm/linux-next with patches from Comment 219.

Upon boot, I get the following behavior:
- Power plug detection appears to be working no matter what.
- Lid switch is recognized (open/close states are reported correctly using acpi_listen), but suspend does NOT trigger.

However, after a while, (maybe after a few ACPI events?), the LID switch actually triggers suspend correctly; I'm not sure the issue is related.
It does not seem to revert to the broken behavior even after some time. It works both on battery and AC power.

Opening the LID does not trigger a resume, but tbh that was never the case on this machine.

Lv, if you have more tests in mind / need any logs just let me know.

Attaching my /proc/acpi/wakeup.

Thanks,
Andrea

I'm unable to compile kernel. It patches ok, but i get:

== code ==
  Using /home/publiczny2/kernel/ubuntu-utopic as source for kernel
  /home/publiczny2/kernel/ubuntu-utopic is not clean, please run 'make mrproper'
  in the '/home/publiczny2/kernel/ubuntu-utopic' directory.
==/code ==

Hi,

(In reply to Andrea Fagiani from comment #224)
> Created attachment 157011 [details]
> proc/acpi/wakeup
>
> I was finally able to do some tests, here are my results.
> - Samsung NP900X3C
> - using linux-pm/linux-next with patches from Comment 219.
>
> Upon boot, I get the following behavior:
> - Power plug detection appears to be working no matter what.
> - Lid switch is recognized (open/close states are reported correctly using
> acpi_listen), but suspend does NOT trigger.

Can you confirm if they still work after suspend/resume?
See test requirement A below.

> However, after a while, (maybe after a few ACPI events?), the LID switch
> actually triggers suspend correctly; I'm not sure the issue is related.
> It does not seem to revert to the broken behavior even after some time. It
> works both on battery and AC power.
>
> Opening the LID does not trigger a resume, but tbh that was never the case
> on this machine.

According to the /proc/acpi/wakeup - yes, only power button can wakeup the system.

> Lv, if you have more tests in mind / need any logs just let me know.

A. We want to confirm that this set of patches can fix old issues. We need to be careful before removing old quirks.
1. using linux-pm/linux-next with patches from Comment 219.
2. build/boot the kernel
3. do a suspend/resume
Can power plug and LID switch still work after resuming?

B. We want to see some proofs that Samsung EC firmware acts exactly as what we've inferred from the reversion requirements. So could you:
1. using linux-pm/linux-next with patches from Comment 219.
2. Uncomment the following line in the drivers/acpi/ec.c:
     /* #define DEBUG */
3. build/boot the kernel
4. do some LID state changes and make sure they are captured by acpi_listen.
5. upload the dmesg output here.

C. We want to confirm if the timely polling support is really required by Samsung EC firmware:
1. using linux-pm/linux-next
3. git am gpe-handle21.patch (attachment 156591 [details])
4. git am ec-flush1.patch (attachment 155351 [details])
5. git am ec-flush2.patch (attachment 155371 [details])
6. git am ec-flush3.patch (attachment 155381 [details])
7. git am ec-flush4.patch (attachment 155391 [details])
8. git am ec-flush5.patch (attachment 155411 [details])
9. git am ec-flush6.patch (attachment 155421 [details])
Do not apply the ec-flush7.patch, can power plug and LID switch still work, and can still work even after resuming?

D. We want to confirm if this is really a regression and current fix can:
1. Using linux-pm/linux-next
2. git revert 79149001
3. git revert df9ff918
Do not apply any of the posted patches, can power plug and LID switch still work? And
1. Using linux-pm/linux-next
Do not revert the "regression fix" commits and do not apply any of the posted patches, can power plug and LID switch still work?

Thanks in advance.

> A. We want to confirm that this set of patches can fix old issues. We need
> to be careful before removing old quirks.
> 1. using linux-pm/linux-next with patches from Comment 219.
> 2. build/boot the kernel
> 3. do a suspend/resume
> Can power plug and LID switch still work after resuming?

Yes, both power plug and LID switch work and are correctly captured by acpi_listen.

> B. We want to see some proofs that Samsung EC firmware acts exactly as what
> we've inferred from the reversion requirements. So could you:
> 1. using linux-pm/linux-next with patches from Comment 219.
> 2. Uncomment the following line in the drivers/acpi/ec.c:
> /* #define DEBUG */
> 3. build/boot the kernel
> 4. do some LID state changes and make sure they are captured by acpi_listen.
> 5. upload the dmesg output here.

Opened/closed 5 times, all of them showed up in acpi_listen.
Attaching dmesg.

> C. We want to confirm if the timely polling support is really required by
> Samsung EC firmware:
> 1. using linux-pm/linux-next
> 3. git am gpe-handle21.patch (attachment 156591 [details])
> 4. git am ec-flush1.patch (attachment 155351 [details])
> 5. git am ec-flush2.patch (attachment 155371 [details])
> 6. git am ec-flush3.patch (attachment 155381 [details])
> 7. git am ec-flush4.patch (attachment 155391 [details])
> 8. git am ec-flush5.patch (attachment 155411 [details])
> 9. git am ec-flush6.patch (attachment 155421 [details])
> Do not apply the ec-flush7.patch, can power plug and LID switch still work,
> and can still work even after resuming?

Both work fine, even after a suspend/resume.

Note that the ec-flush patches depend on the reverts, so I applied them even if they weren't explicitly listed. I can test without them later today if required.

> D. We want to confirm if this is really a regression and current fix can:
> 1. Using linux-pm/linux-next
> 2. git revert 79149001
> 3. git revert df9ff918
> Do not apply any of the posted patches, can power plug and LID switch still
> work? And

Upon boot, it works correctly. After a suspend/resume, power plug detection does not work anymore (it is greatly delayed) and neither does the LID switch (doesn't trigger at all).

> 1. Using linux-pm/linux-next
> Do not revert the "regression fix" commits and do not apply any of the
> posted patches, can power plug and LID switch still work?

I wasn't able to do this last test due to time constraints, I'll try and do that later today.

Created attachment 157111
dmesg - #DEBUG enabled

Download full text (3.6 KiB)

Hi,

The test results are very useful for us to understand this issue. :-)

(In reply to Andrea Fagiani from comment #227)
> > B. We want to see some proofs that Samsung EC firmware acts exactly as what
> > we've inferred from the reversion requirements. So could you:
> > 1. using linux-pm/linux-next with patches from Comment 219.
> > 2. Uncomment the following line in the drivers/acpi/ec.c:
> > /* #define DEBUG */
> > 3. build/boot the kernel
> > 4. do some LID state changes and make sure they are captured by
> acpi_listen.
> > 5. upload the dmesg output here.
>
> Opened/closed 5 times, all of them showed up in acpi_listen.
> Attaching dmesg.

Yes. I also found the 0x5E (5 times) and 0x5F (4 times) in the dmesg.
TBH, I didn't find proofs for comment 210 in it.
So the issue isn't as what I imagined before.

> > C. We want to confirm if the timely polling support is really required by
> > Samsung EC firmware:
> > 1. using linux-pm/linux-next
> > 3. git am gpe-handle21.patch (attachment 156591 [details])
> > 4. git am ec-flush1.patch (attachment 155351 [details])
> > 5. git am ec-flush2.patch (attachment 155371 [details])
> > 6. git am ec-flush3.patch (attachment 155381 [details])
> > 7. git am ec-flush4.patch (attachment 155391 [details])
> > 8. git am ec-flush5.patch (attachment 155411 [details])
> > 9. git am ec-flush6.patch (attachment 155421 [details])
> > Do not apply the ec-flush7.patch, can power plug and LID switch still work,
> > and can still work even after resuming?
>
> Both work fine, even after a suspend/resume.

Thus, in order to fix Samsung issue.
Either we need timely polling quirk (since test A works)
Or we need old CLEAR_ON_RESUME quirk (since test C works).

> Note that the ec-flush patches depend on the reverts, so I applied them even
> if they weren't explicitly listed. I can test without them later today if
> required.

Not necessary :-). I only want the kernel test results __wih__ the reverts (as they conflict with the patches posted here) so I don't need to post another set of rebased patches for current linux-pm/linux-next.

> > D. We want to confirm if this is really a regression and current fix can:
> > 1. Using linux-pm/linux-next
> > 2. git revert 79149001
> > 3. git revert df9ff918
> > Do not apply any of the posted patches, can power plug and LID switch still
> > work? And
>
> Upon boot, it works correctly. After a suspend/resume, power plug detection
> does not work anymore (it is greatly delayed) and neither does the LID
> switch (doesn't trigger at all).

Thus the regression seems to be: "the 2 commits have broken the CLEAR_ON_RESUME quirk".

I guess SCI_EVT is set after resuming but GPE doesn't fire.
If this is the case, this isn't a silicon/firmware bug but a driver bug.
As EC GPE is edge triggered and GPE might have already been fired before suspending.
If so, we need to improve the driver and shouldn't fix the driver bug using any kind of quirk.

So I want to check if the SCI_EVT is set after resuming.
Let me think about a way to confirm this.
I'll add another test later.

> > 1. Using linux-pm/linux-next
> > Do not revert the "regression fix" commits and do not apply any of the
> > posted patches...

Read more...

Hi, Andrea

Please use either of the following working combinations:
1. test A
2. test C
3. test D2 (which you haven't done yet)

If you'll do test D2 for us, you can just do it and capture the confirmation information at same time.

Please do:
1. suspend
2. do some LID opening/closing to trigger old issue
3. resume
4. dmesg > dmesg-right-after-resume.txt

And post the dmesg-right-after-resume.txt here.
If the kernel log buffer size is not big enough to contain all messages right after resuming, please increase it using the kernel boot parameter - log_buf_len=1M.

Thanks and best regards

Hi, Andrea

Sorry that I forgot to ask you to enable the EC log to capture the debug information.

2. Uncomment the following line in the drivers/acpi/ec.c:
     /* #define DEBUG */

(In reply to Lv Zheng from comment #230)
> Hi, Andrea
>
> Please use either of the following working combinations:
> 1. test A
> 2. test C
> 3. test D2 (which you haven't done yet)
>
> If you'll do test D2 for us, you can just do it and capture the confirmation
> information at same time.
>
> Please do:
> 1. suspend
> 2. do some LID opening/closing to trigger old issue
> 3. resume
> 4. dmesg > dmesg-right-after-resume.txt
>
> And post the dmesg-right-after-resume.txt here.
> If the kernel log buffer size is not big enough to contain all messages
> right after resuming, please increase it using the kernel boot parameter -
> log_buf_len=1M.

Thanks
-Lv

Created attachment 157351
dmesg-right-after-resume_A

Created attachment 157361
dmesg-right-after-resume_C

Created attachment 157371
dmesg-right-after-resume_D2

Hi Lv,
I captured and uploaded the dmesg as requested.

Concerning test D2, both power plug and LID switch seem to work fine even after a suspend/resume. You'll notice I've played with it a little bit before performing the suspend test.

Hope it helps,
Andrea

Download full text (4.0 KiB)

Hi, Andrea

The results make something clear to me.
Thanks for the great job.

For the 3 test results, SCI_EVT is always flagged after resuming:
Test A (regression fixes + cleanup + timely polling quirk):
[ 42.608720] PM: Restoring platform NVS memory
...
[ 42.649355] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
Test C (regression fixes + cleanup + CLEAR_ON_RESUME quirk):
[ 21.764941] PM: Restoring platform NVS memory
...
[ 21.811916] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
Test D2 (regression fixes + CLEAR_ON_RESUME quirk):
[ 172.102165] PM: Restoring platform NVS memory
...
[ 172.149860] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
This can explain the cause of the original bug identified in comment 129.
The EC behavior is not broekn here. We shouldn't implement a quirk for it since this is a driver bug - if the EC GPE is not fired, we don't have means to poll the events.

For the regression reported in comment 184, I only find the proof in test A that we need both the 2 commits to be reverted:
Test A (regression fixes + cleanup + timely polling quirk):
[ 42.662695] ACPI : EC: ***** Command(QR_EC) started *****
[ 42.662697] ACPI : EC: ===== TASK (2) =====
[ 42.662702] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
[ 42.662703] ACPI : EC: EC_SC(W) = 0x84
[ 42.662707] ACPI : EC: ***** Event running *****
[ 42.662827] ACPI : EC: ##### Query(0x5e) stopped #####
[ 42.666037] ACPI : EC: ===== TASK (2) =====
[ 42.666042] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 42.666045] ACPI : EC: EC_DATA(R) = 0x5f
[ 42.666046] ACPI : EC: ***** Command(QR_EC) hardware completion *****
[ 42.666047] ACPI : EC: ***** Command(QR_EC) stopped *****
The event of 0x5F is returned when SCI_EVT=0.
So we need to revert the 2 commits to allow QR_EC to be sent when SCI_EVT=0 and there is really non 0x00 events need to be drained when SCI_EVT=0.
TBH, the EC behavior is broekn here, thus IMO this is a different issue than the one identified in comment 129.
It seems both the quirks (drain way of CLEAR_ON_RESUME, timely polling way) can work it around. If this issue can be reproduced during runtime, then the CLEAR_ON_RESUME can not work it around at that time. According to your tests, the problem cannot be easily reproduced during runtime. So we can stay unchanged until someone else blaims.

If we have to assume this is only a problem happened during resuming and if we want to root cause why the SCI_EVT is suddently cleared after resuming, we will have to assume that something during the resuming steps has caused this issue because:
1. In test C, 2 stale events are returned with SCI_EVT=1:
[ 21.814019] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 21.814022] ACPI : EC: EC_DATA(R) = 0x5e
[ 21.817333] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 21.817351] ACPI : EC: EC_DATA(R) = 0x5f
So SCI_EVT will not be spontaneously cleared during acpi_ec_unblock_transactions().
2. In test A, 2 stale events are returned with different SCI_EVT value:
[ 42.656421] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF...

Read more...

Hi, Andrea

I checked the log again. And the following seems to be wrong.

> It seems both the quirks (drain way of CLEAR_ON_RESUME, timely polling way)
> can work it around. If this issue can be reproduced during runtime, then the
> CLEAR_ON_RESUME can not work it around at that time. According to your
> tests, the problem cannot be easily reproduced during runtime. So we can
> stay unchanged until someone else blaims.

Though you didn't face real functional issues, there are following entries in the test D2 log:
[ 31.072607] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 31.072613] ACPI : EC: EC_DATA(R) = 0x66
[ 77.835639] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 77.835647] ACPI : EC: EC_DATA(R) = 0x66
[ 79.939044] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 79.939053] ACPI : EC: EC_DATA(R) = 0x66
[ 82.006084] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 82.006087] ACPI : EC: EC_DATA(R) = 0x66
Thanks for you long time test or we won't find them out.
The _Q66 is as follows:
                    Method (_Q66, 0, NotSerialized)
                    {
                        P8XH (Zero, 0x66)
                        If (LEqual (B1EX, One))
                        {
                            Notify (BAT1, 0x80)
                        }
                    }
It updates battery status.
Then it means the assumption in comment 210 might be real and we need the "timely polling quirk" for samsung to have everything wroked around...
So I'll prepare the whole patchset posted here for upstream according to this result.

In test A, I can even see such log entries:
[ 1.788647] ACPI : EC: ***** Command(QR_EC) started *****
[ 1.788651] ACPI : EC: ===== TASK (1) =====
[ 1.788656] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[ 1.788659] ACPI : EC: EC_SC(W) = 0x84
...
[ 1.800289] ACPI : EC: ===== IRQ (0) =====
[ 1.800295] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 1.800300] ACPI : EC: EC_DATA(R) = 0x70
...
[ 1.800385] ACPI : EC: ***** Command(QR_EC) stopped *****
It seems for Samsung EC, the firmware can even return event that happens after we send the query command. There can really be a big time slot between driver pushes the command byte and firmware fetches it.

Thanks and best regards

Hi Lv,

Thanks for all your work on this issue. It's certainly not easy, especially if you don't have the hardware available to test on yourself.

You are probably right about the racy behaviour. One important point though, is that you would need to miss either 8 (or was it 16?) events in a row during run time for the buffer to fill and for SCI_EVT to stop being given entirely.

Under normal usage, I think there's almost always going to be an event regularly enough that this "full" condition never occurs. It was only found to be a problem during suspend, where the EC controller seemed to continue logging, and after enough battery level or AC adapter changes it would stop triggering events even after resume.

My HDD was too full recently to build kernels, but I have just freed some space (a bit late, sorry) so if you want me to test anything please let me know.

Download full text (3.4 KiB)

(In reply to Kieran Clancy from comment #238)
> Hi Lv,
>
> Thanks for all your work on this issue. It's certainly not easy, especially
> if you don't have the hardware available to test on yourself.
>
> You are probably right about the racy behaviour. One important point though,
> is that you would need to miss either 8 (or was it 16?) events in a row
> during run time for the buffer to fill and for SCI_EVT to stop being given
> entirely.
>
> Under normal usage, I think there's almost always going to be an event
> regularly enough that this "full" condition never occurs. It was only found
> to be a problem during suspend, where the EC controller seemed to continue
> logging, and after enough battery level or AC adapter changes it would stop
> triggering events even after resume.
>
> My HDD was too full recently to build kernels, but I have just freed some
> space (a bit late, sorry) so if you want me to test anything please let me
> know.

Hi,

Thanks a lot!
We failed to find this platform selling on China market.

I was thinking this issue doesn't exist during runtime. That's why I said in comment 236 that it might be a BIOS _WAK issue or something in acpi_hw_legacy_wake() has cleared SCI_EVT and we may need further investigation. But finally in comment 237, I found it occurred not only after resuming, but also runtime, so further investigation doesn't matter any more because this is definitely an EC firmware behavior according to the facts.

In our driver, we will queue 2nd QR_EC after sending the 1st QR_EC and before fetching the returned event. At that time, SCI_EVT should still be 1, so we can always send 2 QR_EC for 1 SCI_EVT=1 indication. This somewhat can reduce the reproduction ratio of the Samsung EC issue and that's why during runtime it can hardly be reproduced as we can drain events faster than they are accumulated in the firmware.

But the above code can only make a happen to work environment. If the host acts a bit slower than the target, this issue might still occurr during runtime when there are more than 3 events queued up and the driver will be unable to queue 3rd QR_EC if SCI_EVT is cleared after fetching the 1st event.

It seems the best way for Samsung is always sending QR_EC whatever SCI_EVT is until 0x00 returned. This driver behavior is reasonable from ACPI spec's point of view. The CLEAR_ON_RESUME quirk provided by you did this after resuming, we may need to extend this behavior to runtime.

But we do have many platforms act differently:
1. Some platforms never return 0x00 even when SCI_EVT=0, they return certain event value. Some of them even return the event value for which there is no _Qxx prepared in the namespace. So if we continously send QR_EC until 0x00 is returned, the process will never end on such platforms. The users of such platforms will sure blame us.
2. Some platforms (Acer) do not return anything if SCI_EVT=0, the EC query transaction will be blocked, and our driver cannot issue further EC transaction unless previous one is completed. So if we allow QR_EC to be sent without checking SCI_EVT, users of such platforms will complain.

What I'm going to do is:
1. extending the draining behavior - poll...

Read more...

Hi, Kieran

Can you:
1. use current linux-pm/linux-next branch
2. enable EC debugging
3. disable quirk
4. trigger the bug
5
. capture the dmesg right after resume
6. upload the dmesg here
for investigation?

Thanks in advance

Created attachment 157601
dmesg on linux-next without quirk showing EC working before but not after triggering bug

(In reply to Lv Zheng from comment #240)
> Hi, Kieran
>
> Can you:
> 1. use current linux-pm/linux-next branch
> 2. enable EC debugging

Is it enough to just uncomment the #define DEBUG?

> 3. disable quirk

Is it enough to prevent EC_FLAGS_CLEAR_ON_RESUME from being set? Is this what you meant?

> 4. trigger the bug
> 5. capture the dmesg right after resume
> 6. upload the dmesg here

See attached. I annotated the dmesg with a few extra > /dev/kmsg lines starting with 'KC'. They are:

[ 107.787915] KC just booted and logged in
[ 137.240937] KC screen close/open works as intended
[ 203.943414] KC about to suspend, but not trigger bug this time
[ 226.584995] KC back from suspend
[ 265.000194] KC screen close/open works as intended
[ 298.608956] KC AC change detected as intended
[ 318.874279] KC about to suspend and trigger bug
[ 348.834217] KC back from suspend
[ 374.777099] KC screen open/close not detected
[ 398.000040] KC AC change not detected
[ 425.733615] KC about to manually clear EC events with script
[ 442.663539] KC EC events cleared
[ 469.766095] KC AC change detected again
[ 496.219180] KC screen open/close detected again

So between 0 and 318 you can see the EC operating as intended including one suspend cycle.

After 318 I trigger the bug (by unplugging AC a bunch of times), and after that you can see that there is nothing at all logged when I either closed the screen or unplugged AC.

At 425 I manually simulated the quirk by sending 0x84 EC command queries and reading the data until the data is 0x00 using Juan Manuel's userspace program.

After that, EC events are detected properly again.

----

My analysis is below:

After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately after resume:

[ 337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0

It does not seem as though we ever handle this event properly though. Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally different?

----

Further testing shows that this SCI_EVT=1 happens for the first resume after the bug is triggered, but not for the second or subsequent resumes.

That means that we are still going to need a wakeup quirk because if for some reason we fail to clear the EC state before the next suspend, we will never get another SCI_EVT=1 (even after a power cycle, I believe).

Created attachment 157611
dmesg on linux-next without quirk showing bug persisting over several suspend cycles

Of the three suspend/resume cycles shown in this dmesg, I trigger the EC bug during the first suspend time.

It comes back with SCI_EVT=1 set the first time, but the 2nd and 3rd resumes do not have this. In a moment I will confirm if this persists across power cycles.

It seems the right behaviour for affected Samsung machines is to send QR_EC until data is 0x00 not just if we get SCI_EVT=1, but additionally on boot or resume.

Created attachment 157631
dmesg on linux-next without quirk showing bug persisting over power cycle

Confirming that once the bug is initially triggered, we don't get SCI_EVT=1 even on a power cycle, so we still need the boot/resume quirk. At least, that's my interpretation.

I hope the dmesgs were helpful. Unfortunately I'm not going to have internet access for the next week (going to the outback where there is not even any mobile reception), but I'm happy to test more things when I return.

Download full text (3.5 KiB)

(In reply to Kieran Clancy from comment #241)
> Created attachment 157601 [details]
> dmesg on linux-next without quirk showing EC working before but not after
> triggering bug
>
> (In reply to Lv Zheng from comment #240)
> > Hi, Kieran
> >
> > Can you:
> > 1. use current linux-pm/linux-next branch
> > 2. enable EC debugging
>
> Is it enough to just uncomment the #define DEBUG?

Yes.

> > 3. disable quirk
>
> Is it enough to prevent EC_FLAGS_CLEAR_ON_RESUME from being set? Is this
> what you meant?

Yes.

> > 4. trigger the bug
> > 5. capture the dmesg right after resume
> > 6. upload the dmesg here
>
> See attached. I annotated the dmesg with a few extra > /dev/kmsg lines
> starting with 'KC'. They are:
>
> [ 107.787915] KC just booted and logged in
> [ 137.240937] KC screen close/open works as intended
> [ 203.943414] KC about to suspend, but not trigger bug this time
> [ 226.584995] KC back from suspend
> [ 265.000194] KC screen close/open works as intended
> [ 298.608956] KC AC change detected as intended
> [ 318.874279] KC about to suspend and trigger bug
> [ 348.834217] KC back from suspend
> [ 374.777099] KC screen open/close not detected
> [ 398.000040] KC AC change not detected
> [ 425.733615] KC about to manually clear EC events with script
> [ 442.663539] KC EC events cleared
> [ 469.766095] KC AC change detected again
> [ 496.219180] KC screen open/close detected again

Great information!
Thanks.

> So between 0 and 318 you can see the EC operating as intended including one
> suspend cycle.
>
> After 318 I trigger the bug (by unplugging AC a bunch of times), and after
> that you can see that there is nothing at all logged when I either closed
> the screen or unplugged AC.
>
> At 425 I manually simulated the quirk by sending 0x84 EC command queries and
> reading the data until the data is 0x00 using Juan Manuel's userspace
> program.
>
> After that, EC events are detected properly again.

Great test cases!
Thanks.

> My analysis is below:
>
> After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately
> after resume:
>
> [ 337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
>
> It does not seem as though we ever handle this event properly though.
> Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a
> couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally
> different?

This is a bug, in ec_poll(), there is no code to check SCI_EVT.
And event will thus be lost.

I think this has been fixed in the ec-flush6.patch.
+out:
+ if (status & ACPI_EC_FLAG_SCI &&
+ (!t || t->flags & ACPI_EC_COMMAND_COMPLETE))
+ __acpi_ec_set_event(ec);

We will have this flag checked in the advance_transaction.
So you won't see this with ec-flush[1-6].patch applied.

> Further testing shows that this SCI_EVT=1 happens for the first resume after
> the bug is triggered, but not for the second or subsequent resumes.
>
> That means that we are still going to need a wakeup quirk because if for
> some reason we fail to clear the EC state before the next suspend, we will
> never get another SCI_EVT=1 (even after a power cycle, I believe).

Yes, I think this is requi...

Read more...

(In reply to Kieran Clancy from comment #242)
> Created attachment 157611 [details]
> dmesg on linux-next without quirk showing bug persisting over several
> suspend cycles
>
> Of the three suspend/resume cycles shown in this dmesg, I trigger the EC bug
> during the first suspend time.
>
> It comes back with SCI_EVT=1 set the first time, but the 2nd and 3rd resumes
> do not have this. In a moment I will confirm if this persists across power
> cycles.
>
> It seems the right behaviour for affected Samsung machines is to send QR_EC
> until data is 0x00 not just if we get SCI_EVT=1, but additionally on boot or
> resume.

Yes.
I was planning to support this in this way.

Thanks

(In reply to Kieran Clancy from comment #243)
> Created attachment 157631 [details]
> dmesg on linux-next without quirk showing bug persisting over power cycle
>
> Confirming that once the bug is initially triggered, we don't get SCI_EVT=1
> even on a power cycle, so we still need the boot/resume quirk. At least,
> that's my interpretation.
>
> I hope the dmesgs were helpful. Unfortunately I'm not going to have internet
> access for the next week (going to the outback where there is not even any
> mobile reception), but I'm happy to test more things when I return.

Yes, it's very helpful.
Thanks for the help!

Best regards
-Lv

(In reply to Lv Zheng from comment #244)
> (In reply to Kieran Clancy from comment #241)
> > My analysis is below:
> >
> > After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately
> > after resume:
> >
> > [ 337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0
> OBF=0
> >
> > It does not seem as though we ever handle this event properly though.
> > Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a
> > couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally
> > different?
>
> This is a bug, in ec_poll(), there is no code to check SCI_EVT.
> And event will thus be lost.
>
> I think this has been fixed in the ec-flush6.patch.
> +out:
> + if (status & ACPI_EC_FLAG_SCI &&
> + (!t || t->flags & ACPI_EC_COMMAND_COMPLETE))
> + __acpi_ec_set_event(ec);
>
> We will have this flag checked in the advance_transaction.
> So you won't see this with ec-flush[1-6].patch applied.

I should take this back. :-)
This still need to be improved to indicate SCI_EVT even when there is a transaction running.

Thanks
-Lv

commit 74443bbed72ab22ee005ecb6ecdc657a8018e1db
Author: Lv Zheng <email address hidden>
Date: Wed Jan 14 19:28:47 2015 +0800

    ACPI / EC: Fix issues related to the SCI_EVT handling

shipped in Linux-4.0-rc1
closed.

Սահակ (petrosyan) on 2015-08-29
Changed in linux (Ubuntu):
status: Incomplete → Invalid

commit 4c237371f290d1ed3b2071dd43554362137b1cce
Author: Lv Zheng <email address hidden>
Date: Wed Jan 4 11:17:17 2017 +0800

    ACPI / EC: Remove old CLEAR_ON_RESUME quirk

The above reintroduced this bug on my NP900x4c. Reverting the commit makes things work again.

(In reply to Elvis Pranskevichus from comment #249)
> commit 4c237371f290d1ed3b2071dd43554362137b1cce
> Author: Lv Zheng <email address hidden>
> Date: Wed Jan 4 11:17:17 2017 +0800
>
> ACPI / EC: Remove old CLEAR_ON_RESUME quirk
>
> The above reintroduced this bug on my NP900x4c. Reverting the commit makes
> things work again.

Confirmed, the bug is back again on samsung latops :-(

https://bbs.archlinux.org/viewtopic.php?pid=1767061#p1767061

Not exactly surprising when the change that fixed then gets reverted out of the kernel. Seems like Intel trying to force obsolescence of these systems.

(In reply to Mark Syms from comment #251)
> Not exactly surprising when the change that fixed then gets reverted out of
> the kernel. Seems like Intel trying to force obsolescence of these systems.

what has to be done to bring the fix back into the kernel code?
I mean at the first glance it was only for `Samsung hardware`, so I guess it does not affect any other vendors/hardwares.
The mentioned commit looks like just a `code cleanup` changes. but I'm not familiar with the kernel code.

Shall I file a new issue or could this be opened again?

Lv is not working on this now, and I'm new to the EC code, first let's confirm a quick revert patch.

Created attachment 274121
revert patch

please confirm
1. the problem exists with the latest upstream kernel, say, 4.16-rc1
2. the problem is gone after applying the patch attached

I can confirm that the regression exists with 4.15.3 and that your patch fixes it. 4.16 not tested.

I am experiencing the same problem (Arch Linux, kernel 4.15.3, KDE). Hardware: Samsung laptop model 900X3L. Will the patch be included in the Linux kernel?

Created attachment 274447
revert-patch works

confirmed, pacth is working on top of `4.15`

I am still facing this problem with Arch Linux + kernel 4.16.0-2. Hardware is a Samsung laptop (model: NP900X3L-KW1BR). Suggestions are welcome.

I compiled kernels 4.16.0 and 4.16.2 (in Arch Linux) using the patch in Comment #254 and I confirm that the patch fixes the problem. (Hardware: Samsung model NP900X3L-KW1BR.)

Hi @Rui,
may I know if the patch #Comment 254 will be pushed upstream ?

Changed in linux:
importance: Unknown → High
status: Unknown → Confirmed
no longer affects: linux (Ubuntu)
affects: linux → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: High → Undecided
status: Confirmed → New
status: New → Invalid
Displaying first 40 and last 40 comments. View all 368 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.